CN112464882B

CN112464882B - Method, apparatus, medium, and device for recognizing continuous motion

Info

Publication number: CN112464882B
Application number: CN202011459110.6A
Authority: CN
Inventors: 梁帆
Original assignee: Dongguan Prophet Big Data Co ltd
Current assignee: Guangdong Prophet Big Data Co ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-09-10
Anticipated expiration: 2040-12-11
Also published as: CN112464882A

Abstract

The embodiment of the specification discloses a method, a device and electronic equipment for identifying continuous actions, wherein the method comprises the steps of respectively inputting a multi-frame image sequence into an operation object identification model and an operation worker identification model to obtain an operation object detection frame and an operation worker detection frame; screening out an operator image of an operator in a working area of an operation object; inputting the screened operator image into a limb recognition model to obtain the limb key point coordinates of the operator; carrying out standardization processing on the limb key point coordinates in all the frame images; and calculating similarity scores of the limb key point coordinates after the standardization treatment and each standard action set, and judging that the operator executes corresponding actions when the similarity scores are greater than a score threshold value. The method and the device have the advantages that the accuracy of monitoring the operation condition is facilitated, the technical problems of inaccuracy and unreality existing when the operation object use condition is recorded manually in the prior art are solved, and the labor and material cost of an enterprise is reduced.

Description

Method, apparatus, medium, and device for recognizing continuous motion

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a method for recognizing a continuous motion, a device for recognizing a continuous motion, an electronic apparatus, a computer-readable storage medium, and a computer program.

Background

The dining room such as schools, enterprises and factories has a large number of dining people, food safety relates to the aspect of the aspect, food material sources are reliable, the safety and the sanitation of the processing process are guaranteed, meanwhile, if tableware is not disinfected, residual food residues can generate harmful substances and breed harmful germs, and can threaten the health of dining staff in schools, enterprises and factories, and due to the fact that the disinfection cabinet is correctly used, the tableware can be effectively disinfected and sterilized, major food poisoning accidents and food-borne diseases are reduced, and therefore research and investigation on the use condition of the disinfection cabinet become more and more important.

Traditional disinfection cabinet use condition monitoring relies on manual records, including, for example, the use time and operators of the disinfection cabinet, and the written monitoring method has low authenticity and the possibility of counterfeit records, so that the use condition of the disinfection cabinet cannot be accurately and truly supervised. Therefore, how to efficiently and accurately monitor the use condition of the disinfection cabinet in the canteens such as schools, enterprises and factories becomes a problem to be solved urgently.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a method for identifying a continuous motion, a device for identifying a continuous motion, an electronic apparatus, a computer-readable storage medium, and a computer program, which can identify a continuous operation performed by an operator, and determine whether the operator performs a certain operation on an operation object based on an identification result, so that the usage of the operation object can be efficiently and accurately monitored, the monitoring efficiency of food safety is improved, and the costs of manpower and material resources are reduced.

To achieve the above object, in a first aspect, the present specification provides a method for recognizing a continuous motion, the method comprising:

respectively inputting a plurality of frames of images with time sequence relation into an operation object identification model and an operator identification model, and respectively obtaining an operation object detection frame and an operator detection frame in each frame of image;

screening out an operator image in a working area of the operation object according to the operation object detection frame and the operator detection frame;

inputting the screened operator images into a limb recognition model, and obtaining the limb key point coordinates of the operator in each frame of image;

standardizing the limb key point coordinates of the operator in all the frame images;

and calculating similarity scores of the limb key point coordinates of the operator after the standardization treatment and each standard action set, and judging that the operator executes the action corresponding to the standard action set when the similarity score is greater than a score threshold value.

Optionally, the operation object is a disinfection cabinet, and the standard action and the middle action include opening a disinfection cabinet door, closing the disinfection cabinet door, carrying tableware into the disinfection cabinet, and carrying tableware out of the disinfection cabinet.

Optionally, the screening out the operator image located in the working area of the operation object according to the operation object detection frame and the operator detection frame includes:

calculating the Euclidean distance between the central point of the operation object detection frame and the central point of the operator detection frame;

and screening out the images of the operators in the working area of the operation object when the Euclidean distance is smaller than a distance threshold value.

Optionally, the normalizing the coordinates of the key points of the limbs of the operator in all the frame images includes:

smoothing the coordinates of the limb key points of the operator in each frame of image and performing mean value calculation to obtain the skeleton center of the operator;

calculating the projection length from the bone center to the surface line of the operation object trained in advance in each frame image, and obtaining a projection length set { l ] in all frame images_j}；

According to the coordinate set of the key points of the limbs of the operator in all the frame images

And set of projection lengths { l }_jCarrying out length standardization and target standardization processing on the limb key point coordinates of the operator, wherein,

the coordinates of the limb key points are represented by i, the types of the limb key points are represented by j, and the time sequence number of the frame image is represented by j;

wherein the length normalization is as follows:

wherein the content of the first and second substances,

l_{sign board}For standard distance, l ═ max (l)_j) I.e. l is the set of projection lengths { l }_jThe maximum value in (c);

the formula for the number normalization is as follows:

wherein int [ x ]]Is the integer of x, wherein x is b multiplied by r,

r denotes the number of key points of the limb, nⁱCoordinate set of limb key points for ith limb key point

Number of (1), n_iThe number of the ith limb key point in the standard action set,

representing the normalized set of key points of the limb.

Optionally, the calculating a similarity score between the limb key point coordinate of the operator after the normalization processing and each standard action set, and when the similarity score is greater than a score threshold, determining that the operator performs an action corresponding to the standard action set includes:

for all frame images, a time window with the length of t is adopted, and preset step length is adopted according to the time direction

Sliding, calculating the similarity score between the standardized limb key point set and each standard action set in each time window to obtain a similarity score result s in each time window_gWherein n represents the frame rate of the video to be detected, g represents the center time of the time window, and the similarity calculation formula is as follows:

where m represents the number of frames in a standard action set, and Y ═ Y_jDenotes a set of standard actions that are set up,

mean value representing a standard action information set, X ═ X_jDenotes the set of actions within a time window,

represents the average value of the motion information set within the time window,

wherein the content of the first and second substances,

respectively representing the coordinates of key points of the wrist, the elbow and the shoulder of an operator, wherein the equation kx-y + b is 0 representing a linear equation passing through the bone center of the operator and perpendicular to the surface of the operation object, k and b represent parameters of the linear equation passing through the bone center of the operator and perpendicular to the surface of the operation object, the equation k 'x-y + b' is 0 representing a linear equation of the surface of the operation object, and k 'and b' represent parameters of the linear equation of the surface of the operation object;

scoring the similarity within all time windows to result s_gComposing a vector and multiplying the vector by corresponding weight to obtain a total similarity score, when the total similarity score is smaller than the corresponding weightAnd when the total similarity score is greater than a score threshold value, judging that the operator executes the action corresponding to the standard action set, and recording a time period when the total similarity score is greater than the score threshold value.

Optionally, after recording a time period in which the total similarity score is greater than the score threshold, the method further comprises:

judging whether the operator executes more than two actions in the same time period;

if so, deleting more than two actions executed in the same time period to obtain continuous actions in the video to be detected.

Optionally, smoothing the coordinates of the limb key points of the operator in each frame of image and performing mean value calculation to obtain the bone center of the operator includes:

judging whether abnormal limb key points exist or not according to the coordinates of the limb key points;

when abnormal limb key points exist, performing mean value calculation on the abnormal limb key points in the current frame image by using coordinates of the limb key points of corresponding parts in the front and rear preset frame images to obtain coordinates of each limb key point after smoothing treatment;

and calculating the average value of the coordinates of the key points of each limb after the smoothing treatment to obtain the bone center of the operator.

In a second aspect, embodiments of the present specification provide a continuous motion recognition apparatus, the apparatus including:

the detection module is used for respectively inputting a plurality of frames of images with time sequence relation into the operation object identification model and the operator identification model and respectively obtaining an operation object detection frame and an operator detection frame in each frame of image;

the screening module is used for screening out the operator image in the working area of the operation object according to the operation object detection frame and the operator detection frame;

the limb identification module is used for inputting the screened operator images into a limb identification model and obtaining the limb key point coordinates of the operator in each frame of image;

the standardization module is used for carrying out standardization processing on the limb key point coordinates of the operator in all the frame images;

and the similarity module is used for calculating similarity scores of the limb key point coordinates of the operator after the standardization processing and each standard action set, and judging that the operator executes corresponding actions in the standard action sets when the similarity scores are larger than a score threshold value.

In a third aspect, the present specification provides an electronic device comprising:

a memory for storing a computer program;

a processor configured to execute the computer program stored in the memory, wherein when the computer program is executed, the method for recognizing continuous motion according to any one of the first aspect is implemented.

In a fourth aspect, the present specification provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method for recognizing continuous motion according to any one of the first aspect.

In the method for identifying continuous motions, the device for identifying continuous motions, the electronic device, the computer-readable storage medium, and the computer program provided in one or more embodiments of the present disclosure, an operation object detection box and an operation person detection box in an image to be detected may be obtained by using a deep learning method, then an operation person image located in a working area of the operation object is screened out, coordinates of each limb key point are obtained after a limb identification model is input, after a limb key point coordinate in a video to be detected is normalized, a similarity score between the normalized limb key point coordinate and each standard motion set is calculated, and when the similarity score is greater than a score threshold, it is determined that the operator has performed a motion corresponding to the standard motion set. The continuous action recognition method disclosed by the specification is beneficial to improving the monitoring accuracy of an operator in executing continuous operation actions on an operation object, overcomes the technical problems of inaccuracy and unreality when the operator relies on manual recording in the prior art, further improves the food safety supervision efficiency, and reduces the labor and material cost of enterprises.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort. In the drawings:

FIG. 1 is a flow chart illustrating an embodiment of a method for recognizing continuous motion provided herein;

FIG. 2 is a schematic illustration of the identification of operands in some embodiments provided herein;

FIG. 3 is a schematic illustration of operator identification in some embodiments provided herein;

FIG. 4 is a schematic illustration of limb identification in some embodiments provided herein;

FIG. 5 is a schematic diagram of 4 actions in a standard set of actions in some embodiments provided herein;

fig. 6 is a schematic structural diagram of an embodiment of a continuously operating identification device provided in this specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the specification, and not all embodiments. All other embodiments obtained by a person skilled in the art based on one or more embodiments of the present specification without making any creative effort shall fall within the protection scope of the embodiments of the present specification.

The embodiments provided herein are applicable to electronic devices such as terminal devices, computer systems, and servers, which are operable with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computer systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, and servers, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, and servers may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, and data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a method for recognizing a continuous motion provided in this specification. Although the present specification provides the method steps or apparatus structures as shown in the following examples or figures, more or less steps or modules may be included in the method or apparatus structures based on conventional or non-inventive efforts. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution order of the steps or the block structure of the apparatus is not limited to the execution order or the block structure shown in the embodiments or the drawings of the present specification. When the method or the module structure is applied to a device, a server or an end product in practice, the method or the module structure shown in the embodiment or the drawings can be executed sequentially or in parallel (for example, a parallel processor or a multi-thread processing environment, or even an implementation environment including distributed processing and server clustering). In a specific embodiment, as shown in fig. 1, in an embodiment of the method for identifying a continuous motion provided in the present specification, the method may include the following steps:

and S200, respectively inputting a plurality of image sequences with time sequence relation into the operation object identification model and the operation worker identification model, and obtaining an operation object detection frame and an operation worker detection frame from a plurality of images.

In an optional example, in this specification, multiple frames of images to be detected having a time sequence relationship may be respectively input into a pre-trained operation object recognition model for operation object recognition in real time, where the multiple frames of images to be detected having a time sequence relationship may be multiple continuous video frames in a video, or may be multiple image blocks cut from multiple continuous video frames in a video. Since a plurality of consecutive video frames in a video have a timing relationship, a plurality of image blocks obtained by slicing a video frame also have a timing relationship. The size of the image to be measured with the time sequence relationship in the present application should meet the preset requirement of the operation object recognition model for the size of the input image, for example, the size of the image to be measured may include, but is not limited to 224 × 224. After the multiple frames of images are respectively input into the operation object identification model, an operation object detection frame can be obtained in each frame of image to be detected, and the operation object detection frame can comprise coordinate information of the operation object detection frame.

Meanwhile, the multiple frames of images to be detected with the time sequence relationship need to be respectively input into a pre-trained operator identification model for operator identification, so as to obtain an operator detection frame. The operator detection box may include coordinate information of the operator detection box and a time of the corresponding frame image.

It should be noted that the embodiment of the present invention does not limit the order of inputting the multi-frame images into the operation object recognition model and the operator recognition model, and in some examples of the present invention, the multi-frame images may be first input into a pre-trained operation object recognition model for operation object recognition, and then input into the pre-trained operator recognition model for operator recognition; in other examples of the present invention, a plurality of frames of images may be input into a pre-trained operator recognition model for operator recognition, and then a plurality of frames of images may be input into a pre-trained operator recognition model for operator recognition.

In this embodiment of the present specification, first, an operation object recognition model and an operator recognition model need to be trained, where topology structures of the operation recognition model and the operator recognition model may be a convolutional neural network.

In an alternative example, the convolutional neural network may be a convolutional neural network with deep learning capability, including but not limited to a plurality of convolutional layers, and may further include: pooling layers, fully connected layers, and layers for performing sort operations, among others. The convolutional neural network can realize deep learning, and the deep convolutional neural network has more outstanding performance in image recognition compared with other deep learning structures.

Before the detection of the operation object is carried out on the image to be detected, the image classification task of the convolutional neural network can be trained by using a data set containing abundant operation object marking information as a training sample in advance, so that an operation object identification model with the operation object classification function is obtained.

The trained operation object recognition model is used for testing a plurality of frames of images to be tested with time sequence relation, and the operation object confidence of each area in the images to be tested can be obtained. The confidence of the operation object is the probability that the image of the region is the operation object, the confidence of the operation object is compared with a preset threshold of the confidence of the operation object, the images of the regions can be classified, and the operation object region and the non-operation object region are distinguished, so that the coordinate information of the operation object detection frame and the operation object detection frame is obtained.

Similarly, before the operator detection is carried out on the image to be detected, the image classification task of the convolutional neural network can be trained by using the data set containing abundant operator marking information as a training sample in advance, so that the operator identification model with the operator classification function is obtained.

And testing a plurality of frames of images to be tested with a time sequence relation by using the trained operator identification model, so as to obtain the confidence of the operators in each region of the images to be tested. The confidence coefficient of the operator is the probability that the image of the region is the operator, the confidence coefficient of the operator is compared with a preset confidence coefficient threshold value of the operator, the images of the regions can be classified, the regions of the operator and the regions of non-operators are distinguished, and therefore the coordinate information of the detection frame of the operator and the time of the corresponding frame are obtained.

It should be noted that the present invention is not limited to the operation object and the operator, and is within the scope of the present invention as long as the operator performs continuous operation on a certain object. In some examples, the operation object may be a disinfection cabinet in a canteen of a business, school, factory, etc., and the operator may be a worker in the canteen who operates the disinfection cabinet, and in other examples, the operation object may be an electronic product, clothing, etc. on a production line of the business, and the operator may be a worker on the production line.

In the embodiment of the present specification, an operation object is a disinfection cabinet in a canteen such as an enterprise, a school, and a factory, and an operator is an operator operating the disinfection cabinet. Referring to fig. 2, fig. 2 is a schematic diagram illustrating the operation object recognition performed in some embodiments provided in the present specification. Wherein, the image to be detected is input into the pre-trained operation object recognition model to obtain the operation object detection frame, i.e. the disinfection cabinet shown in fig. 2.

Referring to fig. 3, fig. 3 is a schematic diagram of performing operator identification in some embodiments provided in the present specification, in which an image to be detected is input into a pre-trained operator identification model, and acquisition time of an operator detection frame and a corresponding frame image is obtained, that is, the operator shown in fig. 3 is obtained.

And S220, screening out the operator image in the working area of the operation object according to the operation object detection frame and the operator detection frame.

In the embodiment of the specification, in order to improve the identification efficiency, it is necessary to remove some images which do not meet the condition from a plurality of frames of images to be detected with a time sequence relationship, for example, when the distance between an operator and an operation object is too far, it is indicated that the operator does not perform an operation on the operation object, so that the images can be removed, the data processing amount is reduced, and the identification efficiency is improved.

In some examples of the present invention, the screening out multiple frames of images of which the operator is located in the working area of the operation object is implemented by the following steps:

s221, calculating the Euclidean distance between the central point of the operation object detection frame and the central point of the operator detection frame.

According to the coordinate information of the operation object detection frame and the coordinate information of the operator detection frame obtained in the above steps, the center point coordinate of the operation object detection frame and the center point coordinate of the operator detection frame can be respectively calculated, and then the Euclidean distance between the two can be obtained according to the coordinates of the center points of the two. Calculating the euclidean distance between two points according to the coordinates of the two points belongs to the common knowledge of those skilled in the art, and will not be described herein.

The coordinates of the operation object detection frame are denoted as (x, y, w, h), where (x, y) is the upper left coordinate of the operation object detection frame, (w, h) is the width and height of the operation object detection frame, and the coordinates of the operator detection frame are denoted as (x ', y', w ', h'), where (x ', y') is the upper left coordinate of the operator detection frame, and (w ', h') is the width and height of the operator detection frame.

Whether the detected operator is in the working area neighborhood of the operation object (x, y, w, h) or not is judged, the coordinates of the canteen workers not in the working area neighborhood are deleted, and the calculation formula of the position coordinates (x ', y', w ', h') of the canteen workers in the station is obtained as follows:

where d is the boundary threshold of the operation object, and if fL is 0, the coordinates of the corresponding operation object detection box are deleted.

And S222, screening and screening out the images of the operators in the working area of the operation object when the Euclidean distance is smaller than a distance threshold value.

In the invention, when the distance between the center point of the operation object detection frame and the center point of the operator detection frame is greater than the distance threshold, the corresponding coordinates of the operator detection frame can be deleted, and the operator detection frame with the distance between the center point of the operation object detection frame and the center point of the operator detection frame smaller than the threshold is reserved.

And S240, inputting the screened operator image into a limb recognition model, and obtaining the limb key point coordinates of the operator in each frame of image.

In the invention, the image of the operator in the working area of the operation object can be input into a pre-trained limb recognition model for limb recognition, and the topological structure of the limb recognition model can be a convolutional neural network.

In an alternative example, the convolutional neural network may be a convolutional neural network with deep learning capability, including but not limited to a scotch convolutional layer, and the convolutional neural network may further include: pooling layers, fully connected layers, and layers for performing sort operations, among others. The convolutional neural network can realize deep learning, and the deep convolutional neural network has more outstanding performance in image recognition compared with other deep learning structures.

In an optional example, the convolutional neural network can also be a lightweight convolutional neural network, so that the processing time is shortened, and the detection speed is increased.

In an alternative example, the convolutional neural network may also be several cascaded convolutional neural networks, thereby having better recognition performance.

Before limb detection is carried out, an image classification task of a convolutional neural network can be trained by using a data set containing abundant skeletal joint point labeling information as a training sample in advance, so that an operator identification model with a limb classification function is obtained. Wherein the skeletal joint points comprise at least: wrist, elbow, shoulder, etc.

The trained limb recognition model is used for testing the image to be tested, so that the limb key points of the operator, such as the wrist, the elbow, the shoulder and the like, in the image to be tested can be positioned, and the limb key points are connected in sequence to form the basic skeleton of the operator.

Referring to fig. 4, fig. 4 is a schematic diagram of limb identification in some embodiments provided herein. Inputting the images meeting the conditions into a pre-trained limb recognition model, testing the multi-frame images meeting the conditions by using the trained limb recognition model, positioning the joint points of the wrist, elbow, shoulder and the like of the worker, and connecting the joint points in sequence to form the basic skeleton of the worker.

And S260, carrying out standardization processing on the limb key point coordinates of the operator in all the frame images.

In this step, the operations of S200-S240 are performed on each frame of image of the video to be tested, and the coordinates of the key points of the limbs of the operator located in the working area of the operation object are obtained in each frame of image, so that the coordinate set of the key points of the limbs of the operator is obtained in the video to be tested

One possible implementation way for carrying out standardization processing on the limb key point coordinates of the operator in the video to be tested is realized by the following steps:

s261, smoothing the coordinates of the limb key points of the operator in each frame of image, and performing mean value calculation to obtain the skeleton center of the operator.

In some alternative examples, the bone center of the operator may be obtained by:

s2611, judging whether abnormal limb key points exist or not according to coordinates of the limb key points.

In this step, the distance between every two limb key points can be calculated, and when the distance is greater than a preset threshold, the abnormal limb key point of the limb key point is judged, and the abnormal limb key point needs to be smoothed.

And S2612, when abnormal limb key points exist, performing mean value calculation on the abnormal limb key points in the current frame image by using coordinates of the limb key points of corresponding parts in the front and rear preset frame images to obtain coordinates of each limb key point after smoothing processing.

In the step, when there is an abnormal bone key point, for the abnormal limb key point in the current frame image, the coordinates of the limb key point of the part in the two previous and next frame images can be used for mean value calculation to obtain the coordinates of each limb key point after smoothing processing.

And S2613, calculating an average value of coordinates of each limb key point after smoothing processing, and obtaining the skeleton center of the operator.

In this specification, after smoothing the skeletal joint points of each frame of image in the motion set, the skeletal joint point coordinates of the operator in each frame of image are added and averaged to obtain the skeletal center of the operator in each frame of image, so that the skeletal frame of the operator in each frame of image can be regarded as a point.

S262, calculating the projection length from the bone center to the linear line on the surface of the operation object trained in advance in each frame image, and obtaining a projection length set { l ] in all the frame images_j}。

The straight line of the surface of the operation object can be obtained through historical data training, for example, when the operation object is a disinfection cabinet, the surface of the operation object can be the surface of the disinfection cabinet facing an operator. The bone center of the operator to the operation object can be obtained in each frame of imageThe projection length of the straight line of the surface, thereby obtaining a projection length set l in the video to be measured_j}。

S263, according to the limb key point coordinate set of the operator in all the frame images

and the coordinates of the limb key points are represented by i, the types of the limb key points are represented by j, and the time sequence number of the frame image is represented by j.

Wherein the length normalization is as follows:

wherein the content of the first and second substances,

l_{sign board}Is the standard distance, v ═ max (l)_j) I.e. l is the set of projection lengths { l }_jThe maximum value in (c);

the formula for the number normalization is as follows:

wherein int [ x ]]Is the integer of x, wherein x is b multiplied by r,

Number of (1), n_iIs the ith limb key pointThe number of the standard action sets is that after the length standardization and the target standardization of the limb key point set are carried out, a standardized limb key point set platform is obtained

And S280, calculating similarity scores of the limb key point coordinates of the operator after the standardization processing and each standard action set, and judging that the operator executes corresponding actions in the standard action sets when the similarity scores are larger than a score threshold value.

For example: the standard action sets comprise four action sets A1, A2, A3 and A4, wherein A1 shows that a cabinet door is opened, A2 shows that tableware is transported outwards, A3 shows that tableware is transported inwards, and A4 shows that the cabinet door is closed, similarity scores of the limb key point set after the standardization processing and various action sets A1, A2, A3 and A4 in the standard action sets are respectively calculated, and the operation of the operator in the standard action sets is judged according to the similarity scores. For example, if the similarity score of the limb key point set of the video to be detected and the standard action set a1 is obtained by calculating the similarity score and is greater than the score threshold, it is determined that the operator has performed an operation of opening the cabinet door. Fig. 5 is a schematic diagram of 4 standard actions in some embodiments provided herein, showing the opening of a cabinet door, the outward transportation of dishes, the inward transportation of dishes, and the closing of a cabinet door, respectively, as shown in fig. 5.

In some alternative examples, step S280 calculates similarity scores of the limb key point coordinates of the operator after the normalization processing and each standard action set, and when the similarity score is greater than a score threshold, it is determined that the operator performed an action corresponding to the standard action set, and one possible implementation manner is to:

s281, for multi-frame images of a video to be detected, adopting a time window with the length of t and presetting step length according to the time direction

Bone movement, calculating each timeObtaining similarity score results s of the standardized limb key point sets and the standard action sets in each time window_k。

Wherein n represents the frame rate of the video to be detected, g represents the center time of the time window, and the similarity calculation formula is as follows:

wherein the content of the first and second substances,

respectively representing the coordinates of the key points of the wrist, elbow and shoulder of the operator, wherein the equation kx-y + b is 0 representing the linear equation passing through the bone center of the operator and perpendicular to the surface of the operation object, wherein k and b represent the parameters of the linear equation passing through the bone center of the operator and perpendicular to the surface of the operation object, the equation k 'x-y + b' is 0 representing the linear equation of the surface of the operation object, and k 'and b' represent the parameters of the linear equation of the surface of the operation object.

In some embodiments, the parameters of the linear equation passing through the center of the operator's bone and perpendicular to the surface of the object and the parameters of the linear equation at the surface of the object may be obtained after training from historical empirical data.

S282, scoring the similarity score results { s ] in all time windows_gAnd forming a vector and multiplying the vector by corresponding weight to obtain a total similarity score, judging that the operator executes the action corresponding to the standard action set when the total similarity score is greater than a score threshold value, and recording a time period when the total similarity score is greater than the score threshold value.

After calculating the correlation of the motion in each time window with the standard motion set using the pearson correlation coefficient, the similarity scores in all time windows are combined into a vector s ═ s (s¹，...，sⁿ) Where n represents the total number of time windows, followed by multiplying each element in the vector by a corresponding weight u ═ u (u ═ u)₁，...，u_n)^TObtaining a total similarity score, wherein u₁，…，u_nAnd representing the total weight corresponding to each time window, judging that the operator performs the operation corresponding to the action in the standard action set when the total similarity score is greater than a score threshold value, and recording the time period when the total similarity score is greater than the score threshold value.

For example, if the similarity score between the limb key point coordinate set of the video to be tested and the action a1 in the standard action set is 0.82, the similarity score between the limb key point coordinate set of the video to be tested and the action a2 in the standard action set is 0.47, the similarity score between the limb key point coordinate set of the video to be tested and the action A3 in the standard action set is-0.23, the similarity score between the limb key point coordinate set of the video to be tested and the action a4 in the standard action set is-0.18, and the score threshold value is 0.7, it is determined that the operator performs the operation corresponding to the action a 1.

In some alternative examples, if it is determined that there is a detection action, a similarity score s is recorded_gThe continuous action time period which is larger than the similarity threshold value is obtained, and the action occurrence time period t is (t ═ t)_s，t_e) Wherein, t_sRepresents a similarity score s_gStart time, t, of a period of continuous motion greater than a threshold of similarity_eRepresents a similarity score s_gAn expiration time for a period of continuous motion greater than a similarity threshold.

In some optional examples, after similarity judgment is performed on each action set of the video to be detected and a standard action set, detection results of four actions of opening a cabinet door in the detected video, closing the cabinet door, carrying the video into the cabinet and carrying the video out of the cabinet and action occurrence time periods are obtained, logic judgment is performed on the detection results, whether the operator performs more than two actions in the same time period is judged, unreasonable results are eliminated (for example, two actions of opening the cabinet door and closing the cabinet door cannot occur simultaneously in the same time period), and the detection result of the last continuous action is obtained.

In the method for identifying continuous actions provided in one or more embodiments of the present specification, an operation object detection frame and an operator detection frame in an image to be detected may be obtained by using a deep learning method, then, an operator image located in a working area of an operation object is screened out, coordinates of each limb key point are obtained after a limb identification model is input, after a limb key point coordinate in a video to be detected is normalized, a similarity score between the normalized limb key point coordinate and each standard action set is calculated, and when the similarity score is greater than a score threshold, it is determined that the operator performs an action corresponding to that in the standard action set. The continuous action recognition method disclosed by the specification is beneficial to improving the monitoring accuracy of an operator in performing continuous operation actions on an operation object, can record the time period of continuous operation, overcomes the technical problems of inaccuracy and unreality in the prior art when the operator relies on manual recording, further improves the food safety supervision efficiency, and reduces the labor and material cost of enterprises.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. For details, reference may be made to the description of the related embodiments of the related processing, and details are not repeated herein.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Based on the methods provided in the foregoing embodiments, one or more embodiments of the specification further provide a continuous motion recognition apparatus, please refer to fig. 6, where fig. 6 is a schematic structural diagram of an embodiment of the continuous motion recognition apparatus provided in the specification, and the apparatus may include a detection module 300, a screening module 320, a limb recognition module 340, a normalization module 360, and a similarity module 380.

The detection module 300 is configured to input multiple frames of images with a time sequence relationship into the operation object identification model and the operator identification model, and obtain the operation object detection frame and the operator detection frame in each frame of image.

The screening module 320 is configured to screen out an operator image located in a working area of an operation object according to the operation object detection frame and the operator detection frame.

The limb identification module 340 is configured to input the screened operator image into a limb identification model, and obtain a limb key point coordinate of the operator in each frame of image.

The normalization module 360 is configured to normalize the coordinates of the key points of the limbs of the operator in all the frame images.

The similarity module 380 is configured to calculate similarity scores between the limb key point coordinates of the operator after the normalization processing and each standard action set, and determine that the operator performs an action corresponding to the standard action set when the similarity score is greater than a score threshold.

In some optional examples, the screening module may include a calculating unit configured to calculate a euclidean distance between a center point of the operation object detection frame and a center point of the operator detection frame, and a screening unit configured to screen out an operator image located within a work area of an operation object when the euclidean distance is smaller than a distance threshold.

In some optional examples, the normalization module may include a smoothing unit, a projection length unit, and a normalization unit.

The smoothing unit is used for smoothing the coordinates of the limb key points of the operator in each frame of image and performing mean value calculation to obtain the skeleton center of the operator.

In some embodiments, the smoothing unit determines whether abnormal limb key points exist according to coordinates of the limb key points; when abnormal limb key points exist, performing mean value calculation on the abnormal limb key points in the current frame image by using coordinates of the limb key points of corresponding parts in the front and rear preset frame images to obtain coordinates of each limb key point after smoothing treatment; and calculating the average value of the coordinates of the key points of each limb after the smoothing treatment to obtain the bone center of the operator.

The projection length unit is used for calculating the projection length from the bone center to a pre-trained straight line on the surface of an operation object in each frame of image, and acquiring a projection length set { l ] in a video to be measured_j}。

The standardization unit is used for collecting the coordinates of the key points of the limbs of the operator in the video to be tested

And set of projection lengths { l }_jCarrying out length standardization and target standardization processing on the limb key point coordinates of the operator in the video to be tested, wherein,

wherein the length normalization is as follows:

wherein the content of the first and second substances,

the formula for the number normalization is as follows:

wherein int [ x ]]Is the integer of x, wherein x is b multiplied by r,

nⁱcoordinate set of limb key points for ith limb key point

Number of (1), n_iFor the number of ith limb key points in the standard action set, carrying out length standardization and target standardization on the limb key point set to obtain a standardized limb key point set

The similarity module may include a similarity calculation unit and a comparison unit.

The similarity calculation unit is used for adopting a time window with the length of t for multi-frame images of a video to be detected and presetting step length according to the time direction

wherein the content of the first and second substances,

The comparison unit is used for scoring the similarity scores in all time windows into a result s_gAnd forming a vector and multiplying the vector by corresponding weight to obtain a total similarity score, judging that the operator executes the action corresponding to the standard action set when the total similarity score is greater than a score threshold value, and recording a time period when the total similarity score is greater than the score threshold value.

The device for recognizing continuous actions provided in one or more embodiments of the present specification may obtain an operation object detection box and an operator detection box in a scene to be detected by using a deep learning method, then screen out multiple frame images that meet a condition, classify the multiple frame images according to recognition by using a clustering method to obtain at least one action set, compare the similarity between each action set and an action in a standard action set to obtain a similarity score, and determine an operation executed by an operator according to the similarity score. The continuous-action operation information identification method disclosed by the specification is beneficial to improving the accuracy of monitoring the operation condition of the operation object by an operator, overcomes the technical problems of inaccuracy and unreality when the operation object use condition is recorded manually in the prior art, further improves the food safety supervision efficiency, and reduces the labor and material cost of enterprises.

It should be noted that the above-mentioned continuous motion recognition apparatus may also include other embodiments according to the description of the method embodiment. The specific implementation manner may refer to the description of the related method embodiment, and is not described in detail herein.

Correspondingly, the embodiment of the present specification also discloses an electronic device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method for identifying continuous actions in any of the above embodiments of the present specification are implemented.

Accordingly, the embodiments of the present specification also disclose a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method for recognizing continuous actions described in any of the above embodiments of the present specification.

The embodiments of the present description are not limited to what must be consistent with a standard data model/template or described in the embodiments of the present description. Certain industry standards, or implementations modified slightly from those described using custom modes or examples, may also achieve the same, equivalent, or similar, or other, contemplated implementations of the above-described examples. The embodiments using these modified or transformed data acquisition, storage, judgment, processing, etc. may still fall within the scope of the alternative embodiments of the present description.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A method for recognizing a continuous motion, the method comprising:

calculating similarity scores of the limb key point coordinates of the operator after the standardization processing and each standard action set, and judging that the operator executes corresponding actions in the standard action sets when the similarity scores are larger than a score threshold value;

wherein the normalizing the limb key point coordinates of the operator in all the frame images comprises:

calculating the projection length from the skeleton center to the surface line of the operation object trained in advance in each frame image, and obtaining a projection length set in all the frame images

；

And set of projection lengths

And carrying out length standardization and target standardization on the limb key point coordinates of the operator, wherein,

wherein the length is normalized by the formulaThe following:

wherein the content of the first and second substances,

，

the distance is a standard distance, and the distance is,

i.e. by

Is the set of projection lengths

Maximum value of (1);

the formula for the number normalization is as follows:

wherein the content of the first and second substances,int[x]is composed ofxThe number of the round-off points is obtained,

r represents the serial number of the key points of the limbs,

is as followsiCoordinate set of each limb key point in limb key point

The number of the (B) is less than the total number of the (B),

is as followsiIndividual limb key point in the standard movementThe number of the centralizers is that of the centralizers,

representing a standardized set of limb key points;

wherein, the calculating the similarity score between the limb key point coordinate of the operator after the normalization processing and each standard action set, and when the similarity score is greater than a score threshold, the judging that the operator performs the action corresponding to the standard action set includes:

Sliding, calculating the similarity score between the standardized limb key point set and each standard action set in each time window to obtain the similarity score result in each time window

where m represents the number of frames in a standard action set,

a set of standard actions is represented that are,

represents the average of the standard set of motion information,

a set of actions within a time window is represented,

，

wherein the content of the first and second substances,

、

、

respectively representing the coordinates of the key points of the wrist, the elbow and the shoulder of the operator

A linear equation which is expressed by the bone center of the operator and is vertical to the surface of the operation object, wherein k and b are expressed by the linear equation which is expressed by the bone center of the operator and is vertical to the surface of the operation objectParameters, equations

A linear equation representing the surface of the operation object,

and

parameters of a linear equation representing the surface of the operation object;

scoring the similarity in all time windows

And forming a vector and multiplying the vector by corresponding weight to obtain a total similarity score, judging that the operator executes the action corresponding to the standard action set when the total similarity score is greater than a score threshold value, and recording a time period when the total similarity score is greater than the score threshold value.

2. The method of claim 1, wherein the object is a disinfection cabinet, and the standard and medium actions include opening a door of the disinfection cabinet, closing the door of the disinfection cabinet, moving dishes into the disinfection cabinet, and moving dishes out of the disinfection cabinet.

3. The method for recognizing a continuous motion according to claim 1, wherein the screening out the operator image located in the work area of the operation object based on the operation object detection frame and the operator detection frame includes:

4. The method of continuous motion recognition as claimed in claim 1, wherein after recording a time period in which the overall similarity score is greater than a score threshold, the method further comprises:

5. The method for recognizing continuous actions according to claim 1, wherein the step of smoothing the coordinates of the key points of the limbs of the operator in each frame of image and performing a mean calculation to obtain the bone center of the operator comprises:

6. A continuous motion recognition device, the device comprising:

the similarity module is used for calculating similarity scores of the limb key point coordinates of the operator after the standardization processing and each standard action set, and judging that the operator executes corresponding actions in the standard action sets when the similarity scores are larger than a score threshold;

the standardization module is used for standardizing the limb key point coordinates of the operator in all the frame images, and comprises the following steps:

；

And set of projection lengths

wherein the length normalization is as follows:

wherein the content of the first and second substances,

，

the distance is a standard distance, and the distance is,

i.e. by

Is the set of projection lengths

Maximum value of (1);

the formula for the number normalization is as follows:

r represents the serial number of the key points of the limbs,

first, theiCoordinate set of each limb key point in limb key point

The number of the (B) is less than the total number of the (B),

is as followsiThe number of key points of each limb in the standard set of actions,

representing a standardized set of limb key points;

the similarity module is configured to calculate similarity scores between the limb key point coordinates of the operator after the normalization processing and each standard action set, and when the similarity score is greater than a score threshold, it is determined that the operator has performed an action corresponding to the standard action set, where the similarity score is greater than a score threshold, and the determining includes:

where m represents the number of frames in a standard action set,

a set of standard actions is represented that are,

represents the average of the standard set of motion information,

a set of actions within a time window is represented,

，

wherein the content of the first and second substances,

、

、

A linear equation representing the center of the bone of the operator and perpendicular to the surface of the object, wherein k and b represent parameters of the linear equation representing the center of the bone of the operator and perpendicular to the surface of the object

A linear equation representing the surface of the operation object,

and

scoring the similarity in all time windows

7. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program stored in the memory, and when the computer program is executed, implementing the method for recognizing a continuous motion according to any one of claims 1 to 5.

8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of continuous action recognition according to any one of the preceding claims 1 to 5.