CN111104816A

CN111104816A - Target object posture recognition method and device and camera

Info

Publication number: CN111104816A
Application number: CN201811247103.2A
Authority: CN
Inventors: 吕瑞
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-10-25
Filing date: 2018-10-25
Publication date: 2020-05-05
Anticipated expiration: 2038-10-25
Also published as: CN111104816B

Abstract

The application discloses a method for recognizing the posture of a target object, which comprises the steps of obtaining a current video frame; detecting preset key points of a target object in a current video frame to obtain preset key point information of the target object in the current frame; judging whether the position change of preset key points of the current target object in the current frame and the previous frame meets a first preset posture condition or not according to the preset key point information, and/or judging whether the position relation between the preset key points of the current target object in the current frame meets a second preset posture condition or not; and if the preset posture condition is met, recognizing the current posture of the target object as a preset posture, wherein f is a preset natural number, and the preset posture condition is set according to the position characteristics among key points of the posture to be recognized of the target object. The method and the device can accurately identify the tiny change of the gesture, have wide application range, low requirement on the image in the video frame, high accuracy of the gesture identification and small false detection and missing detection of the gesture identification.

Description

Target object posture recognition method and device and camera

Technical Field

The invention relates to the field of image analysis, in particular to a method and a device for recognizing the posture of a target object and a camera.

Background

With the development of image acquisition and analysis techniques, video or image data based analysis is increasingly used, for example, gesture detection or recognition of a target object.

The existing method for realizing gesture detection or recognition of a target object through video or image analysis mainly comprises the following steps: analyzing a frame difference image of a current frame and a previous frame, obtaining target object pixel points with moving actions according to the frame difference image, taking a graph formed by the target object pixel points with the moving actions as a contour, and judging whether a specific gesture exists according to the change condition of the contour.

However, in the above method, the target object pixel points are extracted by using frame difference image analysis, and when the target motion changes slightly, the difference between the adjacent frame pixels is small, and at this time, the target object pixel points with moving motion may not be obtained from the frame difference image, so that target posture missing detection is easily caused, and the recognition accuracy is low.

Disclosure of Invention

The embodiment of the invention aims to provide a method and a device for recognizing the posture of a target object and a camera, so as to improve the accuracy of recognizing the posture of the target object in an image.

The invention provides a method for recognizing the posture of a target object, which comprises the following steps,

acquiring a current video frame;

detecting preset key points of a target object in a current video frame to obtain preset key point information of the target object in the current frame;

judging whether the position change of preset key points of the current target object in the current frame and the previous frame meets a first preset posture condition or not according to the preset key point information, and/or judging whether the position relation between the preset key points of the current target object in the current frame meets a second preset posture condition or not;

if the preset posture condition is met, recognizing the current posture of the target object as the preset posture,

and f is a preset natural number, and the preset posture condition is set according to the position characteristics among the key points of the posture to be recognized of the target object.

Preferably, the method further comprises the step of,

and inputting the current frame with the recognized current target object posture into the trained machine learning model, and if the machine learning model recognizes that the target object posture in the current frame is the preset posture, taking the preset posture as a recognition result.

Wherein, the current frame with the recognized current target object posture is input into the trained machine learning model, if the machine learning model recognizes that the target object posture in the current frame is the preset posture, the preset posture is taken as a recognition result, comprising,

collecting picture data containing the pose of the object,

calibrating a first target frame of the target object in the picture data, extracting a first target frame image in the picture data, making two classification samples of a recognition gesture and a non-recognition gesture,

inputting the two classification samples into a machine learning model, training the model, and storing the currently trained model;

and generating a second target frame of the current target object identified in the current frame based on the preset key points, extracting an image of the second target frame from the current frame, inputting the image of the second target frame into the trained model in real time for classification, and taking the classification result as an identification result if the machine learning model classifies the image of the second target frame into the identified posture.

Wherein, the obtaining of the preset key point information of the target object in the current frame further comprises,

judging whether the current frame comprises more than two target objects or not according to the obtained preset key point information, if so, tracking the target objects according to the preset key point information to obtain locked target objects; otherwise, taking the target object in the current frame as a locking target object;

the method further comprises the step of enabling the user to select the target,

and traversing the locked target object in the current frame, and recognizing the posture according to the preset key point information of the current locked target object until the posture recognition of all the target objects of the current frame is finished.

Preferably, the method further comprises the step of,

judging whether the position change of a preset key point of a locking target object in the current frame and the previous f frame is larger than a first displacement threshold value,

if yes, executing the step of judging whether the position change of the preset key point of the current target object in the current frame and the previous frame meets the first preset posture condition,

otherwise, executing the step of judging whether the position relation between the preset key points of the current target object in the current frame meets the second preset posture condition.

Wherein the first preset posture condition is a sitting posture condition,

the judging whether the position change of the preset key point of the current target object in the current frame and the previous frame meets the first preset posture condition comprises the following steps of,

and determining the longitudinal position change of the same preset human body key point of the current frame and the previous frame according to preset human body key point information, and determining whether the sitting posture starting condition is met or not according to the relative position change.

Wherein, the longitudinal position change of the same preset human body key point of the current frame and the previous frame is determined according to the preset human body key point information, and whether the sitting posture starting condition is met or not is determined according to the relative position change, comprising,

and determining the longitudinal position change of the left shoulder and right shoulder human key points in the current frame and the previous frame according to preset left shoulder and right shoulder human key point information, and determining whether the sitting-up posture condition is met according to the position change.

Wherein, the longitudinal position change of the human key points of the left shoulder and the right shoulder in the current frame and the previous frame f is determined according to the preset human key point information of the left shoulder and the right shoulder, and whether the sitting posture condition is met is determined according to the position change, comprising,

judging whether the sum of the displacement of the key point of the left shoulder in the current frame and the previous f frame and the displacement of the key point of the right shoulder in the current frame and the previous f frame is greater than a human body key point time sequence position relation judgment threshold value or not; if yes, identifying the suspected standing posture; if the value is less than the negative value of the judgment threshold, the suspected sitting posture is identified; if the judgment threshold is equal to the judgment threshold, identifying the gesture without action;

the human body key point time sequence position relation judgment threshold is in proportional relation with the distance between the two human body key points of the left shoulder part and the right shoulder part in the current frame.

collecting picture data including a standing posture and/or a sitting posture of the target human body,

calibrating a first target frame of the target human body in the picture data, extracting a first target frame image in the picture data, making two classification samples of the standing posture and the sitting posture of the human body,

the method comprises the steps of generating a second target frame based on preset key points in a current frame, extracting an image of the second target frame from the current frame, inputting the image of the second target frame to a trained model in real time for classification, identifying the image of the target frame with the suspected standing posture as the standing posture if a machine learning model classifies the image of the target frame with the suspected standing posture as the standing posture, and identifying the image of the target frame with the suspected sitting posture as the sitting posture if the image of the target frame with the suspected sitting posture is classified as the sitting posture.

Preferably, the method further comprises the step of,

judging whether the sitting posture of the target human body identified by the current frame continues to have M frames, if so, controlling a camera lens to capture a long shot;

judging whether the standing posture of the target human body identified by the current frame continues to have T frames, if so, counting whether the number of the target human bodies identified in the current frame in the standing posture is equal to 1, if so, controlling a camera lens to capture a close shot of the target human body, otherwise, controlling the camera lens to capture a long shot;

here, M, T is a preset natural number.

Preferably, the second preset posture condition is a blackboard-writing posture condition,

the judging whether the position relation between the preset key points of the current target object in the current frame meets the second preset posture condition comprises the following steps of,

and determining the relative position relation between the preset human key points according to the preset human key point information, and determining whether the blackboard-writing posture condition is met or not according to the relative position relation.

Wherein, the relative position relationship between the preset human key points is determined according to the preset human key point information, and whether the blackboard-writing posture condition is met or not is determined according to the relative position relationship, which comprises,

determining the relative position relation among the key point of the right wrist, the key point of the right elbow and the key point of the right shoulder according to the preset human key point information of the right wrist, the right elbow and the right shoulder, and determining whether the posture condition of the blackboard-writing is met according to the relative position relation.

Wherein,

the method comprises the steps of determining the relative position relationship among a right wrist key point, a right elbow key point and a right shoulder key point according to key point information of a human body, and determining whether the posture conditions of the blackboard-writing are met or not according to the relative position relationship, including,

judging whether the position of the right wrist key point is higher than that of the right elbow key point, whether the horizontal distance between the right wrist key point and the right elbow key point is smaller than a first distance threshold value, and whether the vertical distance between the right elbow key point and the right shoulder key point is smaller than a second distance threshold value;

if so, recognizing that the target human body has a blackboard-writing posture, otherwise, recognizing that the target human body is a non-blackboard-writing posture.

The embodiment of the invention provides a posture recognition device of a target object, which comprises,

the image acquisition module acquires a current video frame;

the key point detection module is used for detecting preset key points of a target object in a current video frame and acquiring preset key point information of the target object in the current frame;

the identification module is used for judging whether the position change of the preset key points of the current target object in the current frame and the previous frame meets a first preset posture condition or not according to the preset key point information and/or judging whether the position relation between the preset key points of the current target object in the current frame meets a second preset posture condition or not;

if the preset posture condition is met, recognizing the current posture of the target object as a preset posture;

and f is a natural number, and the preset posture condition is set according to the position characteristics among the key points of the target object to-be-recognized posture.

Preferably, the device further comprises a control unit,

and the detection classification module is used for inputting the current frame with the recognized current target object posture into the trained machine learning model, and if the machine learning model recognizes the target object posture in the current frame as the preset posture, the preset posture is used as a recognition result.

Wherein the detection classification module comprises a detection classification module,

the sample making unit is used for calibrating a first target frame of a target object in the picture data containing the target object gesture, extracting a first target frame image in the picture data, making two classification samples of a recognition gesture and a non-recognition gesture, and inputting the two classification samples into the machine learning model unit;

the machine learning model unit is used for training based on the input two-classification samples and storing the currently trained model; classifying a second target frame image input in real time through a trained model, wherein the second target frame image is a second target frame of a current target object identified in a current frame generated based on a preset key point and an image in the target frame extracted from the current frame.

Wherein, the device also comprises a control device,

and the target tracking module is used for determining the number of the target objects included in the current frame according to the obtained preset key point information, tracking the target objects according to the preset key point information when the number of the target objects is more than or equal to 2 to obtain locked target objects, and taking the target objects in the current frame as the locked target objects when the number of the target objects is equal to 1.

Preferably, the device further comprises a control unit,

the frame position identification module of the preset key point judges whether the position change of the preset key point of the locking target object in the current frame and the previous frame is larger than a first displacement threshold value or not; when the position change is larger than a first displacement threshold, determining whether the position change meets a first preset posture condition; and when the position change is not larger than the first displacement threshold, determining whether the position relation between preset key points of the current target object meets a second preset posture condition.

Wherein the first preset posture condition is a sitting posture condition, the recognition module comprises,

and the first identification unit is used for determining the longitudinal position change of the same preset human body key point of the current frame and the previous frame according to the preset human body key point information and determining whether the starting sitting posture condition is met or not according to the relative position change.

Wherein the first identification unit comprises a first identification unit,

the first calculating subunit calculates the sum of the displacement of the key point of the left shoulder in the current frame and the previous f frame and the displacement of the key point of the right shoulder in the current frame and the previous f frame; calculating the ratio of the human body key point time sequence position relation judgment threshold value to the distance between the left and right shoulder two human body key points in the current frame,

the first comparison subunit compares the sum of the displacement of the key point of the left shoulder in the current frame and the previous f frame and the displacement of the key point of the right shoulder in the current frame and the previous f frame with the judgment threshold value of the time sequence position relationship of the key points of the human body; if yes, identifying the suspected standing posture; if the value is less than the negative value of the judgment threshold, the suspected sitting posture is identified; and if the judgment threshold is equal to the judgment threshold, identifying the gesture without action.

Preferably, the device further comprises a control unit,

the camera shooting control module is used for controlling a camera lens to capture a long shot when the sitting posture of the target human body identified by the current frame continues to have M frames; when the standing posture of the target human body identified by the current frame is continuously provided with T frames, if the number of the standing posture target human bodies identified in the current frame is counted to be equal to 1, the camera lens is controlled to capture the close shot of the target human body, and if the number of the standing posture target human bodies identified in the current frame is counted to be more than 1, the camera lens is controlled to capture the far shot; here, M, T is a preset natural number.

Wherein the second preset posture condition is a blackboard-writing posture condition, the recognition module comprises,

and the second identification unit is used for determining the relative position relation between the preset human key points according to the preset human key point information and determining whether the blackboard-writing posture condition is met or not according to the relative position relation.

Wherein the second identification unit comprises,

the second calculating subunit calculates the horizontal distance between the right wrist key point and the right elbow key point and the vertical distance between the right elbow key point and the right shoulder key point;

a second comparison subunit, configured to compare whether the position of the right wrist key point is higher than the position of the right elbow key point, and whether the calculated horizontal distance is smaller than the first distance threshold, and whether the calculated vertical distance is smaller than the second distance threshold; if so, recognizing that the target human body has a blackboard-writing posture, otherwise, recognizing that the target human body is a non-blackboard-writing posture.

The embodiment of the invention provides a camera device, which comprises a camera, a memory and a processor, wherein,

the camera is used for shooting images;

the memory is used for storing a computer program;

the processor is used for executing the program stored in the memory and the target object posture identification method.

The gesture recognition method of the target object provided by the embodiment of the invention is used for recognizing the gesture by detecting the preset key points of the target object, and changing the positions and/or the position relations of the preset key points based on the position characteristics among the key points of the gesture to be recognized of the target object. The method and the device can accurately identify the tiny change of the gesture, have no specific requirements on the target object and the gesture, have wide application range, have low requirements on the image in the video frame, have high accuracy of the gesture identification, and have small false detection and missing detection of the gesture identification.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of the present invention for realizing gesture recognition of an object.

Fig. 2 is a schematic flow chart illustrating the process of identifying the target object in the acquired video frame according to the present invention.

Fig. 3 is a schematic diagram of key points of a human body according to an embodiment.

Fig. 4 is a flow chart of a sitting posture recognition method based on a target human body.

Fig. 5 is a schematic flow chart of a process for controlling a recording apparatus based on the sitting posture recognition of a plurality of target human bodies in a classroom recording scene.

Fig. 6 is a schematic view of the camera mounting position in the second embodiment of the present invention.

Fig. 7 is a flowchart illustrating a blackboard writing behavior recognition method.

Fig. 8 is a schematic flow chart of a method for recognizing a blackboard-writing gesture including multiple targets in a video.

FIG. 9 is a schematic diagram of an apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For the purpose of making the objects, technical means and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings.

With the rapid development of the fields of cloud computing, big data, artificial intelligence and the like, intelligent products are widely applied in various industries. The intelligent video analysis is one of intelligent products and mainly comprises target object key point detection, target tracking, key point posture detection and the like, wherein the key point detection is to detect the positions of all key points in an image by utilizing the information of an image frame and give the coordinates of each key point and the connection relation between each key point; the key point tracking is to generate a target detection frame according to the detection result of the key point and track the detection frame. And the key point posture detection is to recognize the posture according to the space-time relation of the tracking target key point.

The method comprises the steps of detecting key points of a target object, judging the detected preset target object key points according to the position relation among the preset key points and/or the position change of the same key point in the preset key points on different time sequences by combining the position relation among the key points of the posture to be recognized and/or the position relation of the key points on different time sequences, recognizing the posture of the target object to be recognized, and further detecting and classifying the recognized posture through a trained deep learning algorithm model in order to avoid misjudgment and missing detection.

Referring to fig. 1, fig. 1 is a schematic flow chart of the present invention for realizing gesture recognition of an object. The schematic diagram shows the basic flow of the technical scheme of the invention.

Acquiring a current video frame;

step 101, detecting a preset key point of a target object in a current video frame, and obtaining preset key point information of the target object in the current frame;

step 102, judging whether the position change of preset key points of a current target object in a current frame and a previous frame meets a first preset posture condition or not according to preset key point information, and/or judging whether the position relation between the preset key points of the current target object in the current frame meets a second preset posture condition or not;

and if the preset posture condition is met, recognizing the current target object posture as the preset posture, and if the preset posture condition is not met, recognizing the current target object posture as the preset posture.

The preset posture condition is set according to the position characteristics among the key points of the posture to be recognized of the target object, for example, set based on the motion trajectory rule of the key points of the posture to be recognized.

Referring to fig. 2, fig. 2 is a schematic flow chart illustrating the identification of the target object in the acquired video frame according to the present invention.

Step 201, after acquiring a current video frame, recording a current frame number f;

step 202, scaling the current video frame according to a fixed size to obtain a map frame with a consistent size;

step 203, detecting preset key points of the target object to obtain and store information of each preset key point of each target object in the current frame;

step 204, judging whether more than two target objects exist in the current frame according to the obtained preset key point information, namely, obtaining the number of the target objects contained in the current frame; if a plurality of objects are included in the current frame, step 205 is executed, to obtain a locked object through object tracking, so that the target object corresponding to the previous video frame in the target objects corresponding to the current video frame is the same target object, if there is only one object in the current frame, step 206 is performed to take the object in the current frame as the locking object, or, in this step, preferably, the target of the current frame is also tracked to avoid misjudgment after the target is changed between frames, for example, when the first object leaves in the last frame or the previous f frames and the second object enters the video in the current frame, the target tracking can ensure that the target human body corresponding to the previous video frame in the target human body corresponding to the current video frame is the target of the same target, and is favorable for improving the accuracy of gesture recognition;

step 207, determining whether the position change of the preset key points of the locking target object in the current frame and the previous frame is larger than a first displacement threshold value to determine whether the position of the same key point in the preset key points on different time sequences (intervals) is changed,

if the position is changed, step 208 is executed, which indicates that the posture to be recognized has inter-frame position change, and may be considered as mainly dynamic motion, and the posture of the locking target object in the current frame is pre-determined according to whether the displacement of the same preset key point between frames meets the first preset posture condition during posture recognition;

if the change of the preset key point position of the locking target object in the current frame and the previous f frame is less than or equal to the first displacement threshold, executing step 209, which shows that the change of the position between the frames to be identified is not obvious, and can be regarded as static, and pre-judging the posture of the locking target object in the current frame according to whether the position relation between the preset key points in the frames meets a second preset posture condition;

and step 210, in order to further improve the accuracy of gesture recognition, further detecting and classifying the suspected gesture of the target object predicted in step 208 or step 209 through a trained depth model to obtain a final recognition result of the current locked target in the current frame.

Step 211, judging whether the current frame has unidentified locking target objects, if yes, identifying the next locking target object until all the postures of the locking target objects are identified; otherwise, the target object in the current frame is processed, and the next frame is processed continuously.

Through the steps, the recognition process under the condition that the current frame contains a plurality of target objects is achieved, and different judgment strategies are adopted for recognizing the preset key points based on the characteristics of the gesture to be recognized.

In practical applications, gesture recognition of a human body as a target object has wide application requirements. The following description will be given taking gesture recognition of a human body as an example.

When the target object is a human body and the acquired image is a video, human body parts related to gesture recognition can be marked as key points based on the human body. As shown in fig. 3, fig. 3 is a schematic diagram of human body keypoints according to an embodiment, where the index of the head keypoint is 0, the index of the neck keypoint is 1, …, and so on, and the number in the diagram represents the index of the part keypoint. When the gesture to be recognized is a standing and/or sitting gesture, the head, the neck, the left and right shoulders, the left and right buttocks, the left and right elbows, the left and right wrists and the chest can be set as key points because the main change part related to the gesture is the upper half of the human body; for another example, the gesture to be recognized is a writing board gesture, and since the writing board usually drives the right hand to write through the right arm, the right wrist, the right arm, and the right shoulder can be set as key points; as another example, the gesture to be recognized is a certain motion gesture such as a sit-up, a flat support, etc., and the key point may be set in association with a part of the human body involved in the motion gesture.

The first embodiment is as follows: the following will take recognition of the sitting posture of the human body as an example.

Referring to fig. 4, fig. 4 is a flow chart illustrating a sitting posture recognition method based on a target human body.

Step 401, starting from the collection of a frame of image, recording the frame number f of the current frame;

step 402, preprocessing the image: the acquired images are zoomed according to the fixed size w multiplied by h to obtain the images with the same size, so that the difference of the sizes of the images caused by the difference of the camera devices is avoided, and the analysis accuracy of the images in the subsequent steps is improved.

Step 403, detecting human body key points according to the preset human body key points of the human body part related to the gesture to be recognized, and acquiring and storing each preset human body key point information J in the frame image_n，fFor example, coordinate information of key points of human body, denoted as J_n，f·x,J_n，fY, where n is the human body key point index number, f is the current frame number, x is the abscissa and y is the ordinate, thus J_n，fX denotes the abscissa of the human key point n in the f-th frame, J_n，fY represents the ordinate of the human keypoint n in the f-th frame.

In the step, analyzing the current video frame by using a preset machine learning algorithm to obtain a plurality of preset human body key point heat maps corresponding to the current video frame; the preset machine learning algorithm is obtained by training sample video frames marked with human key points and sample human key point heat maps corresponding to the sample video frames; any human body key point heat map comprises a part identification and a human body key point heat value of each pixel point of the current video frame corresponding to the part identification; determining human key point information of a target human body corresponding to the current video frame according to the multiple human key point heat maps; wherein,

the human key point marking can mark the position of the human key point in the picture through manual marking. In this embodiment, for the convenience of the need in the recognition of other gestures, the labeled human body key points include 15 human body key points such as head, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, chest, right hip, right knee, right ankle, left hip, left knee, left ankle, etc.

In the machine learning algorithm, the present embodiment adopts an ypn (yolo Pyramid Networks) model, which is obtained by combining and optimizing the network structure design of the open source target detection network framework YOLOv2 and the Feature Pyramid Network (FPN). The YOLOv2 network structure is used to quickly extract target features with a reduced amount of computation. Compared with a convolutional neural network structure adopted by OpenPose, the YPN has smaller calculation amount, and can realize real-time detection of human key points without losing precision. The FPN network structure is beneficial to improving the multi-scale adaptability of the characteristics and ensuring the detection performance of the network model.

Step 404, considering that in different video frames, the position relationship between the preset human body key points related to the sitting posture does not change, but the same human body key point of the same target changes in different video frames, that is, the position relationship between the human body key points in the frame is relatively inconvenient, and the positions of the same human body key points between frames are different, in order to capture the target human body better, preferably, a preset tracking algorithm is used to select the target human body corresponding to the previous video frame in the target human body corresponding to the current video frame as the target human body of the same target, and the target human body is taken as the captured action target human body;

the preset tracking algorithm may be a Multi-target tracking algorithm CMOT (Robust Online Multi-object tracking Based on tracker configuration) or other tracking algorithms, and the embodiment of the present invention is not particularly limited.

Specifically, a preset tracking algorithm is used to determine whether a first preset graph corresponding to a preset human body key point in a target human body corresponding to the current video frame and a second preset graph corresponding to the preset human body key point in the target human body corresponding to the previous video frame meet a preset overlapping condition.

The preset human body key points can be a plurality of human body key points which can form a target human body outline with stronger identification degree. For example, the preset human body key points may include all key points of the upper half of the human body, or may be a human body key point marked with a head identifier, a human body key point marked with a left shoulder identifier, and a human body key point marked with a right shoulder identifier. The target human body can be better identified by the target human body contour formed by the three key points. Those skilled in the art may also select other human body key points as preset human body key points according to actual situations, and the embodiment of the present invention is not limited specifically.

And if the first preset graph and the second preset graph meet the preset overlapping condition, taking the target human body corresponding to the human body key point corresponding to the preset graph meeting the preset overlapping condition in the previous video frame in the current video frame as the target human body of the same target corresponding to the target human body corresponding to the previous video frame.

In practical applications, this step is not necessary when the accuracy of human body keypoint detection is sufficient.

Step 405, because the sitting-up posture is represented on the image that the position relationship between the key points of the human body in the frame is kept relatively inconvenient, and the positions of the key points of the same human body in the frame are different, on the basis, the key point information of the human body of the target human body in the current frame needs to be compared with the key point information of the same human body in the previous f frame of the current frame, so as to judge whether the key point information of the human body in the previous f frame of the current frame exists, if so, step 407 can be executed, otherwise, the collected frame amount is insufficient, and the accuracy of posture identification can be influenced, step 406 is executed, and the next frame image is collected;

step 407, pre-judging the sitting posture of the target human body based on the target human body obtained by human body tracking:

after the target human body is locked through human body tracking, the preset human body key point coordinates in the current video frame of the target human body are determined and compared with the human body key point coordinates of the same target human body in the previous f frames, and the posture of the target human body is judged in advance.

Based on the fact that the displacement change of the shoulders and the width of the shoulders of the body have a certain proportional relation when a human body sits up, the specific expression is that the longitudinal coordinates of the shoulder key points corresponding to the standing posture and the sitting posture on the image are changed, and the longitudinal coordinates of the same human body key point of the current frame and the previous f frames are changed in time sequence, so that the longitudinal coordinates of the left and right shoulder key points of the current frame and the previous f frames are preferably calculated, the position relation of the left and right shoulder key points on the current frame is compared, and whether a suspected sitting-up posture exists is judged by comparing the longitudinal coordinate change with the position relation, wherein the specific formula is as follows:

d＝|J_2，1·x-J_5，1·x|×α

wherein, in conjunction with the designation of FIG. 3, J_2，1·x、J_2，1·y、J_5，1·x、J_5，1Y represents the abscissa and ordinate of the left shoulder and the abscissa and the ordinate of the right shoulder of the current frame, respectively. J. the design is a square_2，f·y、J_5，fY represents the ordinate of the left shoulder and the right shoulder of the f frames before the target student, d represents the human body key point time sequence position relation judgment threshold, and the value is the product of the shoulder width of the target student and the proportionality coefficient α.

In rectangular coordinates of the image field, the default coordinate origin is usually located at the upper left corner of the image, the x coordinate axis is positive from left to right, and the y coordinate axis is positive from top to bottom. According to the formula:

if the sum of the displacements of the left shoulder and the right shoulder is greater than the judgment threshold, the displacement direction of the human body key point of the current frame relative to the human body key point corresponding to the previous frame f is the positive direction along the y coordinate axis, and then W is set to be 1, and the suspected sitting posture is recorded;

if the sum of the displacements of the left shoulder and the right shoulder is smaller than a negative judgment threshold, the displacement direction of the current frame human key point relative to the human key point corresponding to the previous f frame is the negative direction of the y coordinate axis, and if w is 2, the suspected standing posture is recorded;

if the sum of the displacements of the left shoulder and the right shoulder is equal to the judgment threshold, which indicates that the displacement direction of the current frame human key point relative to the human key point corresponding to the previous frame f is very limited, w is made to be 0, and the record is that no action is made, and in the specific implementation, the value of the relevant parameter is α to be 0.8.

The probability of being able to identify the sitting up and down posture in the above manner is around 80-90%.

Because the moving trajectory of the shoulder is similar to the moving trajectory of the standing posture when the chest leans forward and sits up, sits up and leans back, and the like, the posture is misjudged, and preferably, the classification network algorithm is combined to classify and calibrate the suspected judgment result, so as to improve the accuracy of the sitting posture identification, which is specifically the step 408.

Step 408, for the targets judged to be suspected sitting and standing postures, adopting a machine learning algorithm to carry out sitting detection classification:

firstly, collecting picture data containing a sitting posture, preferably, screening the picture data containing the sitting posture and the standing posture from a current collected image frame set to serve as sample data for training, and calibrating first target frames of all sitting targets and the standing targets in the sample data, wherein the sitting posture and the standing posture mainly relate to key points of the upper half body of a human body, and preferably, the half body frames of the sitting targets and the standing targets can be calibrated; and performing external expansion on the basis of the first target frame in the calibration data, extracting the image of the first target frame in each image data, and making two classification samples of the sitting posture and the standing posture. Preferably, the calibrated target frame is a regular graph to conveniently and simply set the outward expansion ratio, for example, the first target frame is a rectangle, and the width of the left and right outward expansion target frames is set to be 0.2, the height of the upper outward expansion target frame is set to be 0.1, and the height of the lower outward expansion target frame is set to be 0.5.

And inputting the two classification samples into a CNN network model of the convolutional neural network, training the model, and storing the current trained CNN network model after training.

The acquisition of the sample data and the training of the CNN network model are independent of the process, and may be a task processed in parallel with the process or a task performed in advance on the basis of the sample data.

When a suspected sitting posture or standing posture in the current frame is identified, a second target frame is generated based on key points of a human body, the second target frame is expanded by a certain proportion, the second target frame which is judged to be suspected to be sitting is subjected to image matting, namely, an image of the second target frame is extracted from the current frame, and then the extracted image is sent to a CNN classification network for classification in real time. If the target frame image suspected to be standing up is classified as standing up, a standing up posture is recognized. If the target frame image suspected of sitting is classified as sitting, the sitting posture is recognized.

The shapes of the first target frame and the second target frame can be the same or different; the second target frame may be generated based on all key points of the target human body, or may be generated based on preset key points related to the gesture to be recognized.

According to the method for recognizing the sitting posture of the single human body based on the image, the suspected posture is further recognized through tracking of key points of the human body, track change of the key points of the human body among multiple frames and training of a deep learning model of the suspected posture, the accuracy of posture recognition is improved, and missing detection or false detection is reduced.

Example two: the following will take the recognition of the sitting-up posture of the student based on the classroom video resources of distance education and taking the recording of classroom video as an application scene as an example.

In the educational and teaching field, the appearance of distance education, wisdom classroom, classroom video monitoring and the like makes modern education more convenient and efficient. Currently, there are two ways to acquire course video resources in distance education. One is to record by manual shooting, and the other is to control the shooting equipment to record automatically by a method based on traditional image processing. However, the former method, while reliable, is costly; the latter approach, while less costly, is unreliable. In the following embodiments, based on the recognition result of the sitting posture, switching and adjustment of the distant view and close view shooting of the recording apparatus are controlled, and monitoring in a classroom is realized.

Referring to fig. 5, fig. 5 is a schematic flow chart illustrating a process of controlling a recording apparatus based on the sitting posture recognition of a plurality of target human bodies in a classroom recording scene.

Step 500, a current video frame is acquired.

Before recording video, a camera may be installed in the classroom, which may be installed on top of the side of the classroom desk, with a camera view angle covering at least all desk areas in the classroom, e.g., as shown in fig. 6, a panoramic camera is installed in a fixed position on the side of the desk, with a camera view angle covering the desk and all desk areas. In the embodiment of the invention, the camera can be started manually, and can also be started automatically according to the preset starting time. Specifically, a preset lecture time for a teacher to lecture may be used as the preset on time.

After the camera is started, the camera of the camera can shoot images, and a current video frame is obtained from the shot images, so that whether the sitting posture exists in the target human body in the current video frame is judged.

Step 501, recording a current frame number f;

step 502, preprocessing the image: the acquired images are zoomed according to the fixed size w multiplied by h to obtain the images with the same size, so that the difference of the sizes of the images caused by the difference of the camera devices is avoided, and the analysis accuracy of the images in the subsequent steps is improved.

Step 503, detecting key points of the human body:

first a dataset of student scenes is collected and then the position of each body key point for each target body (student) in the picture is noted. To facilitate other applications of student behavior and gesture recognition, such as monitoring for prolonged head-down gestures, etc., preferably, the labeled human body keypoints include 15 human body keypoints of head, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, chest, right hip, right knee, right ankle, left hip, left knee, left ankle, etc. In order to reduce the amount of data to be labeled, only key points of the upper half of the body, i.e., head, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, chest, right hip, left hip, may be labeled for the recognition of the sitting-up posture.

One of the implementation modes can be that after the labeled data of the key points of the human body are obtained, the labeled data are used as training samples of the YPN network model to train the YPN network model, and the trained YPN network model is stored; inputting the collected current video frame into the trained YPN network model, and extracting human body key features from the current video frame through the model, thereby realizing the real-time generation of human body key point information and obtaining each human body key point information J of each human body (student) target in the frame image_k，n，fFor example, the coordinate information of the human key point n of the target human body (student) k in the f-th frame is denoted as J_k，n，f·x,J_k，n，fY, where n is the human body key point index number, f is the current frame number, x is the abscissa and y is the ordinate, thus J_k，n，fX denotes the abscissa of the human body key point n of student k in the f-th frame, J_k，n，fY represents the ordinate of the human keypoint n of student k in frame f.

Step 504, when the video frame recorded by the camera includes a plurality of target human bodies, in order to make the current video frame and the previous video frame perform sitting posture identification on the same target human body, so that the identification process can continuously aim at the same person, the step reuses a preset tracking algorithm, and selects a target human body, which is the same target as the target human body corresponding to the previous video frame, in the target human body corresponding to the current video frame as a captured action target human body;

unlike the embodiment, which only has one target, a plurality of target human bodies are usually involved in the classroom video, so that when tracking the target human bodies, each target human body in the current video frame needs to be traversed.

In particular to a method for preparing a high-performance nano-silver alloy,

step 504a, a preset tracking algorithm is used to determine whether a first preset graph corresponding to a plurality of preset human body key points in a target human body corresponding to a current video frame and a second preset graph corresponding to a plurality of preset human body key points in a target human body corresponding to a previous video frame satisfy a preset overlapping condition. And if the first preset graph and the second preset graph meet the preset overlapping condition, taking the target human body corresponding to the first preset graph meeting the preset overlapping condition as the target human body which is the same target as the target human body corresponding to the last video frame, namely, taking the target human body corresponding to the first preset graph meeting the preset overlapping condition as a locking target.

And 504b, traversing the human body key point coordinates of each target human body, calculating to obtain preset graphs corresponding to a plurality of preset human body key points of each target, and then returning to execute the step 504a until the tracking of all target human bodies in the current video frame is completed.

The preset patterns corresponding to the plurality of preset human body key points can be patterns formed after connecting the plurality of preset human body key points, and can also be external multi-deformation of the plurality of preset human body key points, such as external quadrangles, pentagons and the like. The embodiment of the present invention does not specifically limit the specific shape of the preset pattern. Preferably, the preset graph is an upper body outline box, and the preset algorithm may be a CMOT algorithm.

Step 505, judging whether a previous f frame including a current frame exists, if so, executing step 506, otherwise, indicating that the collected frame quantity is insufficient and the accuracy of gesture recognition is influenced, executing step 507, and collecting a next frame image;

step 506, traversing the human body key point coordinates of all the locking targets, and pre-judging the sitting posture of each locking target:

in this step, different from the first embodiment, the classroom video usually includes a plurality of target human bodies, so that after a plurality of locked targets are obtained through human body tracking, each locked target needs to be traversed to obtain the human body key point coordinates of the current video frame of the locked target, and the human body key point coordinates of the same target human body in the previous f frames are compared to predict the posture of the locked target.

In particular to a method for preparing a high-performance nano-silver alloy,

step 506a, calculating changes of longitudinal coordinates of two key points of the left and right shoulders of a current frame and a previous F frame of a locking target k (e.g., student k), and a position relationship of the two key points of the left and right shoulders of the locking target k on a current F frame time sequence, and comparing the changes of the longitudinal coordinates with the position relationship to determine whether a suspected sitting posture exists, wherein a specific formula is as follows:

d_k＝|J_k，2，1·-J_k，5，1·x|×α

in the formula, J_k，2，1·x、J_k，5，1·y、J_k，2，1·x、J_k，5，1Y represents the abscissa and ordinate of the right shoulder and the abscissa and the ordinate of the left shoulder of the current frame of the locked target k, respectively. J. the design is a square_k，2，f·y、J_k，５，fY represents the ordinate of the right shoulder and the left shoulder of the f frames before the target k is locked, d_kAnd the time sequence position relation judgment threshold value of the key point of the human body representing the locking target k takes the value of the product of the shoulder width of the locking target k and the proportionality coefficient α.

According to the formula:

if the sum of the displacements of the left shoulder and the right shoulder is greater than the judgment threshold, the displacement direction of the human body key point of the current frame relative to the human body key point corresponding to the previous frame f is the positive direction along the y coordinate axis, and then W is set to be 1, and the locking target k is recorded as the suspected sitting posture;

if the sum of the displacements of the left shoulder and the right shoulder is smaller than a negative judgment threshold, the fact that the displacement direction of the current frame human key point relative to the human key point corresponding to the previous frame f is the negative direction of the y coordinate axis is shown, when w is 2, the locking target k is recorded as a suspected standing posture;

if the sum of the displacements of the left shoulder and the right shoulder is equal to the judgment threshold, which indicates that the displacement direction of the current frame human body key point relative to the human body key point corresponding to the previous frame f is very limited, w is made to be 0, and the locking target k is recorded as suspected not to act.

And step 506b, traversing the next locking target, and returning to the step 506a until all the locking targets in the traversed current video frame finish the prejudgment of the sitting posture.

Step 508, for multiple targets judged to be suspected sitting and standing postures, a machine learning algorithm is adopted to perform sitting detection classification:

firstly, collecting picture data with a student scene, preferably, screening picture data containing sitting postures and/or standing postures of a plurality of targets (students) from a currently collected video frame set to serve as sample data for training, calibrating first target frames of the sitting targets and the standing targets of all the students in the sample data, and preferably, calibrating the half frames of the sitting targets and the standing targets as the sitting postures and the standing postures mainly relate to key points of the upper half body of a human body; and carrying out external expansion on the basis of the first target frame in the calibration data, extracting the first target frame image of each student in each video frame, and making two classification samples of the sitting posture and the standing posture.

The acquisition of the sample data and the training of the CNN network model are independent of the process, and may be a process of parallel processing with the process, or a process performed in advance on the basis of sample data.

When a plurality of suspected sitting or standing posture targets in the current frame are identified, a second target frame is generated for each suspected posture target based on human body key points, the second target frame is expanded to a certain proportion, then each suspected posture target is subjected to image matting, namely, the second target frame image is extracted from the current frame, and then each extracted image is sent to a CNN classification network in real time for classification. If the target frame image suspected to be standing up is classified as standing up, a standing up posture is recognized. If the target frame image suspected of sitting is classified as sitting, the sitting posture is recognized. And repeating the steps, and traversing all the suspected posture targets in the current frame until all the suspected posture targets finish the sitting-up detection classification.

Step 509, for the sitting posture target recognized in step 508, traversing the previous M frames of the current frame to determine whether there is a sitting posture of the target, that is, determining whether there is M frames continuously in the sitting posture of the target, if yes, indicating that the posture of the target is not changed currently, triggering the camera lens to restore, so as to control the camera lens to capture a long shot, otherwise, processing the next video frame;

step 510, for the standing posture target identified in step 508, go through the previous T frame of the current frame to determine whether there is a standing posture of the target, i.e., determine whether there is a T frame in the standing posture of the target,

if so, indicating that the posture of the target is not changed currently, further counting whether the number of the standing posture targets identified in the current frame is equal to 1, if so, indicating that only 1 person in the current frame is in a standing state, triggering the camera lens to stretch to control the camera lens to capture the close shot of the target, otherwise, indicating that at least more than two persons in the current frame are in a standing state, triggering the camera lens to restore to control the camera lens to capture the far shot,

if the object's pose does not last T frames, the next video frame is processed.

The T, M is a natural number and can be set as required.

In the second embodiment, the head and the shoulders of each student are accurately positioned through human body key point detection, so that the standing posture detection of each student can be effectively carried out, meanwhile, the actions which are easy to cause false alarm, such as raising hands, raising heads, sitting up and standing down, sitting up and lying back, can be distinguished, and finally, a more accurate standing snapshot process is completed, so that the action actions of the students are tracked and shot, the students are automatically positioned, tracked, pre-judged and identified in key action, and the purpose that when the students stand up, a camera is focused on the body of the students, and the expressions and the limb actions of the students when the students stand up are clearly shot is realized; when the student sits down, the camera resumes panoramic photography, reproducing and recording the course of classroom teaching more realistically. A further application may also be that, when the suspected standing and sitting posture is identified in step 506, an alarm target of the suspected posture is output to be input to the intelligent video analysis system for corresponding analysis or alarm output.

Example three: the method is based on classroom video resources of distance education, and recognition of blackboard-writing postures of only one speaker is taken as an application scene by recording classroom videos.

When course video recording is performed, the stretching multiple of the camera is usually required to be adjusted according to whether the blackboard-writing behavior of the speaker exists, so that the blackboard-writing content of the speaker can be shot.

A known method for automatically recognizing writing behavior on a blackboard includes: analyzing a frame difference image of a current video frame and a previous video frame, obtaining human body pixel points with motion actions according to the frame difference image, taking a graph formed by the human body pixel points with the motion actions as a contour, and judging whether a blackboard-writing behavior exists according to the change condition of the contour.

However, in the method, the target human body pixel points are extracted by adopting frame difference image analysis, when the target action changes slightly, the difference between the adjacent frame pixels is very small, and at this time, the human body pixel points with the movement action may not be obtained from the frame difference image, so that the target human body is easy to miss detection, and the detection accuracy of the blackboard writing behavior is low.

Through analysis of the behavior and the posture of the blackboard-writing, the key points of the human body related to the behavior and the posture of the blackboard-writing are mainly the right wrist, the right elbow and the right shoulder, and based on the analysis, the posture of the blackboard-writing is recognized through detection, tracking and position relation change of the key points of the human body in the video frame.

Referring to fig. 7, fig. 7 is a schematic flow chart of a blackboard writing behavior recognition method.

Step 700, a current video frame is obtained.

Before recording a video, a camera can be installed in a classroom in which a speaker gives a lecture, the camera can be installed at the top of the classroom, the camera of the camera can be aligned to the position of a blackboard area, a person skilled in the art can also set the vertical distance between the camera and the blackboard according to the actual situation, and the vertical specific selection is specifically related to the pixels of the camera, the video recording quality requirement and the like. For example, any one of 3 to 6 meters may be selected as the vertical distance between the camera and the blackboard. The embodiment of the present invention does not specifically limit the vertical distance. In the embodiment of the invention, the camera can be started manually, and can also be started automatically according to the preset starting time. Specifically, a preset lecture time may be used as the preset on time.

After the camera is started, the camera of the camera can shoot images, and a current video frame is obtained from the shot images, so that whether the target human body in the current video frame has the blackboard-writing behavior or not is judged.

Step 701, preprocessing an image: the acquired images are zoomed according to the fixed size w multiplied by h to obtain the images with the same size, so that the difference of the sizes of the images caused by the difference of the camera devices is avoided, and the analysis accuracy of the images in the subsequent steps is improved.

Step 702, correlating according to the gesture to be recognizedHuman body key points preset on human body parts are detected, and the human body key point information J in the current frame is obtained and stored_n，

Similar to step 403 in the first embodiment, in this step, the current video frame is analyzed by using a preset machine learning algorithm, so as to obtain a plurality of human body key point heat maps corresponding to the current video frame; the preset machine learning algorithm is obtained by training sample video frames marked with human key points and sample human key point heat maps corresponding to the sample video frames; any human body key point heat map comprises a part identification and a human body key point heat value of each pixel point of the current video frame corresponding to the part identification; determining human key point information of a target human body corresponding to the current video frame according to the multiple human key point heat maps; wherein,

the human key point marking can mark the position of the human key point in the picture through manual marking. In this embodiment, the labeled key points of the human body are the right wrist, the right elbow, and the right shoulder.

Slightly different from step 403 in the first embodiment, in this step, since the position change of the human key points related to the writing posture between the preset human key points is obvious, and the position change of the same preset human key point between the video frames is not obvious, preferably, the coordinates of the preset human key points in the current frame are detected.

Step 703, determining the relative position relationship between the preset human body key points according to the preset human body key point information obtained and stored in step 702, and pre-judging whether the blackboard-writing behavior exists according to the relative position relationship:

in the embodiment of the invention, whether the target human body has the blackboard-writing behavior can be judged by judging whether the target human body has the preset action. Whether the preset action exists or not can be judged through the relative position relation between the preset key points of the target human body. Therefore, in the embodiment of the present invention, the camera may determine the relative position relationship between the preset key points according to the human body key point information of the target human body, and determine whether the blackboard writing behavior exists in the target human body according to the relative position relationship.

The method provided by the embodiment of the invention determines whether the current video frame has the blackboard writing posture or not through the relative position relation of the key points of the human body which belong to the same target human body and are marked with the preset part identification.

Therefore, in an implementation manner of the embodiment of the present invention, a relative position relationship between a key point marked with a right wrist identifier, a key point marked with a right elbow identifier, and a key point marked with a right shoulder identifier in a target human body is determined according to human body key point information of the target human body, and whether the target human body has a blackboard-writing posture is determined according to the relative position relationship. Whether the blackboard writing posture exists in the target human body can be judged more accurately by judging the relative position relation of the human body key points corresponding to the right arm, and whether the blackboard writing posture exists in the target human body can be judged through less key point information, so that the judgment amount is less, and the judgment speed is increased.

Specifically, judging whether the position of a key point marked with a right wrist mark is higher than the position of a key point marked with a right elbow mark in the target human body, whether the horizontal distance between the key point marked with the right wrist mark and the key point marked with the right elbow mark is smaller than a first distance threshold value, and whether the vertical distance between the key point marked with the right elbow mark and the key point marked with a right shoulder mark is smaller than a second distance threshold value; if yes, determining that the target human body has the blackboard-writing posture.

The first distance threshold and the second distance threshold may be determined according to a specific motion of a user writing on a blackboard in a real environment and a size of an arm of the user, and may be equal to or different from each other. For example, the first pitch threshold value and the second pitch threshold value may each be any value within a range of 18cm to 22 cm. Those skilled in the art may set specific values of the first pitch threshold and the second pitch threshold according to practical situations, and embodiments of the present invention are not limited in particular.

In one embodiment, the camera may use the coordinate of the key point marked with the right wrist identifier, the coordinate of the key point marked with the right elbow identifier, and the coordinate of the key point marked with the right shoulder identifier as parameters of a blackboard-writing posture judgment formula, and determine whether the target human body has a blackboard-writing posture according to whether a posture judgment value obtained by the posture judgment formula is a preset value.

In one embodiment, the above-mentioned posture determination formula may be the following formula:

wherein, as shown in FIG. 3, J₄·x、J₃X represents the abscissa of the key point marked with the right wrist mark and the key point marked with the right elbow mark, respectively, and J₄·y、J₃·y、J₂Y represents the ordinate of the key point marked with the right wrist mark, the right elbow mark, the right shoulder mark, respectively, d_j、d_iRespectively representing a first pitch threshold and a second pitch threshold. In the formula, w-1 indicates a pseudo-occurrence blackboard-writing posture, and w-0 indicates a non-blackboard-writing posture.

Since the motions related to the right arm are rich, the pre-judgment has the possibility of erroneous judgment and missed judgment, and preferably, the suspected judgment result is classified and calibrated by combining a classification network algorithm so as to improve the accuracy of the blackboard-writing posture recognition, which is specifically the step 704.

Step 704, for the target judged to be the suspected blackboard-writing posture, a machine learning algorithm is adopted to carry out sitting up detection classification:

firstly, collecting picture data containing a blackboard-writing posture, preferably screening the picture data containing the blackboard-writing posture from a currently collected image frame set to serve as sample data for training, and calibrating first target frames of all blackboard-writing targets in the sample data; and performing external expansion on the basis of the first target frame in the calibration data, extracting the image of the first target frame in each image data, and making two classification samples of the blackboard-writing posture and the non-blackboard-writing posture. Preferably, the calibrated first target frame is a regular graph, so as to conveniently, flexibly and simply set the outward expansion ratio.

When the suspected blackboard-writing posture in the current frame is identified, a second target frame is generated based on the key points of the human body, the second target frame is expanded by a certain proportion, and then the target which is judged to be the suspected blackboard-writing posture in advance is subjected to image matting, namely, the image of the second target frame is extracted from the current frame, and then the extracted image is sent to a CNN classification network for classification in real time. And if the target frame image of the suspected blackboard writing is classified as the blackboard writing, recognizing as the posture of the blackboard writing. And if the target frame image suspected of being in the non-blackboard-writing posture is classified as the non-blackboard-writing posture, identifying the target frame image as the non-blackboard-writing posture.

The embodiment of the invention determines whether the current video frame has the blackboard writing behavior or not through the relative position relation of the human key points which belong to the same human target and are marked with the preset part identification. Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.

Example four: the following takes as an example the recognition of a blackboard-writing gesture that contains multiple speakers in the video.

In practical applications, the video frame may include not only a speaker but also other people such as assistant education.

Referring to fig. 8, fig. 8 is a schematic flow chart of a method for recognizing a blackboard-writing gesture including multiple targets in a video.

Step 801, recording a current frame number f;

step 802, the same as the image preprocessing in the third embodiment,

step 803, human body key point detection:

first a scene data set is collected and then the position of each human key point of each target human in the picture is noted. To facilitate other applications of gesture recognition, the labeled human body key points preferably include 15 human body key points such as head, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, chest, right hip, right knee, right ankle, left hip, left knee, left ankle, etc. For recognition of the blackboard-writing gesture, only the key points of the right upper limb, i.e. right shoulder, right elbow, right wrist, may also be labeled if the amount of data labeled is to be reduced.

One of the implementation modes can be that after the labeled data of the key points of the human body are obtained, the labeled data are used as training samples of the YPN network model to train the YPN network model, and the trained YPN network model is stored; inputting the collected current video frame into the trained YPN network model, and extracting human body key features from the current video frame through the model, thereby realizing the real-time generation of human body key point information and obtaining the human body key point information J of each human body target in the frame image_k，n，fWherein, J_k，n，fThe information of the human key point n of the target human body k in the f frame is used for distinguishing different purposes in different video framesMarking different human body key point information of human body.

Step 804, as one of implementation manners, in order to capture the gesture to be recognized of the same target human body from multiple target human bodies, the same human body key point information of the same target between frames needs to be acquired, and a frame amount at least greater than 1 is favorable for acquiring the human body key point information, so that it is determined whether the currently accumulated frame number reaches a preset threshold, if so, step 805 is executed, otherwise, step 806 is executed to acquire a next video frame;

step 805, in the embodiment of the present invention, when the video frame recorded by the camera includes a plurality of target human bodies, in order to enable the current video frame and the previous video frame to recognize the writing posture of the same target human body, so that the recognition process can be continuously performed for the same person, a preset tracking algorithm is adopted in this step, and a target human body, which is the same target as the target human body corresponding to the previous video frame, in the target human body corresponding to the current video frame is selected as a locked target; and traversing each target human body until the tracking of all the target human bodies in the current video frame is completed.

And step 807, traversing preset human body key point coordinates of all the locked targets, and prejudging the blackboard-writing postures of all the locked targets to obtain suspected blackboard-writing postures of the locked targets. The specific pre-determination method may be the determination method in step 703 in the third embodiment. Since the blackboard-writing posture is the position change between the human key points in the frame, the blackboard-writing posture can be performed based on the preset human key point coordinate relationship of the current frame.

Step 808, traversing all suspected blackboard-writing target human bodies, detecting and classifying the blackboard-writing postures and the non-blackboard-writing postures by adopting a machine learning algorithm, classifying the suspected blackboard-writing target human bodies into the blackboard-writing postures, identifying the suspected blackboard-writing target human bodies into the non-blackboard-writing postures, and identifying the suspected blackboard-writing target human bodies into the non-blackboard-writing postures.

According to the method and the device, pre-judgment of the blackboard-writing posture and detection and classification based on the pre-judged suspected blackboard-writing posture are realized by detecting key points of the human body, tracking the human body and combining the position relation among the key points of the human body of the blackboard-writing posture, the postures of a plurality of target human bodies in the current video frame are recognized, and shooting control can be further triggered based on the recognized blackboard-writing posture.

FIG. 9 is a schematic diagram of an apparatus according to an embodiment of the present invention. The device comprises a plurality of devices which are connected with each other,

the image acquisition module acquires a current video frame;

The identification module is used for identifying the position of the mobile terminal,

the first identification unit is used for determining the longitudinal position change of the same preset human body key point of the current frame and the previous f frames according to preset human body key point information and determining whether the starting sitting posture condition is met or not according to the relative position change;

and the second identification unit is used for determining the relative position relation between the preset human key points according to the preset human key point information and determining whether the blackboard-writing posture condition is met or not according to the relative position relation. .

The first identification unit may comprise a first identification unit,

The second identification unit may comprise a second identification unit,

According to an embodiment of the present invention, there is provided an image pickup apparatus including a camera, a memory, and a processor, wherein,

the camera is used for shooting images;

the memory is used for storing a computer program;

the processor is used for executing the program stored in the memory and realizing the target object gesture recognition method.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the following steps:

acquiring a current video frame;

The storage medium provided by the embodiment of the invention can accurately identify the tiny change of the gesture, has no specific requirements on the target object and the gesture, has wide application range, low requirements on the image in the video frame, high accuracy of the gesture identification and small false detection and missing detection of the gesture identification.

For the device/camera/storage medium embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

It should be noted that the gesture recognition example of the target object provided by the present invention is not limited to the above-mentioned embodiment, and the present invention can be applied to gesture recognition of other target objects, for example, recognition and correction of gestures are performed by shooting videos of a gymnastic person during a gymnastic process, and for example, for behavior tracking and shooting of animals, the preset gesture condition can be set according to position characteristics and/or motion characteristics between key points of the gesture to be recognized of the target object.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for recognizing the posture of a target object is characterized by comprising the following steps,

acquiring a current video frame;

2. The identification method of claim 1, wherein the method further comprises,

3. The recognition method according to claim 2, wherein the current frame in which the current posture of the object is recognized is input to the trained machine learning model, and if the machine learning model recognizes that the posture of the object in the current frame is the preset posture, the preset posture is taken as a recognition result, including,

collecting picture data containing the pose of the object,

4. The identification method according to claim 1, further comprising after said obtaining the preset keypoint information of the target object in the current frame,

5. The identification method according to one of claims 1 to 4, characterized in that the method further comprises,

6. The recognition method of claim 5, wherein the first predetermined posture condition is an in-position posture condition,

7. The identification method according to claim 6, wherein the determining the longitudinal position change of the same preset human key point of the current frame and the previous f frames according to the preset human key point information and determining whether the sitting posture condition is met according to the relative position change comprises,

8. The identification method according to claim 7, wherein the determining the longitudinal position change of the left shoulder and right shoulder human key points in the current frame and the previous f frame according to the preset left shoulder and right shoulder human key point information, and determining whether the sitting posture condition is met according to the position change comprises,

9. The recognition method according to claim 8, wherein the current frame in which the current posture of the object is recognized is input to the trained machine learning model, and if the machine learning model recognizes that the posture of the object in the current frame is the preset posture, the preset posture is taken as a recognition result, including,

10. The identification method of claim 9, further comprising,

here, M, T is a preset natural number.

11. The recognition method of claim 5, wherein the second preset pose condition is a blackboard-writing pose condition,

12. The recognition method according to claim 11, wherein the determining a relative position relationship between the preset human key points according to the preset human key point information and determining whether the blackboard-writing posture condition is met according to the relative position relationship comprises,

13. The identification method of claim 12,

14. An apparatus for recognizing the posture of an object, comprising,

the image acquisition module acquires a current video frame;

15. The apparatus of claim 14, further comprising,

16. The apparatus of claim 15, wherein the detection classification module comprises,

17. The apparatus of claim 14, further comprising,

18. The apparatus of any of claims 14 to 17, further comprising,

19. The apparatus of claim 18, wherein the first predetermined posture condition is an in-position posture condition, the identification module comprises,

20. The apparatus of claim 19, wherein the first identification unit comprises,

21. The apparatus of claim 20, further comprising,

22. The apparatus of claim 18, wherein the second preset pose condition is a blackboard-writing pose condition, the recognition module comprises,

23. The apparatus of claim 22, wherein the second identification unit comprises,

24. A camera device, comprising a camera, a memory and a processor, wherein,

the camera is used for shooting images;

the memory is used for storing a computer program;

the processor is configured to execute the program stored in the memory to implement the object posture identifying method according to any one of claims 1 to 13.

25. A storage medium storing a computer program for implementing the object posture identifying method according to any one of claims 1 to 13.