WO2022116829A1

WO2022116829A1 - Human behavior recognition method and apparatus, computer device and readable storage medium

Info

Publication number: WO2022116829A1
Application number: PCT/CN2021/131200
Authority: WO
Inventors: 林灿然; 程骏; 庞建新
Original assignee: 深圳市优必选科技股份有限公司
Priority date: 2020-12-01
Filing date: 2021-11-17
Publication date: 2022-06-09
Also published as: CN112418135A

Abstract

The present application relates to the technical field of behavior analysis, and provides a human behavior recognition method and apparatus, a computer device and a readable storage medium,. The present application comprises: obtaining a relative position relationship between multiple human body key points by means of obtaining position information and degrees of confidence respective to the multiple human body key points of a target character in an image to be recognized and then normalizing the position information respective to the multiple human body key points; then, calling a pre-stored SVM behavior classifier to perform data analysis on the degrees of confidence respective to the multiple human body key points and the relative position relationship between the multiple human body key points to obtain a corresponding human behavior category of the target character in the image. In the foregoing manner, the accuracy of human behavior recognition is improved by means of combining the position information and degrees of confidence of the human body key points in the process of human behavior category recognition, and the efficiency of human behavior recognition is increased by means of the high operating efficiency of the SVM classifier.

Description

Human behavior recognition method, device, computer equipment and readable storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of the Chinese Patent Application No. 202011388513.6 and entitled "Human Behavior Recognition Method, Device, Computer Equipment and Readable Storage Medium" filed with the China Patent Office on December 1, 2020, the entire content of which is approved by Reference is incorporated in this application.

technical field

The present application relates to the technical field of behavior analysis, and in particular, to a method, device, computer equipment and readable storage medium for identifying human behavior.

Background technique

With the improvement of hardware computing power and the rise of artificial intelligence, the application of computer vision technology is more and more extensive, among which behavior analysis technology is an important branch of current computer vision technology. Behavior analysis technology can identify the specific types of human behavior, and correspondingly analyze the specific intention of human behavior in the application field of the behavior analysis technology, so as to effectively improve the service effect of electronic equipment. For behavior analysis technology, the specific efficiency and accuracy of human behavior recognition are important factors that affect the final effect of behavior analysis.

Application content

In view of this, the purpose of the present application includes providing a human behavior recognition method, device, computer equipment and readable storage medium, which can improve the human behavior recognition accuracy while improving the human behavior recognition efficiency.

In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:

In a first aspect, the present application provides a method for recognizing human behavior, the method comprising:

Obtaining an image to be recognized, and performing human body key point detection on the image to be recognized, to obtain respective position information and confidence levels of multiple human key points corresponding to the target person in the image to be recognized;

Normalizing the respective position information of the multiple human body key points corresponding to the target person, to obtain the relative positional relationship between the multiple human body key points corresponding to the target person;

Call the pre-stored SVM behavior classifier to perform data analysis on the respective confidence levels of the multiple human body key points and the relative positional relationship between the multiple human body key points, and obtain the target person in the to-be-recognized image. Human behavior category.

In an optional implementation manner, the SVM behavior classifier corresponds to a plurality of identifiable behavior categories, and the pre-stored SVM behavior classifier is called for the respective confidence levels of the multiple human body key points and the multiple human body key points. The steps of performing data analysis on the relative positional relationship between the key points to obtain the human behavior category of the target person in the to-be-recognized image, including:

According to the respective confidence levels of the multiple human body key points and the relative positional relationship between the multiple human body key points, the SVM behavior classifier is invoked to calculate the probability that the target person is classified into each identifiable behavior category value;

Extract the maximum probability value from the calculated probability values of each identifiable behavior category, and compare the maximum probability value with a preset probability threshold;

If the maximum probability value is equal to or greater than the preset probability threshold, the identifiable behavior category corresponding to the maximum probability value is used as the human behavior category of the target person.

In an optional embodiment, the method further includes:

Obtaining respective sample behavior data sets of different behavior categories, wherein the sample behavior data sets include position information and confidence levels of multiple human body key points in the corresponding sample images of multiple sample characters divided into the same behavior category, The number of sample characters corresponding to different behavior categories is the same;

Normalizing the position information of a plurality of human body key points in the corresponding sample image for each sample character, to obtain the relative positional relationship between the multiple human body key points of the sample character in the sample image;

According to the confidence of multiple human body key points in the corresponding sample images of multiple sample characters corresponding to different behavior categories and the relative positional relationship between the multiple human body key points, the initial SVM classifier is model trained to obtain The SVM behavioral classifier.

In an optional embodiment, for a target person or a sample person, the position information of multiple human body key points in the corresponding image is normalized to obtain the relative positions of the multiple human body key points. Relationship steps, including:

Determine the area height and area width of the minimum circumscribed rectangular area of the multiple human body key points according to the horizontal and vertical coordinates of the original image in the corresponding images of the multiple human body key points, and determine the horizontal and vertical coordinates of the original image corresponding to the human body reference points. ordinate value;

For each human body key point, the difference between the original image abscissa value of the human body key point and the original image abscissa value of the human body reference point is divided by the area width to obtain the human body key point The normalized abscissa value of ;

For each key point of the human body, the difference between the ordinate value of the original image of the key point of the human body and the ordinate value of the original image of the human body reference point is divided by the height of the region to obtain the key point of the human body The normalized ordinate value of .

In a second aspect, the present application provides a human behavior recognition device, the device comprising:

a human body detection module, configured to obtain an image to be recognized, and perform human body key point detection on the image to be recognized, to obtain respective position information and confidence levels of multiple human key points corresponding to the target person in the to-be-recognized image;

The normalization processing module is used to normalize the respective position information of the multiple human body key points corresponding to the target person, so as to obtain the relative positional relationship between the multiple human body key points corresponding to the target person ;

The behavior recognition module is used to call the pre-stored SVM behavior classifier to perform data analysis on the respective confidence levels of the multiple human body key points and the relative positional relationship between the multiple human body key points, and obtain the target person at the location. Describe the human behavior category in the image to be recognized.

In an optional implementation manner, the SVM behavior classifier corresponds to a plurality of identifiable behavior categories, and the behavior identification module includes:

The probability calculation sub-module is used to call the SVM behavior classifier to calculate that the target person is divided into different positions according to the respective confidence levels of the multiple human key points and the relative positional relationship between the multiple human key points. Probability values under identifiable behavior categories;

a probability comparison submodule, used for extracting the maximum probability value from the calculated probability values of each identifiable behavior category, and comparing the maximum probability value with a preset probability threshold;

A category output sub-module, configured to use the identifiable behavior category corresponding to the maximum probability value as the human behavior category of the target person if the maximum probability value is equal to or greater than the preset probability threshold.

In an optional embodiment, the device further comprises:

The sample acquisition module is used to acquire respective sample behavior data sets of different behavior categories, wherein the sample behavior data sets include the data of the multiple human body key points in the corresponding sample images of the multiple sample characters that are divided into the same behavior category. Location information and confidence, the number of sample characters corresponding to different behavior categories is the same;

The normalization processing module is further configured to perform normalization processing on the position information of multiple human body key points of each sample person in the corresponding sample image, so as to obtain a plurality of people of the sample person in the sample image. The relative positional relationship between the key points of the body;

The classifier training module is used for, according to the confidence of the multiple human body key points in the corresponding sample images and the relative positional relationship between the multiple human key points corresponding to the multiple sample characters corresponding to different behavior categories, for the initial SVM The classifier performs model training to obtain the SVM behavior classifier.

In an optional embodiment, the normalization processing module performs normalization processing on the position information of multiple human body key points in the corresponding image for the target person or the sample person, and obtains the multiple human bodies Ways of relative positional relationship between key points, including:

In a third aspect, the present application provides a computer device, the computer device includes a processor and a memory, the memory stores a computer program executable by the processor, and the processor can execute the computer program to The method for recognizing human behavior described in any one of the foregoing embodiments is implemented.

In a fourth aspect, the present application provides a readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the method for recognizing human behavior described in any one of the foregoing embodiments.

The beneficial effects of the embodiments of the present application include the following:

The present application obtains the respective position information and confidence of multiple human body key points of the target person in the image to be recognized, and then normalizes the respective position information of the multiple human body key points to obtain the relationship between the multiple human body key points. Then call the pre-stored SVM behavior classifier to analyze the confidence of the multiple human key points and the relative position relationship between the multiple human key points, and obtain the target person in the image to be recognized. Corresponding human behavior category, thus improving the accuracy of human behavior recognition by combining the position information and confidence of human key points into the human behavior category recognition process, and improving human behavior recognition through the operating efficiency of the SVM classifier. efficiency.

In order to make the above-mentioned objects, features and advantages of the present application more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.

Description of drawings

In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following drawings will briefly introduce the drawings that need to be used in the embodiments. It should be understood that the following drawings only show some embodiments of the present application, and therefore do not It should be regarded as a limitation of the scope, and for those of ordinary skill in the art, other related drawings can also be obtained according to these drawings without any creative effort.

1 is a schematic diagram of the composition of a computer device provided by an embodiment of the present application;

FIG. 2 is one of the schematic flow charts of the method for recognizing human behavior provided by an embodiment of the present application;

3 is a schematic table of position information of multiple human body key points corresponding to the same person in a person image provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of the relative positional relationship between a plurality of human body key points shown in FIG. 3;

FIG. 5 is a schematic flowchart of the sub-steps included in step S230 in FIG. 2;

FIG. 6 is the second schematic flowchart of the method for recognizing human behavior provided by the embodiment of the present application;

FIG. 7 is one of the schematic diagrams of the composition of the human behavior recognition device provided by the embodiment of the present application;

Fig. 8 is the composition schematic diagram of the behavior recognition module in Fig. 7;

FIG. 9 is the second schematic diagram of the composition of the apparatus for recognizing human behavior provided by the embodiment of the present application.

Icon: 10-computer equipment; 11-memory; 12-processor; 13-communication unit; 100-human behavior recognition device; 110-human detection module; 120-normalization processing module; 130-action recognition module; 131- Probability calculation sub-module; 132-probability comparison sub-module; 133-category output sub-module; 140-sample acquisition module; 150-classifier training module.

Detailed ways

In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. The components of the embodiments of the present application generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations.

Thus, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

In the description of this application, it is to be understood that relational terms such as the terms "first" and "second" etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require Or imply that there is any such actual relationship or order between these entities or operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element. For those of ordinary skill in the art, the specific meanings of the above terms in this application can be understood in specific situations.

Through painstaking research, the applicant found that the existing human behavior recognition solutions are usually implemented in two ways. The first way is to use the optical flow algorithm to predict the trend of behavior, and the second way is to use the convolutional neural network class to carry out. Human behavior recognition. Among them, the first idea takes a lot of time to extract the optical flow when it is implemented, and is easily interfered by the noise in the environment during the specific identification process. Therefore, this idea is greatly limited in the actual environment. Once the cover plate appears in the environment, The recognition effect of this kind of thinking will drop sharply. The second way of thinking needs to collect huge sample image data for model training. The whole training process is complicated, the trained model is not easy to converge and the recognition efficiency is low. Land deployment will lead to a decrease in the recognition accuracy of the deployed model. At the same time, when the scene difference between the image to be recognized and the sample image data is high, the recognition effect of the model will also drop sharply, and the overall robustness will be poor.

In this case, in order to reduce the interference of image background changes on the effect of human behavior recognition, improve the accuracy of human behavior recognition, and simultaneously improve the efficiency of human behavior recognition, the embodiments of the present application provide a method, device, and computer equipment for human behavior recognition. and a readable storage medium to achieve the aforementioned effects.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Please refer to FIG. 1 , which is a schematic diagram of the composition of a computer device 10 provided by an embodiment of the present application. In the embodiment of the present application, the computer device 10 can effectively reduce the interference of the image background in the image of the person to be recognized to the human behavior recognition process, and quickly and accurately recognize the behavior of the person in the image of the person to be recognized. The computer device 10 may be, but not limited to, a smart phone, a tablet computer, a personal computer, a server, a robot with an image acquisition function, and the like.

In this embodiment, the computer device 10 may include a memory 11 , a processor 12 , a communication unit 13 and a human behavior recognition apparatus 100 . Wherein, the elements of the memory 11 , the processor 12 and the communication unit 13 are electrically connected to each other or indirectly to realize data transmission or interaction. For example, the elements of the memory 11 , the processor 12 and the communication unit 13 can be electrically connected to each other through one or more communication buses or signal lines.

In this embodiment, the memory 11 may be, but not limited to, a random access memory (Random Access Memory, RAM), a read only memory (Read Only Memory, ROM), a programmable read only memory (Programmable Read-Only Memory) Memory, PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Read-Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. Wherein, the memory 11 is used for storing a computer program, and the processor 12 can execute the computer program correspondingly after receiving the execution instruction.

Wherein, the memory 11 is also used to store an SVM behavior classifier, and the SVM behavior classifier is a classifier model obtained by an SVM (Support Vector Machine, Support Vector Machine) classifier based on the relevant information of human key points trained. The SVM behavior classifier is used to identify human behavior categories. The SVM behavior classifier corresponds to a plurality of identifiable behavior categories, and the identifiable behavior categories may be, but are not limited to, sitting, standing, lying down, raising hands, and squatting. Wherein, the number of the human body key points is usually 18, which includes the nose, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, right hip, right knee, right ankle, left hip, The left knee, left ankle, left eye, right eye, left ear and right ear, and the relevant information of the human body key points includes the position information and confidence level of the corresponding human body key points in the human image, and the confidence level is used to indicate Corresponding to the position reliability of the extracted human body key points in the person image, the position information may be represented by the horizontal and vertical coordinates of the original image corresponding to the human body key points in the person image.

In this embodiment, the processor 12 may be an integrated circuit chip with signal processing capability. The processor 12 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a network processor (Network Processor, NP), a digital signal processor (DSP) ), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, at least one of a discrete gate or transistor logic device, a discrete hardware component. The general-purpose processor may be a microprocessor, or the processor may also be any conventional processor, etc., and may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application.

In this embodiment, the communication unit 13 is configured to establish a communication connection between the computer device 10 and other electronic devices through a network, and to send and receive data through the network, wherein the network includes a wired communication network and wireless communication network. For example, the computer device 10 may acquire images to be recognized that need to be recognized by human behavior from other electronic devices through the communication unit 13 .

In this embodiment, the human behavior recognition apparatus 100 includes at least one software function module that can be stored in the memory 11 or fixed in the operating system of the computer device 10 in the form of software or firmware. The processor 12 may be configured to execute executable modules stored in the memory 11 , such as software function modules and computer programs included in the human behavior recognition device 100 . The computer equipment 10 reduces the interference of the image background in the image of the person to be recognized to the human behavior recognition process through the human behavior recognition device 100, and quickly and accurately recognizes the behavior of the person in the image of the person to be recognized, thereby improving human behavior. Identify the effect.

It can be understood that the block diagram shown in FIG. 1 is only a schematic diagram of the composition of the computer device 10, and the computer device 10 may further include more or less components than those shown in FIG. 1 shows different configurations. Each component shown in FIG. 1 may be implemented in hardware, software, or a combination thereof.

In the present application, in order to ensure that the computer device 10 can perform an accurate and fast human behavior recognition operation on the image to be recognized, so as to achieve the effect of improving the accuracy of human behavior recognition and the efficiency of human behavior recognition, the embodiment of the present application provides human behavior recognition The method achieves the aforementioned objects. The human action recognition method provided by the present application will be described in detail below.

Optionally, please refer to FIG. 2, which is one of the schematic flowcharts of the method for recognizing human behavior provided by the embodiment of the present application. In the embodiment of the present application, the specific flow and specific steps of the method for recognizing human behavior shown in FIG. 2 are as follows.

In step S210, an image to be recognized is acquired, and human body key point detection is performed on the image to be recognized, so as to obtain respective position information and confidence levels of multiple human key points corresponding to the target person in the image to be recognized.

In this embodiment, the to-be-recognized image may be acquired by the computer device 10 from other electronic devices through the communication unit 13 , or may be acquired by a camera additionally included in the computer device 10 . After the computer device 10 acquires the to-be-recognized image, it can use the Human Pose Estimation (HPE) algorithm to perform human key point detection on the to-be-recognized image, so that each character in the to-be-recognized image is used as a The target person is obtained, and the position information of the multiple human body key points corresponding to the target person in the to-be-recognized image, and the corresponding confidence level of each human-body key point in the to-be-recognized image are obtained. The position information of each human body key point corresponding to the target person may be represented by the horizontal and vertical coordinates of the original image of the human body key point in the to-be-recognized image. In an implementation manner of this embodiment, the computer device 10 may use the AlphaPose algorithm to detect human body key points.

Step S220, normalizing the respective position information of the multiple human body key points corresponding to the target person, to obtain the relative positional relationship between the multiple human body key points corresponding to the target person.

In this embodiment, after determining the respective position information of multiple human body key points corresponding to a certain person from a person image, the computer device 10 will determine the multiple human body key points of the person in the corresponding image. The position information of the point is normalized to obtain the relative positional relationship between the key points of the human body in the corresponding image, so that the key points of the human body corresponding to the person are not affected by the key points of the human body in the corresponding image. In order to maintain the recognition accuracy of the subsequent human behavior recognition process.

Wherein, in an implementation of this embodiment, the relative position information may be expressed as the positional relationship of each human body key point of the same person in the corresponding image relative to a human body reference point is normalized to an interval of 0 to 1 The specific coordinate value within the range. Wherein, the human body reference point may be any one of the key points of the human body of the same person; it may also be a coordinate point formed by taking the minimum abscissa value of the original image and the minimum ordinate value of the original image of each key point of the human body. It can be the coordinate average point between the human body key points of the same person; the abscissa value of the coordinate average point is the average value between the original image abscissa values of all the corresponding human body key points, and the coordinate average point The ordinate value of is the average value between the ordinate values of the original image of all the corresponding human body key points; the specific human body reference point can be configured according to the normalization processing requirements.

At this time, the computer device 10 normalizes the position information of the multiple human body key points of the character in the corresponding image to obtain the relative positional relationship between the multiple human body key points of the character in the corresponding image steps that can include:

In this process, the area height and area width of the minimum circumscribed rectangular area of the multiple human body key points can be calculated using the following formulas:

H= _Ymax - _Ymin , W= _Xmax - _Xmin

Among them, H and W respectively represent the area height and area width of the minimum circumscribed rectangular area corresponding to the multiple human body key points of the character, and Y _max and Y _min respectively represent the largest ordinate of the original image among the multiple human body key points of the character value and the minimum ordinate value of the original image, X _max and X _min respectively represent the maximum abscissa value of the original image and the minimum abscissa value of the original image among the multiple key points of the human body of the character. Taking Fig. 3 as an example, Y _max and Y _min in Fig. 3 are 180 and 10 respectively, and X _max and X _min are 180 and 10 respectively, then the area height and area width of the minimum circumscribed rectangular area corresponding to Fig. 3 are both. 170.

At the same time, for a certain human body key point, the following formulas can be used to calculate the normalized abscissa value and normalized ordinate value of the human body key point:

X _ni =(X _i -X _T )/W, Y _ni =(Y _i -Y _T )/H

Among them, X _i is used to represent the abscissa value of the original image of the key point of the ith person of the character, Y _i is used to represent the ordinate value of the original image of the key point of the ith person of the character, and X _ni is used to represent the value of the original image. The normalized abscissa value of the ith human body key point of the character, Y _ni is used to represent the normalized ordinate value of the ith human body key point of the character, and X _T is used to represent the human body reference point of the character. The abscissa value of the original image, Y _T is used to represent the ordinate value of the original image of the human body reference point of the character. Taking Fig. 3 as an example, the minimum original image abscissa value X _min of the character can be used as the original image abscissa value X _T of the human body reference point, and the minimum original image ordinate value Y _min of the character can be used as the human body reference point. The ordinate value Y _T of the original image is further calculated by the above formula to obtain the content of the relative positional relationship between the key points of the human body shown in FIG. 4 .

Therefore, after the computer device 10 determines the respective position information of a plurality of human body key points corresponding to a certain target person from the to-be-recognized image, it will determine the plurality of human body key points of the target person in the to-be-recognized image. The position information of the point is normalized to obtain the relative positional relationship between the target person and the key points of the human body in the to-be-recognized image. At this time, the normalization processing is performed on the position information of the multiple human body key points of the target person in the to-be-recognized image, and the relative relationship between the multiple human-body key points of the target person in the to-be-recognized image is obtained. The steps of the positional relationship can include:

Determine the area height and area width of the minimum circumscribed rectangular area of the multiple human body key points according to the original image horizontal and vertical coordinate values of the target person's multiple human body key points in the to-be-recognized image, as well as the corresponding human body reference point. The horizontal and vertical coordinates of the original image;

Step S230, calling the pre-stored SVM behavior classifier to perform data analysis on the respective confidence levels of the multiple human body key points and the relative positional relationship between the multiple human body key points, and obtain the human behavior category of the target person in the image to be recognized.

In this embodiment, the SVM behavior classifier is obtained by training the SVM classifier based on the confidence and relative position information of multiple human body key points corresponding to different characters. The SVM behavior classifier has good fault tolerance, can solve high-dimensional problems, and runs fast without a huge amount of training data, so that the human behavior recognition efficiency can be improved through the SVM behavior classifier. At the same time, the present application characterizes the specific behavior of the corresponding person by extracting key points of the human body, without being disturbed by the background noise in the image of the person, and can better highlight the behavior of the human body. Combined with the process of human behavior category recognition, the interference of image background changes on the human behavior recognition effect is reduced, and the accuracy of human behavior recognition is improved.

Among them, the SVM behavior classifier has a certain fault tolerance due to the introduction of slack variables, and can ignore the interference of some ungrouped noise points, so the generalization ability of the model is better and the recognition ability is strong. The SVM behavior classifier can map the linearly inseparable problem to a high-dimensional space through the kernel function, so as to find a separation plane, so that the problem becomes linearly separable and easier to solve, so the SVM does not need to consider the sample dimension, It can handle high-dimensional problems, and the fitting effect is good. At the same time, the SVM behavior classifier is a machine learning algorithm, which does not need to build a complex neural network structure. It only needs to learn support vectors and perform operations based on a small number of low-dimensional tensors. Therefore, the training and inference speed is very fast, which is similar to the neural network. Unlike a model that requires a large amount of data for training to obtain a more robust model, this SVM behavioral classifier only needs to provide a small amount of data to fit the relationship well. Therefore, the SVM behavior classifier in this application reduces the complexity of model training by performing representation learning on the features of human body key points (including the confidence and relative position information of multiple human body key points corresponding to a certain person in the human image). At the same time, the amount of data required for training is reduced, and the recognition accuracy and robustness of the model are improved.

In this embodiment, when the computer device 10 determines the respective normalized abscissa values, normalized ordinate values and confidence levels of multiple human body key points corresponding to a certain target person from the image to be recognized , the normalized abscissa value, normalized ordinate value and confidence of these multiple human body key points will be regarded as the multi-dimensional feature of the same target person, and the multi-dimensional feature will be input into the SVM behavior classifier. In, the multi-dimensional feature is analyzed by the SVM behavior classifier to determine whether the human behavior corresponding to the multi-dimensional feature belongs to which of the multiple identifiable behavior categories corresponding to the SVM behavior classifier .

Therefore, the present application can reduce the interference of image background changes on the recognition effect of human behavior by combining the position information and confidence of key points of the human body into the process of identifying human behavior categories, improve the accuracy of human behavior recognition, and pass the SVM classifier. It has high operating efficiency and improves the efficiency of human behavior recognition, thereby improving the effect of human behavior recognition. In an implementation of this embodiment, if there are 18 human body key points corresponding to the same character, the multi-dimensional feature of the character will be a 54-dimensional feature, and the 54-dimensional feature includes each of the 18 human body key points of the corresponding character. Normalized abscissa value, normalized ordinate value and confidence.

In this process, please refer to FIG. 5 , which is a schematic flowchart of the sub-steps included in step S230 in FIG. 2 . In this embodiment, the SVM behavior classifier corresponds to a plurality of identifiable behavior categories, and the step S230 may include sub-steps S231 to S233.

Sub-step S231, according to the respective confidence levels of the multiple human body key points and the relative positional relationship between the multiple human body key points, call the SVM behavior classifier to calculate the probability value that the target person is divided into each identifiable behavior category.

In this embodiment, when the computer device 10 determines the respective normalized abscissa values, normalized ordinate values and confidence levels of multiple human body key points corresponding to a certain target person from the image to be recognized , the normalized abscissa value, normalized ordinate value and confidence of these multiple human body key points will be regarded as the multi-dimensional feature of the same target person, and the multi-dimensional feature will be input into the SVM behavior classifier. middle. The SVM behavior classifier calculates the probability value that the target person is classified into each identifiable behavior category corresponding to the SVM behavior classifier according to the multi-dimensional features of the target person.

Sub-step S232: Extract the maximum probability value from the calculated probability values of each identifiable behavior category, and compare the maximum probability value with a preset probability threshold.

In this embodiment, the preset probability threshold is used to verify the recognition reliability of the SVM behavior classifier, and the target person calculated by the SVM behavior classifier is classified into different identifiable behavior categories. The maximum probability value among the probability values is used to represent the maximum possibility that the human behavior of the target person is effectively recognized. The computer device 10 determines whether the target person can be effectively identified by comparing the maximum probability value with the preset probability threshold.

Wherein, if the maximum probability value is less than the preset probability threshold, it indicates that each identifiable behavior category involved in the current SVM behavior classifier cannot match the current human behavior of the target person, and the current SVM behavior classifier cannot The current human behavior of the target person is effectively identified; if the maximum probability value is greater than or equal to the preset probability threshold, it indicates that the current SVM behavior classifier can effectively identify the current human behavior of the target person. In an implementation manner of this embodiment, the value of the preset probability threshold may be, but not limited to, any one of 50%, 55%, and 60%.

Sub-step S233, if the maximum probability value is equal to or greater than the preset probability threshold, the identifiable behavior category corresponding to the maximum probability value is used as the human behavior category of the target person.

In this embodiment, when the calculated maximum probability value among the probability values that the target person is classified into different identifiable behavior categories is greater than or equal to the preset probability threshold, it indicates that the current SVM behavior classifier can The current human behavior of the target person is effectively recognized, and at this time, the recognizable behavior category corresponding to the maximum probability value can be used as the human behavior category of the target person in the to-be-recognized image.

As a result, the present application can effectively analyze the multi-dimensional features of the target person in the image to be recognized by performing the above sub-steps S231 to S233 by using the SVM behavior classifier, so as to identify the most likely target person in the to-be-recognized image. Human behavior category, improve the accuracy of human behavior recognition as much as possible.

At the same time, by performing the above steps S210 to S230, the present application combines the position information and confidence of key points of the human body into the process of identifying human behavior categories, reducing the interference of image background changes on the effect of identifying human behavior, and improving the accuracy of human behavior recognition. , and through the operating efficiency of the SVM classifier, the efficiency of human behavior recognition is improved, thereby improving the effect of human behavior recognition.

Optionally, please refer to FIG. 6 . FIG. 6 is the second schematic flowchart of the method for recognizing human behavior provided by the embodiment of the present application. In the embodiment of the present application, the human behavior recognition method shown in FIG. 6 is compared with the human behavior recognition method shown in FIG. 2 . The human behavior recognition method shown in FIG. By performing the method of steps S240 to S260, the training operation of the SVM behavior classifier can be completed by using less human body key point feature sample data, which reduces the complexity and calculation amount of model training, and reduces the training requirements. The amount of data improves the recognition accuracy and robustness of the model.

Step S240, obtaining respective sample behavior data sets of different behavior categories, wherein the sample behavior data sets include position information and confidence levels of multiple human body key points in the corresponding sample images of multiple sample characters classified into the same behavior category. , the number of sample characters corresponding to different behavior categories is the same.

In this embodiment, multiple sample characters corresponding to the same sample behavior data set are each marked as the same behavior category in the corresponding sample images, the number of sample characters corresponding to different sample behavior data sets is the same, and each sample behavior data set has the same number of sample characters. Corresponds to a behavior category.

In an implementation of this embodiment, for each behavior category, N samples can be selected to collect objects, and n people are taken as a batch, and the same behavior can be collected during each batch of N/n batches. Category of behavioral image data, in which each batch takes the same amount of time to collect behavioral image data. During each batch, the sample collection object may be required to have a small range of motion changes (for example, body tilt and head swing, etc.) , but the overall human behavior category needs to be consistent, so that the final number of sample characters for different behavior categories is equal to the value of the corresponding behavior image frames multiplied by n.

Then, by performing human body key point detection on each behavior image frame, the position information and confidence of multiple human body key points corresponding to each sample person in the behavior image frame are determined, and then each sample corresponding to the same behavior category is determined. The position information and confidence of multiple key points of the human body corresponding to the characters are integrated to obtain the respective sample behavior data sets of different behavior categories, so that each sample behavior data set includes multiple samples classified into the same behavior category. The position information and confidence level of the multiple human body key points in the corresponding sample image for each person.

Step S250, normalize the position information of multiple human body key points in the corresponding sample image for each sample person to obtain the relative positional relationship between the multiple human body key points of the sample person in the sample image.

In this embodiment, the specific execution process of the step S250 is similar to the specific execution process of the above step S220, so the step S250 may include the following content:

For each sample person, according to the abscissa and vertical coordinate values of the original image of the plurality of human body key points in the corresponding sample image, the area height and area width of the minimum circumscribed rectangular area of the plurality of human body key points are determined, and The horizontal and vertical coordinates of the original image corresponding to the reference point of the human body;

Wherein, the human body reference point may be any one of the key points of the human body of the same person; it may also be a coordinate point formed by taking the minimum abscissa value of the original image and the minimum ordinate value of the original image of each key point of the human body. It can be the coordinate average point between the human body key points of the same person; the abscissa value of the coordinate average point is the average value between the original image abscissa values of all the corresponding human body key points, and the coordinate average point The ordinate value of is the average value between the ordinate values of the original image of all the corresponding human body key points; the specific human body reference point can be configured according to the normalization processing requirements. For the specific execution process of the step S250, reference may be made to the detailed description of the specific execution process of the step S220 above, which will not be repeated here.

Step S260, according to the confidence of the multiple human body key points in the corresponding sample images and the relative positional relationship between the multiple human body key points of the multiple sample characters corresponding to different behavior categories, perform model training on the initial SVM classifier, Get the SVM behavior classifier.

In this embodiment, after obtaining the confidence levels of the multiple human body key points in the corresponding sample images and the relative positional relationship between the multiple human body key points of the multiple sample characters corresponding to different behavior categories, the same sample The confidence level, normalized abscissa value and normalized ordinate value of multiple human body key points corresponding to the character are integrated into the multi-dimensional characteristics of the sample character, and then the respective multi-dimensional characteristics of multiple sample characters of different behavior categories are combined. The features are used as the model training data set of the SVM classifier, and the model training data set of the SVM classifier is input into the initial SVM classifier for model training. By fitting and learning, an SVM behavior classifier that can effectively identify each behavior category involved in the model training data set is obtained.

Therefore, the present application can complete the training operation of the SVM behavior classifier by performing the above-mentioned steps S240 to S260 by using less human body key point feature sample data, which reduces the complexity and calculation amount of model training, and reduces the It reduces the amount of data required for training, improves the recognition accuracy and robustness of the model, and ensures that the trained SVM behavior classifier can have good human behavior recognition accuracy and human behavior recognition efficiency in actual use.

In this application, in order to ensure that the computer equipment 10 can execute the above-mentioned human behavior recognition method through the human behavior recognition apparatus 100 , the application implements the aforementioned functions by dividing the human behavior recognition apparatus 100 into functional modules. The specific components of the human behavior recognition apparatus 100 provided by the present application will be described below accordingly.

Optionally, please refer to FIG. 7 . FIG. 7 is one of the schematic diagrams of the composition of the human behavior recognition apparatus 100 provided by the embodiment of the present application. In this embodiment of the present application, the human behavior recognition device 100 may include a human body detection module 110 , a normalization processing module 120 and a behavior recognition module 130 .

The human body detection module 110 is used for acquiring the image to be recognized, and performing human body key point detection on the image to be recognized, so as to obtain respective position information and confidence levels of multiple human key points corresponding to the target person in the image to be recognized.

The normalization processing module 120 is used for normalizing the respective position information of the multiple human body key points corresponding to the target person, so as to obtain the relative positional relationship between the multiple human body key points corresponding to the target person.

The behavior recognition module 130 is used to call the pre-stored SVM behavior classifier to perform data analysis on the respective confidence levels of multiple human body key points and the relative positional relationship between multiple human body key points, and obtain the human body of the target person in the image to be recognized. behavior category.

Optionally, please refer to FIG. 8 , which is a schematic diagram of the composition of the behavior recognition module 130 in FIG. 7 . In this embodiment, the behavior recognition module 130 may include a probability calculation sub-module 131 , a probability comparison sub-module 132 and a category output sub-module 133 .

The probability calculation sub-module 131 is used to call the SVM behavior classifier to calculate the probability that the target person is divided into each identifiable behavior category according to the respective confidence degrees of the multiple human body key points and the relative positional relationship between the multiple human body key points value.

The probability comparison sub-module 132 is configured to extract the maximum probability value from the calculated probability values of each identifiable behavior category, and compare the maximum probability value with a preset probability threshold.

The category output sub-module 133 is configured to use the identifiable behavior category corresponding to the maximum probability value as the human behavior category of the target person if the maximum probability value is equal to or greater than the preset probability threshold.

Optionally, please refer to FIG. 9 . FIG. 9 is the second schematic diagram of the composition of the human behavior recognition apparatus 100 provided by the embodiment of the present application. In this embodiment of the present application, the human behavior recognition apparatus 100 may further include a sample acquisition module 140 and a classifier training module 150 .

The sample acquisition module 140 is configured to acquire respective sample behavior data sets of different behavior categories, wherein the sample behavior data sets include the positions of multiple human body key points in the corresponding sample images of multiple sample characters divided into the same behavior category. Information and confidence, the number of sample characters corresponding to different behavior categories is the same.

The normalization processing module 120 is further configured to perform normalization processing on the position information of multiple human body key points of each sample person in the corresponding sample image, so as to obtain multiple human body key points of the sample person in the sample image. relative positional relationship between them.

The classifier training module 150 is used to classify the initial SVM according to the confidence of the multiple human body key points in the corresponding sample images and the relative positional relationship between the multiple human body key points corresponding to the multiple sample characters corresponding to different behavior categories. The model is trained by the classifier to obtain the SVM behavior classifier.

Wherein, the normalization processing module 120 performs normalization processing on the position information of the multiple human body key points in the corresponding image for the target person or the sample person, and obtains the relative positions between the multiple human body key points relationship, including:

Determine the area height and area width of the minimum circumscribed rectangular area of the multiple human body key points according to the respective abscissa and ordinate values of the original image in the corresponding images of the multiple human body key points, and the original image abscissa value corresponding to the human body reference point;

For each key point of the human body, the difference between the abscissa value of the original image of the key point of the human body and the abscissa value of the original image of the human body reference point is divided by the area width to obtain the normalization of the key point of the human body. abscissa value;

For each key point of the human body, the difference between the ordinate value of the original image of the key point of the human body and the ordinate value of the original image of the human body reference point is divided by the area height to obtain the normalization of the key point of the human body. Ordinate value.

Therefore, the normalization processing module 120 normalizes the respective position information of the multiple human body key points corresponding to the target person, so as to obtain the relative positional relationship between the multiple human body key points corresponding to the target person. , which can be expressed as the following:

The normalization processing module 120 performs normalization processing on the position information of multiple human body key points of each sample person in the corresponding sample image, and obtains the relationship between the sample person and the multiple human body key points in the sample image. The relative positional relationship can be expressed as follows:

It should be noted that the basic principle and the technical effect of the human behavior recognition device 100 provided by the embodiment of the present application are the same as the aforementioned human behavior recognition method. For the sake of brief description, the parts not mentioned in this embodiment are not mentioned. , you can refer to the above description of the human action recognition method.

In the embodiments provided in this application, it should be understood that the disclosed apparatus and method may also be implemented in other manners. The apparatus embodiments described above are merely illustrative, eg, the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality and operation of possible implementations of apparatuses, methods and computer program products according to embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function(s) executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.

In addition, each functional module in each embodiment of the present application may be integrated together to form an independent part, or each module may exist independently, or two or more modules may be integrated to form an independent part.

If the functions are implemented in the form of software function modules and sold or used as independent products, they may be stored in a readable storage medium. Based on such understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a readable storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned readable storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other various programs that can store program codes medium.

To sum up, in a human behavior recognition method, device, computer equipment and readable storage medium provided by the present application, the present application obtains the respective position information and confidence of multiple human key points of the target person in the image to be recognized. degree, and then normalize the respective position information of multiple human body key points to obtain the relative positional relationship between these multiple human body key points, and then call the pre-stored SVM behavior classifier to analyze the respective position information of these multiple human body key points. Confidence and the relative positional relationship between these key points of the human body are analyzed to obtain the corresponding human behavior category of the target person in the image to be recognized. In the process of behavior category recognition, the accuracy of human behavior recognition is improved, and the efficiency of human behavior recognition is improved through the operating efficiency of the SVM classifier.

The above are only various embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application, All should be covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A method for human behavior recognition, characterized in that the method comprises:

Obtaining an image to be recognized, and performing human body key point detection on the image to be recognized, to obtain respective position information and confidence levels of multiple human key points corresponding to the target person in the image to be recognized;

Normalizing the respective position information of the multiple human body key points corresponding to the target person, to obtain the relative positional relationship between the multiple human body key points corresponding to the target person;

Call the pre-stored SVM behavior classifier to perform data analysis on the respective confidence levels of the multiple human body key points and the relative positional relationship between the multiple human body key points, and obtain the target person in the to-be-recognized image. Human behavior category.
The method according to claim 1, wherein the SVM behavior classifier corresponds to a plurality of identifiable behavior categories, and the calling pre-stored SVM behavior classifier has respective confidence and The steps of performing data analysis on the relative positional relationship between the multiple human body key points to obtain the human behavior category of the target person in the to-be-recognized image include:

According to the respective confidence levels of the multiple human body key points and the relative positional relationship between the multiple human body key points, the SVM behavior classifier is invoked to calculate the probability that the target person is classified into each identifiable behavior category value;

Extract the maximum probability value from the calculated probability values of each identifiable behavior category, and compare the maximum probability value with a preset probability threshold;

If the maximum probability value is equal to or greater than the preset probability threshold, the identifiable behavior category corresponding to the maximum probability value is used as the human behavior category of the target person.
The method according to claim 1, wherein the method further comprises:

Obtaining respective sample behavior data sets of different behavior categories, wherein the sample behavior data sets include position information and confidence levels of multiple human body key points in the corresponding sample images of multiple sample characters divided into the same behavior category, The number of sample characters corresponding to different behavior categories is the same;

Normalizing the position information of a plurality of human body key points in the corresponding sample image for each sample character, to obtain the relative positional relationship between the multiple human body key points of the sample character in the sample image;

According to the confidence of multiple human body key points in the corresponding sample images of multiple sample characters corresponding to different behavior categories and the relative positional relationship between the multiple human body key points, the initial SVM classifier is model trained to obtain The SVM behavioral classifier.
The method according to any one of claims 1-3, characterized in that, for the target person or the sample person, normalizing the position information of multiple human body key points of the person in the corresponding image to obtain the The steps of describing the relative positional relationship between multiple human body key points include:

Determine the area height and area width of the minimum circumscribed rectangular area of the multiple human body key points according to the horizontal and vertical coordinates of the original image in the corresponding images of the multiple human body key points, and determine the horizontal and vertical coordinates of the original image corresponding to the human body reference points. ordinate value;

For each human body key point, the difference between the original image abscissa value of the human body key point and the original image abscissa value of the human body reference point is divided by the area width to obtain the human body key point The normalized abscissa value of ;

For each key point of the human body, the difference between the ordinate value of the original image of the key point of the human body and the ordinate value of the original image of the human body reference point is divided by the height of the region to obtain the key point of the human body The normalized ordinate value of .
A human behavior recognition device, characterized in that the device comprises:

a human body detection module, configured to obtain an image to be recognized, and perform human body key point detection on the image to be recognized, to obtain respective position information and confidence levels of multiple human key points corresponding to the target person in the to-be-recognized image;

The normalization processing module is used to normalize the respective position information of the multiple human body key points corresponding to the target person, so as to obtain the relative positional relationship between the multiple human body key points corresponding to the target person ;

The behavior recognition module is used to call the pre-stored SVM behavior classifier to perform data analysis on the respective confidence levels of the multiple human body key points and the relative positional relationship between the multiple human body key points, and obtain the target person at the location. Describe the human behavior category in the image to be recognized.
The device according to claim 5, wherein the SVM behavior classifier corresponds to a plurality of identifiable behavior categories, and the behavior identification module comprises:

The probability calculation sub-module is used to call the SVM behavior classifier to calculate that the target person is divided into different positions according to the respective confidence levels of the multiple human key points and the relative positional relationship between the multiple human key points. Probability values under identifiable behavior categories;

a probability comparison submodule, used for extracting the maximum probability value from the calculated probability values of each identifiable behavior category, and comparing the maximum probability value with a preset probability threshold;

A category output sub-module, configured to use the identifiable behavior category corresponding to the maximum probability value as the human behavior category of the target person if the maximum probability value is equal to or greater than the preset probability threshold.
The device according to claim 5, wherein the device further comprises:

The sample acquisition module is used to acquire respective sample behavior data sets of different behavior categories, wherein the sample behavior data sets include the data of the multiple human body key points in the corresponding sample images of the multiple sample characters that are divided into the same behavior category. Location information and confidence, the number of sample characters corresponding to different behavior categories is the same;

The normalization processing module is further configured to perform normalization processing on the position information of multiple human body key points of each sample person in the corresponding sample image, so as to obtain a plurality of people of the sample person in the sample image. The relative positional relationship between the key points of the body;

The classifier training module is used for, according to the confidence of the multiple human body key points in the corresponding sample images and the relative positional relationship between the multiple human key points corresponding to the multiple sample characters corresponding to different behavior categories, for the initial SVM The classifier performs model training to obtain the SVM behavior classifier.
The device according to any one of claims 5-7, wherein the normalization processing module, for the target person or the sample person, performs the position information of the multiple human body key points of the person in the corresponding image. The normalization process to obtain the relative positional relationship between the multiple human body key points includes:

Determine the area height and area width of the minimum circumscribed rectangular area of the multiple human body key points according to the horizontal and vertical coordinates of the original image in the corresponding images of the multiple human body key points, and determine the horizontal and vertical coordinates of the original image corresponding to the human body reference points. ordinate value;

For each human body key point, the difference between the original image abscissa value of the human body key point and the original image abscissa value of the human body reference point is divided by the area width to obtain the human body key point The normalized abscissa value of ;

For each key point of the human body, the difference between the ordinate value of the original image of the key point of the human body and the ordinate value of the original image of the human body reference point is divided by the height of the region to obtain the key point of the human body The normalized ordinate value of .
A computer device, characterized in that the computer device comprises a processor and a memory, the memory stores a computer program that can be executed by the processor, and the processor can execute the computer program to realize the claims The human action recognition method described in any one of 1-4.
A readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the method for recognizing human behavior according to any one of claims 1-4 is implemented.