WO2022041953A1

WO2022041953A1 - Behavior recognition method and apparatus, and storage medium

Info

Publication number: WO2022041953A1
Application number: PCT/CN2021/100379
Authority: WO
Inventors: 慕晨; 黄伟; 郭红星; 王春利; 梁敬柏
Original assignee: 中兴通讯股份有限公司; 长安大学
Priority date: 2020-08-31
Filing date: 2021-06-16
Publication date: 2022-03-03
Also published as: CN114202797A

Abstract

A behavior recognition method and apparatus, and a storage medium. The behavior recognition method comprises: acquiring a video image frame corresponding to a user to be subjected to recognition, wherein the video image frame comprises a depth image and a skeleton image (S10); determining a posture feature parameter corresponding to said user according to the depth image and the skeleton image (S20); and determining the current behavioral state of said user according to the posture feature parameter corresponding to said user (S30).

Description

Behavior recognition method, device and storage medium

cross reference

This application is based on the Chinese patent application with the application number "202010901631.6" and the application date is August 31, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference. Apply.

technical field

The embodiments of the present application relate to the technical field of image processing, and in particular, to a behavior recognition method, device, and storage medium.

Background technique

With the continuous development of artificial intelligence technology, human behavior recognition has become an emerging research field. The commonly used human behavior recognition method is the behavior recognition method based on wearable devices. The wearable device-based behavior recognition method collects the motion data of the human body through the motion sensor worn on the person. The disadvantage is that the user needs to wear the sensor, which is not convenient and the recognition accuracy is low.

Therefore, how to improve the accuracy and convenience of recognizing human behavior has become an urgent problem to be solved.

SUMMARY OF THE INVENTION

An embodiment of the present application provides a method for behavior recognition, including: acquiring a video image frame corresponding to a user to be recognized, wherein the video image frame includes a depth image and a skeleton image; The gesture characteristic parameter corresponding to the user to be recognized is determined; the current behavior state of the user to be recognized is determined according to the gesture characteristic parameter corresponding to the user to be recognized.

An embodiment of the present application further provides a behavior recognition device, including: a processor and a memory; the memory is used to store a program; the processor is used to execute the program and implement the above behavior recognition when the program is executed method.

Embodiments of the present application further provide a storage medium for readable storage, where the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement the following: The above-mentioned behavior recognition method.

Description of drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

1 is a schematic structural diagram of a behavior recognition system provided by an embodiment of the present application;

2 is a schematic diagram of performing part segmentation on a human body region provided by an embodiment of the present application;

3 is a schematic structural diagram of a behavior recognition device provided by an embodiment of the present application;

FIG. 4 is a schematic flowchart of a behavior recognition method provided by an embodiment of the present application;

5 is a schematic diagram of a video image frame provided by an embodiment of the present application;

6 is a schematic diagram of a human body region and a human body center of gravity provided by an embodiment of the present application;

7 is a schematic diagram of determining a head height corresponding to a user to be identified according to an embodiment of the present application;

8 is a schematic diagram of determining the posture height of a user to be recognized provided by an embodiment of the present application;

Fig. 9 is a schematic block diagram of the sub-steps of determining the current behavioral state of the user to be identified in Fig. 4;

FIG. 10 is a schematic flowchart of the sub-steps of judging whether the user to be identified has fallen in FIG. 9 .

detailed description

Embodiments of the present application provide a behavior recognition method, device, and storage medium, so as to at least solve the problems of low accuracy and poor convenience in recognizing human behavior in the related art.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

The flowcharts shown in the figures are for illustration only, and do not necessarily include all contents and operations/steps, nor do they have to be performed in the order described. For example, some operations/steps can also be decomposed, combined or partially combined, so the actual execution order may be changed according to the actual situation.

It should be understood that the terms used in the specification of the present application herein are for the purpose of describing particular embodiments only and are not intended to limit the present application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural unless the context clearly dictates otherwise.

It will also be understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items.

It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

In the following description, suffixes such as 'module', 'component' or 'unit' used to represent elements are used only to facilitate the description of the present application, and have no specific meaning per se. Thus, "module", "component" or "unit" may be used interchangeably.

An embodiment of the present application provides a behavior recognition method, apparatus, system, and storage medium. Among them, the behavior recognition method can be applied to a behavior recognition device to realize the determination of the posture feature parameters according to the depth image and the skeleton image, and to determine the current behavior parameters of the user to be recognized through the posture feature parameters, which can more conveniently and accurately identify the user to be recognized. The corresponding current behavior state.

Exemplarily, the behavior recognition device may include a server or a terminal. The server may be an independent server or a server cluster; the terminal may be an electronic device such as a smart phone, a tablet computer, a notebook computer, and a desktop computer.

Please refer to FIG. 1. FIG. 1 is a schematic structural diagram of a behavior recognition system provided by an embodiment of the present application. The behavior recognition system includes a behavior recognition device 10 and a photographing device 20 . Wherein, the behavior recognition device 10 may be connected with the photographing device 20 in wired or wireless communication. The photographing device 20 is used to collect video image frames including the user to be recognized, and the behavior recognition device 10 is used to perform image processing on the video image frames collected by the photographing device 20 to determine the current behavior state corresponding to the user to be recognized.

In some embodiments, the photographing device 20 may collect video image frames including the corresponding user to be identified, and perform human body recognition, human body part recognition, and skeletal joint point positioning on the video image frames, and the processed video image frames include depth images. and skeleton images. The behavior recognition device 10 can acquire the video image frames processed by the shooting device 20, determine the body center of gravity information corresponding to the user to be recognized according to the depth image, and determine the posture height and head height corresponding to the user to be recognized according to the skeleton image; then, according to the body center of gravity information, posture height, and head height to determine the current behavioral state of the user to be identified.

Exemplarily, the photographing apparatus 20 may include an electronic device such as a 3D camera, such as a somatosensory that may capture video image frames. In the embodiment of the present application, the somatosensory may be used to collect video image frames including the user to be identified.

It should be noted that the somatosensory may include a depth camera, a color camera and a light source emitter, and the somatosensory may acquire depth images, color images and three-dimensional data information of the background space.

Exemplarily, the somatosensory can acquire depth information. The working principle of obtaining depth information includes: projecting the light emitted by the light source emitter into the real scene. Since the emitted light will change due to the different surface shapes of the object, this light can be collected and encoded. The distance difference between each pixel in the scene and the depth camera can be obtained, and then the position and depth information of the object can be obtained.

In some embodiments, the body sensor performs human body recognition on the video image frame; for example, the background and the person in the video image frame are segmented according to a preset segmentation strategy, so as to determine the human body region or the human body contour corresponding to the user to be identified. information, and the obtained depth image includes the body region or body contour information corresponding to the user to be identified.

In some embodiments, the body sensor performs human body part recognition on the depth image obtained by human body recognition. Exemplarily, as shown in FIG. 2, FIG. 2 is a schematic diagram of part segmentation of the human body area; part segmentation is performed on the human body area in the depth image to obtain multiple part images, such as head, arms, legs, limbs and torso, etc. Part images; perform feature value classification and matching on multiple part images to determine the body part corresponding to each part image.

In some embodiments, the somatosensory performs skeletal joint point positioning on the video image frame to obtain a skeletal image. Specifically, the identified body parts are added to the virtual skeleton model, and adjusted according to the position information of the body parts to obtain a skeleton image including a plurality of joint points.

Exemplarily, the skeleton image may include, but is not limited to, joint points such as head joint points, neck joint points, knee joint points, elbow joint points, hip joint points, or ankle joint points.

Please refer to FIG. 3 , which is a schematic structural diagram of a behavior recognition apparatus 10 provided by an embodiment of the present application. The behavior recognition device 10 may include a processor 11 and a memory 12, wherein the processor 11 and the memory 12 may be connected through a bus, such as an integrated circuit bus (Inter-integrated Circuit, referred to as: IIC) and other suitable the bus.

Wherein, the memory 12 may include a non-volatile storage medium and an internal memory. The nonvolatile storage medium can store operating systems and computer programs. The computer program includes program instructions that, when executed, can cause the processor to perform any behavior recognition method.

Wherein, the processor 11 is used to provide computing and control capabilities to support the operation of the entire behavior recognition device 10 .

The processor 11 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (application specific integrated circuits) circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and features in the embodiments may be combined with each other without conflict.

As shown in FIG. 4 , FIG. 4 is a schematic flowchart of a behavior recognition method provided by an embodiment of the present application. The behavior recognition method can be applied to a behavior recognition device to realize the determination of the posture feature parameters according to the depth image and the skeleton image, and to determine the current behavior parameters of the user to be recognized through the posture feature parameters, which can more conveniently and accurately identify the corresponding user to be recognized. Current behavior state. The behavior recognition method includes steps S10 to S30.

Step S10: Obtain a video image frame corresponding to the user to be identified, wherein the video image frame includes a depth image and a skeleton image.

It should be noted that, in the embodiment of the present application, the video image frame corresponding to the user to be identified may be collected by the somatosensory sensor. Exemplarily, the somatosensory sensor can be installed indoors to monitor the indoor environment and people in real time.

In some embodiments, the somatosensory can collect video image frames corresponding to the user to be recognized through a built-in software development kit, and perform human body recognition, human body part recognition, and skeletal joint point positioning on the video image frames. Image frames include depth images and skeletal images.

Exemplarily, a processed video image frame sent by a body sensor may be received, wherein the video image frame includes a depth image and a skeleton image. The depth image and the skeleton image overlap each other in the same image, exemplarily, as shown in FIG. 5 , which is a video image frame. The depth image includes the human body region corresponding to the user to be identified, and the skeleton image includes bone information corresponding to the user to be identified. The skeleton information may include different joint points and connection relationships between the joint points, and the like.

Since the depth image and the skeleton image are not easily disturbed by the external environment, obtaining the depth image and the skeleton image corresponding to the user to be recognized can improve the accuracy of determining the posture feature parameters corresponding to the user to be recognized according to the depth image and the skeleton image.

Step S20: Determine, according to the depth image and the skeleton image, a posture feature parameter corresponding to the user to be recognized.

Exemplarily, the posture feature parameters may include information on the body's center of gravity, posture height, and head height.

It can be understood that the posture height refers to the height of the body of the user to be identified in different postures. Exemplarily, a person's body height when standing, body height when sitting cross-legged, body height when sitting on a chair, and the like.

Exemplarily, the information on the center of gravity of the human body may include, but is not limited to, the center of gravity point, the coordinates of the center of gravity, and the speed of falling of the center of gravity, and the like. The head height refers to the height of the entire head of the user to be identified.

In some embodiments, the center of gravity information of the human body corresponding to the user to be identified may be determined according to the depth image, and the posture height and the height of the head corresponding to the user to be identified may be determined according to the skeleton image.

In some embodiments, before determining the body center of gravity information corresponding to the user to be identified according to the depth image, the method may further include: performing format conversion on the depth data in the initial depth image according to a preset data format, to obtain the depth after the format conversion image. The preset data format may be Mat format. The initial depth image refers to the depth image in the video image frame obtained from the somatosensory sensor.

It should be noted that, in the embodiment of the present application, the recognition process of determining the current behavior state of the user to be recognized according to the depth image and the skeleton image is completed on the basis of the computer vision platform. In order to make the depth image have a better display effect, Format conversion of the depth data in the original depth image is required.

Exemplarily, according to the Mat format, the depth data in the original depth image is format-converted, and the depth data in the format-converted depth image is in the Mat format.

In some embodiments, before determining the posture height and the head height corresponding to the user to be recognized according to the skeleton image, the method may further include: smoothing the initial skeleton image according to a preset smoothing strategy to obtain a smoothed skeleton image. Wherein, the initial skeleton image refers to the skeleton image in the video image frame acquired from the somatosensory sensor.

It should be noted that, in the embodiments of the present application, the posture height and head height corresponding to the user to be recognized need to be determined according to the skeleton image; during the recognition process, the height, movement and real-time performance of the user to be recognized are highly recognized. It is required that if the skeleton image is not smoothed, it may cause the computer vision platform to shake or even crash.

It can be understood that smoothing is also called filtering. By smoothing the skeleton image, noise or distortion in the skeleton image can be reduced.

Exemplarily, the preset smoothing processing strategy may include, but is not limited to, a mean filtering algorithm, a median filtering algorithm, a Gaussian filtering algorithm, a bilateral filtering algorithm, and the like.

For example, according to the mean filtering algorithm, the initial skeleton image is smoothed to obtain the smoothed skeleton image.

For example, according to the Gaussian filtering algorithm, the initial skeleton image is smoothed to obtain the smoothed skeleton image.

In some embodiments, after format conversion is performed on the depth data in the initial depth image according to a preset data format to obtain a format-converted depth image, the depth image corresponding to the user to be recognized may be determined according to the format-converted depth image. Human body center of gravity information.

Exemplarily, the information on the center of gravity of the human body may include a center of gravity point of the human body; as shown in FIG. 6 , FIG. 6 is a schematic diagram of the area of the human body and the center of gravity point of the human body.

In some embodiments, determining the center of gravity information of the human body corresponding to the user to be identified according to the depth image may include: acquiring the total number of pixels in the human body region in the format-converted depth image, and acquiring the corresponding pixel points in the human body region. The abscissa and the ordinate; determine the sum of the abscissas and ordinates corresponding to all the pixels in the human body area; according to the total number of pixels and the sum of the abscissas, determine the mean of the abscissas corresponding to all the pixels, and according to the pixel The sum of the total number of points and the ordinate is to determine the mean of the ordinates corresponding to all the pixels; the mean of the abscissas corresponding to all the pixels is taken as the abscissa of the center of gravity of the human body, and the mean of the ordinates corresponding to all the pixels is taken as the human body The ordinate of the center of gravity, to get the center of gravity of the human body in the depth image.

Exemplarily, a rectangular coordinate system can be established in the depth image, the total number of pixels in the human body region in the depth image is obtained, and the abscissa and ordinate corresponding to all the pixels in the human body region are determined.

In some embodiments, the center of gravity coordinates corresponding to the center of gravity of the human body may be calculated according to the center of gravity calculation formula. Exemplarily, the calculation formula of the center of gravity is as follows:

In the formula, (X ₀ , Y ₀ ) represents the barycentric coordinates; s represents the total number of pixels in the human body area; i represents the ith pixel in the human body area; x _i represents the abscissa corresponding to the ith pixel; y _i Indicates the ordinate corresponding to the i-th pixel.

According to the calculation formula of the center of gravity, the mean value X ₀ of the abscissa and the mean value Y ₀ of the ordinate corresponding to all the pixel points in the human body area can be determined. Then, the mean value X ₀ is taken as the abscissa of the human body's center of gravity, and the mean value Y ₀ is taken as the ordinate of the human body's center of gravity, to obtain the barycentric coordinates (X ₀ , Y ₀ ) corresponding to the human body's center of gravity in the depth image.

By using the total number of pixels in the human body region in the depth image and the coordinates of each pixel, the center of gravity of the user to be identified can be more accurately determined.

In some embodiments, after smoothing an initial skeleton image according to a preset smoothing strategy to obtain a smoothed skeleton image, the posture height corresponding to the user to be recognized and head height.

Exemplarily, the joint point information in the smoothed skeleton image is extracted, and the posture height and the head height corresponding to the user to be recognized are determined according to the joint point information. Wherein, the joint point information includes joint point coordinates. For example, the joint point coordinates corresponding to the head joint point, the joint point coordinates corresponding to the neck joint point, and so on.

In some embodiments, extracting joint point information in the smoothed bone image may include: establishing a two-dimensional coordinate system in the bone image and determining a coordinate origin; according to the two-dimensional coordinate system of each joint point in the bone image In the position, determine the joint point coordinates of each joint point. For example, the joint point coordinates corresponding to the head joint point are (X ₁ , Y ₁ ), and the joint point coordinates corresponding to the neck joint point are (X ₂ , Y ₂ ).

In some embodiments, before determining the posture height and the head height corresponding to the user to be recognized according to the joint point information, the method may further include: acquiring the head joint points in the skeleton image; when the head joint points are located in the human body region in the depth image Inside, the posture height and head height corresponding to the user to be recognized are determined according to the joint point information.

Exemplarily, the head joint points can be positioned according to the connection relationship between the joint points in the skeleton image to determine the specific position of the head joint point in the skeleton image and whether the head joint point is located in the depth image. in the human body area.

It should be noted that, when the head of the user to be identified is not at the highest position or the head is blocked by other parts of the body, the head height cannot be accurately determined through the joint point information in the skeleton image.

Exemplarily, when the head joint point in the skeleton image is located in the human body area in the depth image, it means that the head of the user to be identified is at the highest position or the head is not blocked, at this time, it can be accurately determined according to the joint point information. The posture height and head height corresponding to the user to be recognized.

Exemplarily, when the head joint points in the skeleton image are not within the human body area in the depth image, it means that the head of the user to be identified is not in the highest position or the head is occluded. At this time, the joint point information cannot be accurately determined. Identify the user's corresponding posture height and head height.

By judging whether the head joint point is located in the human body area in the depth image, the accuracy of determining the posture height and the head height corresponding to the user to be recognized can be improved.

In some embodiments, after it is determined that the head joint points are located in the human body region in the depth image, the posture height and the head height corresponding to the user to be identified may be determined according to the joint point information.

In some embodiments, determining the height of the head corresponding to the user to be identified according to the joint point information may include: determining the head joint point and the neck joint point according to the joint point coordinates corresponding to the head joint point and the joint point coordinates corresponding to the neck joint point. The first height difference between the neck joint points; according to the product of the preset height ratio and the first height difference, the height of the head corresponding to the user to be identified is obtained.

Exemplarily, as shown in FIG. 7 , FIG. 7 is a schematic diagram of determining the height of the head corresponding to the user to be identified. In the two-dimensional coordinate system established by the skeleton image, if the joint point coordinates corresponding to the head joint points are (X ₁ , Y ₁ ), and the joint point coordinates corresponding to the neck joint points are (X ₂ , Y ₂ ), then the head The first height difference h ₁ between the neck joint point and the neck joint point is |Y ₁ -Y ₂ |.

In the embodiment of the present application, if the preset height ratio is 1.8726, then according to the product of the preset height ratio and the _first height difference _h1 , the head height H1 corresponding to the user to be identified can be obtained; wherein, H1 ₁ = 1.8726h ₁ .

By using the joint point coordinates corresponding to the head joint points and the joint point coordinates corresponding to the neck joint points, the head height can be obtained more accurately, which improves the accuracy of subsequent judgment of the current behavior state of the user to be identified.

In some embodiments, determining the posture height corresponding to the user to be recognized according to the joint point information may include: determining the highest joint point and the lowest joint point in the skeleton image; according to the joint point coordinates corresponding to the highest joint point and the lowest joint point The joint point coordinates are used to determine the second height difference between the highest joint point and the lowest joint point, and the second height difference is used as the posture height corresponding to the user to be identified.

It can be understood that when the user to be recognized is standing, the highest joint point in the skeleton image is the head joint point, and the lowest joint point is the ankle joint point. When the user to be identified is sitting cross-legged, the highest joint point in the skeleton image is the head joint point, and the lowest joint point is the hip joint point.

In some embodiments, when the user to be identified is standing, the highest joint point in the skeleton image is the head joint point, and the lowest joint point is the ankle joint point; the joint point coordinates corresponding to the head joint point and the foot joint point can be used according to The joint point coordinates corresponding to the wrist joint point determine the second height difference between the head joint point and the ankle joint point, that is, the posture height of the user to be identified is obtained. At this time, the posture height is the height of the body when the user to be recognized is standing.

Exemplarily, if the joint point coordinates corresponding to the head joint point are (X ₁ , Y ₁ ), and the joint point coordinates corresponding to the ankle joint point are (X ₃ , Y ₃ ), then in the skeleton image, the head joint The second height difference between the point and the ankle joint point is |Y ₁ -Y ₃ |, that is, the height of the gesture of the user to be recognized is |Y ₁ -Y ₃ |. As shown in FIG. 8 , FIG. 8 is a schematic diagram of determining the gesture height of the user to be recognized.

In some embodiments, when the user to be identified is sitting cross-legged, the highest joint point in the skeletal image is the head joint point, and the lowest joint point is the hip joint point; the joint point coordinates corresponding to the head joint point and the hip joint The joint point coordinates corresponding to the points determine the second height difference between the head joint point and the hip joint point, that is, the posture height of the user to be recognized is obtained. At this time, the posture height is the height of the body when the user to be identified sits cross-legged.

Exemplarily, if the joint point coordinates corresponding to the head joint point are (X ₁ , Y ₁ ), and the joint point coordinates corresponding to the hip joint point are (X ₄ , Y ₄ ), then in the skeleton image, the head joint point is The second height difference between it and the hip joint point is |Y ₁ -Y ₄ |, that is, the height of the posture of the user to be recognized is |Y ₁ -Y ₄ |.

By determining the posture height corresponding to the user to be recognized according to the second height difference between the highest joint point and the lowest joint point in the skeleton image, the posture height corresponding to the current state of the user to be recognized can be more truly and accurately reflected.

Step S30: Determine the current behavior state of the to-be-identified user according to the gesture characteristic parameter corresponding to the to-be-identified user.

In some embodiments, determining the current behavioral state of the to-be-recognized user according to the posture feature parameters corresponding to the to-be-recognized user may include: determining the current behavioral state of the to-be-recognized user according to the body's center of gravity information, posture height, and head height.

In the embodiments of the present application, the current behavioral state of the user to be identified is comprehensively determined according to parameters such as the user's body center of gravity information, posture height, and head height, and no additional wearable equipment is required, and the to-be-identified user can be more conveniently and accurately determined. The current behavioral state of the user.

Referring to FIG. 9 , determining the current behavioral state of the user to be identified according to the body's center of gravity information, posture height, and head height may include the following step S31 or step S32.

Step S31: If the ratio of the posture height to the head height is within a preset ratio range, determine the posture corresponding to the to-be-identified user according to the preset correspondence between the ratio range and the posture type Types of.

Exemplarily, the preset ratio range may include a first ratio range, a second ratio range and a third ratio range.

Exemplarily, posture types may include, but are not limited to, standing, cross-sitting, sitting, kneeling, and the like.

Among them, the first ratio range is the ratio of the standing height of the human body to the height of the head; the second ratio range is the ratio of the sitting height of the human body to the height of the head; the third ratio range is the sitting height of the human body and the head height The ratio of the height of the waist or the height of the kneeling position to the height of the head.

In some embodiments, the first ratio range, the second ratio range, and the third ratio range may be determined from the ratio measurement data between the body height and the head height.

Exemplarily, the ratio measurement data between body height and head height are shown in Table 1.

Table 1: Measurement data

It should be noted that, the measurement data in Table 1 are obtained by measuring a preset number of testers. In Table 1, the measurement data corresponding to the 1% column represents the measurement data corresponding to 1% of the testers; the measurement data corresponding to the 99% column represents the measurement data corresponding to 99% of the testers. A is the average of the ratio of height to head height for all people in each percentile; H is the ratio of the posture height of 99% of testers to the head height of 1% of testers; L is the ratio of 1% of testers The ratio of the posture height to the head height of 99% of the subjects; P represents the average of all ratios in each row; H/H represents the ratio of standing height to head height; S/H represents the ratio of sitting height to head height ; C/H represents the ratio of the height of the cross sitting posture to the height of the head.

Exemplarily, according to the measurement data in Table 1, it can be determined: the average ratio of the standing height to the head height is 7.551 for the males and 7.322 for the females; the average of the ratio of the sitting height to the head height, Males were 5.953 and females were 5.726; the average ratio of the height of the cross-sitting posture to the head height was 4.084 for males and 3.981 for females.

The Chinese national standard GB 10000-88 adult body shape and related standardization literature show that when the ratio is greater than or equal to 6.5, it can be determined that the human body is in a standing state; when the ratio is [3.3, 4.5], it can be determined that the human body is in a cross-sitting state; When the ratio is [5,6], it can be determined that the human body is in a sitting or kneeling state. Therefore, in the embodiment of the present invention, the first ratio range is set to [6.5, +∞], the second ratio range is [3.3, 4.5], and the third ratio range is [5, 6].

Exemplarily, the preset correspondence between the ratio range and the gesture type may be as shown in Table 2.

Table 2: Comparison of Ratio Ranges and Posture Types

比率范围Ratio range	姿势类型Posture Type
第一比率范围first ratio range	站姿standing
第二比率范围Second Ratio Range	盘坐姿sitting cross-legged
第三比率范围third ratio range	坐姿或跪姿Sitting or kneeling

In some embodiments, the gesture type corresponding to the user to be recognized may be determined according to a preset correspondence between the ratio range and the gesture type.

Exemplarily, if the ratio of the posture height to the head height is in the first ratio range, it is determined that the posture type corresponding to the user to be recognized is a standing posture.

Exemplarily, if the ratio of the posture height to the head height is within the second ratio range, it is determined that the posture type corresponding to the user to be identified is the cross-sitting posture.

Exemplarily, if the ratio of the posture height to the head height is in the third ratio range, then according to the positional relationship between the lowest joint point corresponding to the to-be-identified user and the head joint point, it is determined that the posture type corresponding to the to-be-identified user is: Sitting or kneeling. It can be understood that, when the ratio of the posture height to the head height corresponding to the user to be identified is in the third ratio range, since the posture type corresponding to the third ratio range is a sitting posture or a kneeling posture, it is necessary to further determine the posture of the user to be identified. Posture type.

It should be noted that, in the skeleton image, if the user to be identified is in a sitting posture, the lowest joint point in the skeleton image is the ankle joint point, and the ankle joint point and the head joint point are not on the same vertical line; if If the user to be identified is in a kneeling posture, the lowest joint point in the skeleton image is the knee joint point, and at this time the knee joint point and the head joint point are on the same vertical line. Therefore, according to the positional relationship between the lowest joint point in the skeleton image and the head joint point, it can be determined that the posture type corresponding to the user to be identified is a sitting posture or a kneeling posture.

In some embodiments, if the lowest joint point and the head joint point are on the same vertical line, it is determined that the posture type corresponding to the user to be identified is a kneeling posture.

In other embodiments, if the lowest joint point and the head joint point are not on the same vertical line, it is determined that the posture type corresponding to the user to be identified is a sitting posture.

By judging whether the ratio of the height of the gesture corresponding to the user to be recognized to the height of the head is within the first ratio range, the second ratio range or the third ratio range, the gesture type corresponding to the user to be recognized can be more accurately determined. By judging whether the lowest joint point and the head joint point are on the same vertical line, it can be more accurately determined that the posture type of the user to be recognized is a sitting posture or a kneeling posture.

Step S32 : if the ratio of the posture height to the head height is not within the preset ratio range, determine whether the user to be identified falls according to the body center of gravity information according to a preset detection strategy.

Exemplarily, in addition to the center of gravity point, the information about the center of gravity of the human body may also include the rate of descent of the center of gravity. The detection strategy may include: determining whether the center of gravity descending rate of the user to be identified is greater than a preset descending rate; when the descending rate of the center of gravity is greater than the preset descending rate, determining a hip height value corresponding to the user to be identified; when the hip height value is less than the preset descending rate When the height value is , it is determined that the user to be identified is in a falling state, wherein the height value of the buttocks is the distance between the buttocks of the user to be identified and the ground.

Please refer to FIG. 10 . FIG. 10 is a schematic flowchart of determining whether the user to be identified falls according to the information of the center of gravity of the human body according to the preset detection strategy in step S32 , which may specifically include the following steps S321 to S323 .

Step S321: Detect the drop rate of the center of gravity of the user to be identified.

In some embodiments, before detecting the drop rate of the center of gravity of the user to be identified, the method further includes: acquiring the center of gravity of the human body in the depth image, and marking the center of gravity of the human body in the skeleton image.

It should be noted that, in the above-mentioned embodiments, the human body center of gravity point corresponding to the user to be identified has been determined according to the depth image, wherein the human body center point of gravity in the depth image can be directly obtained and marked in the skeleton image.

By marking the center of gravity of the human body in the skeleton image, it is possible to determine the drop rate of the center of gravity corresponding to the user to be identified by changing the position of the center of gravity of the human body in the skeleton image within a preset interval.

In some embodiments, a first skeletal image and a second skeletal image at adjacent preset intervals in the video image frame are acquired, the first skeletal image includes the center of gravity of the first human body, and the second skeletal image includes the center of gravity of the second human body point; according to the coordinates corresponding to the center of gravity of the first person, the coordinates corresponding to the center of gravity of the second person, and the interval time, determine the drop rate of the center of gravity of the user to be identified, wherein the preset interval time can be set according to the actual situation, and the specific value It is not limited here.

Exemplarily, take the time of acquiring 10 video image frames as the interval time t, for example, the skeleton image in the video image frame of the first frame is used as the first skeleton image, and the skeleton image in the video image frame of the 10th frame is used as the interval time t. Second bone image. Since the center of gravity of the human body has been marked in the skeleton image, the first skeleton image includes the center of gravity of the first human body, and the second skeleton image includes the center of gravity of the second human body.

Exemplarily, if the coordinates corresponding to the center of gravity of the first person are (X ₀₁ , Y ₀₁ ) and the coordinates corresponding to the center of gravity of the second person (X ₁₀ , Y ₁₀ ), then according to the coordinates corresponding to the center of gravity of the first person (X ₀₁ , Y ₀₁ ), the coordinates (X ₁₀ , Y ₁₀ ) corresponding to the center of gravity of the second person, and the interval time t, can determine the drop rate of the center of gravity of the user to be identified. The descent rate of the center of gravity is represented by v, and the descent rate of the center of gravity v can be calculated by the following formula:

Step S322: When the lowering rate of the center of gravity is greater than a preset lowering rate, detect the hip height value of the user to be identified.

Exemplarily, the preset falling rate may be represented by V, where the falling rate V may be set according to the actual situation, and the specific value is not limited herein.

In some embodiments, when the gravity center descending rate v is greater than the preset descending rate V, the hip height value of the user to be identified is detected.

In other embodiments, when the gravity center descending rate v is not greater than the preset descending rate V, continue to detect the gravity center descending rate v of the user to be identified until a gravity center descending rate v greater than the preset descending rate V is detected.

Exemplarily, acquiring the skeleton image in the video image frame, for example, using the skeleton image in the video image frame of the 11th frame as the first skeleton image, and using the skeleton image in the video image frame of the 20th frame as the second skeleton image; continue According to the coordinates and the interval time corresponding to the center of gravity point in the first skeleton image and the second skeleton image, the drop rate of the center of gravity of the user to be identified is determined.

In some embodiments, detecting the height value of the buttocks of the user to be identified may include: acquiring the coordinates corresponding to the hip joint points in the skeleton image; determining the vertical distance between the coordinates corresponding to the hip joint points and the ground, and using the vertical distance as the to-be-identified distance The user's hip height value.

Since the coordinates of the joint points in the skeleton image have been obtained in the above embodiment, the coordinates corresponding to the hip joint points in the skeleton image can be directly obtained; exemplarily, in the skeleton image, the coordinates corresponding to the hip joint points are (X ₄ , Y4 ₎ .

Exemplarily, a three-dimensional coordinate system can be established with the somatosensory as the coordinate origin, wherein the distance between the hip joint point and the somatosensory can be determined by the depth information in the depth image; for example, if the distance between the hip joint point and the somatosensory If the distance is Z ₄ , in the three-dimensional coordinate system, the coordinates corresponding to the hip joint point are (X ₄ , Y ₄ , Z ₄ ).

Exemplarily, the ground can be expressed as AX+BY+CZ+D=0; in the formula, A, B, C are coefficients, and D is a constant.

In the embodiments of the present application, the constant D represents the distance between the somatosensory sensor and the ground. In some embodiments, the get_FloorClipPlane function in the kinect SDK can be used to obtain three point coordinates that are not on the same straight line, and the three point coordinates can be substituted into AX+BY+CZ+D=0 to determine the coefficients A, B, C .

Exemplarily, the coordinates corresponding to the hip joint point are determined to be the vertical distance between (X ₄ , Y ₄ , Z ₄ ) and the ground, which can be determined by the point-to-surface distance formula, as shown below:

In the formula, H represents the vertical distance between the hip joint point and the ground, that is, the hip height value of the user to be identified is H.

Step S323: When the hip height value is less than a preset height value, determine that the user to be identified is in a falling state. Wherein, the preset height value can be set according to the average width of the waist and buttocks of an adult male and the average width of the waist and buttocks of an adult female, and the specific value is not limited herein.

Exemplarily, when the hip height value H is smaller than the preset height value, it means that the buttocks of the user to be identified are relatively close to the ground, and at this time, it can be determined that the user to be identified is in a falling state.

By first judging whether the lowering rate of the center of gravity of the user to be recognized is greater than the preset lowering rate, and then detecting the height of the hips of the user to be recognized, it is possible to combine the lowering rate of the center of gravity and the height of the hips to determine whether the user to be recognized is in a fall state, and it is extremely Greatly improved accuracy.

In some embodiments, after determining that the to-be-identified user is in a fall state, the method further includes: sending an emergency notification to the family or hospital corresponding to the to-be-identified user, so that the family or hospital finds that the to-be-identified user has fallen and handles it in time according to the emergency notification.

Exemplarily, the manner of sending the emergency notification may include, but is not limited to, text messages, phone calls, emails, and the like.

Exemplarily, the emergency notification may include location information of the user to be identified, and may also include depth images and skeleton images corresponding to the user to be identified.

By sending an emergency notification to a family member or hospital corresponding to the user to be identified when it is determined that the user to be identified is in a fall state, the fall of the user to be identified can be detected in time and time can be saved for rescue.

The behavior recognition method, device, system and storage medium provided by the above embodiments can more conveniently determine the body center of gravity information according to the depth image and determine the posture height and head according to the skeleton image by acquiring the depth image and skeleton image corresponding to the user to be recognized. Height; the center of gravity of the user to be identified can be more accurately determined according to the total number of pixels in the human body area in the depth image and the coordinates of each pixel; The coordinates of the joint points corresponding to the points can obtain the height of the head more accurately, which improves the accuracy of the subsequent judgment of the current behavior state of the user to be identified; difference, determine the posture height corresponding to the user to be recognized, which can more truly and accurately reflect the posture height corresponding to the user to be recognized in the current state; by judging whether the ratio of the posture height corresponding to the user to be recognized to the head height is within the first ratio range , the second ratio range or the third ratio range, the gesture type corresponding to the user to be recognized can be more accurately determined. By judging whether the lowest joint point and the head joint point are on the same vertical line, it can be more accurately determined that the posture type of the user to be identified is sitting or kneeling; by first judging whether the downward rate of the center of gravity of the user to be identified is greater than the preset drop Then, the hip height value of the user to be identified can be detected, so that it can be determined whether the user to be identified is in a fall state by combining the lowering rate of the center of gravity and the hip height value, which greatly improves the identification accuracy.

The embodiments of the present application further provide a storage medium for readable storage, the storage medium stores a program, the program includes program instructions, and the processor executes the program instructions to implement the embodiments of the present application Any of the behavioral identification methods provided.

For example, the program is loaded by the processor and can perform the following steps:

Obtain a video image frame corresponding to the user to be identified, wherein the video image frame includes a depth image and a skeleton image; determine the posture feature parameter corresponding to the user to be identified according to the depth image and the skeleton image; The gesture characteristic parameters corresponding to the user are identified, and the current behavior state of the to-be-identified user is determined.

Wherein, the storage medium may be an internal storage unit of the behavior recognition apparatus described in the foregoing embodiments, such as a hard disk or a memory of the behavior recognition apparatus. The storage medium may also be an external storage device of the behavior recognition device, such as a plug-in hard disk equipped on the behavior recognition device, a smart memory card (Smart Media Card, SMC), a Secure Digital Card (Secure Digital Card, SD Card), flash memory card (Flash Card), etc.

Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, functional modules/units in the systems, and devices can be implemented as software, firmware, hardware, and appropriate combinations thereof.

In hardware embodiments, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components Components execute cooperatively. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on storable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is known to those of ordinary skill in the art, the term storage medium includes both volatile and nonvolatile implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules or other data , removable and non-removable media. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or may Any other medium used to store desired information and which can be accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and can include any information delivery media, as is well known to those of ordinary skill in the art .

The preferred embodiments of the present invention have been described above with reference to the accompanying drawings, but are not intended to limit the scope of the rights of the present invention. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the present invention shall fall within the right scope of the present invention.

Claims

A behavioral identification method comprising:

acquiring a video image frame corresponding to the user to be identified, wherein the video image frame includes a depth image and a skeleton image;

Determine the posture feature parameter corresponding to the user to be recognized according to the depth image and the skeleton image;

The current behavior state of the to-be-identified user is determined according to the gesture characteristic parameter corresponding to the to-be-identified user.
The behavior recognition method according to claim 1, wherein the posture feature parameters include body center of gravity information, posture height and head height; Pose feature parameters, including:

Determine the body center of gravity information corresponding to the to-be-identified user according to the depth image, and determine the posture height and head height corresponding to the to-be-identified user according to the skeleton image;

The determining the current behavioral state of the user to be identified according to the gesture characteristic parameter corresponding to the user to be identified includes:

The current behavior state of the user to be identified is determined according to the information on the center of gravity of the human body, the posture height, and the head height.
The behavior recognition method according to claim 2, wherein before the determining according to the depth image the information on the body center of gravity corresponding to the user to be recognized, the method further comprises:

According to a preset data format, format conversion is performed on the depth data in the initial depth image to obtain the format-converted depth image;

Before determining the posture height and the head height corresponding to the user to be identified according to the skeleton image, the method further includes:

According to a preset smoothing processing strategy, the initial skeleton image is smoothed to obtain the smoothed skeleton image.
The behavior recognition method according to claim 3, wherein the information on the center of gravity of the human body includes a center of gravity point of the human body; and the determining the information on the center of gravity of the human body corresponding to the user to be identified according to the depth image comprises:

Obtain the total number of pixels in the human body region in the depth image after format conversion, and obtain the abscissa and vertical coordinates corresponding to all the pixels in the human body region;

Determine the sum of the abscissas and ordinates corresponding to all the pixels in the human body area;

According to the sum of the total number of pixels and the abscissa, the mean value of the abscissa corresponding to all the pixels is determined, and according to the sum of the total number of pixels and the ordinate, the average value of the corresponding the mean of the ordinate;

Taking the mean value of the abscissas corresponding to all the pixels as the abscissa of the body's center of gravity, and using the mean of the ordinates corresponding to all the pixels as the ordinate of the body's center of gravity to obtain the depth image The center of gravity of the human body.
The behavior recognition method according to claim 3 or 4, wherein the determining the posture height and the head height corresponding to the to-be-recognized user according to the skeleton image, comprises:

Extracting joint point information in the smoothed skeleton image, and determining a posture height and a head height corresponding to the user to be identified according to the joint point information.
The behavior recognition method according to claim 5, wherein the skeleton image includes head joint points; before determining the posture height and the head height corresponding to the user to be recognized according to the joint point information, further comprising: :

obtaining the head joint points in the skeleton image;

When the head joint point is located in the human body area in the depth image, the posture height and the head height corresponding to the user to be identified are determined according to the joint point information.
The behavior recognition method according to claim 6, wherein the skeleton image further includes neck joint points; and the determining the head height corresponding to the user to be identified according to the joint point information comprises:

determining a first height difference between the head joint point and the neck joint point according to the joint point coordinates corresponding to the head joint point and the joint point coordinates corresponding to the neck joint point;

obtaining the head height corresponding to the user to be identified according to the product of the preset height ratio and the first height difference;

The determining the posture height corresponding to the user to be identified according to the joint point information includes:

determining the highest joint point and the lowest joint point in the skeleton image;

According to the joint point coordinates corresponding to the highest joint point and the joint point coordinates corresponding to the lowest joint point, a second height difference between the highest joint point and the lowest joint point is determined, and the second height difference as the gesture height corresponding to the to-be-identified user.
The behavior recognition method according to any one of claims 2-7, wherein the current behavior state includes posture type and whether to fall; height, determine the current behavior status of the user to be identified, including:

If the ratio of the gesture height to the head height is within a preset ratio range, determine the gesture type corresponding to the to-be-identified user according to the preset correspondence between the ratio range and the gesture type; or

If the ratio of the posture height to the head height is not within a preset ratio range, it is determined whether the user to be identified falls down according to the information on the center of gravity of the human body according to a preset detection strategy.
The behavior recognition method according to claim 8, wherein the preset ratio range includes a first ratio range, a second ratio range and a third ratio range, and the posture types include standing posture, cross-sitting posture, sitting posture and kneeling posture posture;

Wherein, the first ratio range is the ratio of the standing height of the human body to the height of the head, the second ratio range is that the second ratio is the ratio of the sitting height of the human body to the head height, and the third ratio range is The ratio of the sitting height to the head height of the human body or the ratio of the kneeling height to the head height.
The behavior recognition method according to claim 9, wherein, if the ratio of the posture height to the head height is in a preset ratio range, then the ratio between the ratio range and the posture type is determined according to a preset ratio. Correspondence, determine the gesture type corresponding to the user to be identified, including:

If the ratio of the posture height to the head height is within the first ratio range, determining that the posture type corresponding to the user to be identified is a standing posture; or

If the ratio of the posture height to the head height is within the second ratio range, determining that the posture type corresponding to the user to be identified is a cross-sitting posture; or

If the ratio of the posture height to the head height is within the third ratio range, determine the to-be-identified user according to the positional relationship between the lowest joint point corresponding to the to-be-identified user and the head joint point The user's corresponding posture type is sitting or kneeling.
The behavior recognition method according to claim 10, wherein, according to the positional relationship between the lowest joint point corresponding to the to-be-identified user and the head joint point, it is determined that the posture type corresponding to the to-be-identified user is a sitting posture or a Kneeling positions, including:

If the lowest joint point and the head joint point are on the same vertical line, it is determined that the posture type corresponding to the to-be-identified user is kneeling posture; or

If the lowest joint point and the head joint point are not on the same vertical line, it is determined that the posture type corresponding to the to-be-identified user is a sitting posture.
The behavior recognition method according to any one of claims 8-11, wherein the information on the center of gravity of the human body further includes a rate of descent of the center of gravity; if the ratio of the posture height to the head height is not within a preset In the ratio range, according to the information of the center of gravity of the human body, it is determined whether the user to be identified falls according to the preset detection strategy, including:

Detecting the drop rate of the center of gravity of the user to be identified;

When the lowering rate of the center of gravity is greater than a preset lowering rate, detecting the height value of the buttocks of the user to be identified;

When the hip height value is smaller than a preset height value, it is determined that the user to be identified is in a falling state.
The behavior identification method according to claim 12, wherein the detecting the gravity center drop rate of the to-be-identified user comprises:

Acquire a first skeleton image and a second skeleton image at adjacent preset intervals in the video image frame, wherein the first skeleton image includes a first human body center of gravity, and the second skeleton image includes a second skeleton image. center of gravity

According to the coordinates corresponding to the center of gravity of the first person, the coordinates corresponding to the center of gravity of the second person, and the interval time, determine the rate of descent of the center of gravity of the user to be identified;

The detecting the hip height value of the user to be identified includes:

obtaining the coordinates corresponding to the hip joint points in the skeleton image;

Determine the vertical distance between the coordinates corresponding to the hip joint point and the ground, and use the vertical distance as the hip height value of the user to be identified.
A behavior recognition device, comprising:

memory for storing programs;

The processor is configured to execute the program and implement the behavior recognition method according to any one of claims 1 to 13 when the program is executed.
A storage medium for readable storage, the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to realize any one of claims 1 to 13 The behavior recognition method described in item.