CN114582012A

CN114582012A - Skeleton human behavior recognition method, device and equipment

Info

Publication number: CN114582012A
Application number: CN202111616700.XA
Authority: CN
Inventors: 邓浩阳; 柯少杰; 罗印威; 张阳; 何志强
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-06-03

Abstract

The embodiment of the invention provides a method, a device and equipment for identifying skeleton human behaviors, wherein the method comprises the following steps: acquiring skeleton data of a target object; calculating key points based on the skeleton data to obtain joint difference data and skeleton difference data; performing feature extraction based on the joint difference data and the skeleton difference data to obtain skeleton data features and skeleton difference data features, and obtaining joint data features and joint difference data features based on the skeleton data features; performing characteristic data splicing fusion respectively based on the joint data characteristics and the joint difference data characteristics, and the bone data and the bone difference data characteristics to obtain joint splicing characteristics and bone splicing characteristics; and performing reinforced fusion on the key position characteristics of the branches with different dimensions, the joint splicing characteristics and the bone splicing characteristics respectively to obtain an action classification prediction result. The invention identifies the characteristics of the detail information of the human body, so that the human body behavior identification is more accurate.

Description

Skeleton human behavior recognition method, device and equipment

Technical Field

The invention relates to the technical field of human behavior recognition, in particular to a skeleton human behavior recognition method, device and equipment.

Background

Human behavior recognition is a popular research topic in the field of machine vision, and aims to capture and extract spatial and temporal feature information of human motion and further determine the motion type of human according to the feature information. The skeleton human behavior recognition method is a human behavior recognition method using skeleton data extracted directly or indirectly from human actions as input data. Compared with RGB video data, the skeleton data has the advantages of insensitive environmental interference, high effective information density, small data storage space and the like.

In the current skeleton human behavior identification method, a Microsoft Kinect sensor is used for directly collecting skeleton data with labels or an OpenPose algorithm is used for extracting skeleton data from an RGB human motion video to be used as input data, and a cyclic neural network method, a convolutional neural network method and a graph convolution neural network method in deep learning are used for identifying human behaviors of the skeleton data. However, the existing human behavior recognition methods are modification of a single data feature extraction network, or fusion of skeleton data and derived data thereof is performed by a prediction score addition fusion method based on single data, and these methods lack sufficient attention to the derived data and the fused data, which results in insufficient use of information of the skeleton data, and further affects final recognition accuracy.

Disclosure of Invention

Therefore, the technical problem to be solved by the present invention is to overcome the defect of insufficient information use of skeleton data in the prior art, and to provide a skeleton human behavior recognition method, device and equipment.

According to a first aspect, an embodiment of the present invention provides a skeleton human behavior recognition method, including: acquiring skeleton data of a target object, wherein the skeleton data comprises joint data and skeleton data; calculating key points based on the skeleton data to obtain joint difference data and skeleton difference data; performing feature extraction based on the joint difference data and the skeleton difference data to obtain skeleton data features and skeleton difference data features, and obtaining joint data features and joint difference data features based on the skeleton data features; performing characteristic data splicing fusion respectively based on the joint data characteristic and the joint difference data characteristic, and the skeleton data characteristic and the skeleton difference data characteristic to obtain a joint splicing characteristic and a skeleton splicing characteristic; and performing reinforced fusion on the key position characteristics of the branches with different dimensions, the joint splicing characteristics and the bone splicing characteristics respectively to obtain action classification prediction results.

Optionally, the calculating joint difference data based on the key points of the skeleton data includes: extracting joint data based on the skeleton data; establishing a joint coordinate system based on the joint data; and extracting joint change data in preset time based on the joint coordinate system, and calculating a difference value in the preset time based on the joint change data and the joint data to obtain joint difference data.

Optionally, the calculating skeletal difference data based on the key points of the skeletal data includes: extracting skeletal data based on the skeletal data; establishing a bone coordinate system based on the bone data; and extracting bone position change data within preset time based on the bone coordinate system, and calculating a difference value within the preset time based on the bone position change data and the bone data to obtain bone difference data.

Optionally, the extracting features based on the skeleton difference data to obtain skeleton data features and skeleton difference data features includes: constructing a skeleton difference data coordinate system based on the skeleton difference data and preset time; extracting a skeleton difference change image within preset time based on the skeleton difference data coordinate system; and obtaining skeleton data characteristics and skeleton difference data characteristics based on the skeleton difference change image.

Optionally, performing feature data splicing and fusion based on the joint data features and the joint difference data features to obtain joint splicing features, including: constructing a network layer based on the joint data characteristics and the joint difference data characteristics; performing data sorting based on the first network layer to obtain a first sorting result; and performing characteristic data splicing and fusion based on the first sequencing result to obtain joint splicing characteristics.

Optionally, performing feature data splicing and fusion based on the skeleton data features and the skeleton difference data features to obtain skeleton splicing features, including: constructing a network layer based on the skeleton data characteristics and the skeleton difference data characteristics; performing data sorting based on the second network layer to obtain a second sorting result; and performing feature data splicing and fusion based on the second sequencing result to obtain bone splicing features.

Optionally, performing enhanced fusion on the key position features of the branches with different dimensions, the joint splicing features and the bone splicing features respectively to obtain motion classification prediction results, including: establishing a fusion layer based on the joint splicing characteristics and the bone splicing characteristics; extracting key position feature information based on the fusion layer, wherein the key position is obtained based on the skeleton data feature and the skeleton difference data feature; obtaining a prediction value of the skeleton data based on the key position feature information; and obtaining an action classification prediction result based on the prediction numerical value.

According to a second aspect, an embodiment of the present invention provides a skeletal human behavior recognition apparatus, including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring skeleton data of a target object, and the skeleton data comprises joint data and skeleton data; the computing module is used for computing key points based on the skeleton data to obtain joint difference data and skeleton difference data; the characteristic extraction module is used for extracting characteristics based on the joint difference data and the skeleton difference data to obtain skeleton data characteristics and skeleton difference data characteristics, and obtaining joint data characteristics and joint difference data characteristics based on the skeleton data characteristics; the fusion module is used for performing characteristic data splicing fusion respectively based on the joint data characteristic and the joint difference data characteristic, and the skeleton data characteristic and the skeleton difference data characteristic to obtain a joint splicing characteristic and a skeleton splicing characteristic; and the prediction module is used for performing reinforced fusion on the key position characteristics of the branches with different dimensions, the joint splicing characteristics and the bone splicing characteristics respectively to obtain an action classification prediction result.

According to a third aspect, a skeletal human behavior recognition device comprises: the skeleton human behavior recognition method comprises a memory and a processor, wherein the memory and the processor are connected in a communication mode, computer instructions are stored in the memory, and the processor executes the computer instructions to execute the skeleton human behavior recognition method according to the first aspect or any one of the optional modes.

According to a fourth aspect, a computer-readable storage medium is characterized by storing computer instructions for causing a computer to execute the skeletal human behavior recognition method of the first aspect or any one of the optional embodiments.

The technical scheme of the invention has the following advantages:

the embodiment of the invention provides a method, a device and equipment for identifying skeleton human behaviors, wherein the method comprises the following steps: obtaining skeleton data of a target object, calculating key points of the skeleton data to obtain joint difference data and skeleton difference data, extracting feature information based on the joint difference data and the skeleton difference data to obtain skeleton data features and skeleton difference features, obtaining joint data features and joint difference data features based on the skeleton data features, performing feature data splicing and fusion on the obtained joint data features and joint difference data features, the obtained skeleton data features and the obtained skeleton difference data features to obtain joint splicing features and skeleton splicing features, and performing enhanced fusion on key position features of branches with different dimensions and the joint splicing features and the skeleton splicing features respectively to obtain action classification prediction results. According to the invention, the joint and the skeleton in the human skeleton are respectively subjected to feature extraction, so that the detailed information of the human body can be identified, and meanwhile, more noise information interference is eliminated, so that the human behavior identification is more accurate.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart illustrating a specific example of a skeleton human behavior recognition method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an example of a joint coordinate system of a skeleton human behavior recognition method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating an example of joint difference data of a skeleton human behavior recognition method according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating skeleton difference data of a skeleton human behavior recognition method according to an embodiment of the present invention;

fig. 5 is a schematic diagram illustrating a network layer of a skeleton human behavior recognition method according to an embodiment of the present invention;

fig. 6 is an exemplary diagram of an overall network layer of a skeleton human behavior recognition method according to an embodiment of the present invention;

FIG. 7 is a schematic diagram illustrating an example of enhanced fusion of a skeleton human behavior recognition method according to an embodiment of the present invention;

fig. 8 is a schematic diagram illustrating an exemplary structure of a skeleton human behavior recognition device according to an embodiment of the present invention;

fig. 9 is a schematic diagram illustrating an example of connection of a skeleton human behavior recognition device according to an embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

According to the embodiment of the invention, the relevant data of the joints and the bones are extracted based on the skeleton data of the target object, and then the relevant data of all the bones of the joints are fused, so that the motion classification prediction result is obtained. In the following embodiments, the human skeleton is taken as an example, and the present invention can also be applied to the motion recognition of other skeletons, which is not limited in this application.

Fig. 1 shows a flowchart of a skeleton human behavior recognition method according to an embodiment of the present invention, where the method specifically includes the following steps:

s100: skeleton data of a target object is acquired, the skeleton data including joint data and skeleton data.

Specifically, motion information of a target object is collected through a collection device, and skeleton data of the target object is extracted based on the collected motion information, wherein the skeleton data comprises joint data and skeleton data of the target object. In practical applications, the capturing device may be, for example, a Microsoft Kinect sensor, a video capturing device.

S200: and calculating to obtain joint difference data and skeleton difference data based on the key points of the skeleton data.

Specifically, key points of skeleton data are obtained by preprocessing the skeleton data, joint data in a channel dimension, a time dimension and a space dimension are obtained based on the key points, a data frame is calculated in the time dimension based on the joint data D, and the data form is

Wherein, t₀Obtaining a target object motion transformation product in a preset time based on joint data, wherein the time frame, C, T and S of the joint data respectively represent a channel dimension, a time dimension and a space dimension of the skeleton dataGenerating joint difference data, obtaining bone data based on the space dimension, and obtaining bone difference data generated by action transformation of the target object within preset time based on the bone data and the time dimension.

S300: and performing feature extraction based on the joint difference data and the skeleton difference data to obtain skeleton data features and skeleton difference data features, and obtaining joint data features and joint difference data features based on the skeleton data features.

Specifically, feature extraction is performed according to the joint difference data and the bone difference data, skeleton data features and skeleton difference data features are obtained according to the joint change difference features and the bone change difference features of the target object in the action process, and the joint data features and the joint difference data features are extracted from the skeleton data features.

S400: and performing characteristic data splicing fusion based on the joint data characteristic and the joint difference data characteristic, and the skeleton data characteristic and the skeleton difference data characteristic respectively to obtain a joint splicing characteristic and a skeleton splicing characteristic.

Specifically, feature data splicing and fusion are performed based on the joint data features and the joint difference data features, the skeleton data features and the skeleton difference data features, and joint splicing features and skeleton splicing features are obtained according to the integrated skeleton and the joints after fusion.

S500: and performing reinforced fusion on the key position characteristics of the branches with different dimensions, the joint splicing characteristics and the bone splicing characteristics respectively to obtain action classification prediction results.

Specifically, the key position features of the skeleton of the channel dimension, the time dimension and the space dimension, the joint splicing features and the skeleton splicing features are subjected to enhanced fusion again to obtain a complete skeleton subjected to multi-view attention enhanced fusion, and the action of the target object is analyzed based on the complete skeleton to obtain an action classification prediction result.

In the embodiment of the invention, the motion information of a target object is acquired through acquisition equipment, skeleton data of the target object is extracted based on the acquired motion information, the skeleton data is preprocessed to obtain key points of the skeleton data, joint data with different dimensions are obtained based on the key points, joint difference data and skeleton difference data generated by motion transformation of the target object in preset time are obtained based on the joint data, feature data splicing and fusion are carried out based on the joint data features and the joint difference data features, the joint splicing features and the skeleton difference data features are obtained according to the fused integral skeleton and joints, the key position features of the skeleton with different dimensions, the joint splicing features and the skeleton splicing features are subjected to strengthening fusion again to obtain the integral skeleton, and the motion of the target object is analyzed based on the integral skeleton, and obtaining an action classification prediction result. According to the embodiment of the invention, the data characteristic information of different dimensions is obtained by collecting the skeleton data of the action of the target object, so that the action and the behavior of the target object are analyzed, the accuracy of the identification task can be improved, and the influence of image noise is reduced.

In an optional embodiment of the present invention, the step S200 of calculating joint difference data based on the key points of the skeleton data includes the following steps:

(1) extracting joint data based on the skeleton data;

(2) establishing a joint coordinate system based on the joint data;

(3) and extracting joint change data in preset time based on the joint coordinate system, and calculating a difference value in the preset time based on the joint change data and the joint data to obtain joint difference data.

Specifically, joint data are extracted according to the skeleton data, a joint coordinate system is established according to the joint data, the joint coordinate system is shown in fig. 2, joint points of a target object are used as an original point in the joint coordinate system, the joint points are naturally connected according to a human body skeleton, joint change data in a preset time are obtained based on the joint coordinate system, and the joint change data and a difference value of the joint data in the preset time are calculated to obtain joint difference data.

Illustratively, the data of the joint data extracted at the t moment in the preset time is marked as (x)_t1,y_t2,z_t3) Calculating a difference value of the same joint node in adjacent time dimensions based on the data, and expressing the difference value as (x)_t1-x_t1-1,u_t1-y_t1-1,z_t1-z_t1-1). In practical applications, in order to align the data and facilitate subsequent calculation, for example, the preset time 0 may be set as an average value of all data, and the process is shown in fig. 3, that is, the process is that

In the embodiment of the invention, joint data are extracted according to the skeleton data, a joint coordinate system is established according to the joint data, joint points of a target object are taken as an original point in the joint coordinate system, the joint points are naturally connected according to a human body skeleton, joint change data in preset time are obtained based on the joint coordinate system, and the joint change data and a difference value of the joint data in the preset time are calculated to obtain joint difference data. According to the embodiment of the invention, the joint coordinate system is established, so that the difference data of each joint in the preset time can be more accurately calculated, and the accuracy of identifying the action behavior of the target object is further improved.

In an optional embodiment of the present invention, the step S200 of obtaining the bone difference data based on the key point calculation of the skeleton data includes the following steps:

(1) extracting skeletal data based on the skeletal data;

(2) establishing a bone coordinate system based on the bone data;

(3) and extracting bone position change data within preset time based on the bone coordinate system, and calculating a difference value within the preset time based on the bone position change data and the bone data to obtain bone difference data.

Specifically, bone data are extracted from a space dimension according to the skeleton data, a bone coordinate system is established according to the bone data, the bone coordinate system is shown in fig. 2, joint points of a target object are taken as an origin in the bone coordinate system, the joint points are naturally connected according to a human body skeleton, connecting lines are bone coordinates of the target object, bone change data in a preset time are obtained based on the bone coordinate system, and a difference value of the bone change data and the bone data in the preset time is calculated based on a time dimension of the skeleton data, so that bone difference data are obtained.

Illustratively, extracting the data of the bone data at the time t within the preset time is marked as (x)_t2,y_t2,z_t2) Calculating a difference value for the corresponding bone change in adjacent time dimensions, expressed as (x), based on said data_t2-x_t2-1,y_t2-y_t2-1,z_t2-z_t2-1). In practical applications, in order to align the data and facilitate the subsequent calculation, for example, the preset time 0 can be set as the average value of all the data, and the process is shown in fig. 3, that is, the process is that

In an optional embodiment of the present invention, the step S300 of performing feature extraction based on the skeleton difference data to obtain skeleton data features and skeleton difference data features includes the following steps:

(1) constructing a skeleton difference data coordinate system based on the skeleton difference data and preset time;

(2) extracting a skeleton difference change image within a preset time based on the skeleton difference data coordinate system;

(3) and obtaining skeleton data characteristics and skeleton difference data characteristics based on the skeleton difference change image.

In the embodiment of the present invention, a skeleton difference data coordinate system is constructed according to the skeleton difference data and the preset time, the skeleton difference data coordinate system is shown in fig. 4, a change condition of a skeleton position within the preset time is obtained based on the skeleton difference data coordinate system, a skeleton difference change image is extracted according to the change condition, a whole skeleton change image is constructed according to the skeleton difference change image, and a skeleton data feature and a skeleton difference data feature are extracted from the skeleton change image. According to the embodiment of the invention, the skeleton data image and the skeleton difference data are obtained through the skeleton data, so that all action change conditions of the target object can be analyzed more comprehensively, the loss of characteristic information of a lower layer is avoided, and the identification accuracy of the action behavior of the target object is further improved.

In an optional embodiment of the present invention, the performing feature data splicing and fusion based on the joint data features and the joint difference data features in step S400 to obtain the joint splicing features includes the following steps:

(1) constructing a first network layer based on the joint data characteristics and the joint difference data characteristics;

(2) performing data sorting based on the first network layer to obtain a first sorting result;

(3) and performing characteristic data splicing and fusion based on the first sequencing result to obtain joint splicing characteristics.

Specifically, a network layer is constructed based on the joint data features and the joint difference data features, the bone data and the bone difference data features, and the network layer is as shown in fig. 5, wherein a first network layer is constructed based on the joint data features and the joint difference data features, the joint data features and the joint difference data features of different layers are comprehensively sequenced according to the first network layer, joint feature data are spliced based on a first sequencing result to obtain joint splicing features, and joint behavior prediction scores are obtained based on the joint splicing features.

Illustratively, the joint data features and the joint difference data features of the different layers are respectively recorded as

And

wherein L is the maximum feature using layer number, the corresponding number can be, for example, increasing from deep to shallow, and the number is spliced by the channel dimensionThe joint data characteristics and the joint difference data characteristics are spliced, namely the joint data characteristics and the joint difference data characteristics are obtained

Wherein the cat function is characterized

And features

Spliced in the channel dimension as

In the embodiment of the invention, a first network layer is constructed based on the joint data characteristics and the joint difference data characteristics, the joint data characteristics and the joint difference data characteristics of different layers are comprehensively sequenced according to the first network layer, and the joint characteristic data are spliced based on a first sequencing result to obtain joint splicing characteristics. According to the embodiment of the invention, the joint data characteristics and the joint difference data characteristics are spliced and combined by constructing the first network layer, so that the change condition of the joint caused by the movement of the target object in the preset time can be accurately obtained, and the behavior characteristics of the target object can be more comprehensively analyzed.

In an optional embodiment of the present invention, the performing feature data splicing and fusion based on the bone data and the bone difference data feature in the step S400 to obtain a bone splicing feature includes the following steps:

(1) constructing a second network layer based on the bone data and the bone difference data characteristics;

(2) performing data sorting based on the second network layer to obtain a second sorting result;

(3) and performing feature data splicing and fusion based on the second sequencing result to obtain bone splicing features.

Specifically, a network layer is constructed based on the joint data feature and the joint difference data feature, bone data and a bone difference data feature, and the network layer is as shown in fig. 5, wherein a second network layer is constructed based on the bone data and the bone difference data feature, the bone data and the bone difference data feature of different layers are comprehensively sorted according to the second network layer, the bone feature data are spliced based on a second sorting result to obtain a bone splicing feature, and a bone behavior prediction score is obtained based on the bone splicing feature.

Illustratively, the bone data and bone difference data features of the different layers are respectively noted as

And

wherein L is the maximum number of feature layers, the corresponding number may be, for example, gradually increased from the deep layer to the shallow layer, and the bone data and the bone difference data feature are spliced by a channel dimension splicing method, that is, the number is the maximum number of feature layers

Wherein the cat function is characterized

And features

Spliced in the channel dimension as

In the embodiment of the invention, a second network layer is constructed based on the bone data and the bone difference data characteristics, the bone data and the bone difference data characteristics of different layers are comprehensively sequenced according to the second network layer, and the bone characteristic data are spliced based on a second sequencing result to obtain the bone splicing characteristics. According to the embodiment of the invention, the second network layer is constructed, and the bone data and the bone difference data characteristics are spliced and combined, so that the change condition of the bone position caused by the movement of the target object in the preset time can be accurately obtained, and the behavior characteristics of the target object can be more comprehensively analyzed.

In an optional embodiment of the present invention, in step S500, the performing enhanced fusion on the key position features of the branches with different dimensions, the joint splicing features, and the bone splicing features, respectively, to obtain the motion classification prediction result, includes the following steps:

(1) establishing a fusion layer based on the joint splicing characteristics and the bone splicing characteristics;

(2) extracting key position feature information based on the fusion layer, wherein the key position is obtained based on the skeleton data feature and the skeleton difference data feature;

(3) obtaining a prediction value of the skeleton data based on the key position feature information;

(4) and obtaining an action classification prediction result based on the prediction numerical value.

Specifically, as shown in fig. 6, an average fusion layer is established based on the joint splicing feature and the bone splicing feature, key position feature information is extracted from the fusion layer, the key position is obtained by focusing on a skeleton data feature and a skeleton difference data feature from multiple visual angles, weight ratios of a spatial dimension branch, a temporal dimension branch and a parameter channel branch are calculated according to the joint splicing feature data and the bone splicing feature data of the average fusion layer, a prediction value of the fused skeleton data is obtained based on the weight ratios and the key position feature information, and a motion classification prediction result of the target object is obtained according to the prediction value.

Illustratively, as shown in fig. 7, the fusion of the joint data and the joint difference data may be represented as

Wherein the joint data is f_I1∈R^C×T×SData of joint difference f_I2∈R^C×T×S、W_com1∈R^C×T×SAs input features f_I1Attention from multiple anglesAttention weight, W, of force calculation_com2∈R^C×T×SAs input features f_I2Attention weight of multi-view attention calculation,

Is multiplied by corresponding elements of the matrix. The W is_com∈R^C×T×SThe calculating method comprises the following steps:

wherein Sig is a Sigmoid function. The spatial dimension branch attention weight W_s∈R^1×1×SThe calculation method comprises the following steps: w_s＝reshape(reshape(W_as+W_ms))、W_as＝FC_w2(ReLU(FC_w1(GAP(reshape(f_I)))))、W_as＝FC_w2(ReLU(FC_w1(GMP(reshape(f_I) ))) where reshape is a matrix dimension transform operation, f) is input_I∈R^C×T×SIs converted into f_I∈R^S×C×TGAP is the global average sampling operation, GMP is the global maximum sampling operation, FC_w1Is a full link layer of weight w1, and

r is the channel reduction factor, ReLU is the ReLU function. The parametric channel branch attention weight W_c∈R^C×1×1The calculation method comprises the following steps: w_c＝W_ac+W_mc、W_as＝FC_w4(ReLU(FC_w3(GAP(f_I))))、W_as＝FC_w4(ReLU(FC_w3(GAP(f_I) ) of a plurality of chemical entities, in which),

the time dimension branch attention weight W_t∈R^1×T×1The calculation method comprises the following steps: w_t＝AP_s(Sig(Conv9(AP_c(f_I) ))) wherein, AP_cAs a parameter generalAverage sampling operation on a lane branch, AP_sFor the average sampling operation on the spatial dimension branch, Conv9 is a one-dimensional convolution operation with a convolution kernel size of 9.

For example, as shown in fig. 7, the bone data and the bone difference data are fused and spliced to obtain the weight ratios of the space dimension branch, the time dimension branch and the parameter channel branch, and the weight ratio calculation processes of the space dimension branch, the time dimension branch and the parameter channel branch obtained by the fused and spliced method are the same as the weight ratios of the space dimension branch, the time dimension branch and the parameter channel branch obtained by the fused and spliced method using the joint data and the joint difference data, which is not described herein again.

Illustratively, a prediction numerical value of fused skeleton data is obtained based on the weight ratio and the key position feature information, a skeleton data prediction score is obtained according to the joint data and the skeleton data prediction score, and a skeleton difference data prediction score is obtained according to the joint difference data prediction score and the skeleton difference data prediction score and is respectively marked as

Calculating a predicted value of a network layer from the prediction score

Wherein α, β, γ are the weight parameters of the spatial dimension branch, the time dimension branch, and the parameter channel branch, and Mod is in the range of { j, b }.

In the embodiment of the invention, an average fusion layer is established based on the joint splicing characteristics and the bone splicing characteristics, key position characteristic information is extracted from the fusion layer, the key position is obtained by focusing on skeleton data characteristics and skeleton difference data characteristics in multi-view attention, the weight ratios of a space dimension branch, a time dimension branch and a parameter channel branch are calculated and obtained according to the joint splicing characteristic data and the bone splicing characteristic data of the average fusion layer, the prediction value of the fused skeleton data is obtained based on the weight ratios and the key position characteristic information, and the action classification prediction result of the target object is obtained according to the prediction value. According to the embodiment of the invention, the joint prediction score, the skeleton prediction score and the fused skeleton prediction score are respectively calculated, and the prediction value of the whole network image layer is comprehensively obtained, so that the behavior characteristics of the target object can be more comprehensively and accurately identified, and the prediction is more accurate.

As shown in fig. 8, an embodiment of the present invention provides a skeletal human behavior recognition apparatus, which includes an obtaining module 1, a calculating module 2, a feature extracting module 3, a fusing module 4, and a predicting module 5, wherein,

an obtaining module 1, configured to obtain skeleton data of a target object, where the skeleton data includes joint data and skeleton data, and details may be referred to in the related description of step S100 of any of the above method embodiments;

a calculating module 2, configured to calculate joint difference data and bone difference data based on the key points of the skeleton data, for details, see the related description of step S200 in any of the above embodiments of the method;

a feature extraction module 3, configured to perform feature extraction based on the joint difference data and the bone difference data to obtain a skeleton data feature and a skeleton difference data feature, and obtain a joint data feature and a joint difference data feature based on the skeleton data feature, where details may refer to relevant description of step S300 in any of the above method embodiments;

a fusion module 4, configured to perform feature data splicing and fusion based on the joint data feature and the joint difference data feature, the skeleton data feature, and the skeleton difference data feature, respectively, to obtain a joint splicing feature and a skeleton splicing feature, for details, see the related description of step S400 in any of the above method embodiments;

the prediction module 5 is configured to perform enhanced fusion on the key position features of the branches with different dimensions, the joint splicing features, and the bone splicing features, to obtain an action classification prediction result, and the detailed contents may refer to the related description of step S500 in any of the above method embodiments.

The embodiment of the invention provides a skeleton human behavior recognition device, which collects motion information of a target object through collection equipment, extracts skeleton data of the target object based on the collected motion information, obtains key points of the skeleton data by preprocessing the skeleton data, obtains joint data with different dimensions based on the key points, obtains joint difference data and skeleton difference data generated by motion transformation of the target object within preset time based on the joint data, performs characteristic data splicing and fusion based on the joint data characteristic and the joint difference data characteristic, the skeleton data characteristic and the skeleton difference data characteristic, obtains the joint splicing characteristic and the skeleton splicing characteristic according to a fused integral skeleton and a joint, performs strengthening and fusion again on the key position characteristic of the skeleton with different dimensions, the joint splicing characteristic and the skeleton splicing characteristic, and acquiring a complete skeleton, and analyzing the action of the target object based on the complete skeleton to obtain an action classification prediction result. According to the embodiment of the invention, the skeleton data of the action of the target object is collected to obtain the data characteristic information of different dimensions, so that the action and the behavior of the target object are analyzed, the accuracy of the recognition task can be improved, and the influence of image noise is reduced.

For specific limitations and beneficial effects of the skeleton human behavior recognition device, reference may be made to the above limitations on the skeleton human behavior recognition method, which is not described herein again. All modules of the skeleton human behavior recognition device can be completely or partially realized through software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the electronic device, or can be stored in a memory in the electronic device in a software form, so that the processor can call and execute operations corresponding to the modules.

Fig. 9 is a schematic structural diagram of a skeleton human behavior recognition apparatus according to an alternative embodiment of the present invention, where the skeleton human behavior recognition apparatus may include at least one processor 41, at least one communication interface 42, at least one communication bus 43, and at least one memory 44, where the communication interface 42 may include a Display screen (Display) and a Keyboard (Keyboard), and the alternative communication interface 42 may further include a standard wired interface and a standard wireless interface. The Memory 44 may be a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 44 may alternatively be at least one memory device located remotely from the aforementioned processor 41. Wherein the processor 41 may be combined with the apparatus described in fig. 8, the memory 44 stores an application program, and the processor 41 calls the program code stored in the memory 44 for executing the steps of the skeletal human behavior recognition method of any of the above-mentioned method embodiments.

The communication bus 43 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 43 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

The memory 44 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: SSD); the memory 44 may also comprise a combination of the above-mentioned kinds of memories.

The processor 41 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of CPU and NP.

The processor 41 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

Optionally, the memory 44 is also used to store program instructions. Processor 41 may invoke program instructions to implement the skeletal human behavior recognition method as shown in the embodiment of fig. 1 of the present invention.

The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the skeleton human behavior identification method in any method embodiment. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims

1. A skeleton human behavior recognition method is characterized by comprising the following steps:

acquiring skeleton data of a target object, wherein the skeleton data comprises joint data and skeleton data;

calculating key points based on the skeleton data to obtain joint difference data and skeleton difference data;

performing feature extraction based on the joint difference data and the skeleton difference data to obtain skeleton data features and skeleton difference data features, and obtaining joint data features and joint difference data features based on the skeleton data features;

performing characteristic data splicing fusion respectively based on the joint data characteristics and the joint difference data characteristics, and the bone data and the bone difference data characteristics to obtain joint splicing characteristics and bone splicing characteristics;

and performing reinforced fusion on the key position characteristics of the branches with different dimensions, the joint splicing characteristics and the bone splicing characteristics respectively to obtain an action classification prediction result.

2. The method for recognizing skeleton human behavior according to claim 1, wherein calculating joint difference data based on key points of the skeleton data comprises:

extracting joint data based on the skeleton data;

establishing a joint coordinate system based on the joint data;

and extracting joint change data in preset time based on the joint coordinate system, and calculating a difference value in the preset time based on the joint change data and the joint data to obtain joint difference data.

3. The skeletal human behavior recognition method according to claim 1, wherein calculating skeletal difference data based on the key points of the skeletal data comprises:

extracting skeletal data based on the skeletal data;

establishing a bone coordinate system based on the bone data;

and extracting bone position change data within preset time based on the bone coordinate system, and calculating a difference value within the preset time based on the bone position change data and the bone data to obtain bone difference data.

4. The method for recognizing skeleton human behavior according to claim 1, wherein performing feature extraction based on the skeleton difference data to obtain skeleton data features and skeleton difference data features comprises:

constructing a skeleton difference data coordinate system based on the skeleton difference data and preset time;

extracting a skeleton difference change image within a preset time based on the skeleton difference data coordinate system;

and obtaining skeleton data characteristics and skeleton difference data characteristics based on the skeleton difference change image.

5. The skeleton human behavior recognition method according to claim 1, wherein performing feature data stitching fusion based on the joint data features and joint difference data features to obtain joint stitching features comprises:

constructing a first network layer based on the joint data characteristics and the joint difference data characteristics;

performing data sorting based on the first network layer to obtain a first sorting result;

and performing characteristic data splicing and fusion based on the first sequencing result to obtain joint splicing characteristics.

6. The method for recognizing skeleton human behavior according to claim 1, wherein performing feature data splicing fusion based on the skeleton data and the skeleton difference data features to obtain skeleton splicing features comprises:

constructing a second network layer based on the bone data and the bone difference data characteristics;

performing data sorting based on the second network layer to obtain a second sorting result;

and performing feature data splicing and fusion based on the second sequencing result to obtain bone splicing features.

7. The method for recognizing the skeleton human behavior according to claim 1, wherein the step of performing enhanced fusion on key position features of branches with different dimensions, the joint splicing features and the bone splicing features respectively to obtain motion classification prediction results comprises the steps of:

establishing a fusion layer based on the joint splicing characteristics and the bone splicing characteristics;

extracting key position feature information based on the fusion layer, wherein the key position is obtained based on the skeleton data feature and the skeleton difference data feature;

obtaining a prediction value of the skeleton data based on the key position feature information;

and obtaining an action classification prediction result based on the prediction numerical value.

8. A skeletal human behavior recognition device, comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring skeleton data of a target object, and the skeleton data comprises joint data and skeleton data;

the computing module is used for computing key points based on the skeleton data to obtain joint difference data and skeleton difference data;

the characteristic extraction module is used for extracting characteristics based on the joint difference data and the skeleton difference data to obtain skeleton data characteristics and skeleton difference data characteristics, and obtaining joint data characteristics and joint difference data characteristics based on the skeleton data characteristics;

the fusion module is used for performing characteristic data splicing fusion respectively based on the joint data characteristic and the joint difference data characteristic, and the skeleton data characteristic and the skeleton difference data characteristic to obtain a joint splicing characteristic and a skeleton splicing characteristic;

and the prediction module is used for performing reinforced fusion on the key position characteristics of the branches with different dimensions, the joint splicing characteristics and the bone splicing characteristics respectively to obtain an action classification prediction result.

9. A skeletal human behavior recognition device, comprising:

a communication unit, a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor performing the steps of the method according to any one of claims 1 to 7 by executing the computer instructions.

10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the steps of the method of any one of claims 1-7.