CN114582012A - Skeleton human behavior recognition method, device and equipment - Google Patents

Skeleton human behavior recognition method, device and equipment Download PDF

Info

Publication number
CN114582012A
CN114582012A CN202111616700.XA CN202111616700A CN114582012A CN 114582012 A CN114582012 A CN 114582012A CN 202111616700 A CN202111616700 A CN 202111616700A CN 114582012 A CN114582012 A CN 114582012A
Authority
CN
China
Prior art keywords
data
skeleton
joint
splicing
bone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111616700.XA
Other languages
Chinese (zh)
Inventor
邓浩阳
柯少杰
罗印威
张阳
何志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202111616700.XA priority Critical patent/CN114582012A/en
Publication of CN114582012A publication Critical patent/CN114582012A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Abstract

The embodiment of the invention provides a method, a device and equipment for identifying skeleton human behaviors, wherein the method comprises the following steps: acquiring skeleton data of a target object; calculating key points based on the skeleton data to obtain joint difference data and skeleton difference data; performing feature extraction based on the joint difference data and the skeleton difference data to obtain skeleton data features and skeleton difference data features, and obtaining joint data features and joint difference data features based on the skeleton data features; performing characteristic data splicing fusion respectively based on the joint data characteristics and the joint difference data characteristics, and the bone data and the bone difference data characteristics to obtain joint splicing characteristics and bone splicing characteristics; and performing reinforced fusion on the key position characteristics of the branches with different dimensions, the joint splicing characteristics and the bone splicing characteristics respectively to obtain an action classification prediction result. The invention identifies the characteristics of the detail information of the human body, so that the human body behavior identification is more accurate.

Description

Skeleton human behavior recognition method, device and equipment
Technical Field
The invention relates to the technical field of human behavior recognition, in particular to a skeleton human behavior recognition method, device and equipment.
Background
Human behavior recognition is a popular research topic in the field of machine vision, and aims to capture and extract spatial and temporal feature information of human motion and further determine the motion type of human according to the feature information. The skeleton human behavior recognition method is a human behavior recognition method using skeleton data extracted directly or indirectly from human actions as input data. Compared with RGB video data, the skeleton data has the advantages of insensitive environmental interference, high effective information density, small data storage space and the like.
In the current skeleton human behavior identification method, a Microsoft Kinect sensor is used for directly collecting skeleton data with labels or an OpenPose algorithm is used for extracting skeleton data from an RGB human motion video to be used as input data, and a cyclic neural network method, a convolutional neural network method and a graph convolution neural network method in deep learning are used for identifying human behaviors of the skeleton data. However, the existing human behavior recognition methods are modification of a single data feature extraction network, or fusion of skeleton data and derived data thereof is performed by a prediction score addition fusion method based on single data, and these methods lack sufficient attention to the derived data and the fused data, which results in insufficient use of information of the skeleton data, and further affects final recognition accuracy.
Disclosure of Invention
Therefore, the technical problem to be solved by the present invention is to overcome the defect of insufficient information use of skeleton data in the prior art, and to provide a skeleton human behavior recognition method, device and equipment.
According to a first aspect, an embodiment of the present invention provides a skeleton human behavior recognition method, including: acquiring skeleton data of a target object, wherein the skeleton data comprises joint data and skeleton data; calculating key points based on the skeleton data to obtain joint difference data and skeleton difference data; performing feature extraction based on the joint difference data and the skeleton difference data to obtain skeleton data features and skeleton difference data features, and obtaining joint data features and joint difference data features based on the skeleton data features; performing characteristic data splicing fusion respectively based on the joint data characteristic and the joint difference data characteristic, and the skeleton data characteristic and the skeleton difference data characteristic to obtain a joint splicing characteristic and a skeleton splicing characteristic; and performing reinforced fusion on the key position characteristics of the branches with different dimensions, the joint splicing characteristics and the bone splicing characteristics respectively to obtain action classification prediction results.
Optionally, the calculating joint difference data based on the key points of the skeleton data includes: extracting joint data based on the skeleton data; establishing a joint coordinate system based on the joint data; and extracting joint change data in preset time based on the joint coordinate system, and calculating a difference value in the preset time based on the joint change data and the joint data to obtain joint difference data.
Optionally, the calculating skeletal difference data based on the key points of the skeletal data includes: extracting skeletal data based on the skeletal data; establishing a bone coordinate system based on the bone data; and extracting bone position change data within preset time based on the bone coordinate system, and calculating a difference value within the preset time based on the bone position change data and the bone data to obtain bone difference data.
Optionally, the extracting features based on the skeleton difference data to obtain skeleton data features and skeleton difference data features includes: constructing a skeleton difference data coordinate system based on the skeleton difference data and preset time; extracting a skeleton difference change image within preset time based on the skeleton difference data coordinate system; and obtaining skeleton data characteristics and skeleton difference data characteristics based on the skeleton difference change image.
Optionally, performing feature data splicing and fusion based on the joint data features and the joint difference data features to obtain joint splicing features, including: constructing a network layer based on the joint data characteristics and the joint difference data characteristics; performing data sorting based on the first network layer to obtain a first sorting result; and performing characteristic data splicing and fusion based on the first sequencing result to obtain joint splicing characteristics.
Optionally, performing feature data splicing and fusion based on the skeleton data features and the skeleton difference data features to obtain skeleton splicing features, including: constructing a network layer based on the skeleton data characteristics and the skeleton difference data characteristics; performing data sorting based on the second network layer to obtain a second sorting result; and performing feature data splicing and fusion based on the second sequencing result to obtain bone splicing features.
Optionally, performing enhanced fusion on the key position features of the branches with different dimensions, the joint splicing features and the bone splicing features respectively to obtain motion classification prediction results, including: establishing a fusion layer based on the joint splicing characteristics and the bone splicing characteristics; extracting key position feature information based on the fusion layer, wherein the key position is obtained based on the skeleton data feature and the skeleton difference data feature; obtaining a prediction value of the skeleton data based on the key position feature information; and obtaining an action classification prediction result based on the prediction numerical value.
According to a second aspect, an embodiment of the present invention provides a skeletal human behavior recognition apparatus, including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring skeleton data of a target object, and the skeleton data comprises joint data and skeleton data; the computing module is used for computing key points based on the skeleton data to obtain joint difference data and skeleton difference data; the characteristic extraction module is used for extracting characteristics based on the joint difference data and the skeleton difference data to obtain skeleton data characteristics and skeleton difference data characteristics, and obtaining joint data characteristics and joint difference data characteristics based on the skeleton data characteristics; the fusion module is used for performing characteristic data splicing fusion respectively based on the joint data characteristic and the joint difference data characteristic, and the skeleton data characteristic and the skeleton difference data characteristic to obtain a joint splicing characteristic and a skeleton splicing characteristic; and the prediction module is used for performing reinforced fusion on the key position characteristics of the branches with different dimensions, the joint splicing characteristics and the bone splicing characteristics respectively to obtain an action classification prediction result.
According to a third aspect, a skeletal human behavior recognition device comprises: the skeleton human behavior recognition method comprises a memory and a processor, wherein the memory and the processor are connected in a communication mode, computer instructions are stored in the memory, and the processor executes the computer instructions to execute the skeleton human behavior recognition method according to the first aspect or any one of the optional modes.
According to a fourth aspect, a computer-readable storage medium is characterized by storing computer instructions for causing a computer to execute the skeletal human behavior recognition method of the first aspect or any one of the optional embodiments.
The technical scheme of the invention has the following advantages:
the embodiment of the invention provides a method, a device and equipment for identifying skeleton human behaviors, wherein the method comprises the following steps: obtaining skeleton data of a target object, calculating key points of the skeleton data to obtain joint difference data and skeleton difference data, extracting feature information based on the joint difference data and the skeleton difference data to obtain skeleton data features and skeleton difference features, obtaining joint data features and joint difference data features based on the skeleton data features, performing feature data splicing and fusion on the obtained joint data features and joint difference data features, the obtained skeleton data features and the obtained skeleton difference data features to obtain joint splicing features and skeleton splicing features, and performing enhanced fusion on key position features of branches with different dimensions and the joint splicing features and the skeleton splicing features respectively to obtain action classification prediction results. According to the invention, the joint and the skeleton in the human skeleton are respectively subjected to feature extraction, so that the detailed information of the human body can be identified, and meanwhile, more noise information interference is eliminated, so that the human behavior identification is more accurate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart illustrating a specific example of a skeleton human behavior recognition method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an example of a joint coordinate system of a skeleton human behavior recognition method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating an example of joint difference data of a skeleton human behavior recognition method according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating skeleton difference data of a skeleton human behavior recognition method according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a network layer of a skeleton human behavior recognition method according to an embodiment of the present invention;
fig. 6 is an exemplary diagram of an overall network layer of a skeleton human behavior recognition method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram illustrating an example of enhanced fusion of a skeleton human behavior recognition method according to an embodiment of the present invention;
fig. 8 is a schematic diagram illustrating an exemplary structure of a skeleton human behavior recognition device according to an embodiment of the present invention;
fig. 9 is a schematic diagram illustrating an example of connection of a skeleton human behavior recognition device according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
According to the embodiment of the invention, the relevant data of the joints and the bones are extracted based on the skeleton data of the target object, and then the relevant data of all the bones of the joints are fused, so that the motion classification prediction result is obtained. In the following embodiments, the human skeleton is taken as an example, and the present invention can also be applied to the motion recognition of other skeletons, which is not limited in this application.
Fig. 1 shows a flowchart of a skeleton human behavior recognition method according to an embodiment of the present invention, where the method specifically includes the following steps:
s100: skeleton data of a target object is acquired, the skeleton data including joint data and skeleton data.
Specifically, motion information of a target object is collected through a collection device, and skeleton data of the target object is extracted based on the collected motion information, wherein the skeleton data comprises joint data and skeleton data of the target object. In practical applications, the capturing device may be, for example, a Microsoft Kinect sensor, a video capturing device.
S200: and calculating to obtain joint difference data and skeleton difference data based on the key points of the skeleton data.
Specifically, key points of skeleton data are obtained by preprocessing the skeleton data, joint data in a channel dimension, a time dimension and a space dimension are obtained based on the key points, a data frame is calculated in the time dimension based on the joint data D, and the data form is
Figure BDA0003436537610000071
Wherein, t0Obtaining a target object motion transformation product in a preset time based on joint data, wherein the time frame, C, T and S of the joint data respectively represent a channel dimension, a time dimension and a space dimension of the skeleton dataGenerating joint difference data, obtaining bone data based on the space dimension, and obtaining bone difference data generated by action transformation of the target object within preset time based on the bone data and the time dimension.
S300: and performing feature extraction based on the joint difference data and the skeleton difference data to obtain skeleton data features and skeleton difference data features, and obtaining joint data features and joint difference data features based on the skeleton data features.
Specifically, feature extraction is performed according to the joint difference data and the bone difference data, skeleton data features and skeleton difference data features are obtained according to the joint change difference features and the bone change difference features of the target object in the action process, and the joint data features and the joint difference data features are extracted from the skeleton data features.
S400: and performing characteristic data splicing fusion based on the joint data characteristic and the joint difference data characteristic, and the skeleton data characteristic and the skeleton difference data characteristic respectively to obtain a joint splicing characteristic and a skeleton splicing characteristic.
Specifically, feature data splicing and fusion are performed based on the joint data features and the joint difference data features, the skeleton data features and the skeleton difference data features, and joint splicing features and skeleton splicing features are obtained according to the integrated skeleton and the joints after fusion.
S500: and performing reinforced fusion on the key position characteristics of the branches with different dimensions, the joint splicing characteristics and the bone splicing characteristics respectively to obtain action classification prediction results.
Specifically, the key position features of the skeleton of the channel dimension, the time dimension and the space dimension, the joint splicing features and the skeleton splicing features are subjected to enhanced fusion again to obtain a complete skeleton subjected to multi-view attention enhanced fusion, and the action of the target object is analyzed based on the complete skeleton to obtain an action classification prediction result.
In the embodiment of the invention, the motion information of a target object is acquired through acquisition equipment, skeleton data of the target object is extracted based on the acquired motion information, the skeleton data is preprocessed to obtain key points of the skeleton data, joint data with different dimensions are obtained based on the key points, joint difference data and skeleton difference data generated by motion transformation of the target object in preset time are obtained based on the joint data, feature data splicing and fusion are carried out based on the joint data features and the joint difference data features, the joint splicing features and the skeleton difference data features are obtained according to the fused integral skeleton and joints, the key position features of the skeleton with different dimensions, the joint splicing features and the skeleton splicing features are subjected to strengthening fusion again to obtain the integral skeleton, and the motion of the target object is analyzed based on the integral skeleton, and obtaining an action classification prediction result. According to the embodiment of the invention, the data characteristic information of different dimensions is obtained by collecting the skeleton data of the action of the target object, so that the action and the behavior of the target object are analyzed, the accuracy of the identification task can be improved, and the influence of image noise is reduced.
In an optional embodiment of the present invention, the step S200 of calculating joint difference data based on the key points of the skeleton data includes the following steps:
(1) extracting joint data based on the skeleton data;
(2) establishing a joint coordinate system based on the joint data;
(3) and extracting joint change data in preset time based on the joint coordinate system, and calculating a difference value in the preset time based on the joint change data and the joint data to obtain joint difference data.
Specifically, joint data are extracted according to the skeleton data, a joint coordinate system is established according to the joint data, the joint coordinate system is shown in fig. 2, joint points of a target object are used as an original point in the joint coordinate system, the joint points are naturally connected according to a human body skeleton, joint change data in a preset time are obtained based on the joint coordinate system, and the joint change data and a difference value of the joint data in the preset time are calculated to obtain joint difference data.
Illustratively, the data of the joint data extracted at the t moment in the preset time is marked as (x)t1,yt2,zt3) Calculating a difference value of the same joint node in adjacent time dimensions based on the data, and expressing the difference value as (x)t1-xt1-1,ut1-yt1-1,zt1-zt1-1). In practical applications, in order to align the data and facilitate subsequent calculation, for example, the preset time 0 may be set as an average value of all data, and the process is shown in fig. 3, that is, the process is that
Figure BDA0003436537610000101
In the embodiment of the invention, joint data are extracted according to the skeleton data, a joint coordinate system is established according to the joint data, joint points of a target object are taken as an original point in the joint coordinate system, the joint points are naturally connected according to a human body skeleton, joint change data in preset time are obtained based on the joint coordinate system, and the joint change data and a difference value of the joint data in the preset time are calculated to obtain joint difference data. According to the embodiment of the invention, the joint coordinate system is established, so that the difference data of each joint in the preset time can be more accurately calculated, and the accuracy of identifying the action behavior of the target object is further improved.
In an optional embodiment of the present invention, the step S200 of obtaining the bone difference data based on the key point calculation of the skeleton data includes the following steps:
(1) extracting skeletal data based on the skeletal data;
(2) establishing a bone coordinate system based on the bone data;
(3) and extracting bone position change data within preset time based on the bone coordinate system, and calculating a difference value within the preset time based on the bone position change data and the bone data to obtain bone difference data.
Specifically, bone data are extracted from a space dimension according to the skeleton data, a bone coordinate system is established according to the bone data, the bone coordinate system is shown in fig. 2, joint points of a target object are taken as an origin in the bone coordinate system, the joint points are naturally connected according to a human body skeleton, connecting lines are bone coordinates of the target object, bone change data in a preset time are obtained based on the bone coordinate system, and a difference value of the bone change data and the bone data in the preset time is calculated based on a time dimension of the skeleton data, so that bone difference data are obtained.
Illustratively, extracting the data of the bone data at the time t within the preset time is marked as (x)t2,yt2,zt2) Calculating a difference value for the corresponding bone change in adjacent time dimensions, expressed as (x), based on said datat2-xt2-1,yt2-yt2-1,zt2-zt2-1). In practical applications, in order to align the data and facilitate the subsequent calculation, for example, the preset time 0 can be set as the average value of all the data, and the process is shown in fig. 3, that is, the process is that
Figure BDA0003436537610000111
In an optional embodiment of the present invention, the step S300 of performing feature extraction based on the skeleton difference data to obtain skeleton data features and skeleton difference data features includes the following steps:
(1) constructing a skeleton difference data coordinate system based on the skeleton difference data and preset time;
(2) extracting a skeleton difference change image within a preset time based on the skeleton difference data coordinate system;
(3) and obtaining skeleton data characteristics and skeleton difference data characteristics based on the skeleton difference change image.
In the embodiment of the present invention, a skeleton difference data coordinate system is constructed according to the skeleton difference data and the preset time, the skeleton difference data coordinate system is shown in fig. 4, a change condition of a skeleton position within the preset time is obtained based on the skeleton difference data coordinate system, a skeleton difference change image is extracted according to the change condition, a whole skeleton change image is constructed according to the skeleton difference change image, and a skeleton data feature and a skeleton difference data feature are extracted from the skeleton change image. According to the embodiment of the invention, the skeleton data image and the skeleton difference data are obtained through the skeleton data, so that all action change conditions of the target object can be analyzed more comprehensively, the loss of characteristic information of a lower layer is avoided, and the identification accuracy of the action behavior of the target object is further improved.
In an optional embodiment of the present invention, the performing feature data splicing and fusion based on the joint data features and the joint difference data features in step S400 to obtain the joint splicing features includes the following steps:
(1) constructing a first network layer based on the joint data characteristics and the joint difference data characteristics;
(2) performing data sorting based on the first network layer to obtain a first sorting result;
(3) and performing characteristic data splicing and fusion based on the first sequencing result to obtain joint splicing characteristics.
Specifically, a network layer is constructed based on the joint data features and the joint difference data features, the bone data and the bone difference data features, and the network layer is as shown in fig. 5, wherein a first network layer is constructed based on the joint data features and the joint difference data features, the joint data features and the joint difference data features of different layers are comprehensively sequenced according to the first network layer, joint feature data are spliced based on a first sequencing result to obtain joint splicing features, and joint behavior prediction scores are obtained based on the joint splicing features.
Illustratively, the joint data features and the joint difference data features of the different layers are respectively recorded as
Figure BDA0003436537610000121
And
Figure BDA0003436537610000122
wherein L is the maximum feature using layer number, the corresponding number can be, for example, increasing from deep to shallow, and the number is spliced by the channel dimensionThe joint data characteristics and the joint difference data characteristics are spliced, namely the joint data characteristics and the joint difference data characteristics are obtained
Figure BDA0003436537610000131
Wherein the cat function is characterized
Figure BDA0003436537610000132
And features
Figure BDA0003436537610000133
Spliced in the channel dimension as
Figure BDA0003436537610000134
In the embodiment of the invention, a first network layer is constructed based on the joint data characteristics and the joint difference data characteristics, the joint data characteristics and the joint difference data characteristics of different layers are comprehensively sequenced according to the first network layer, and the joint characteristic data are spliced based on a first sequencing result to obtain joint splicing characteristics. According to the embodiment of the invention, the joint data characteristics and the joint difference data characteristics are spliced and combined by constructing the first network layer, so that the change condition of the joint caused by the movement of the target object in the preset time can be accurately obtained, and the behavior characteristics of the target object can be more comprehensively analyzed.
In an optional embodiment of the present invention, the performing feature data splicing and fusion based on the bone data and the bone difference data feature in the step S400 to obtain a bone splicing feature includes the following steps:
(1) constructing a second network layer based on the bone data and the bone difference data characteristics;
(2) performing data sorting based on the second network layer to obtain a second sorting result;
(3) and performing feature data splicing and fusion based on the second sequencing result to obtain bone splicing features.
Specifically, a network layer is constructed based on the joint data feature and the joint difference data feature, bone data and a bone difference data feature, and the network layer is as shown in fig. 5, wherein a second network layer is constructed based on the bone data and the bone difference data feature, the bone data and the bone difference data feature of different layers are comprehensively sorted according to the second network layer, the bone feature data are spliced based on a second sorting result to obtain a bone splicing feature, and a bone behavior prediction score is obtained based on the bone splicing feature.
Illustratively, the bone data and bone difference data features of the different layers are respectively noted as
Figure BDA0003436537610000141
And
Figure BDA0003436537610000142
wherein L is the maximum number of feature layers, the corresponding number may be, for example, gradually increased from the deep layer to the shallow layer, and the bone data and the bone difference data feature are spliced by a channel dimension splicing method, that is, the number is the maximum number of feature layers
Figure BDA0003436537610000143
Wherein the cat function is characterized
Figure BDA0003436537610000144
And features
Figure BDA0003436537610000145
Spliced in the channel dimension as
Figure BDA0003436537610000146
In the embodiment of the invention, a second network layer is constructed based on the bone data and the bone difference data characteristics, the bone data and the bone difference data characteristics of different layers are comprehensively sequenced according to the second network layer, and the bone characteristic data are spliced based on a second sequencing result to obtain the bone splicing characteristics. According to the embodiment of the invention, the second network layer is constructed, and the bone data and the bone difference data characteristics are spliced and combined, so that the change condition of the bone position caused by the movement of the target object in the preset time can be accurately obtained, and the behavior characteristics of the target object can be more comprehensively analyzed.
In an optional embodiment of the present invention, in step S500, the performing enhanced fusion on the key position features of the branches with different dimensions, the joint splicing features, and the bone splicing features, respectively, to obtain the motion classification prediction result, includes the following steps:
(1) establishing a fusion layer based on the joint splicing characteristics and the bone splicing characteristics;
(2) extracting key position feature information based on the fusion layer, wherein the key position is obtained based on the skeleton data feature and the skeleton difference data feature;
(3) obtaining a prediction value of the skeleton data based on the key position feature information;
(4) and obtaining an action classification prediction result based on the prediction numerical value.
Specifically, as shown in fig. 6, an average fusion layer is established based on the joint splicing feature and the bone splicing feature, key position feature information is extracted from the fusion layer, the key position is obtained by focusing on a skeleton data feature and a skeleton difference data feature from multiple visual angles, weight ratios of a spatial dimension branch, a temporal dimension branch and a parameter channel branch are calculated according to the joint splicing feature data and the bone splicing feature data of the average fusion layer, a prediction value of the fused skeleton data is obtained based on the weight ratios and the key position feature information, and a motion classification prediction result of the target object is obtained according to the prediction value.
Illustratively, as shown in fig. 7, the fusion of the joint data and the joint difference data may be represented as
Figure BDA0003436537610000151
Wherein the joint data is fI1∈RC×T×SData of joint difference fI2∈RC×T×S、Wcom1∈RC×T×SAs input features fI1Attention from multiple anglesAttention weight, W, of force calculationcom2∈RC×T×SAs input features fI2Attention weight of multi-view attention calculation,
Figure BDA0003436537610000152
Is multiplied by corresponding elements of the matrix. The W iscom∈RC×T×SThe calculating method comprises the following steps:
Figure BDA0003436537610000153
wherein Sig is a Sigmoid function. The spatial dimension branch attention weight Ws∈R1×1×SThe calculation method comprises the following steps: ws=reshape(reshape(Was+Wms))、Was=FCw2(ReLU(FCw1(GAP(reshape(fI)))))、Was=FCw2(ReLU(FCw1(GMP(reshape(fI) ))) where reshape is a matrix dimension transform operation, f) is inputI∈RC×T×SIs converted into fI∈RS×C×TGAP is the global average sampling operation, GMP is the global maximum sampling operation, FCw1Is a full link layer of weight w1, and
Figure BDA0003436537610000161
Figure BDA0003436537610000162
r is the channel reduction factor, ReLU is the ReLU function. The parametric channel branch attention weight Wc∈RC×1×1The calculation method comprises the following steps: wc=Wac+Wmc、Was=FCw4(ReLU(FCw3(GAP(fI))))、Was=FCw4(ReLU(FCw3(GAP(fI) ) of a plurality of chemical entities, in which),
Figure BDA0003436537610000163
the time dimension branch attention weight Wt∈R1×T×1The calculation method comprises the following steps: wt=APs(Sig(Conv9(APc(fI) ))) wherein, APcAs a parameter generalAverage sampling operation on a lane branch, APsFor the average sampling operation on the spatial dimension branch, Conv9 is a one-dimensional convolution operation with a convolution kernel size of 9.
For example, as shown in fig. 7, the bone data and the bone difference data are fused and spliced to obtain the weight ratios of the space dimension branch, the time dimension branch and the parameter channel branch, and the weight ratio calculation processes of the space dimension branch, the time dimension branch and the parameter channel branch obtained by the fused and spliced method are the same as the weight ratios of the space dimension branch, the time dimension branch and the parameter channel branch obtained by the fused and spliced method using the joint data and the joint difference data, which is not described herein again.
Illustratively, a prediction numerical value of fused skeleton data is obtained based on the weight ratio and the key position feature information, a skeleton data prediction score is obtained according to the joint data and the skeleton data prediction score, and a skeleton difference data prediction score is obtained according to the joint difference data prediction score and the skeleton difference data prediction score and is respectively marked as
Figure BDA0003436537610000164
Calculating a predicted value of a network layer from the prediction score
Figure BDA0003436537610000165
Wherein α, β, γ are the weight parameters of the spatial dimension branch, the time dimension branch, and the parameter channel branch, and Mod is in the range of { j, b }.
In the embodiment of the invention, an average fusion layer is established based on the joint splicing characteristics and the bone splicing characteristics, key position characteristic information is extracted from the fusion layer, the key position is obtained by focusing on skeleton data characteristics and skeleton difference data characteristics in multi-view attention, the weight ratios of a space dimension branch, a time dimension branch and a parameter channel branch are calculated and obtained according to the joint splicing characteristic data and the bone splicing characteristic data of the average fusion layer, the prediction value of the fused skeleton data is obtained based on the weight ratios and the key position characteristic information, and the action classification prediction result of the target object is obtained according to the prediction value. According to the embodiment of the invention, the joint prediction score, the skeleton prediction score and the fused skeleton prediction score are respectively calculated, and the prediction value of the whole network image layer is comprehensively obtained, so that the behavior characteristics of the target object can be more comprehensively and accurately identified, and the prediction is more accurate.
As shown in fig. 8, an embodiment of the present invention provides a skeletal human behavior recognition apparatus, which includes an obtaining module 1, a calculating module 2, a feature extracting module 3, a fusing module 4, and a predicting module 5, wherein,
an obtaining module 1, configured to obtain skeleton data of a target object, where the skeleton data includes joint data and skeleton data, and details may be referred to in the related description of step S100 of any of the above method embodiments;
a calculating module 2, configured to calculate joint difference data and bone difference data based on the key points of the skeleton data, for details, see the related description of step S200 in any of the above embodiments of the method;
a feature extraction module 3, configured to perform feature extraction based on the joint difference data and the bone difference data to obtain a skeleton data feature and a skeleton difference data feature, and obtain a joint data feature and a joint difference data feature based on the skeleton data feature, where details may refer to relevant description of step S300 in any of the above method embodiments;
a fusion module 4, configured to perform feature data splicing and fusion based on the joint data feature and the joint difference data feature, the skeleton data feature, and the skeleton difference data feature, respectively, to obtain a joint splicing feature and a skeleton splicing feature, for details, see the related description of step S400 in any of the above method embodiments;
the prediction module 5 is configured to perform enhanced fusion on the key position features of the branches with different dimensions, the joint splicing features, and the bone splicing features, to obtain an action classification prediction result, and the detailed contents may refer to the related description of step S500 in any of the above method embodiments.
The embodiment of the invention provides a skeleton human behavior recognition device, which collects motion information of a target object through collection equipment, extracts skeleton data of the target object based on the collected motion information, obtains key points of the skeleton data by preprocessing the skeleton data, obtains joint data with different dimensions based on the key points, obtains joint difference data and skeleton difference data generated by motion transformation of the target object within preset time based on the joint data, performs characteristic data splicing and fusion based on the joint data characteristic and the joint difference data characteristic, the skeleton data characteristic and the skeleton difference data characteristic, obtains the joint splicing characteristic and the skeleton splicing characteristic according to a fused integral skeleton and a joint, performs strengthening and fusion again on the key position characteristic of the skeleton with different dimensions, the joint splicing characteristic and the skeleton splicing characteristic, and acquiring a complete skeleton, and analyzing the action of the target object based on the complete skeleton to obtain an action classification prediction result. According to the embodiment of the invention, the skeleton data of the action of the target object is collected to obtain the data characteristic information of different dimensions, so that the action and the behavior of the target object are analyzed, the accuracy of the recognition task can be improved, and the influence of image noise is reduced.
For specific limitations and beneficial effects of the skeleton human behavior recognition device, reference may be made to the above limitations on the skeleton human behavior recognition method, which is not described herein again. All modules of the skeleton human behavior recognition device can be completely or partially realized through software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the electronic device, or can be stored in a memory in the electronic device in a software form, so that the processor can call and execute operations corresponding to the modules.
Fig. 9 is a schematic structural diagram of a skeleton human behavior recognition apparatus according to an alternative embodiment of the present invention, where the skeleton human behavior recognition apparatus may include at least one processor 41, at least one communication interface 42, at least one communication bus 43, and at least one memory 44, where the communication interface 42 may include a Display screen (Display) and a Keyboard (Keyboard), and the alternative communication interface 42 may further include a standard wired interface and a standard wireless interface. The Memory 44 may be a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 44 may alternatively be at least one memory device located remotely from the aforementioned processor 41. Wherein the processor 41 may be combined with the apparatus described in fig. 8, the memory 44 stores an application program, and the processor 41 calls the program code stored in the memory 44 for executing the steps of the skeletal human behavior recognition method of any of the above-mentioned method embodiments.
The communication bus 43 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 43 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus.
The memory 44 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: SSD); the memory 44 may also comprise a combination of the above-mentioned kinds of memories.
The processor 41 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of CPU and NP.
The processor 41 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
Optionally, the memory 44 is also used to store program instructions. Processor 41 may invoke program instructions to implement the skeletal human behavior recognition method as shown in the embodiment of fig. 1 of the present invention.
The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the skeleton human behavior identification method in any method embodiment. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims (10)

1. A skeleton human behavior recognition method is characterized by comprising the following steps:
acquiring skeleton data of a target object, wherein the skeleton data comprises joint data and skeleton data;
calculating key points based on the skeleton data to obtain joint difference data and skeleton difference data;
performing feature extraction based on the joint difference data and the skeleton difference data to obtain skeleton data features and skeleton difference data features, and obtaining joint data features and joint difference data features based on the skeleton data features;
performing characteristic data splicing fusion respectively based on the joint data characteristics and the joint difference data characteristics, and the bone data and the bone difference data characteristics to obtain joint splicing characteristics and bone splicing characteristics;
and performing reinforced fusion on the key position characteristics of the branches with different dimensions, the joint splicing characteristics and the bone splicing characteristics respectively to obtain an action classification prediction result.
2. The method for recognizing skeleton human behavior according to claim 1, wherein calculating joint difference data based on key points of the skeleton data comprises:
extracting joint data based on the skeleton data;
establishing a joint coordinate system based on the joint data;
and extracting joint change data in preset time based on the joint coordinate system, and calculating a difference value in the preset time based on the joint change data and the joint data to obtain joint difference data.
3. The skeletal human behavior recognition method according to claim 1, wherein calculating skeletal difference data based on the key points of the skeletal data comprises:
extracting skeletal data based on the skeletal data;
establishing a bone coordinate system based on the bone data;
and extracting bone position change data within preset time based on the bone coordinate system, and calculating a difference value within the preset time based on the bone position change data and the bone data to obtain bone difference data.
4. The method for recognizing skeleton human behavior according to claim 1, wherein performing feature extraction based on the skeleton difference data to obtain skeleton data features and skeleton difference data features comprises:
constructing a skeleton difference data coordinate system based on the skeleton difference data and preset time;
extracting a skeleton difference change image within a preset time based on the skeleton difference data coordinate system;
and obtaining skeleton data characteristics and skeleton difference data characteristics based on the skeleton difference change image.
5. The skeleton human behavior recognition method according to claim 1, wherein performing feature data stitching fusion based on the joint data features and joint difference data features to obtain joint stitching features comprises:
constructing a first network layer based on the joint data characteristics and the joint difference data characteristics;
performing data sorting based on the first network layer to obtain a first sorting result;
and performing characteristic data splicing and fusion based on the first sequencing result to obtain joint splicing characteristics.
6. The method for recognizing skeleton human behavior according to claim 1, wherein performing feature data splicing fusion based on the skeleton data and the skeleton difference data features to obtain skeleton splicing features comprises:
constructing a second network layer based on the bone data and the bone difference data characteristics;
performing data sorting based on the second network layer to obtain a second sorting result;
and performing feature data splicing and fusion based on the second sequencing result to obtain bone splicing features.
7. The method for recognizing the skeleton human behavior according to claim 1, wherein the step of performing enhanced fusion on key position features of branches with different dimensions, the joint splicing features and the bone splicing features respectively to obtain motion classification prediction results comprises the steps of:
establishing a fusion layer based on the joint splicing characteristics and the bone splicing characteristics;
extracting key position feature information based on the fusion layer, wherein the key position is obtained based on the skeleton data feature and the skeleton difference data feature;
obtaining a prediction value of the skeleton data based on the key position feature information;
and obtaining an action classification prediction result based on the prediction numerical value.
8. A skeletal human behavior recognition device, comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring skeleton data of a target object, and the skeleton data comprises joint data and skeleton data;
the computing module is used for computing key points based on the skeleton data to obtain joint difference data and skeleton difference data;
the characteristic extraction module is used for extracting characteristics based on the joint difference data and the skeleton difference data to obtain skeleton data characteristics and skeleton difference data characteristics, and obtaining joint data characteristics and joint difference data characteristics based on the skeleton data characteristics;
the fusion module is used for performing characteristic data splicing fusion respectively based on the joint data characteristic and the joint difference data characteristic, and the skeleton data characteristic and the skeleton difference data characteristic to obtain a joint splicing characteristic and a skeleton splicing characteristic;
and the prediction module is used for performing reinforced fusion on the key position characteristics of the branches with different dimensions, the joint splicing characteristics and the bone splicing characteristics respectively to obtain an action classification prediction result.
9. A skeletal human behavior recognition device, comprising:
a communication unit, a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor performing the steps of the method according to any one of claims 1 to 7 by executing the computer instructions.
10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the steps of the method of any one of claims 1-7.
CN202111616700.XA 2021-12-27 2021-12-27 Skeleton human behavior recognition method, device and equipment Pending CN114582012A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111616700.XA CN114582012A (en) 2021-12-27 2021-12-27 Skeleton human behavior recognition method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111616700.XA CN114582012A (en) 2021-12-27 2021-12-27 Skeleton human behavior recognition method, device and equipment

Publications (1)

Publication Number Publication Date
CN114582012A true CN114582012A (en) 2022-06-03

Family

ID=81769829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111616700.XA Pending CN114582012A (en) 2021-12-27 2021-12-27 Skeleton human behavior recognition method, device and equipment

Country Status (1)

Country Link
CN (1) CN114582012A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117238026A (en) * 2023-07-10 2023-12-15 中国矿业大学 Gesture reconstruction interactive behavior understanding method based on skeleton and image features

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117238026A (en) * 2023-07-10 2023-12-15 中国矿业大学 Gesture reconstruction interactive behavior understanding method based on skeleton and image features
CN117238026B (en) * 2023-07-10 2024-03-08 中国矿业大学 Gesture reconstruction interactive behavior understanding method based on skeleton and image features

Similar Documents

Publication Publication Date Title
CN110751134B (en) Target detection method, target detection device, storage medium and computer equipment
CN112465748B (en) Crack identification method, device, equipment and storage medium based on neural network
CN112419202B (en) Automatic wild animal image recognition system based on big data and deep learning
CN113052295B (en) Training method of neural network, object detection method, device and equipment
CN111444986A (en) Building drawing component classification method and device, electronic equipment and storage medium
CN114049356B (en) Method, device and system for detecting structure apparent crack
CN110929685A (en) Pedestrian detection network structure based on mixed feature pyramid and mixed expansion convolution
CN111340213B (en) Neural network training method, electronic device, and storage medium
CN114332911A (en) Head posture detection method and device and computer equipment
CN111382638B (en) Image detection method, device, equipment and storage medium
CN111353429A (en) Interest degree method and system based on eyeball turning
CN109543744B (en) Multi-category deep learning image identification method based on Loongson group and application thereof
CN114582012A (en) Skeleton human behavior recognition method, device and equipment
CN113963446A (en) Behavior recognition method and system based on human skeleton
CN111652181B (en) Target tracking method and device and electronic equipment
CN113487610A (en) Herpes image recognition method and device, computer equipment and storage medium
CN113673308A (en) Object identification method, device and electronic system
CN111611917A (en) Model training method, feature point detection device, feature point detection equipment and storage medium
CN113496148A (en) Multi-source data fusion method and system
CN116229066A (en) Portrait segmentation model training method and related device
CN111626313A (en) Feature extraction model training method, image processing method and device
CN111488889B (en) Intelligent image processor for extracting image edges
CN113256556A (en) Image selection method and device
CN111178202B (en) Target detection method, device, computer equipment and storage medium
CN114549628A (en) Power pole inclination detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination