CN113963202A

CN113963202A - Skeleton point action recognition method and device, electronic equipment and storage medium

Info

Publication number: CN113963202A
Application number: CN202111215393.4A
Authority: CN
Inventors: 陈恩庆; 辛华磊; 高猛; 郭佳乐; 郭新; 吕小永; 丁英强; 马龙; 酒明远; 马双双; 张楠楠; 张爱菊; 刘晓娜
Original assignee: Henan Xintong Intelligent Iot Co ltd; Zhengzhou University
Current assignee: Henan Xintong Intelligent Iot Co ltd; Zhengzhou University
Priority date: 2021-10-19
Filing date: 2021-10-19
Publication date: 2022-01-21

Abstract

The application provides a skeleton point action identification method, a skeleton point action identification device, an electronic device and a storage medium, which are used for solving the problem of low accuracy rate of identification of skeleton point actions. The method comprises the following steps: obtaining skeletal point data of a target organism; carrying out batch normalization operation on the bone point data by using a batch normalization layer in the neural network model to obtain normalized data; calculating the normalized data by using a first module in the neural network model to obtain a first characteristic diagram; calculating the first characteristic diagram by using a plurality of second modules in the neural network model to obtain a second characteristic diagram, wherein the first module and the second module both comprise a batch perception attention BAM network; and classifying the second feature map by using a full connection layer in the neural network model to obtain a classification result, wherein the classification result represents the action category identified from the target organism.

Description

Skeleton point action recognition method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of skeleton point motion recognition, and in particular, to a skeleton point motion recognition method, apparatus, electronic device, and storage medium.

Background

The current human motion recognition field generally uses various modality data for motion recognition, such as RGB modality, depth modality, optical flow modality, and skeleton modality.

Early skeletal sequence-based motion recognition methods used artificially constructed features to classify motions, such as skeleton angles or skeleton orientations. Traditional deep learning-based methods manually construct a skeleton as a sequence of joint coordinate vectors or pseudo-images that are fed into a Recurrent Neural Network (RNN) or convolutional neural network to generate predictions. But the action recognition accuracy of the existing neural network action recognition method is low.

Disclosure of Invention

The embodiment of the invention aims to provide a skeleton point action identification method, a skeleton point action identification device, electronic equipment and a storage medium, which are used for improving the accuracy of the skeleton point action identification method.

In a first aspect, an embodiment of the present application provides a method for identifying a skeletal point action, including: obtaining skeletal point data of a target organism; carrying out batch normalization operation on the bone point data by using a batch normalization layer in the neural network model to obtain normalized data; calculating the normalized data by using a first module in the neural network model to obtain a first characteristic diagram; calculating the first characteristic diagram by using a plurality of second modules in the neural network model to obtain a second characteristic diagram, wherein the first module and the second module both comprise a batch perception attention BAM network; and classifying the second feature map by using a full connection layer in the neural network model to obtain a classification result, wherein the classification result represents the action category identified from the target organism. In the implementation process, the normalized data is sequentially calculated by using the first neural network module and the plurality of second neural network modules, and the calculation results are classified by using the full-connection layer so as to perform action recognition. The BAM network in the first module and the second module of the neural network model learns the similarity weight between different samples, so that the problem of low accuracy rate of identifying the skeleton point action is solved.

Optionally, in this embodiment of the present application, the first module further includes: a spatial graph convolutional network and a temporal convolutional network; calculating the normalized data by using a first module in the neural network model to obtain a first characteristic diagram, wherein the first characteristic diagram comprises the following steps: carrying out convolution operation on the normalized data by using a space map convolution network in a first module to obtain first characteristic data; calculating the first characteristic data by using a BAM network in a first module to obtain second characteristic data; and performing convolution operation on the second characteristic data by using a time convolution network in the first module to obtain a first characteristic diagram. In the implementation process, the normalized data is sequentially calculated by utilizing the space-time graph convolutional network, the BAM network and the time convolutional network of the first module to obtain the first characteristic diagram, and the importance weight among different samples in a batch is learned through the BAM network, so that the discrimination of the different samples in the batch in the image classification task is improved.

Optionally, in an embodiment of the present application, the BAM network includes: a softmax layer, a channel dimension convolution layer, a space dimension convolution layer and a time dimension convolution layer; calculating the first characteristic data by using a BAM network in a first module to obtain second characteristic data, wherein the method comprises the following steps: performing convolution operation on the first feature data by using a channel dimension convolution layer, a space dimension convolution layer and a time dimension convolution layer to obtain a first convolution feature; and performing similarity operation on the first volume characteristic by using a softmax layer to obtain second characteristic data. In the implementation process, the convolution operation is used for averaging the channel dimension, the time dimension and the space dimension to obtain a first convolution characteristic, and the softmax layer is used for performing similarity operation on the first convolution characteristic to obtain second characteristic data. The batch perception attention BAM network learns the importance weight among different samples in a batch, and improves the distinguishing degree of the different samples in the batch in an image classification task, so that the accuracy of identifying the action category through the bone point data is effectively improved.

Optionally, in this embodiment of the present application, performing similarity operation on the first convolution feature by using a softmax layer to obtain second feature data, where the obtaining step includes: performing similarity operation on the first convolution characteristics by using a softmax layer to obtain similarity data; and performing residual error operation on the similarity data and the first characteristic data to obtain second characteristic data. In the implementation process, residual error operation is carried out on the BAM network, so that the problem of gradient disappearance or gradient explosion caused by the deepening of the network is effectively solved.

Optionally, in this embodiment of the present application, the second module further includes: a space map convolutional network, a time convolutional network; calculating the first feature map by using a plurality of second modules in the neural network model to obtain a second feature map, wherein the method comprises the following steps: performing convolution operation on the first feature map by using a space map convolution network in the second module to obtain third feature data; calculating the third characteristic data by using a BAM network in the second module to obtain fourth characteristic data; performing convolution operation on the fourth characteristic data by using a time convolution network in the second module to obtain fifth characteristic data; and performing residual error operation on the first characteristic diagram and the fifth characteristic data to obtain a second characteristic diagram. In the implementation process, the normalized data is sequentially calculated by using the space-time diagram convolutional network, the BAM network and the time convolutional network of the second module, residual operation is performed, importance weights among different samples in one batch are learned through the BAM network, the discrimination of the different samples in one batch in an image classification task is improved, the situation of gradient disappearance or gradient explosion is avoided, and the accuracy of the neural network model is improved.

Optionally, in an embodiment of the present application, the BAM network includes: a softmax layer, a channel dimension convolution layer, a space dimension convolution layer and a time dimension convolution layer; calculating the third characteristic data by using the BAM network in the second module to obtain fourth characteristic data, wherein the fourth characteristic data comprises the following steps: performing convolution operation on the third feature data by using the channel dimension convolution layer, the space dimension convolution layer and the time dimension convolution layer to obtain a second convolution feature; and performing similarity operation on the second convolution characteristics by using a softmax layer to obtain fourth characteristic data. In the implementation process, by perceiving the softmax layer, the channel dimension convolutional layer, the space dimension convolutional layer and the time dimension convolutional layer in the BAM network in batches, the importance weight among different samples in one batch is learned from the channel dimension, the time dimension and the space dimension, the discrimination of the different samples in one batch in the image classification task is improved, and therefore the accuracy of identifying the action category through the bone point data is effectively improved.

Optionally, in an embodiment of the present application, the space map convolution includes: a human body natural joint adjacency matrix, a self-adaptive parameter matrix and a similarity matrix; performing convolution operation on the normalized data by using a space map convolution network in a first module to obtain first characteristic data, wherein the convolution operation comprises the following steps: acquiring a similarity matrix, and adding the similarity matrix, the human body natural joint adjacency matrix and the self-adaptive parameter matrix to obtain an adjacency matrix; performing matrix multiplication on the normalized data and the adjacent matrix to obtain first volume data; and performing two-dimensional convolution operation on the first convolution data to obtain first characteristic data. In the implementation process, the spatial graph convolution network uses three different matrixes to model the action characteristics, simultaneously pays attention to different characteristics between samples, and utilizes second-order information of the bone data, so that the accuracy of action identification is improved.

Optionally, in an embodiment of the present application, the time convolution network includes: a two-dimensional convolution layer, a batch normalization layer and an activation function layer; performing convolution operation on the second feature data by using a time convolution network in the first module to obtain a first feature map, wherein the convolution operation comprises the following steps: performing two-dimensional convolution operation on the second characteristic data to obtain second convolution data; and sequentially operating the second convolution data by using the quantity normalization layer and the activation function layer to obtain a first characteristic diagram.

The embodiment of the present application further provides a bone point motion recognition apparatus, including: the data acquisition module is used for acquiring skeletal point data of the target organism; the data normalization module is used for carrying out batch normalization operation on the bone point data to obtain normalized data; the first calculation module is used for calculating the normalized data to obtain a first characteristic diagram; the second calculation module is used for calculating the first characteristic diagram to obtain a second characteristic diagram, and the first calculation module and the second calculation module both comprise a batch perception attention BAM network; and the classification module is used for classifying the second feature map to obtain a classification result, and the classification result represents the action category identified from the target organism.

An embodiment of the present application further provides an electronic device, including: a processor and a memory, the memory storing processor-executable machine-readable instructions, the machine-readable instructions when executed by the processor performing the method as described above.

Embodiments of the present application also provide a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to perform the above-described method.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic flow chart of a bone point motion identification method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a neural network model provided by an embodiment of the present application;

fig. 3 is a schematic structural diagram of a first module provided in the embodiment of the present application;

fig. 4 is a schematic diagram of a BAM network structure provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a second module structure provided in the embodiments of the present application;

fig. 6 is a schematic structural diagram of a bone point motion recognition device provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are merely used to more clearly illustrate the technical solutions of the present application, and therefore are only examples, and the protection scope of the present application is not limited thereby.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

In the description of the embodiments of the present application, the technical terms "first", "second", and the like are used only for distinguishing different objects, and are not to be construed as indicating or implying relative importance or implicitly indicating the number, specific order, or primary-secondary relationship of the technical features indicated. In the description of the embodiments of the present application, "a plurality" means two or more unless specifically defined otherwise.

Before describing the image recognition analysis method provided by the embodiment of the present application, some concepts related to the embodiment of the present application are described:

the neural network model refers to a neural network model obtained by training an untrained neural network by using preset training data, where the preset training data may be set according to specific actual conditions, for example: in the task of image recognition, the preset training data refers to the image to be recognized, and in the process of supervised learning training, a correct label needs to be set for the training data.

And (3) identifying the action of the bone points: usually, Human body pose Estimation (Human post Estimation) is adopted, that is, key points of a Human body detected in a picture are correctly associated, so that Estimation of the Human body pose is realized, and detection of the Human body is realized. The key points of the human body usually correspond to joints with a certain degree of freedom on the human body, such as neck, shoulder, elbow, wrist, waist, knee, ankle, etc. Specific examples thereof include: the current posture of the human body can be estimated by calculating the relative positions of the key points of the human body in the three-dimensional space. Meanwhile, if a time sequence is added, the position change of key points of the human body is observed within a period of time, the posture can be more accurately detected, the posture of a target at a future moment can be estimated, and more abstract human body behavior analysis can be achieved, such as judging whether a person plays a badminton.

It should be noted that the bone point action identification method provided in the embodiments of the present application may be executed by an electronic device, where the electronic device refers to a device terminal having a function of executing a computer program or the server, and the device terminal includes, for example: a smart phone, a Personal Computer (PC), a tablet computer, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), a network switch or a network router, and the like.

Before introducing the skeleton point motion recognition method provided by the embodiment of the present application, an application scenario applicable to the skeleton point motion recognition method is introduced, where the application scenario includes, but is not limited to: human behavior recognition, human-computer interaction, clothing analysis and the like. The skeleton point action identification method has wide application prospects in the fields of physical fitness, action acquisition, 3D fitting, public opinion monitoring and the like, and the specific application mainly focuses on intelligent video monitoring, a patient monitoring system, man-machine interaction, virtual reality, human body animation, intelligent home, intelligent security, auxiliary training of athletes and the like.

Please refer to fig. 1, which is a schematic flow chart of a bone point action identification method provided in the embodiment of the present application;

according to the bone point action identification method, action identification is carried out through the neural network model, the BAM network in the first module and the BAM network in the second module of the neural network model learn similarity weights among different samples, and therefore the problem that the accuracy rate of identification of bone point actions is low is solved. The above method for recognizing the action of the bone point may include the steps of:

step S110: skeletal point data of a target organism is obtained.

The embodiment of the step S110 includes: in the first way, skeletal point data of the target organism is obtained from a data set. The data set may be: human bone key point data sets, coco data sets bone key points, and the like. And in the second mode, acquiring an image to be processed, and extracting the bone point data of the target organism in the image to be processed. The image to be processed here refers to the acquisition of an image of the target living being, which may be available from a motion capture device or video. Skeletal point data of the target organism in the image to be processed is extracted, which may be obtained from a motion capture device or a pose estimation algorithm of the video. Typically, the data is a series of frames, each having a set of joint coordinates. Given a body joint sequence in a 2D or 3D coordinate system, a skeleton sequence space-time diagram is constructed based on the body joint sequence. Extracting the bone point data of the target organism in the image to be processed may be: bone data were extracted by microsoft's SDK using Kinect and displayed by OpenCV.

After step S110, step S120 is performed: and carrying out batch normalization operation on the bone point data by using a batch normalization layer in the neural network model to obtain normalized data.

Please refer to fig. 2, which illustrates a schematic diagram of a neural network model provided in an embodiment of the present application; the neural network model includes: the system comprises a batch normalization layer, a full connection layer, a first module and a plurality of second modules; the batch normalization layer, the first module, the second modules and the full connection layer are sequentially connected; the neural network model may be a batch attention adaptive space-time graph convolutional network (BA-AGCN).

The embodiment of step S120 described above is, for example: and (3) carrying out batch normalization operation on the skeleton sequence space-time diagram by using a batch normalization layer of the batch attention self-adaptive space-time diagram convolution network and a data preprocessing method to obtain normalized data. The normalization operation is to keep the input of each layer of neural network in the same distribution in the deep neural network training process. For a deep neural network, let the net input at layer I be z^(l)The output after the activation function is a^(l)I.e. a^(l)＝f(z^(l))＝f(Wa^(l-1)+ b) where f () is the activation function and W and b are the weights and bias parameters.

After step S120, step S130 is performed: and calculating the normalized data by using a first module in the neural network model to obtain a first characteristic diagram.

Please refer to fig. 3 for a schematic diagram of a first module structure provided in the embodiment of the present application; the first module comprises a space map convolutional network, a batch sensing attention BAM network and a time convolutional network, wherein the space map convolutional network, the BAM network and the time convolutional network are sequentially connected. The implementation of step S130 may include:

step S131: and carrying out convolution operation on the normalized data by using a space map convolution network in the first module to obtain first characteristic data.

The adjacency matrix of the space map convolution consists of three parts: a human natural joint adjacency matrix, an adaptive parameter matrix and a similarity matrix.

The embodiment of step S131 described above is, for example: respectively passing the normalized data through

Function sum W_θPerforming dimensionality reduction mapping on the function to obtain first mapping data and second mapping data; the first mapping data and the second mapping data are multiplied, and a similarity matrix is generated through a softmax function. Adding the similarity matrix, the human body natural joint adjacency matrix and the self-adaptive parameter matrix to obtain an adjacency matrix; performing matrix multiplication on the normalized data and the adjacent matrix to obtain first volume data; and performing two-dimensional convolution operation and residual error analysis on the first convolution data to obtain first characteristic data.

Please refer to fig. 4, which is a schematic diagram of a BAM network structure provided in an embodiment of the present application, and the embodiment of the present application does not limit the order of the channel dimension convolutional layer, the space dimension convolutional layer, and the time dimension convolutional layer in the BAM network structure;

step S132: and calculating the first characteristic data by using the BAM network in the first module to obtain second characteristic data.

The BAM network in the first module comprises: softmax layer, channel dimension convolution layer, space dimension convolution layer, and time dimension convolution layer. The above-mentioned embodiment of step S132 may include:

step S1321: and performing convolution operation on the first feature data by using the channel dimension convolution layer, the space dimension convolution layer and the time dimension convolution layer to obtain a first convolution feature.

Specifically, the first convolution feature includes: a channel dimension mean, a temporal frame dimension mean, and a spatial dimension mean. Performing convolution operation on channel dimensions by using a two-dimensional convolution network with an input channel as the number of input channels and an output channel as 1 to obtain a channel dimension mean value;

exchanging the time dimension with the channel dimension, and carrying out convolution operation on the time dimension by using a two-dimensional convolution network with an input channel as a time frame number and an output channel as a 1 so as to obtain a time frame dimension mean value; exchanging the space dimension with the channel dimension, and performing convolution operation on the space dimension by using a two-dimensional convolution network with an input channel as the number of joint points and an output channel as 1 to obtain a space dimension mean value;

step S1322: and performing similarity operation on the first volume characteristic by using a softmax layer to obtain second characteristic data.

Optionally, in the step S1322, a residual error operation may be performed, specifically for example: and performing similarity operation on the first volume characteristic by using a softmax layer to obtain second characteristic data, wherein the similarity operation comprises the following steps: performing similarity operation on the first convolution characteristics by using a softmax layer to obtain similarity data; and performing residual error operation on the similarity data and the first characteristic data to obtain second characteristic data. The residual error operation on the similarity data and the first characteristic data can be that the first characteristic data and the similarity data are multiplied, and the obtained result is added with the first characteristic data to obtain second characteristic data, so that the attention mechanism in the batch training process is realized.

In the implementation process, the first characteristic data is calculated by using the BAM network in the first module, so as to obtain the second characteristic data. The BAM network is used for adaptively re-weighting the spatial map features of different samples in a batch, and the importance weight among the different samples in the batch is learned through the BAM network, so that the discrimination of the different samples in the batch in an image classification task is improved.

Step S133: and performing convolution operation on the second characteristic data by using a time convolution network in the first module to obtain a first characteristic diagram.

The time convolution network includes: two-dimensional convolution layers, batch normalization layers, and activation function layers. The embodiment of step S133 described above includes, for example: performing two-dimensional convolution operation on the second characteristic data to obtain second convolution data; and sequentially operating the second convolution data by using the quantity normalization layer and the activation function layer to obtain a first characteristic diagram.

After step S130, step S140 is performed: and calculating the first characteristic diagram by using a plurality of second modules in the neural network model to obtain a second characteristic diagram, wherein the first module and the second module both comprise a batch perception attention BAM network.

Please refer to fig. 5, which illustrates a second module structure diagram provided in the embodiment of the present application; the neural network model includes a plurality of second modules, wherein the second modules include a space map convolutional network, a batch aware attention BAM network, and a temporal convolutional network. The implementation of step S140 may include:

step S141: and carrying out convolution operation on the first feature map by using a space map convolution network in the second module to obtain third feature data.

The adjacency matrix of the spatial graph convolution consists of three parts: a human natural joint adjacency matrix, an adaptive parameter matrix and a similarity matrix. The embodiment of step S141 described above includes, for example: respectively passing the first characteristic diagram through

Function sum W_θPerforming dimensionality reduction mapping on the function to obtain third mapping data and fourth mapping data; the third mapping data and the fourth mapping data are multiplied, and a similarity matrix is generated through a softmax function. Adding the similarity matrix, the human body natural joint adjacency matrix and the self-adaptive parameter matrix to obtain an adjacency matrix; matrix multiplication is carried out on the first characteristic diagram and the adjacent matrix to obtain third convolution data; and performing two-dimensional convolution operation and residual error analysis on the third convolution data to obtain third characteristic data. In the implementation process, the spatial graph convolution network uses three different matrixes to model the action characteristics, simultaneously pays attention to different characteristics between samples, and utilizes second-order information of the bone data, so that the accuracy of action identification is improved.

Step S142: and calculating the third characteristic data by using the BAM network in the second module to obtain fourth characteristic data.

Referring to fig. 4, in the present embodiment, the order of the channel dimension convolutional layer, the space dimension convolutional layer, and the time dimension convolutional layer in the BAM network structure is not limited; wherein the BAM network in the second module comprises: softmax layer, channel dimension convolution layer, space dimension convolution layer, and time dimension convolution layer. The implementation of step S142 may include:

step S1421: and performing convolution operation on the third feature data by using the channel dimension convolution layer, the space dimension convolution layer and the time dimension convolution layer to obtain a second convolution feature.

Specifically, the second convolution characteristic includes: a channel dimension mean, a temporal frame dimension mean, and a spatial dimension mean. Performing convolution operation on channel dimensions by using a two-dimensional convolution network with an input channel as the number of input channels and an output channel as 1 to obtain a channel dimension mean value; exchanging the time dimension with the channel dimension, and carrying out convolution operation on the time dimension by using a two-dimensional convolution network with an input channel as a time frame number and an output channel as a 1 so as to obtain a time frame dimension mean value;

exchanging the space dimension with the channel dimension, and performing convolution operation on the space dimension by using a two-dimensional convolution network with an input channel as the number of joint points and an output channel as 1 to obtain a space dimension mean value;

step S1422: and performing similarity operation on the second convolution characteristics by using a softmax layer to obtain fourth characteristic data.

Optionally, in step S1422, a residual operation may be further performed, and a softmax layer is used to perform a similarity operation on the second convolution feature, so as to obtain fourth feature data, where the method includes: performing similarity operation on the second convolution characteristics by using a softmax layer to obtain similarity data; and performing residual error operation on the similarity data and the third characteristic data to obtain fourth characteristic data. The residual error operation on the similarity data and the third feature data may be that the third feature data and the similarity data are multiplied, and the fourth feature data is obtained by adding the third feature data to the obtained result. In the implementation process, residual error operation is carried out on the BAM network, and the problem that the accuracy of the training set is reduced along with the deepening of the network is effectively solved.

In the implementation process, the fourth feature data is obtained by calculating the third feature data by using the BAM network in the second module. The BAM network block is used for adaptively re-weighting the spatial map features of different samples in a batch, and the importance weight between different samples in a batch is learned through the BAM network, so that the discrimination of different samples in a batch and different samples in a batch in an image classification task is improved.

Step S143: and performing convolution operation on the fourth characteristic data by using a time convolution network in the second module to obtain fifth characteristic data.

The time convolution network includes: two-dimensional convolution layers, batch normalization layers, and activation function layers. The embodiment of step S143 is, for example: performing two-dimensional convolution operation on the fourth characteristic data to obtain fourth convolution data; and sequentially operating the fourth convolution data by using the quantity normalization layer and the activation function layer to obtain a second characteristic diagram.

Step S144: and performing residual error operation on the first characteristic diagram and the fifth characteristic data to obtain a second characteristic diagram.

Notably, the optimization of the neural network is based on batch images, and the contributions of all images are equal in using batch optimization. When the features of two samples from the separation class are approximately similar, the current attention mechanism cannot effectively separate them in the feature space. In order to solve the problem, in the first module and the second module of the neural network model, a BAM network is used between each layer of space map convolution and time convolution, and the BAM network comprises: softmax layer, channel dimension convolution layer, space dimension convolution layer, and time dimension convolution layer. The BAM network learns the importance weight among different samples in a batch, and improves the discrimination of the different samples in the batch in an image classification task, so that the accuracy of identifying the skeleton point action is improved.

After step S140, step S150 is performed: and classifying the second feature map by using a full connection layer in the neural network model to obtain a classification result, wherein the classification result represents the action category identified from the target organism.

The embodiment of S150 described above is, for example: when the result is identified by using the full connection layer, the weighted sum is carried out according to the weight obtained by training the neural network model and the result calculated by the depth network such as the convolution and the activation function, the predicted value of each result is obtained, and then the largest value is taken as the identification result. Wherein, the fully-connected layer plays the role of a classifier in the whole convolutional neural network.

In the implementation process, the skeleton point data of the target organism is obtained, batch normalization operation is carried out on the skeleton point data to obtain normalized data, and a first module in the neural network model is used for calculating the normalized data to obtain a first characteristic diagram; and calculating the first characteristic diagram by using a plurality of second modules in the neural network model to obtain a second characteristic diagram, wherein the first module and the second module both comprise batch perception attention BAM networks, and classifying the second characteristic diagram by using a full connection layer so as to identify the action of the target organism. The similarity weight among different samples is learned in the bone point action recognition algorithm of the neural network model, so that the problem of low accuracy rate of recognition of the bone point action is solved.

Optionally, in the above step, performing convolution operation on the first feature data by using a channel dimension convolution layer, a space dimension convolution layer, and a time dimension convolution layer, where an order of the channel dimension convolution layer, the space dimension convolution layer, and the time dimension convolution layer may not be limited, and the order may be:

performing convolution operation on the first feature data sequentially by using a channel dimension convolution layer, a space dimension convolution layer and a time dimension convolution layer respectively to obtain a first convolution feature;

or, sequentially performing convolution operation on the first feature data by using a channel dimension convolution layer, a time dimension convolution layer and a space dimension convolution layer respectively to obtain a first convolution feature;

or performing convolution operation on the first feature data sequentially by using a space dimension convolution layer, a channel dimension convolution layer and a time dimension convolution layer respectively to obtain a first convolution feature;

or performing convolution operation on the first feature data sequentially by using a space dimension convolution layer, a time dimension convolution layer and a channel dimension convolution layer respectively to obtain a first convolution feature;

or performing convolution operation on the first feature data sequentially by using the time dimension convolution layer, the channel dimension convolution layer and the space dimension convolution layer respectively to obtain a first convolution feature;

or performing convolution operation on the first feature data sequentially by using the time dimension convolution layer, the space dimension convolution layer and the channel dimension convolution layer respectively to obtain a first convolution feature.

Please refer to fig. 6, which illustrates a schematic structural diagram of a bone point motion recognition device according to an embodiment of the present application; the embodiment of the present application provides a bone point motion recognition apparatus 200, including:

a data obtaining module 210, configured to obtain skeletal point data of the target organism.

And the data normalization module 220 is configured to perform batch normalization operation on the bone point data to obtain normalized data.

The first calculating module 230 is configured to calculate the normalized data to obtain a first feature map.

And a second calculating module 240, configured to calculate the first feature map to obtain a second feature map, where the first calculating module and the second calculating module both include a batch awareness BAM network.

And a classification module 250, configured to classify the second feature map to obtain a classification result, where the classification result represents an action category identified from the target living being.

It should be understood that the apparatus corresponds to the above-mentioned embodiment of the image recognition and analysis method, and can perform the steps related to the above-mentioned embodiment of the method, and the specific functions of the apparatus can be referred to the above description, and the detailed description is appropriately omitted here to avoid redundancy. The device includes at least one software function that can be stored in memory in the form of software or firmware (firmware) or solidified in the Operating System (OS) of the device.

Please refer to fig. 7 for a schematic structural diagram of an electronic device according to an embodiment of the present application. An electronic device 300 provided in an embodiment of the present application includes: a processor 310 and a memory 320, the memory 320 storing machine readable instructions executable by the processor 310, the machine readable instructions when executed by the processor 310 performing the method as above.

The embodiment of the present application further provides a storage medium 330, where the storage medium 330 stores thereon a computer program, and the computer program is executed by the processor 310 to perform the method as above.

The storage medium 330 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The above description is only an alternative embodiment of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present application, and all the changes or substitutions should be covered by the scope of the embodiments of the present application.

Claims

1. A bone point motion recognition method is characterized by comprising the following steps:

obtaining skeletal point data of a target organism;

carrying out batch normalization operation on the bone point data by using a batch normalization layer in a neural network model to obtain normalized data;

calculating the normalized data by using a first module in the neural network model to obtain a first characteristic diagram;

calculating the first feature map by using a plurality of second modules in the neural network model to obtain a second feature map, wherein the first module and the second module both comprise a batch perception attention BAM network;

and classifying the second feature map by using a full connection layer in the neural network model to obtain a classification result, wherein the classification result represents the action category identified from the target organism.

2. The method of claim 1, wherein the first module further comprises: a spatial graph convolutional network and a temporal convolutional network; the calculating the normalized data by using the first module in the neural network model to obtain a first feature map includes:

carrying out convolution operation on the normalized data by using a space map convolution network in the first module to obtain first characteristic data;

calculating the first characteristic data by using a BAM network in the first module to obtain second characteristic data;

and performing convolution operation on the second feature data by using a time convolution network in the first module to obtain the first feature map.

3. The method of claim 2, wherein the BAM network comprises: a softmax layer, a channel dimension convolution layer, a space dimension convolution layer and a time dimension convolution layer; the calculating the first characteristic data by using the BAM network in the first module to obtain second characteristic data comprises:

performing convolution operation on the first feature data by using a channel dimension convolution layer, a space dimension convolution layer and a time dimension convolution layer to obtain a first convolution feature;

and performing similarity operation on the first volume characteristic by using the softmax layer to obtain the second characteristic data.

4. The method according to claim 3, wherein the performing a similarity operation on the first volume feature using the softmax layer to obtain the second feature data comprises:

performing similarity operation on the first volume characteristic by using the softmax layer to obtain similarity data;

and performing residual error operation on the similarity data and the first characteristic data to obtain second characteristic data.

5. The method of any of claims 1-4, wherein the second module further comprises: a space map convolutional network, a time convolutional network; the calculating the first feature map by using a plurality of second modules in the neural network model to obtain a second feature map comprises:

performing convolution operation on the first feature map by using a space map convolution network in the second module to obtain third feature data;

calculating the third characteristic data by using a BAM network in the second module to obtain fourth characteristic data;

performing convolution operation on the fourth feature data by using a time convolution network in the second module to obtain fifth feature data;

and performing residual error operation on the first characteristic diagram and the fifth characteristic data to obtain a second characteristic diagram.

6. The method of claim 5, wherein the BAM network comprises: a softmax layer, a channel dimension convolution layer, a space dimension convolution layer and a time dimension convolution layer; the calculating the third feature data by using the BAM network in the second module to obtain fourth feature data includes:

performing convolution operation on the third feature data by using a channel dimension convolution layer, a space dimension convolution layer and a time dimension convolution layer to obtain a second convolution feature;

and performing similarity operation on the second convolution characteristic by using the softmax layer to obtain the fourth characteristic data.

7. The method of claim 2, wherein the time convolutional network comprises: a two-dimensional convolution layer, a batch normalization layer and an activation function layer; performing convolution operation on the second feature data by using a time convolution network in the first module to obtain the first feature map, including:

performing two-dimensional convolution operation on the second characteristic data to obtain second convolution data;

and sequentially operating the second convolution data by using the batch normalization layer and the activation function layer to obtain a first characteristic diagram.

8. A bone point motion recognition apparatus, comprising:

the data acquisition module is used for acquiring skeletal point data of the target organism;

the data normalization module is used for carrying out batch normalization operation on the bone point data to obtain normalized data;

the first calculation module is used for calculating the normalized data to obtain a first characteristic diagram;

the second calculation module is used for calculating the first feature map to obtain a second feature map, and the first calculation module and the second calculation module both comprise a batch perception attention BAM network;

and the classification module is used for classifying the second feature map to obtain a classification result, and the classification result represents the action category identified from the target organism.

9. An electronic device, comprising: a processor and a memory, the memory storing machine-readable instructions executable by the processor, the machine-readable instructions, when executed by the processor, performing the method of any of claims 1 to 7.

10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the method of any one of claims 1 to 7.