CN107704819B

CN107704819B - Action identification method and system and terminal equipment

Info

Publication number: CN107704819B
Application number: CN201710901427.2A
Authority: CN
Inventors: 李懿; 程俊; 姬晓鹏; 方璡
Original assignee: Tencent Technology Shenzhen Co Ltd; Shenzhen Institute of Advanced Technology of CAS
Current assignee: Tencent Technology Shenzhen Co Ltd; Shenzhen Institute of Advanced Technology of CAS
Priority date: 2017-09-28
Filing date: 2017-09-28
Publication date: 2020-01-24
Anticipated expiration: 2037-09-28
Also published as: CN107704819A

Abstract

The invention is applicable to the technical field of human-computer interaction, and provides a motion recognition method, a system and terminal equipment. The method does not need to perform pre-segmentation processing on the target object, and has the remarkable advantages of high algorithm processing speed and high recognition precision and efficiency.

Description

Action identification method and system and terminal equipment

Technical Field

The invention belongs to the technical field of human-computer interaction, and particularly relates to a method, a system and a terminal device for recognizing actions.

Background

With the continuous development of computer vision technology, the motion recognition technology has been widely applied in many fields such as human-computer interaction, intelligent monitoring and virtual reality, and has very important research value. The depth camera has the capability of sensing depth distance information of a target object in an imaging range, and is widely applied to three-dimensional information reconstruction, motion model estimation and human body action recognition of the target object.

However, the existing motion recognition method based on depth images usually needs to perform preprocessing operations such as detection and segmentation on a target object, and aims to extract a target region of interest from a depth image sequence to reduce interference caused by a complex background, so as to improve the accuracy of subsequent motion recognition, but also increases the computational complexity of an algorithm, and the recognition accuracy of the motion recognition method depends on the accuracy of a preprocessing process. In addition, the dimensionality of the features extracted by the existing motion recognition method based on the depth image is often high, a large amount of time is consumed in the aspect of feature detection, and the recognition efficiency is reduced.

Disclosure of Invention

In view of this, embodiments of the present invention provide a motion recognition method, a motion recognition system, and a terminal device, so as to solve the problems in the prior art that a depth image-based motion recognition method is complex in calculation method, long in feature detection time, and low in recognition efficiency.

A first aspect of an embodiment of the present invention provides an action recognition method, including:

dividing the depth image sequence of the target object in a time dimension according to a preset time step to obtain a depth cube in each time domain;

dividing the depth cube on a spatial dimension according to a preset plane grid to obtain a plurality of sub-depth cubes with the same dimension, wherein the number of image frames of the sub-depth cubes is the same as that of the depth cubes;

acquiring a time motion response map of the sub-depth cube;

extracting preset image characteristics of the time motion response image;

splicing preset image features of all sub-depth cubes corresponding to the depth cube to obtain feature vectors of the depth cube, and splicing the feature vectors of all depth cubes corresponding to the depth image sequence to obtain feature descriptors of the depth image sequence;

and classifying the feature descriptors through a linear support vector machine to identify the action type of the target object.

In one embodiment, the obtaining the temporal motion response map of the sub-depth cube includes:

acquiring pixel values of non-noise pixel points in the sub-depth cube;

and acquiring pixel points which characterize the motion of the target node in the non-noise pixel points according to the pixel values, and acquiring a time motion response graph of the sub-depth cube.

In one embodiment, the obtaining the pixel values of the non-noise pixel points in the sub-depth cube includes:

carrying out zero equalization processing on the pixel point signal of each pixel point in the sub-depth cube;

performing sign function transformation on the pixel point signals subjected to zero equalization processing to obtain sign function pixel point signals of each pixel point in the sub-depth cube;

and performing time domain convolution operation on the symbol function pixel point signal to obtain a pixel value of a non-noise pixel point in the sub-depth cube.

In an embodiment, the obtaining, according to the pixel value, a pixel point representing a motion of a target node in the non-noise pixel points to obtain a time motion response map of the sub-depth cube includes:

screening out pixel points which characterize the motion of a target node from the non-noise pixel points according to the time change characteristic of the pixel value;

and carrying out visualization processing on the pixel points representing the motion of the target node to obtain a time motion response graph of the sub-depth cube.

In one embodiment, the preset time step is a fixed time step which is not overlapped or partially overlapped, the adjacent time domains are not overlapped or partially overlapped, and the image feature is a histogram of oriented gradients feature.

A second aspect of an embodiment of the present invention provides an action recognition system, including:

the first dividing module is used for dividing the depth image sequence of the target object in the time dimension according to a preset time step to obtain a depth cube in each time domain;

the second division module is used for dividing the depth cube on the spatial dimension according to a preset plane grid to obtain a plurality of sub-depth cubes with the same dimension, and the number of image frames of the sub-depth cubes is the same as that of the image frames of the depth cubes;

the acquisition module is used for acquiring a time motion response map of the sub-depth cube;

the extraction module is used for extracting preset image characteristics of the time motion response image;

the splicing module is used for splicing preset image features of all sub-depth cubes corresponding to the depth cubes to obtain feature vectors of the depth cubes, splicing the feature vectors of all the depth cubes corresponding to the depth image sequence to obtain feature descriptors of the depth image sequence;

and the motion identification module is used for classifying the feature descriptors through a linear support vector machine so as to identify the motion type of the target object.

In one embodiment, the obtaining module comprises:

the pixel value acquisition unit is used for acquiring the pixel value of a non-noise pixel point in the sub-depth cube;

and the response map acquisition unit is used for acquiring pixel points which characterize the motion of the target node in the non-noise pixel points according to the pixel values to acquire the time motion response map of the sub-depth cube.

In one embodiment, the pixel value obtaining unit includes:

the averaging processing subunit is used for carrying out zero averaging processing on the pixel point signal of each pixel point in the sub-depth cube;

the function transformation subunit is used for performing sign function transformation on the pixel point signals subjected to zero equalization processing to obtain sign function pixel point signals of each pixel point in the sub-depth cube;

and the pixel value acquisition subunit is used for performing time domain convolution operation on the sign function pixel point signal to acquire the pixel value of the non-noise pixel point in the sub-depth cube.

A third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method when executing the computer program.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described method.

According to the method, the depth image sequence of the target object is divided in the time dimension and the space dimension to obtain the plurality of sub-depth cubes, the time motion response image of the sub-depth cubes is obtained, the preset image features of the time motion response image are extracted, and all the preset image features are spliced to obtain the feature descriptors of the depth image sequence, so that the action type of the target object can be identified according to the feature descriptors.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart of an implementation of a motion recognition method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of the implementation of step S30 according to an embodiment of the present invention;

FIG. 3 is an exemplary graph of a temporal motion response plot provided by one embodiment of the present invention;

fig. 4 is a schematic flow chart of the implementation of step S31 according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a motion recognition system provided by one embodiment of the present invention;

FIG. 6 is a schematic diagram of an acquisition module provided by one embodiment of the present invention;

fig. 7 is a schematic diagram of a pixel value obtaining unit according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

As shown in fig. 1, an embodiment of the present invention provides a motion recognition method, which includes:

step S10: and dividing the depth image sequence of the target object in the time dimension according to a preset time step to obtain a depth cube in each time domain.

In a specific application, the preset time step is a fixed time step, which can be set according to actual needs, the fixed time steps may not overlap or partially overlap, and correspondingly, adjacent time domains may also not overlap or partially overlap, for example, if the total time length of all image frames corresponding to the depth time sequence is 10 seconds, and the fixed time step is 2 seconds, when dividing is performed with the non-overlapping fixed time steps, the depth image sequence may be divided into five time domains of 0 to 2 seconds, 2 to 4 seconds, 4 to 6 seconds, 6 to 8 seconds, and 8 to 10 seconds; when the depth image sequence is divided at a fixed time step with an overlap time of 1 second, the depth image sequence may be divided into a depth cube of nine time regions of 0 to 2 seconds, 1 to 3 seconds, 2 to 4 seconds, 3 to 5 seconds, 4 to 6 seconds, 5 to 7 seconds, 6 to 8 seconds, 7 to 9 seconds, and 8 to 10 seconds.

In a specific application, the target object generally refers to a human body in a motion state, and may be various living animals, or non-living robots, bionic animals, or mechanical and electronic devices.

In specific application, a depth image sequence can be obtained through a depth camera, the depth camera has the capability of sensing distance information of a target area, a depth image of a target object can be output in real time, the depth image is not influenced by illumination change, is insensitive to color and texture features of the object, and has good robustness.

Step S20: and dividing the depth cube on a spatial dimension according to a preset plane grid to obtain a plurality of sub-depth cubes with the same dimension, wherein the number of image frames of the sub-depth cubes is the same as that of the image frames of the depth cubes.

In a specific application, step S20 specifically refers to dividing each depth cube in a spatial dimension to obtain a plurality of sub depth cubes corresponding to each depth cube.

In specific application, the preset plane grid is a two-dimensional grid parallel to the plane of each depth image in the depth image sequence, the grid density can be set according to actual needs, and the larger the grid density is, the smaller the surface area of the sub-depth cube parallel to the plane of the depth image is; the image frame number of the sub-depth cube is the same as the image frame number of the depth cube, specifically, in a direction perpendicular to the plane of the depth image, the image frame numbers of the sub-depth cube and the depth cube in each time domain are the same.

Step S30: and acquiring a time motion response map of the sub-depth cube.

In a specific application, the step S30 specifically refers to obtaining a time motion response map of a plurality of sub depth cubes corresponding to each depth cube.

Step S40: and extracting preset image characteristics of the time motion response image.

In a specific application, the preset image feature specifically refers to a Histogram of Oriented Gradient (HOG) feature, and may also refer to a Local Binary Pattern (LBP) feature or a Haar feature.

Step S50: and splicing preset image features of all sub-depth cubes corresponding to the depth cube to obtain the feature vectors of the depth cube, and splicing the feature vectors of all depth cubes corresponding to the depth image sequence to obtain the feature descriptors of the depth image sequence.

In a specific application, the step S50 specifically refers to sequentially stitching the preset image features of each sub-depth cube according to the arrangement order of all the sub-depth cubes, and sequentially stitching the feature vectors of each depth cube according to the arrangement order of all the depth cubes.

It should be understood that the depth cube or sub-depth cube in this embodiment is not limited to a cube with equal length, width and height, but may be a cube with different length, width and height partially or completely.

Step S60: the feature descriptors are classified by a linear support vector machine (SVW) to identify a motion type of the target object.

According to the method, the target object is not required to be pre-segmented, and the method has the remarkable advantages of high algorithm processing speed and high recognition precision efficiency.

As shown in fig. 2, in an embodiment of the present invention, step S30 specifically includes:

step S31: and acquiring the pixel value of the non-noise pixel point in the sub-depth cube.

In specific application, time domain convolution operation can be carried out on each pixel point position in each sub-depth cube, non-noise pixel points in the sub-depth cube are screened out, then the pixel value of the non-noise pixel points is calculated, and the pixel value is the time motion response of the non-noise pixel points.

Step S32: and acquiring pixel points which characterize the motion of the target node in the non-noise pixel points according to the pixel values, and acquiring a time motion response graph of the sub-depth cube.

In a specific application, the target node specifically refers to a motion node on the target object, and when the target object is a human body, the target node may refer to each limb or joint of the human body.

In one embodiment, step S32 specifically includes:

In specific application, the time change characteristics of pixel values of different motion types are different, for example, when a hand is lifted, the pixel value of a non-noise pixel point at a hand position is reduced to 0 along with the increase of time, the pixel value of a pixel point above the non-noise pixel point at the hand position is increased, and the pixel point with the changed pixel value is a pixel point representing the motion of a target node.

In a specific application, the visualization process is to restore the pixel values represented in numerical form to an image that can be seen by the human eye.

As shown in fig. 3, the time motion response graph of the depth image sequence sample is obtained by performing visualization processing on the pixel points representing the motion of the target node of all the subcubes included in the depth image sequence sample of a certain human body. The numerical scale on the right side in fig. 3 represents the gray scale value. As shown in fig. 3, it can be seen that both the human shape information and the limb movement information in the depth image sequence sample of the human body are well characterized, and both the noise and the background are well filtered.

As shown in fig. 4, in an embodiment of the present invention, step S31 specifically includes:

step S311: and carrying out zero equalization processing on the pixel point signal of each pixel point in the sub-depth cube.

In a specific application, in order to screen out pixel points representing motion of a target node in a depth image sequence, each pixel point may be regarded as a one-dimensional time signal, and each pixel point may be represented as (i, j), and correspondingly, in an embodiment, step S311 may be implemented by the following formula:

where N is the number of frames in the depth image sequence, f (i, j) [ N]F is the pixel value of the nth frame image in the depth image sequence at the pixel point (i, j)^*And (i, j) is the pixel point signal after zero equalization processing.

Step S312: and performing sign function transformation on the pixel point signals subjected to zero equalization processing to obtain sign function pixel point signals of each pixel point in the sub-depth cube.

In one embodiment, step S312 may be implemented by the following formula:

sf(i,j)＝sign(f^*(i,j))；

wherein sf (i, j) is f^*(i, j) obtaining a symbol function pixel point signal after symbol function conversion, wherein sign () is a symbol function.

Step S313: and performing time domain convolution operation on the symbol function pixel point signal to obtain a pixel value of a non-noise pixel point in the sub-depth cube.

In a specific application, for a certain specific pixel point (i, j), the number of pixel points with sf (i, j) >0 in a sub-depth cube is called a positive sample number and is called P (i, j), the number of pixel points with sf (i, j) <0 is called a negative sample number and is called Q (i, j), the number of pixel points with non-zero time motion response is called a non-zero response sample number and is called NZ (i, j), and a template [ -1,0, -1] is taken as a convolution kernel; correspondingly, in an embodiment, the step S311 is specifically implemented by the following formula:

where M (i, j) represents the pixel value of the temporal motion response at pixel point (i, j).

In the embodiment, the human body action is recognized by calculating the time motion response mode for the obtained sub-depth cube, the algorithm is simple, and the processing speed is high.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

As shown in fig. 5, an embodiment of the present invention provides an action recognition system 100 for performing the method steps in the embodiment corresponding to fig. 1, which includes:

the first dividing module 10 is configured to divide a depth image sequence of a target object in a time dimension according to a preset time step to obtain a depth cube in each time domain;

a second dividing module 20, configured to divide the depth cube in a spatial dimension according to a preset plane grid, to obtain multiple sub-depth cubes with the same dimension, where the number of image frames of the sub-depth cubes is the same as the number of image frames of the depth cube;

an obtaining module 30, configured to obtain a time motion response map of the sub-depth cube;

an extraction module 40, configured to extract preset image features of the temporal motion response map;

the splicing module 50 is configured to splice preset image features of all sub-depth cubes corresponding to the depth cube to obtain feature vectors of the depth cube, and splice the feature vectors of all depth cubes corresponding to the depth image sequence to obtain feature descriptors of the depth image sequence;

a classification module 60 for classifying the feature descriptors by a linear support vector machine (SVW) to identify the action type of the target object.

As shown in fig. 6, in an embodiment of the present invention, the obtaining module 30 includes a structure for executing the method steps in the embodiment corresponding to fig. 2, and includes:

a pixel value obtaining unit 31, configured to obtain a pixel value of a non-noise pixel point in the sub-depth cube;

and the response map obtaining unit 32 is configured to obtain, according to the pixel value, a pixel point representing motion of a target node in the non-noise pixel points, and obtain a time motion response map of the sub-depth cube.

In one embodiment, the response map obtaining unit 32 specifically includes:

the screening subunit is used for screening out pixel points representing the motion of the target node from the non-noise pixel points according to the time change characteristic of the pixel values;

and the response map acquisition unit is used for carrying out visualization processing on the pixel points representing the motion of the target node to acquire the time motion response map of the sub-depth cube.

As shown in fig. 7, in an embodiment of the present invention, the pixel value obtaining unit 31 includes a structure for executing the method steps in the embodiment corresponding to fig. 4, and includes:

an averaging processing subunit 311, configured to perform zero averaging processing on the pixel point signal of each pixel point in the sub-depth cube;

a function transformation subunit 312, configured to perform sign function transformation on the pixel point signal after zero equalization processing, so as to obtain a sign function pixel point signal of each pixel point in the sub-depth cube;

and the pixel value obtaining subunit 313 is configured to perform time domain convolution operation on the sign function pixel point signal to obtain a pixel value of a non-noise pixel point in the sub-depth cube.

In an embodiment, the equalization processing subunit 311 is specifically configured to perform zero equalization processing on the pixel point signal of each pixel point in the sub-depth cube according to the following formula:

In one embodiment, the function transformation subunit 312 is specifically configured to calculate the sign function pixel point signal according to the following formula:

sf(i,j)＝sign(f^*(i,j))；

In an embodiment, the pixel value obtaining subunit 313 is specifically configured to calculate the pixel value of the non-noise pixel according to the following formula:

wherein, M (i, j) represents the pixel value of the temporal motion response at the pixel point (i, j), and P (i, j) is the number of pixel points sf (i, j) >0 in the sub-depth cube, which is called the number of sample samples; q (i, j) is the number of pixels of sf (i, j) <0 in the sub-depth cube, and is called as the number of negative samples; NZ (i, j) is the number of pixels with nonzero time motion response in the sub-depth cube, and is called the number of nonzero response samples.

Fig. 8 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 8, the terminal device 7 of this embodiment includes a processor 70, a memory 71, and a computer program 72 stored in the memory 71 and operable on the processor 70, and when the processor 70 executes the computer program 72, the steps in the above-described method embodiments, such as steps S10 to S60 shown in fig. 1, are implemented. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the modules/units in the above-described device embodiments, such as the functions of the modules 10 to 60 shown in fig. 5.

Illustratively, the computer program 72 may be partitioned into one or more modules/units that are stored in the memory 71 and executed by the processor 70 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 72 in the terminal device 7. For example, the computer program 72 may be divided into a first dividing module, a second dividing module, an obtaining module, an extracting module, a splicing module and an action identifying module, and the specific functions of the modules are as follows:

The terminal device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 8 is merely an example of a terminal device 7 and does not constitute a limitation of the terminal device 7 and may include more or less components than those shown, or some components may be combined, or different components, for example the terminal device may also include input output devices, network access devices, buses, etc.

The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing the computer program and other programs and data required by the terminal device. The memory 71 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A motion recognition method, comprising:

acquiring a time motion response map of the sub-depth cube;

extracting preset image characteristics of the time motion response image;

classifying the feature descriptors through a linear support vector machine to identify the action type of the target object;

the obtaining of the temporal motion response map of the sub-depth cube includes:

acquiring pixel values of non-noise pixel points in the sub-depth cube;

2. The motion recognition method of claim 1, wherein the obtaining pixel values of non-noise pixel points in the sub-depth cube comprises:

3. The motion recognition method according to claim 1, wherein the obtaining, according to the pixel values, pixel points characterizing a motion of a target node from among the non-noise pixel points to obtain a time motion response map of the sub-depth cube comprises:

4. The motion recognition method according to claim 1, wherein the preset time step is a fixed time step which is not overlapped or partially overlapped, the adjacent time domains are not overlapped or partially overlapped, and the image feature is an histogram of oriented gradients feature.

5. A motion recognition system, comprising:

the motion recognition module is used for classifying the feature descriptors through a linear support vector machine so as to recognize the motion type of the target object;

the acquisition module includes:

6. The motion recognition system according to claim 5, wherein the pixel value acquisition unit includes:

7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 4 when executing the computer program.

8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.