CN107704819B - Action identification method and system and terminal equipment - Google Patents

Action identification method and system and terminal equipment Download PDF

Info

Publication number
CN107704819B
CN107704819B CN201710901427.2A CN201710901427A CN107704819B CN 107704819 B CN107704819 B CN 107704819B CN 201710901427 A CN201710901427 A CN 201710901427A CN 107704819 B CN107704819 B CN 107704819B
Authority
CN
China
Prior art keywords
depth
sub
cube
motion
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710901427.2A
Other languages
Chinese (zh)
Other versions
CN107704819A (en
Inventor
李懿
程俊
姬晓鹏
方璡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Tencent Technology Shenzhen Co Ltd
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd, Shenzhen Institute of Advanced Technology of CAS filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710901427.2A priority Critical patent/CN107704819B/en
Publication of CN107704819A publication Critical patent/CN107704819A/en
Application granted granted Critical
Publication of CN107704819B publication Critical patent/CN107704819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/467Encoded features or binary features, e.g. local binary patterns [LBP]

Abstract

The invention is applicable to the technical field of human-computer interaction, and provides a motion recognition method, a system and terminal equipment. The method does not need to perform pre-segmentation processing on the target object, and has the remarkable advantages of high algorithm processing speed and high recognition precision and efficiency.

Description

Action identification method and system and terminal equipment
Technical Field
The invention belongs to the technical field of human-computer interaction, and particularly relates to a method, a system and a terminal device for recognizing actions.
Background
With the continuous development of computer vision technology, the motion recognition technology has been widely applied in many fields such as human-computer interaction, intelligent monitoring and virtual reality, and has very important research value. The depth camera has the capability of sensing depth distance information of a target object in an imaging range, and is widely applied to three-dimensional information reconstruction, motion model estimation and human body action recognition of the target object.
However, the existing motion recognition method based on depth images usually needs to perform preprocessing operations such as detection and segmentation on a target object, and aims to extract a target region of interest from a depth image sequence to reduce interference caused by a complex background, so as to improve the accuracy of subsequent motion recognition, but also increases the computational complexity of an algorithm, and the recognition accuracy of the motion recognition method depends on the accuracy of a preprocessing process. In addition, the dimensionality of the features extracted by the existing motion recognition method based on the depth image is often high, a large amount of time is consumed in the aspect of feature detection, and the recognition efficiency is reduced.
Disclosure of Invention
In view of this, embodiments of the present invention provide a motion recognition method, a motion recognition system, and a terminal device, so as to solve the problems in the prior art that a depth image-based motion recognition method is complex in calculation method, long in feature detection time, and low in recognition efficiency.
A first aspect of an embodiment of the present invention provides an action recognition method, including:
dividing the depth image sequence of the target object in a time dimension according to a preset time step to obtain a depth cube in each time domain;
dividing the depth cube on a spatial dimension according to a preset plane grid to obtain a plurality of sub-depth cubes with the same dimension, wherein the number of image frames of the sub-depth cubes is the same as that of the depth cubes;
acquiring a time motion response map of the sub-depth cube;
extracting preset image characteristics of the time motion response image;
splicing preset image features of all sub-depth cubes corresponding to the depth cube to obtain feature vectors of the depth cube, and splicing the feature vectors of all depth cubes corresponding to the depth image sequence to obtain feature descriptors of the depth image sequence;
and classifying the feature descriptors through a linear support vector machine to identify the action type of the target object.
In one embodiment, the obtaining the temporal motion response map of the sub-depth cube includes:
acquiring pixel values of non-noise pixel points in the sub-depth cube;
and acquiring pixel points which characterize the motion of the target node in the non-noise pixel points according to the pixel values, and acquiring a time motion response graph of the sub-depth cube.
In one embodiment, the obtaining the pixel values of the non-noise pixel points in the sub-depth cube includes:
carrying out zero equalization processing on the pixel point signal of each pixel point in the sub-depth cube;
performing sign function transformation on the pixel point signals subjected to zero equalization processing to obtain sign function pixel point signals of each pixel point in the sub-depth cube;
and performing time domain convolution operation on the symbol function pixel point signal to obtain a pixel value of a non-noise pixel point in the sub-depth cube.
In an embodiment, the obtaining, according to the pixel value, a pixel point representing a motion of a target node in the non-noise pixel points to obtain a time motion response map of the sub-depth cube includes:
screening out pixel points which characterize the motion of a target node from the non-noise pixel points according to the time change characteristic of the pixel value;
and carrying out visualization processing on the pixel points representing the motion of the target node to obtain a time motion response graph of the sub-depth cube.
In one embodiment, the preset time step is a fixed time step which is not overlapped or partially overlapped, the adjacent time domains are not overlapped or partially overlapped, and the image feature is a histogram of oriented gradients feature.
A second aspect of an embodiment of the present invention provides an action recognition system, including:
the first dividing module is used for dividing the depth image sequence of the target object in the time dimension according to a preset time step to obtain a depth cube in each time domain;
the second division module is used for dividing the depth cube on the spatial dimension according to a preset plane grid to obtain a plurality of sub-depth cubes with the same dimension, and the number of image frames of the sub-depth cubes is the same as that of the image frames of the depth cubes;
the acquisition module is used for acquiring a time motion response map of the sub-depth cube;
the extraction module is used for extracting preset image characteristics of the time motion response image;
the splicing module is used for splicing preset image features of all sub-depth cubes corresponding to the depth cubes to obtain feature vectors of the depth cubes, splicing the feature vectors of all the depth cubes corresponding to the depth image sequence to obtain feature descriptors of the depth image sequence;
and the motion identification module is used for classifying the feature descriptors through a linear support vector machine so as to identify the motion type of the target object.
In one embodiment, the obtaining module comprises:
the pixel value acquisition unit is used for acquiring the pixel value of a non-noise pixel point in the sub-depth cube;
and the response map acquisition unit is used for acquiring pixel points which characterize the motion of the target node in the non-noise pixel points according to the pixel values to acquire the time motion response map of the sub-depth cube.
In one embodiment, the pixel value obtaining unit includes:
the averaging processing subunit is used for carrying out zero averaging processing on the pixel point signal of each pixel point in the sub-depth cube;
the function transformation subunit is used for performing sign function transformation on the pixel point signals subjected to zero equalization processing to obtain sign function pixel point signals of each pixel point in the sub-depth cube;
and the pixel value acquisition subunit is used for performing time domain convolution operation on the sign function pixel point signal to acquire the pixel value of the non-noise pixel point in the sub-depth cube.
A third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method when executing the computer program.
A fourth aspect of embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described method.
According to the method, the depth image sequence of the target object is divided in the time dimension and the space dimension to obtain the plurality of sub-depth cubes, the time motion response image of the sub-depth cubes is obtained, the preset image features of the time motion response image are extracted, and all the preset image features are spliced to obtain the feature descriptors of the depth image sequence, so that the action type of the target object can be identified according to the feature descriptors.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flow chart of an implementation of a motion recognition method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of the implementation of step S30 according to an embodiment of the present invention;
FIG. 3 is an exemplary graph of a temporal motion response plot provided by one embodiment of the present invention;
fig. 4 is a schematic flow chart of the implementation of step S31 according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a motion recognition system provided by one embodiment of the present invention;
FIG. 6 is a schematic diagram of an acquisition module provided by one embodiment of the present invention;
fig. 7 is a schematic diagram of a pixel value obtaining unit according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
As shown in fig. 1, an embodiment of the present invention provides a motion recognition method, which includes:
step S10: and dividing the depth image sequence of the target object in the time dimension according to a preset time step to obtain a depth cube in each time domain.
In a specific application, the preset time step is a fixed time step, which can be set according to actual needs, the fixed time steps may not overlap or partially overlap, and correspondingly, adjacent time domains may also not overlap or partially overlap, for example, if the total time length of all image frames corresponding to the depth time sequence is 10 seconds, and the fixed time step is 2 seconds, when dividing is performed with the non-overlapping fixed time steps, the depth image sequence may be divided into five time domains of 0 to 2 seconds, 2 to 4 seconds, 4 to 6 seconds, 6 to 8 seconds, and 8 to 10 seconds; when the depth image sequence is divided at a fixed time step with an overlap time of 1 second, the depth image sequence may be divided into a depth cube of nine time regions of 0 to 2 seconds, 1 to 3 seconds, 2 to 4 seconds, 3 to 5 seconds, 4 to 6 seconds, 5 to 7 seconds, 6 to 8 seconds, 7 to 9 seconds, and 8 to 10 seconds.
In a specific application, the target object generally refers to a human body in a motion state, and may be various living animals, or non-living robots, bionic animals, or mechanical and electronic devices.
In specific application, a depth image sequence can be obtained through a depth camera, the depth camera has the capability of sensing distance information of a target area, a depth image of a target object can be output in real time, the depth image is not influenced by illumination change, is insensitive to color and texture features of the object, and has good robustness.
Step S20: and dividing the depth cube on a spatial dimension according to a preset plane grid to obtain a plurality of sub-depth cubes with the same dimension, wherein the number of image frames of the sub-depth cubes is the same as that of the image frames of the depth cubes.
In a specific application, step S20 specifically refers to dividing each depth cube in a spatial dimension to obtain a plurality of sub depth cubes corresponding to each depth cube.
In specific application, the preset plane grid is a two-dimensional grid parallel to the plane of each depth image in the depth image sequence, the grid density can be set according to actual needs, and the larger the grid density is, the smaller the surface area of the sub-depth cube parallel to the plane of the depth image is; the image frame number of the sub-depth cube is the same as the image frame number of the depth cube, specifically, in a direction perpendicular to the plane of the depth image, the image frame numbers of the sub-depth cube and the depth cube in each time domain are the same.
Step S30: and acquiring a time motion response map of the sub-depth cube.
In a specific application, the step S30 specifically refers to obtaining a time motion response map of a plurality of sub depth cubes corresponding to each depth cube.
Step S40: and extracting preset image characteristics of the time motion response image.
In a specific application, the preset image feature specifically refers to a Histogram of Oriented Gradient (HOG) feature, and may also refer to a Local Binary Pattern (LBP) feature or a Haar feature.
Step S50: and splicing preset image features of all sub-depth cubes corresponding to the depth cube to obtain the feature vectors of the depth cube, and splicing the feature vectors of all depth cubes corresponding to the depth image sequence to obtain the feature descriptors of the depth image sequence.
In a specific application, the step S50 specifically refers to sequentially stitching the preset image features of each sub-depth cube according to the arrangement order of all the sub-depth cubes, and sequentially stitching the feature vectors of each depth cube according to the arrangement order of all the depth cubes.
It should be understood that the depth cube or sub-depth cube in this embodiment is not limited to a cube with equal length, width and height, but may be a cube with different length, width and height partially or completely.
Step S60: the feature descriptors are classified by a linear support vector machine (SVW) to identify a motion type of the target object.
According to the method, the target object is not required to be pre-segmented, and the method has the remarkable advantages of high algorithm processing speed and high recognition precision efficiency.
As shown in fig. 2, in an embodiment of the present invention, step S30 specifically includes:
step S31: and acquiring the pixel value of the non-noise pixel point in the sub-depth cube.
In specific application, time domain convolution operation can be carried out on each pixel point position in each sub-depth cube, non-noise pixel points in the sub-depth cube are screened out, then the pixel value of the non-noise pixel points is calculated, and the pixel value is the time motion response of the non-noise pixel points.
Step S32: and acquiring pixel points which characterize the motion of the target node in the non-noise pixel points according to the pixel values, and acquiring a time motion response graph of the sub-depth cube.
In a specific application, the target node specifically refers to a motion node on the target object, and when the target object is a human body, the target node may refer to each limb or joint of the human body.
In one embodiment, step S32 specifically includes:
screening out pixel points which characterize the motion of a target node from the non-noise pixel points according to the time change characteristic of the pixel value;
and carrying out visualization processing on the pixel points representing the motion of the target node to obtain a time motion response graph of the sub-depth cube.
In specific application, the time change characteristics of pixel values of different motion types are different, for example, when a hand is lifted, the pixel value of a non-noise pixel point at a hand position is reduced to 0 along with the increase of time, the pixel value of a pixel point above the non-noise pixel point at the hand position is increased, and the pixel point with the changed pixel value is a pixel point representing the motion of a target node.
In a specific application, the visualization process is to restore the pixel values represented in numerical form to an image that can be seen by the human eye.
As shown in fig. 3, the time motion response graph of the depth image sequence sample is obtained by performing visualization processing on the pixel points representing the motion of the target node of all the subcubes included in the depth image sequence sample of a certain human body. The numerical scale on the right side in fig. 3 represents the gray scale value. As shown in fig. 3, it can be seen that both the human shape information and the limb movement information in the depth image sequence sample of the human body are well characterized, and both the noise and the background are well filtered.
As shown in fig. 4, in an embodiment of the present invention, step S31 specifically includes:
step S311: and carrying out zero equalization processing on the pixel point signal of each pixel point in the sub-depth cube.
In a specific application, in order to screen out pixel points representing motion of a target node in a depth image sequence, each pixel point may be regarded as a one-dimensional time signal, and each pixel point may be represented as (i, j), and correspondingly, in an embodiment, step S311 may be implemented by the following formula:
Figure BDA0001423208640000081
where N is the number of frames in the depth image sequence, f (i, j) [ N]F is the pixel value of the nth frame image in the depth image sequence at the pixel point (i, j)*And (i, j) is the pixel point signal after zero equalization processing.
Step S312: and performing sign function transformation on the pixel point signals subjected to zero equalization processing to obtain sign function pixel point signals of each pixel point in the sub-depth cube.
In one embodiment, step S312 may be implemented by the following formula:
sf(i,j)=sign(f*(i,j));
wherein sf (i, j) is f*(i, j) obtaining a symbol function pixel point signal after symbol function conversion, wherein sign () is a symbol function.
Step S313: and performing time domain convolution operation on the symbol function pixel point signal to obtain a pixel value of a non-noise pixel point in the sub-depth cube.
In a specific application, for a certain specific pixel point (i, j), the number of pixel points with sf (i, j) >0 in a sub-depth cube is called a positive sample number and is called P (i, j), the number of pixel points with sf (i, j) <0 is called a negative sample number and is called Q (i, j), the number of pixel points with non-zero time motion response is called a non-zero response sample number and is called NZ (i, j), and a template [ -1,0, -1] is taken as a convolution kernel; correspondingly, in an embodiment, the step S311 is specifically implemented by the following formula:
where M (i, j) represents the pixel value of the temporal motion response at pixel point (i, j).
In the embodiment, the human body action is recognized by calculating the time motion response mode for the obtained sub-depth cube, the algorithm is simple, and the processing speed is high.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
As shown in fig. 5, an embodiment of the present invention provides an action recognition system 100 for performing the method steps in the embodiment corresponding to fig. 1, which includes:
the first dividing module 10 is configured to divide a depth image sequence of a target object in a time dimension according to a preset time step to obtain a depth cube in each time domain;
a second dividing module 20, configured to divide the depth cube in a spatial dimension according to a preset plane grid, to obtain multiple sub-depth cubes with the same dimension, where the number of image frames of the sub-depth cubes is the same as the number of image frames of the depth cube;
an obtaining module 30, configured to obtain a time motion response map of the sub-depth cube;
an extraction module 40, configured to extract preset image features of the temporal motion response map;
the splicing module 50 is configured to splice preset image features of all sub-depth cubes corresponding to the depth cube to obtain feature vectors of the depth cube, and splice the feature vectors of all depth cubes corresponding to the depth image sequence to obtain feature descriptors of the depth image sequence;
a classification module 60 for classifying the feature descriptors by a linear support vector machine (SVW) to identify the action type of the target object.
According to the method, the target object is not required to be pre-segmented, and the method has the remarkable advantages of high algorithm processing speed and high recognition precision efficiency.
As shown in fig. 6, in an embodiment of the present invention, the obtaining module 30 includes a structure for executing the method steps in the embodiment corresponding to fig. 2, and includes:
a pixel value obtaining unit 31, configured to obtain a pixel value of a non-noise pixel point in the sub-depth cube;
and the response map obtaining unit 32 is configured to obtain, according to the pixel value, a pixel point representing motion of a target node in the non-noise pixel points, and obtain a time motion response map of the sub-depth cube.
In one embodiment, the response map obtaining unit 32 specifically includes:
the screening subunit is used for screening out pixel points representing the motion of the target node from the non-noise pixel points according to the time change characteristic of the pixel values;
and the response map acquisition unit is used for carrying out visualization processing on the pixel points representing the motion of the target node to acquire the time motion response map of the sub-depth cube.
As shown in fig. 7, in an embodiment of the present invention, the pixel value obtaining unit 31 includes a structure for executing the method steps in the embodiment corresponding to fig. 4, and includes:
an averaging processing subunit 311, configured to perform zero averaging processing on the pixel point signal of each pixel point in the sub-depth cube;
a function transformation subunit 312, configured to perform sign function transformation on the pixel point signal after zero equalization processing, so as to obtain a sign function pixel point signal of each pixel point in the sub-depth cube;
and the pixel value obtaining subunit 313 is configured to perform time domain convolution operation on the sign function pixel point signal to obtain a pixel value of a non-noise pixel point in the sub-depth cube.
In an embodiment, the equalization processing subunit 311 is specifically configured to perform zero equalization processing on the pixel point signal of each pixel point in the sub-depth cube according to the following formula:
Figure BDA0001423208640000101
where N is the number of frames in the depth image sequence, f (i, j) [ N]F is the pixel value of the nth frame image in the depth image sequence at the pixel point (i, j)*And (i, j) is the pixel point signal after zero equalization processing.
In one embodiment, the function transformation subunit 312 is specifically configured to calculate the sign function pixel point signal according to the following formula:
sf(i,j)=sign(f*(i,j));
wherein sf (i, j) is f*(i, j) obtaining a symbol function pixel point signal after symbol function conversion, wherein sign () is a symbol function.
In an embodiment, the pixel value obtaining subunit 313 is specifically configured to calculate the pixel value of the non-noise pixel according to the following formula:
Figure BDA0001423208640000111
wherein, M (i, j) represents the pixel value of the temporal motion response at the pixel point (i, j), and P (i, j) is the number of pixel points sf (i, j) >0 in the sub-depth cube, which is called the number of sample samples; q (i, j) is the number of pixels of sf (i, j) <0 in the sub-depth cube, and is called as the number of negative samples; NZ (i, j) is the number of pixels with nonzero time motion response in the sub-depth cube, and is called the number of nonzero response samples.
In the embodiment, the human body action is recognized by calculating the time motion response mode for the obtained sub-depth cube, the algorithm is simple, and the processing speed is high.
Fig. 8 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 8, the terminal device 7 of this embodiment includes a processor 70, a memory 71, and a computer program 72 stored in the memory 71 and operable on the processor 70, and when the processor 70 executes the computer program 72, the steps in the above-described method embodiments, such as steps S10 to S60 shown in fig. 1, are implemented. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the modules/units in the above-described device embodiments, such as the functions of the modules 10 to 60 shown in fig. 5.
Illustratively, the computer program 72 may be partitioned into one or more modules/units that are stored in the memory 71 and executed by the processor 70 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 72 in the terminal device 7. For example, the computer program 72 may be divided into a first dividing module, a second dividing module, an obtaining module, an extracting module, a splicing module and an action identifying module, and the specific functions of the modules are as follows:
the first dividing module is used for dividing the depth image sequence of the target object in the time dimension according to a preset time step to obtain a depth cube in each time domain;
the second division module is used for dividing the depth cube on the spatial dimension according to a preset plane grid to obtain a plurality of sub-depth cubes with the same dimension, and the number of image frames of the sub-depth cubes is the same as that of the image frames of the depth cubes;
the acquisition module is used for acquiring a time motion response map of the sub-depth cube;
the extraction module is used for extracting preset image characteristics of the time motion response image;
the splicing module is used for splicing preset image features of all sub-depth cubes corresponding to the depth cubes to obtain feature vectors of the depth cubes, splicing the feature vectors of all the depth cubes corresponding to the depth image sequence to obtain feature descriptors of the depth image sequence;
and the motion identification module is used for classifying the feature descriptors through a linear support vector machine so as to identify the motion type of the target object.
The terminal device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 8 is merely an example of a terminal device 7 and does not constitute a limitation of the terminal device 7 and may include more or less components than those shown, or some components may be combined, or different components, for example the terminal device may also include input output devices, network access devices, buses, etc.
The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing the computer program and other programs and data required by the terminal device. The memory 71 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. . Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (8)

1. A motion recognition method, comprising:
dividing the depth image sequence of the target object in a time dimension according to a preset time step to obtain a depth cube in each time domain;
dividing the depth cube on a spatial dimension according to a preset plane grid to obtain a plurality of sub-depth cubes with the same dimension, wherein the number of image frames of the sub-depth cubes is the same as that of the depth cubes;
acquiring a time motion response map of the sub-depth cube;
extracting preset image characteristics of the time motion response image;
splicing preset image features of all sub-depth cubes corresponding to the depth cube to obtain feature vectors of the depth cube, and splicing the feature vectors of all depth cubes corresponding to the depth image sequence to obtain feature descriptors of the depth image sequence;
classifying the feature descriptors through a linear support vector machine to identify the action type of the target object;
the obtaining of the temporal motion response map of the sub-depth cube includes:
acquiring pixel values of non-noise pixel points in the sub-depth cube;
and acquiring pixel points which characterize the motion of the target node in the non-noise pixel points according to the pixel values, and acquiring a time motion response graph of the sub-depth cube.
2. The motion recognition method of claim 1, wherein the obtaining pixel values of non-noise pixel points in the sub-depth cube comprises:
carrying out zero equalization processing on the pixel point signal of each pixel point in the sub-depth cube;
performing sign function transformation on the pixel point signals subjected to zero equalization processing to obtain sign function pixel point signals of each pixel point in the sub-depth cube;
and performing time domain convolution operation on the symbol function pixel point signal to obtain a pixel value of a non-noise pixel point in the sub-depth cube.
3. The motion recognition method according to claim 1, wherein the obtaining, according to the pixel values, pixel points characterizing a motion of a target node from among the non-noise pixel points to obtain a time motion response map of the sub-depth cube comprises:
screening out pixel points which characterize the motion of a target node from the non-noise pixel points according to the time change characteristic of the pixel value;
and carrying out visualization processing on the pixel points representing the motion of the target node to obtain a time motion response graph of the sub-depth cube.
4. The motion recognition method according to claim 1, wherein the preset time step is a fixed time step which is not overlapped or partially overlapped, the adjacent time domains are not overlapped or partially overlapped, and the image feature is an histogram of oriented gradients feature.
5. A motion recognition system, comprising:
the first dividing module is used for dividing the depth image sequence of the target object in the time dimension according to a preset time step to obtain a depth cube in each time domain;
the second division module is used for dividing the depth cube on the spatial dimension according to a preset plane grid to obtain a plurality of sub-depth cubes with the same dimension, and the number of image frames of the sub-depth cubes is the same as that of the image frames of the depth cubes;
the acquisition module is used for acquiring a time motion response map of the sub-depth cube;
the extraction module is used for extracting preset image characteristics of the time motion response image;
the splicing module is used for splicing preset image features of all sub-depth cubes corresponding to the depth cubes to obtain feature vectors of the depth cubes, splicing the feature vectors of all the depth cubes corresponding to the depth image sequence to obtain feature descriptors of the depth image sequence;
the motion recognition module is used for classifying the feature descriptors through a linear support vector machine so as to recognize the motion type of the target object;
the acquisition module includes:
the pixel value acquisition unit is used for acquiring the pixel value of a non-noise pixel point in the sub-depth cube;
and the response map acquisition unit is used for acquiring pixel points which characterize the motion of the target node in the non-noise pixel points according to the pixel values to acquire the time motion response map of the sub-depth cube.
6. The motion recognition system according to claim 5, wherein the pixel value acquisition unit includes:
the averaging processing subunit is used for carrying out zero averaging processing on the pixel point signal of each pixel point in the sub-depth cube;
the function transformation subunit is used for performing sign function transformation on the pixel point signals subjected to zero equalization processing to obtain sign function pixel point signals of each pixel point in the sub-depth cube;
and the pixel value acquisition subunit is used for performing time domain convolution operation on the sign function pixel point signal to acquire the pixel value of the non-noise pixel point in the sub-depth cube.
7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 4 when executing the computer program.
8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
CN201710901427.2A 2017-09-28 2017-09-28 Action identification method and system and terminal equipment Active CN107704819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710901427.2A CN107704819B (en) 2017-09-28 2017-09-28 Action identification method and system and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710901427.2A CN107704819B (en) 2017-09-28 2017-09-28 Action identification method and system and terminal equipment

Publications (2)

Publication Number Publication Date
CN107704819A CN107704819A (en) 2018-02-16
CN107704819B true CN107704819B (en) 2020-01-24

Family

ID=61175218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710901427.2A Active CN107704819B (en) 2017-09-28 2017-09-28 Action identification method and system and terminal equipment

Country Status (1)

Country Link
CN (1) CN107704819B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608421A (en) * 2015-12-18 2016-05-25 中国科学院深圳先进技术研究院 Human movement recognition method and device
CN106709461A (en) * 2016-12-28 2017-05-24 中国科学院深圳先进技术研究院 Video based behavior recognition method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608421A (en) * 2015-12-18 2016-05-25 中国科学院深圳先进技术研究院 Human movement recognition method and device
CN106709461A (en) * 2016-12-28 2017-05-24 中国科学院深圳先进技术研究院 Video based behavior recognition method and device

Also Published As

Publication number Publication date
CN107704819A (en) 2018-02-16

Similar Documents

Publication Publication Date Title
Hashemi et al. Template matching advances and applications in image analysis
CN109165538B (en) Bar code detection method and device based on deep neural network
CN110334762B (en) Feature matching method based on quad tree combined with ORB and SIFT
CN110991533B (en) Image recognition method, recognition device, terminal device and readable storage medium
US11145080B2 (en) Method and apparatus for three-dimensional object pose estimation, device and storage medium
CN111145209A (en) Medical image segmentation method, device, equipment and storage medium
KR101436369B1 (en) Apparatus and method for detecting multiple object using adaptive block partitioning
Richardson et al. Learning convolutional filters for interest point detection
CN109410246B (en) Visual tracking method and device based on correlation filtering
CN114049499A (en) Target object detection method, apparatus and storage medium for continuous contour
CN111161348B (en) Object pose estimation method, device and equipment based on monocular camera
CN111145196A (en) Image segmentation method and device and server
CN114444565A (en) Image tampering detection method, terminal device and storage medium
CN113191189A (en) Face living body detection method, terminal device and computer readable storage medium
CN112686122A (en) Human body and shadow detection method, device, electronic device and storage medium
CN112418089A (en) Gesture recognition method and device and terminal
CN108229498B (en) Zipper piece identification method, device and equipment
CN111126248A (en) Method and device for identifying shielded vehicle
CN111488811A (en) Face recognition method and device, terminal equipment and computer readable medium
CN107704819B (en) Action identification method and system and terminal equipment
Bozkurt et al. Multi-scale directional-filtering-based method for follicular lymphoma grading
CN113705660A (en) Target identification method and related equipment
Salau An effective graph-cut segmentation approach for license plate detection
CN111382632A (en) Target detection method, terminal device and computer-readable storage medium
CN117456284B (en) Image classification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant