CN112580442B - Behavior identification method based on multi-dimensional pyramid hierarchical model - Google Patents

Behavior identification method based on multi-dimensional pyramid hierarchical model Download PDF

Info

Publication number
CN112580442B
CN112580442B CN202011398484.1A CN202011398484A CN112580442B CN 112580442 B CN112580442 B CN 112580442B CN 202011398484 A CN202011398484 A CN 202011398484A CN 112580442 B CN112580442 B CN 112580442B
Authority
CN
China
Prior art keywords
pyramid
action
features
scale
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011398484.1A
Other languages
Chinese (zh)
Other versions
CN112580442A (en
Inventor
黄倩
李畅
陈斯斯
李兴
毛莺池
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Huiying Electronic Technology Co ltd
Hohai University HHU
Original Assignee
Nanjing Huiying Electronic Technology Co ltd
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Huiying Electronic Technology Co ltd, Hohai University HHU filed Critical Nanjing Huiying Electronic Technology Co ltd
Priority to CN202011398484.1A priority Critical patent/CN112580442B/en
Publication of CN112580442A publication Critical patent/CN112580442A/en
Application granted granted Critical
Publication of CN112580442B publication Critical patent/CN112580442B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • G06V10/507Summing image-intensity values; Histogram projection analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a behavior identification method based on a multi-dimensional pyramid hierarchical model, which is characterized in that a multi-dimensional pyramid hierarchical model containing space dimensionality and time dimensionality is constructed to model behaviors in a video so as to capture structured multi-scale features, and then behavior identification is carried out through a classifier. The invention fully describes the behavior characteristics under different scales from multiple dimensions, provides more discriminative additional information for behavior identification, and effectively improves the accuracy and robustness of behavior identification.

Description

Behavior identification method based on multi-dimensional pyramid hierarchical model
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a behavior recognition method.
Background
Behavior recognition is one of important research subjects in the field of computer vision, and has wide application prospects in the aspects of intelligent safety monitoring, novel human-computer interaction, intelligent traffic management, smart cities, smart homes and the like. Early behavior recognition techniques were based primarily on RGB data acquired by a common camera. These techniques are susceptible to external factors such as the shooting environment, lighting conditions, and wearing textures. With the increasing demand of intelligent behavior analysis, a series of behavior recognition technologies based on depth data, skeletal data and multi-mode fusion are generated under the promotion of big data and machine learning algorithms.
In order to construct a behavior recognition model based on depth video data, an intuitive method is to apply feature descriptor expansion commonly used in RGB image videos to the depth image videos. For this reason, many effective motion feature coding techniques are studied for describing depth sequences of motions, such as motion energy Map (MEI), motion history Map (MHI), and depth motion map (DMI). The skeleton-based method represents human body motion information through dynamic three-dimensional skeleton sequence data, and mainly aims to dig the relative positions of all key skeleton points for identification. Subsequently, a behavior recognition method based on multi-modal fusion data is concerned, and the method combines two or more kinds of data to perform behavior recognition, so that more complementary information can be provided for action description, and the accuracy of behavior recognition is improved.
Although many advances have been made in the research of behavior recognition, many problems still remain. Behaviors contain information in different dimensions, including space, time and the like, and behaviors in different dimensions also contain rich multi-scale information. When the dimensions and scales observed by the behavior people change, the presentation mode of the action also changes. However, in an unknown scene, computer vision cannot perceive the scale change as human eyes. The existing behavior recognition methods ignore multi-scale information of actions, so that the methods lack robustness and are difficult to apply to practical environments.
In summary, the main problem of the existing behavior identification method is that multi-scale motion features under different dimensions cannot be sufficiently extracted to identify similar behaviors. Therefore, designing a behavior model for describing different scale features under multiple dimensions and extracting structured multi-scale features from the behavior model is an urgent problem to be solved.
Disclosure of Invention
In order to solve the technical problems mentioned in the background art, the invention provides a behavior identification method based on a multi-dimensional pyramid hierarchical model.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
a behavior identification method based on a multi-dimensional pyramid hierarchical model comprises the following steps:
(1) constructing a multi-dimensional pyramid hierarchical model: projecting a depth video frame obtained by a depth camera onto a coordinate plane to obtain an action characteristic diagram, wherein the action characteristic diagram is used for representing a depth video sequence of each action sample, and generating a Gaussian pyramid as a space dimension pyramid through Gaussian low-pass filtering and downsampling; dividing the depth video sequence of each action sample into a plurality of partitions, and calculating an action characteristic map of each partition to construct a time dimension pyramid; the space dimension pyramid and the time dimension pyramid together form a multi-dimensional pyramid level model;
(2) extracting structured multi-scale features: firstly, sequentially extracting action features from bottom to top according to the hierarchical structure of a space dimension pyramid, then extracting the action features according to the hierarchical structure of a time dimension pyramid, and then cascading the action features extracted twice to generate space-time multi-scale action features;
(3) behavior recognition: and (3) inputting the multi-scale features extracted in the step (2) into a trained classifier or a neural network for classification to obtain a behavior recognition result.
Further, in the step (1), on the basis of generating the gaussian pyramid, a laplacian pyramid is further generated to enhance the multi-scale dynamic information as an optimized spatial dimension pyramid.
Further, in step (1), the process of constructing the spatial dimension pyramid is as follows:
(1a) projecting depth video frames obtained by a depth camera onto three orthogonal Cartesian planes, taking the minimum value of the same pixel position in a depth video sequence as the pixel value of an action characteristic diagram, and generating three action characteristic diagrams with different visual angles by each depth frame, wherein the action characteristic diagrams respectively correspond to a front view, a side view and a top view; performing brightness normalization on the generated action characteristic diagram, and cutting an interested area;
(1b) generating a Gaussian pyramid by performing Gaussian low-pass filtering and downsampling on the action characteristic diagram of each view angle;
(1c) and (3) obtaining a prediction pyramid by carrying out interpolation and Gaussian smoothing on each layer of the Gaussian pyramid, and correspondingly subtracting each layer of the prediction pyramid from the Gaussian pyramid to obtain the Laplacian pyramid.
Further, in step (1), the process of constructing the time-dimensional pyramid is as follows:
(1A) dividing the depth video sequence of each action sample into a plurality of partitions, wherein each partition comprises frames with the same or different numbers, is divided into different levels according to different dividing methods, the non-partitioned depth video sequence is regarded as 0 level, the non-partitioned depth video sequence is divided into two partitions which are regarded as 1 level, and the like;
(1B) and respectively calculating the action characteristic graph of each partition as a time dimension pyramid so as to capture sub-actions of different time scales in the depth video sequence.
Further, the specific process of step (2) is as follows:
(2a) normalizing the action characteristic graphs under the same visual angle to be the same size;
(2b) the method comprises the steps that action features extracted from action feature graphs of the same level in a spatial dimension pyramid are cascaded to obtain action features of three visual angles under the scale;
(2c) sequentially extracting action features of different levels from bottom to top according to the spatial wiki-character tower hierarchical structure and cascading to generate action features of different scales;
(2d) extracting multi-scale time features according to the levels of the time dimension pyramid, firstly extracting action features of level 0, then sequentially extracting action features of other levels, and cascading the action features in each level;
(2e) and (3) cascading the action characteristics in the steps (2c) and (2d) into structured space-time multi-scale action characteristics, and carrying out normalization and dimension reduction treatment on the structured space-time multi-scale action characteristics.
Further, the motion feature adopts a direction gradient histogram, a local binary pattern or a scale-invariant feature transform.
Further, the action feature map adopts a depth motion map, a motion energy map or a motion history map.
Further, the number of layers of the space dimension pyramid and the time dimension pyramid is determined according to the computing resources and the storage resources, and the CPU utilization rate, the memory occupancy rate, the video card performance and the GPU video memory utilization rate are used as evaluation indexes for measuring the computing resources and the storage resources.
Further, a four-layer space dimension pyramid and a two-layer time dimension pyramid are adopted; in addition, a higher level pyramid is used when the CPU utilization rate, the memory occupancy rate and the GPU video memory utilization rate are lower than 30% and the video card performance is superior to that in the standard state, and a lower level pyramid is used when the CPU utilization rate, the memory occupancy rate and the GPU video memory utilization rate are higher than 70% and the video card performance is lower than that in the standard state.
Further, in the step (3), the multi-scale features extracted in the step (2) are divided into a training set and a testing set, the classifier is initialized randomly at first, parameters in the classifier are trained according to cross entropy loss by using action samples of the training set, and then the testing set is input into the trained classifier to obtain a final behavior recognition result; the classifiers include, but are not limited to, extreme learning machines, support vector machines, and random forest classifiers.
Adopt the beneficial effect that above-mentioned technical scheme brought:
1. the invention provides a multi-dimensional pyramid hierarchical model, which is a modeling method for describing structured multi-scale features of an identified object in different dimensions. Firstly, the model can realize dynamic compression and expansion of dimensions and layers to meet the requirements of different application fields, and therefore, the model has wider applicability. Secondly, in each dimension, the model can increase the feature types by expanding the number of the child nodes, and the model can adjust the scale diversity of the features by setting the number of layers of the pyramid in the same dimension, so that the features of the identification object can be more fully mined and described. In addition, the model integrally presents a tree-shaped hierarchical structure, and structured multi-scale features can be effectively extracted.
2. The behavior identification method based on the multi-dimensional pyramid hierarchical model fully extracts the structured multi-scale action features in the space and time dimensions, captures more discriminative space-time information, has an important effect on solving the identification problem of similar behaviors and opposite behaviors, and improves the accuracy and robustness of behavior identification.
Drawings
FIG. 1 is a general framework schematic of the present invention;
FIG. 2 is a schematic diagram of a depth profile DMI in an embodiment;
FIG. 3 is a block diagram of an embodiment of a time dimension pyramid;
FIG. 4 is a diagram of a multi-dimensional pyramid hierarchy model in an embodiment.
Detailed Description
The technical scheme of the invention is explained in detail in the following with the accompanying drawings.
As shown in fig. 1, a depth motion map is used to represent a depth sequence of motion, and a gaussian pyramid (laplacian pyramid) is constructed as a spatial dimension pyramid to capture more discriminative spatial multi-scale motion information. Then, different levels of feature maps are generated as a time-dimensional pyramid by dividing the video sequence into different segments to capture the time-multiscale information of the motion. And calculating the action characteristics of the multi-dimensional pyramid hierarchical model, cascading to obtain the structured multi-scale action characteristics, and inputting into a classifier for behavior recognition. In addition, pyramids with other dimensions can be constructed to jointly form a multi-dimensional pyramid level model.
The multidimensional pyramid hierarchical model provides a modeling method for describing structured multi-scale features of an identified object in different dimensions, and dynamic compression and expansion of dimensions and layer numbers can be realized. The model integrally presents a tree-shaped hierarchical structure, and structured multi-scale features can be effectively extracted. The dimensionality comprises time, space and the like as parent nodes, and the video sequence can be divided according to the time sequence in the time dimensionality to further extract the characteristics of the whole and the local parts and the like as child nodes. Correspondingly, in the spatial dimension, the characteristics of static and dynamic states and the like can be further extracted as child nodes. In each dimension, the model can increase the types of the features by expanding the number of the child nodes, and the model can also set the layer number of the pyramid to adjust the scale diversity of the features in the same dimension. The multi-dimensional pyramid hierarchical model provided by the invention can effectively capture the multi-scale characteristics of the target under different dimensions and is suitable for various identification tasks.
The invention is further described below with reference to specific assays.
1. Generating a spatial dimension pyramid
The depth frames obtained by the depth camera are projected onto three orthogonal cartesian planes, so that three 2D motion maps, denoted as maps, are generated for each 3D depth frame v (v ∈ { f, s, t }) corresponds to a front view, a side view, and a top view, respectively, as shown in FIG. 2. The DMI takes the minimum value of the same pixel position of the depth map sequence as the pixel value of the feature map. The depth sequence with the number of frames N can be calculated by the following equation:
DMI v (i,j)=255-min(map v (i,j,t))
Figure BDA0002811613730000061
wherein map is v (i, j, t) is the pixel value at the (i, j) position in the action diagram of the t-th frame under the v view angle. The resulting image may be intensity normalized by dividing each pixel value by the maximum value of all pixels in the image. In addition, the excess black pixels can be excluded by cropping the regions of interest of the DMI. The further normalization can reduce the intra-class difference and reduce the interference of the body type and the action amplitude on the action recognition.
And carrying out Gaussian pyramid decomposition on the DMI to generate a cluster of characteristic images with different scales for simulating scale change of human eye observation actions. A structured DMI multi-scale image set is obtained by gaussian filtering and downsampling, and each layer of the spatial pyramid is numbered from bottom to top, as shown in fig. 3. By G l To represent the first layer of gaussian pyramid, i.e. G l+1 Image scale ratio of layers G l The layer is small. To obtain G l+1 Pyramid image of layer, need to be for G l The slice images are subjected to gaussian kernel convolution and downsampling. In general, the gray scale value of the corresponding coordinate (i, j) position of the ith layer image is:
Figure BDA0002811613730000062
wherein the content of the first and second substances,
Figure BDA0002811613730000063
is the convolution operator; l is the number of layers of the Gaussian pyramid; r l And C l Respectively corresponding line number and column number of the first layer characteristic diagram of the Gaussian pyramid;
Figure BDA0002811613730000064
is a gaussian window of size (2c +1) × (2c +1), which can be expressed as:
Figure BDA0002811613730000071
wherein m and n are respectively the number of rows and columns; σ is called a scale space factor, is a standard deviation of a gaussian normal distribution, and reflects the degree to which an image is blurred. The original feature map G 1 As the lowest layer of the Gaussian pyramid, G can be obtained by calculation in sequence according to a formula 2 ,G 3 ,...,G L Forming an L-level Gaussian pyramid. A series of images { I } generated by the above-described Gaussian convolution kernel and downsampling operation 1 ,I 2 ,I 3 ,...,I L And a Gaussian pyramid forming the DMI is taken as a space dimension pyramid and is marked as GP-DMI. The pyramid algorithm reduces the filter bandwidth limit between levels by eight degrees and reduces the sampling interval by the same factor. The frequency of the down-sampling is related to the size of the original image. Maximum number of layers L of Gaussian pyramid constructed from images of size M × N max =log 2 min(M,N)。
The gaussian pyramid decomposition inevitably leads to multiple-level growth of the action map, and the generated redundant static information reduces the accuracy rate of behavior identification. In order to solve the problems, a Laplacian pyramid is further generated to obtain a more compact and more discriminant multi-scale action feature map so as to reduce the interference of redundant static information on behavior identification. Pyramid of GaussLayer i characteristic diagram G of l By interpolation, i.e. inserting 0 in even rows and columns, and then filtering with Gaussian kernel, a feature map G with the same size as the lower layer can be obtained l *
Figure BDA0002811613730000072
Wherein
Figure BDA0002811613730000073
Further, the generation process of the laplacian pyramid can be expressed as:
Figure BDA0002811613730000074
wherein L is the layer number of the top layer of the Laplacian pyramid, LP l Is the l-th layer image of the laplacian pyramid decomposition. It should be noted that: in order to maintain the integrity of the motion information in the feature map, the top-level image of the gaussian pyramid is directly taken as the top-level map of the laplacian pyramid. The laplacian pyramid is an optimized space-dimensional pyramid and is denoted as LP-DMI. FIG. 4 illustrates an example motion sample generated spatially-dimensional pyramid.
2. Generating a time dimension pyramid
The video sequence is first divided into a number of portions or partitions containing an equal number of frames, and into different levels according to the different partitions, with the non-partitioned one being considered level 0, the average of the two partitions being level 1, and so on. And respectively calculating the action characteristic diagram DMI of each partition as a time dimension pyramid according to the method in the step 1 to capture the sub-actions of different time scales in the video sequence, and recording the sub-actions as HP-DMI. The structure of the generated time-dimensional pyramid is shown in fig. 4. In addition, pyramids with other dimensions can be constructed to jointly form a multi-dimensional pyramid level model.
According to computing resource and storage in specific implementationAnd selecting the layer number of the multi-dimensional pyramid hierarchical model according to the resource and practical application requirements. And using the CPU utilization rate, the memory occupancy rate, the performance of the display card, the GPU display memory utilization rate and the like as evaluation indexes for measuring computing resources. The method comprises the steps of firstly opening a cmd window, using a nivdia-smi command to check the use condition of computing resources, and then evaluating the performance of the current computer according to various indexes to determine the layer number of each dimension pyramid. The CPU utilization rate, the memory occupancy rate and the GPU video memory utilization rate are lower than 30%, and the performance of the video card is superior to P 2 The higher-level pyramid is recommended to be used, and the CPU utilization rate, the memory occupancy rate and the GPU video memory utilization rate are higher than 70%, and the performance of the video card is worse than P 8 It is recommended to use a low level pyramid. In other cases, a four-layer space dimension pyramid and a two-layer time dimension pyramid are recommended, and the dimension can be adjusted according to the actual application scene.
3. Extracting structured multi-scale features
The HOG represents the distribution of gradient and edge information in the local image, can well describe gradient change and enhance the outline information of the image, and therefore, the HOG features of multiple scales are selected and extracted for motion classification. Besides the first time, the features such as Local Binary Pattern (LBP), Scale Invariant Feature Transform (SIFT) and the like can also be selected. Firstly, the feature maps with different sizes and the same visual angle are normalized to be the same size by a method of copying adjacent pixels, so that the problem of over-small pictures caused by over decomposition layers is avoided. And cascading HOGs extracted from LP-DMIs at the same level in the time dimension pyramid to obtain the action characteristics of three visual angles at the scale. And sequentially extracting the action features of different levels from bottom to top according to the hierarchical structure of the LP-DMI, and cascading to generate the action features of different scales, namely the spatial multi-scale features. And extracting multi-scale time features according to the HP-DMI hierarchy, firstly extracting the features of the 0-level HP-DMI, and then sequentially extracting the features of other hierarchies. The Nth level comprises N characteristic graphs, and all the sub-action characteristics in each level are sequentially extracted and then cascaded. Features extracted from the multidimensional pyramid hierarchy model are concatenated into structured multi-scale features where further feature processing can be performed. Firstly, the motion characteristics are normalized by using a maximum and minimum value method, and then the dimension of the motion characteristics is reduced by using a Principal Component Analysis (PCA) algorithm. Other normalization and dimension reduction methods may also be used.
4. Behavior recognition
Dividing the processed behavior characteristics into a training set and a testing set, firstly carrying out random initialization on an ELM (extreme learning machine), using the motion samples of the training set to train parameters in extreme learning according to cross entropy loss, then using the prediction results of the motion samples in the testing set as final recognition results and evaluating the effectiveness of the method. The ELM classifier used may also be replaced with a support vector machine, a random forest, etc. classifier, and other deep networks.
The embodiments are only for illustrating the technical idea of the present invention, and the technical idea of the present invention is not limited thereto, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the scope of the present invention.

Claims (8)

1. A behavior identification method based on a multi-dimensional pyramid hierarchical model is characterized by comprising the following steps:
(1) constructing a multi-dimensional pyramid hierarchical model: projecting a depth video frame obtained by a depth camera onto a coordinate plane to obtain an action characteristic diagram, wherein the action characteristic diagram is used for representing a depth video sequence of each action sample, and then generating a Gaussian pyramid as a spatial dimension pyramid through Gaussian low-pass filtering and downsampling operation; dividing the depth video sequence of each action sample into a plurality of partitions, and calculating an action characteristic map of each partition to construct a time dimension pyramid; the space dimension pyramid and the time dimension pyramid together form a multi-dimensional pyramid level model; on the basis of generating the Gaussian pyramid, further generating a Laplacian pyramid to enhance multi-scale dynamic information to serve as an optimized space dimension pyramid; the process of constructing the spatial dimension pyramid is as follows:
(1a) projecting depth video frames obtained by a depth camera onto three orthogonal Cartesian planes, taking the minimum value of the same pixel position in a depth video sequence as the pixel value of an action characteristic diagram, and generating three action characteristic diagrams with different visual angles by each depth frame, wherein the action characteristic diagrams respectively correspond to a front view, a side view and a top view; performing brightness normalization on the generated action characteristic diagram, and cutting an interested area;
(1b) gaussian low-pass filtering and down-sampling operation are carried out on the action characteristic diagram of each view angle to generate a Gaussian pyramid;
(1c) obtaining a prediction pyramid by carrying out interpolation and Gaussian smoothing on each layer of the Gaussian pyramid, and correspondingly subtracting each layer of the prediction pyramid from the Gaussian pyramid to obtain a Laplacian pyramid;
(2) extracting structured multi-scale features: firstly, sequentially extracting action features from bottom to top according to the hierarchical structure of a space dimension pyramid, then extracting the action features according to the hierarchical structure of a time dimension pyramid, and then cascading the action features extracted twice to generate space-time multi-scale action features;
(3) and (3) behavior recognition: and (3) inputting the multi-scale features extracted in the step (2) into a trained classifier or a neural network for classification to obtain a behavior recognition result.
2. The behavior recognition method based on the multi-dimensional pyramid hierarchy model of claim 1, wherein in the step (1), the process of constructing the time-dimensional pyramid is as follows:
(1A) dividing the depth video sequence of each action sample into a plurality of partitions, wherein each partition comprises frames with the same or different numbers, is divided into different levels according to different dividing methods, the non-partitioned depth video sequence is regarded as 0 level, the non-partitioned depth video sequence is divided into two partitions which are regarded as 1 level, and the like;
(1B) and respectively calculating the action characteristic graph of each partition as a time dimension pyramid so as to capture sub-actions of different time scales in the depth video sequence.
3. The behavior recognition method based on the multidimensional pyramid hierarchy model as claimed in claim 1, wherein the specific process of the step (2) is as follows:
(2a) normalizing the action characteristic graphs under the same visual angle to be the same size;
(2b) the method comprises the steps that action features extracted from action feature graphs of the same level in a spatial dimension pyramid are cascaded to obtain action features of three visual angles under the scale;
(2c) sequentially extracting action features of different levels from bottom to top according to the spatial wiki-character tower hierarchical structure and cascading to generate action features of different scales;
(2d) extracting multi-scale time features according to the levels of the time dimension pyramid, firstly extracting action features of level 0, then sequentially extracting action features of other levels, and cascading the action features in each level;
(2e) and (3) cascading the action characteristics in the steps (2c) and (2d) into structured space-time multi-scale action characteristics, and carrying out normalization and dimension reduction treatment on the structured space-time multi-scale action characteristics.
4. The behavior recognition method based on the multi-dimensional pyramid hierarchy model according to claim 3, wherein the action features are transformed by a histogram of oriented gradients, a local binary pattern or a scale-invariant feature.
5. The behavior recognition method based on the multidimensional pyramid hierarchy model according to claim 1, wherein the motion feature map is a depth motion map, a motion energy map or a motion history map.
6. The behavior recognition method based on the multi-dimensional pyramid hierarchy model according to claim 1, wherein the number of layers of the space-dimensional pyramid and the time-dimensional pyramid is determined according to computing resources and storage resources, and a CPU utilization rate, a memory occupancy rate, a video card performance, and a GPU video memory utilization rate are used as evaluation indexes for measuring the computing resources and the storage resources.
7. The behavior recognition method based on the multi-dimensional pyramid hierarchy model of claim 6, characterized in that a four-level spatial dimension pyramid and a two-level temporal dimension pyramid are adopted; in addition, a higher level pyramid is used when the CPU utilization rate, the memory occupancy rate and the GPU video memory utilization rate are lower than 30% and the video card performance is superior to that in the standard state, and a lower level pyramid is used when the CPU utilization rate, the memory occupancy rate and the GPU video memory utilization rate are higher than 70% and the video card performance is lower than that in the standard state.
8. The behavior recognition method based on the multi-dimensional pyramid hierarchical model according to claim 1, wherein in step (3), the multi-dimensional features extracted in step (2) are divided into a training set and a testing set, a classifier is initialized randomly at first, parameters in the classifier are trained according to cross entropy loss by using motion samples of the training set, and then the testing set is input into the trained classifier to obtain a final behavior recognition result; the classifiers include, but are not limited to, extreme learning machines, support vector machines, and random forest classifiers.
CN202011398484.1A 2020-12-02 2020-12-02 Behavior identification method based on multi-dimensional pyramid hierarchical model Active CN112580442B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011398484.1A CN112580442B (en) 2020-12-02 2020-12-02 Behavior identification method based on multi-dimensional pyramid hierarchical model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011398484.1A CN112580442B (en) 2020-12-02 2020-12-02 Behavior identification method based on multi-dimensional pyramid hierarchical model

Publications (2)

Publication Number Publication Date
CN112580442A CN112580442A (en) 2021-03-30
CN112580442B true CN112580442B (en) 2022-08-09

Family

ID=75127163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011398484.1A Active CN112580442B (en) 2020-12-02 2020-12-02 Behavior identification method based on multi-dimensional pyramid hierarchical model

Country Status (1)

Country Link
CN (1) CN112580442B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469056A (en) * 2021-07-02 2021-10-01 上海商汤智能科技有限公司 Behavior recognition method and device, electronic equipment and computer readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473530B (en) * 2013-08-30 2016-06-15 天津理工大学 Self adaptation action identification method based on multi views and multi-modal feature
CN110197116B (en) * 2019-04-15 2023-05-23 深圳大学 Human behavior recognition method, device and computer readable storage medium

Also Published As

Publication number Publication date
CN112580442A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
Chen et al. Retracted: Multiscale fast correlation filtering tracking algorithm based on a feature fusion model
WO2021043168A1 (en) Person re-identification network training method and person re-identification method and apparatus
CN108108751B (en) Scene recognition method based on convolution multi-feature and deep random forest
Chen et al. Saliency detection via the improved hierarchical principal component analysis method
Oliva et al. Scene-centered description from spatial envelope properties
CN111738143B (en) Pedestrian re-identification method based on expectation maximization
CN110334762B (en) Feature matching method based on quad tree combined with ORB and SIFT
CN107977661B (en) Region-of-interest detection method based on FCN and low-rank sparse decomposition
CN110263712B (en) Coarse and fine pedestrian detection method based on region candidates
CN113762138B (en) Identification method, device, computer equipment and storage medium for fake face pictures
CN113989890A (en) Face expression recognition method based on multi-channel fusion and lightweight neural network
Cho et al. Semantic segmentation with low light images by modified CycleGAN-based image enhancement
Soni et al. Face recognition using SOM neural network with different facial feature extraction techniques
CN111104924B (en) Processing algorithm for identifying low-resolution commodity image
Parde et al. Deep convolutional neural network features and the original image
CN116977674A (en) Image matching method, related device, storage medium and program product
CN109299702B (en) Human behavior recognition method and system based on depth space-time diagram
CN110728238A (en) Personnel re-detection method of fusion type neural network
CN112580442B (en) Behavior identification method based on multi-dimensional pyramid hierarchical model
CN114049531A (en) Pedestrian re-identification method based on weak supervision human body collaborative segmentation
Kukharev et al. Face recognition using two-dimensional CCA and PLS
Özyurt et al. A new method for classification of images using convolutional neural network based on Dwt-Svd perceptual hash function
Chen et al. An improved local descriptor and threshold learning for unsupervised dynamic texture segmentation
Moujahid et al. Multi-scale multi-block covariance descriptor with feature selection
Li et al. Spatial and temporal information fusion for human action recognition via Center Boundary Balancing Multimodal Classifier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant