CN113486734B

CN113486734B - Gait recognition method, system, equipment and storage medium

Info

Publication number: CN113486734B
Application number: CN202110678960.3A
Authority: CN
Inventors: 利华康; 赵慧民; 邱怡丹
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2023-11-21
Anticipated expiration: 2041-06-18
Also published as: CN113486734A

Abstract

The application relates to the technical field of computer vision, in particular to a gait recognition method, a system, equipment and a storage medium based on space-time slicing characteristics, which comprises the following steps: the features from adjacent body parts are associated from top to bottom by the slice extraction device, the weight of each frame is learned by a residual frame attention mechanism and weighted to each frame of the gait contour sequence, so that the network can pay more attention to the frames with high contribution. The method combines the residual frame attention mechanism with the slice feature in parallel, and can flexibly screen each adjacent feature of the human body, so that the method provided by the application has higher gait recognition accuracy under the cross visual angle and complex conditions; in addition, the application can input any number of video frames and has the advantages of simple and flexible model, wide application range and good real-time performance.

Description

Gait recognition method, system, equipment and storage medium

Technical Field

The application relates to the technical field of computer vision, in particular to a gait recognition method, system, equipment and storage medium based on space-time slicing characteristics.

Background

Gait recognition is a hot problem in the field of computer vision in recent years, and gait is a biological feature and has wide application situations in various fields due to the advantage of long-distance recognition. The gait recognition technology can realize recognition in a remote place under the condition that a subject does not cooperate or invades, and other common biological characteristics such as a face, fingerprints and irises are limited by distance and can be realized only by matching the subject, so that compared with the traditional recognition modes such as face recognition, pupil recognition and the like, the gait recognition has more and more extensive attention, and has wide application prospects in the aspects of crime prevention, forensic recognition, criminal investigation and the like.

Although the accuracy of gait recognition technology for human body target recognition is over 95% in normal wear and standard viewing angles, even up to 100% in individual data sets, the accuracy of gait recognition is still low in viewing angle transformation, environmental abnormality, wearing change and the like, which are just ubiquitous in real situations, so that the application range of gait recognition is greatly limited by the existing gait recognition technology; in addition, in the walking process of a person, the characteristics of each part of the body are mutually associated from top to bottom, and each frame of the gait video contains information of different degrees, however, the existing gait recognition generally ignores the two phenomena, so that the characteristics of each part of the human body are respectively analyzed, and the characteristics of each frame of the gait video are extracted with equal probability, so that the recognition accuracy under the conditions of cross-view angle and complex conditions is low.

Disclosure of Invention

The application provides a gait recognition method, a system, equipment and a storage medium based on space-time slicing features, which solve the technical problems that the existing gait recognition method not only analyzes the features of each part of a human body independently, but also extracts the features of each frame of a gait video with equal probability, so that the recognition accuracy rate of the gait video under the conditions of cross-view angle and complex conditions is low.

In order to solve the technical problems, the application provides a gait recognition method, a gait recognition system, gait recognition equipment and a storage medium.

In a first aspect, the present application provides a gait recognition method, the method comprising the steps of:

inputting the preprocessed gait contour sequence into a spatial feature extractor to obtain gait spatial features;

processing the gait space features by using a space horizontal pooling device to obtain gait space pooling features;

inputting the gait space pooling feature into a slice extraction device to obtain a first slice feature;

pooling the first slice feature and inputting the pooled first slice feature into a multi-scale channel attention mechanism to obtain a second slice feature;

inputting the second slice feature into a residual frame attention mechanism to obtain a third slice feature;

inputting the spliced third slice characteristic into a time width aggregator to obtain a refined characteristic;

and inputting the refined features into a full connection layer to obtain feature descriptors for gait recognition.

In a further embodiment, the step of inputting the preprocessed gait contour sequence into the spatial feature extractor to obtain the gait spatial features comprises:

preprocessing the acquired gait contour sequence, and inputting the preprocessed gait contour sequence into each convolution layer of a spatial feature extractor to obtain spatial features output by each convolution layer;

and connecting the spatial features output by the preset convolution layer in the channel dimension to obtain gait spatial features.

In a further embodiment, the spatial feature extractor includes a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, a fifth convolution layer, and a sixth convolution layer;

wherein the first convolution layer comprises 32 filters of size 5×5, with a step size of 2, which are activated by a first activation function;

the second convolution layer comprises 32 filters with the size of 3×3, the step size is 1, and the second convolution layer is activated by using a second activation function and performs 2×2 maximum pooling;

the third convolution layer comprises 64 filters of size 3×3, with a step size of 1, which are activated with a third activation function;

the fourth convolution layer comprises 64 filters of 3×3 size, with a step size of 1, activated with a fourth activation function and 2×2 max pooling;

the fifth convolution layer comprises 128 filters of size 3 x 3, with a step size of 1, activated with a fifth activation function;

the sixth convolution layer includes 128 filters of size 3 x 3, with a step size 1, that are activated using a sixth activation function.

In a further embodiment, the step of processing the gait spatial features with a spatial horizontal pooler to obtain gait spatial pooling features comprises:

mapping the gait space features into horizontal stripes to obtain space horizontal features;

and respectively carrying out global maximum pooling and global average pooling on the space horizontal features, and combining pooling results to obtain gait space pooling features.

In a further embodiment, the step of inputting the gait space pooling feature into a slice extraction device to obtain a first slice feature comprises:

based on the preset association feature quantity and association degree, horizontally dividing the gait space pooling feature in a space dimension to obtain a plurality of slices;

and concentrating the slices with the preset number of slices in the space dimension based on the maximum value statistical function and the average value statistical function to obtain a first slice characteristic.

In a further embodiment, the number of correlation features is greater than the degree of correlation.

In a further embodiment, the step of inputting the second slice feature into a residual frame attention mechanism, resulting in a third slice feature comprises:

carrying out channel compression on the second slice characteristic by utilizing global maximum pooling and global average pooling to obtain a slice compression characteristic;

inputting the slice compression characteristics into a frame attention network to obtain frame attention weights;

residual weighting is carried out on the frame attention weight and the second slice feature, so that a third slice feature is obtained;

wherein the frame attention network comprises a convolution layer with a channel number of 1.

In a second aspect, the present application provides a gait recognition system, the system comprising:

the feature extraction module is used for inputting the preprocessed gait contour sequence into the space feature extractor to obtain gait space features, and processing the gait space features by using the space horizontal pooling device to obtain gait space pooling features;

the feature processing module is used for inputting the gait space pooling feature into a slice extraction device to obtain a first slice feature, and inputting the pooled first slice feature into a multi-scale channel attention mechanism to obtain a second slice feature;

the key frame analysis module is used for inputting the second slice characteristic into a residual frame attention mechanism to obtain a third slice characteristic;

and the feature identification module is used for inputting the third slice feature into a time width aggregator after being spliced to obtain a refined feature, and inputting the refined feature into the full-connection layer to obtain a feature descriptor for gait identification.

In a third aspect, the present application also provides a computer device, including a processor and a memory, where the processor is connected to the memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the computer device performs steps for implementing the method.

In a fourth aspect, the present application also provides a computer readable storage medium having stored therein a computer program which when executed by a processor performs the steps of the above method.

The application provides a gait recognition method, a system, equipment and a storage medium, by which an end-to-end gait recognition technical scheme based on space-time slicing characteristics is realized; compared with the prior art, the method utilizes deep learning to extract the features of each frame of the gait contour sequence, and extracts the features in space and time through the slice extraction device and the residual frame attention mechanism, so that the correlation of the features is realized, and the frames with high contribution are focused more, so that a better recognition effect is achieved.

Drawings

FIG. 1 is a schematic flow chart of a gait recognition method according to an embodiment of the application;

FIG. 2 is a simplified schematic diagram of a gait recognition method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of correlation of human body neighboring features provided by an embodiment of the present application;

fig. 4 is a schematic flow chart of a slice extraction device according to an embodiment of the present application;

FIG. 5 is a schematic view of slice association degrees provided by an embodiment of the present application;

FIG. 6 is a flow chart of a residual frame attention mechanism provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a gait recognition system according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following examples are given for the purpose of illustration only and are not to be construed as limiting the application, including the drawings for reference and description only, and are not to be construed as limiting the scope of the application as many variations thereof are possible without departing from the spirit and scope of the application.

Aiming at the problems that the existing gait recognition method not only segments the characteristics of each part of a human body, but also extracts the characteristics of each frame of a gait video with equal probability, so that the recognition accuracy rate of the gait video is low under the conditions of cross-view angle and complexity, the embodiment of the application provides a gait recognition method, a system, equipment and a storage medium, and referring to fig. 1, fig. 1 is a flow diagram of the gait recognition method provided by an embodiment of the application, and the method comprises the following steps:

s1, inputting the preprocessed gait contour sequence into a spatial feature extractor to obtain gait spatial features.

In one embodiment, before inputting the acquired gait contour sequence into the spatial feature extractor, the embodiment needs to preprocess each frame of gait contour map to further improve the accuracy of gait recognition, and specifically includes:

and aligning the length and width of each frame of gait contour map in the gait contour sequence to be 64 x 44, normalizing each frame of gait contour map, and inputting the normalized gait contour sequence into a spatial feature extractor, wherein the gait contour sequence input into the spatial feature extractor comprises 4 dimensions, namely a batch size, a height dimension, a width dimension and a channel dimension.

It should be noted that, during the training phase, the length or frame number of the gait contour sequence should be kept consistent, the present embodiment divides the gait video into segments of every 30 to 40 frames, and for each segment, uses 30 frames as training data, wherein when the segment length is less than 15 frames, the segment is discarded, i.e. the present embodiment will not sample from the segment; when the segment length is between 15 and 30 frames, the segment length is extended to 30 frames by repeated sampling, and the extended segment is used as training data.

In one embodiment, the preprocessed gait contour sequence is input into each convolution layer of a spatial feature extractor, and the spatial features output by each convolution layer are obtained.

In one embodiment, the spatial feature extractor includes a plurality of convolution layers and corresponding activation functions, and in this embodiment, the spatial feature extractor employs six 2-D convolution layers, where the six 2-D convolution layers are a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, a fifth convolution layer, and a sixth convolution layer, respectively.

Wherein the first convolution layer comprises 32 filters of size 5×5, with a step size of 2, which are activated by a first activation function; in this embodiment, the first activation function is preferably a leak relu activation function.

The second convolution layer comprises 32 filters with the size of 3×3, the step size is 1, and the second convolution layer is activated by using a second activation function and performs 2×2 maximum pooling; in this embodiment, the second activation function is preferably a leak relu activation function.

The third convolution layer comprises 64 filters of size 3×3, with a step size of 1, which are activated with a third activation function; in this embodiment, the third activation function is preferably a leak relu activation function.

The fourth convolution layer comprises 64 filters of 3×3 size, with a step size of 1, activated with a fourth activation function and 2×2 max pooling; in this embodiment, the fourth activation function is preferably a leak relu activation function.

The fifth convolution layer comprises 128 filters of size 3 x 3, with a step size of 1, activated with a fifth activation function; in this embodiment, the fifth activation function is preferably a leak relu activation function.

The sixth convolution layer comprises 128 filters of size 3 x 3, step size 1, activated with a sixth activation function; in this embodiment, the sixth activation function is preferably a leak relu activation function.

In one embodiment, in order to obtain semantic information of a plurality of convolution layers, the embodiment connects spatial features output by a preset convolution layer in a channel dimension to obtain gait spatial features including the spatial information.

It should be noted that, in this embodiment, the second convolution layer, the fourth convolution layer, and the sixth convolution layer in the spatial feature extractor are preferentially selected as preset convolution layers, where the second convolution layer and the fourth convolution layer both include pooling layers, and a person skilled in the art may adjust the preset convolution layers to other convolution layers according to specific implementation conditions, and in this embodiment, spatial features output by the three convolution layers are connected in a channel dimension, so as to obtain gait spatial features that fuse multiple layers of semantic information.

In fig. 2, the spatial feature extractor includes three convolution blocks, namely a first convolution block, a second convolution block, and a third convolution block; each convolution block comprises two convolution layers, and the embodiment realizes the extraction of the gait contour sequence spatial characteristics through each convolution layer.

S2, processing the gait space features by using a space horizontal pooling device to obtain gait space pooling features.

In one embodiment, the spatial horizontal pooling device comprises feature mapping and pooling operations, in particular:

and respectively carrying out global maximum pooling and global average pooling on the space horizontal features, and combining pooling results to obtain gait space pooling features with time dimension.

According to the embodiment, the gait space features are mapped into the horizontal stripes, so that each horizontal stripe can represent different information of a human body, and the dimension of the space horizontal features is reduced through pooling, so that the calculated amount is reduced; according to the embodiment, the performance and the recognition accuracy can be effectively improved by performing feature mapping and pooling operation on the gait space features.

S3, inputting the gait space pooling feature into a slice extraction device to obtain a first slice feature.

In one embodiment, the slice extraction device is used to simulate the association between adjacent body parts during walking, as shown in fig. 3, and the slice extraction device is used to connect adjacent features of the human body from top to bottom, and in this embodiment, the slice extraction device includes a horizontal slice and a concentration operation.

In one embodiment, as shown in fig. 4, the horizontal slicing operation is specifically:

based on the preset association feature quantity d and association degree s, horizontally dividing the gait space pooling features in a space dimension to obtain a plurality of slices, wherein the slices have the same size as the gait space pooling features; operationally, the horizontal slice functions like a convolution, the number of associated features d corresponds to the convolution kernel size, and the degree of association s corresponds to the convolution step size, thereby progressively slicing the gait space pooling feature into a plurality of slices.

In this embodiment, the number d of association features and the degree s of association are both positive integers, and in addition, in order to be able to associate features of adjacent body parts from top to bottom, in this embodiment, d > s is preferentially set, where at this time, the first slice feature obtained by the slice extraction device has a strong association, so that the accuracy of gait recognition is further improved; those skilled in the art can form slice features with different degrees of association by adjusting the number of association features and the degree of association according to specific implementation.

When d > s, the obtained slices partially overlap each other; when d is 1 < s, the obtained slices do not overlap each other; when d=1, the obtained slices are independent of each other, regardless of the value of s; as shown in fig. 5, when d=1, s=1, the slices are independent of each other; when d=2, s=2, the slices do not overlap each other; when d=2, s=1, the slices partially overlap each other.

In one embodiment, the concentrating operation in the slice extraction device is:

and processing the slices in the space dimension by using a maximum value statistical function and an average value statistical function respectively, splicing output results of the two statistical functions, and concentrating the slices with the preset number of slices into 1 first slice feature in the space dimension, wherein in fig. 4, the preset number of slices is set to d preferentially, namely, d slices are concentrated into 1 first slice feature.

In this embodiment, the property of the first slice feature is determined by the number d of associated features of the slice and the degree s of association, that is: when d > s, the obtained first slice features are mutually associated from top to bottom; when d is more than 1 and less than or equal to s, the obtained first slice characteristics are interrelated, but are not interrelated from top to bottom; when d=1, the resulting first slice features are independent of each other, with maximum autocorrelation; in this embodiment, d > s is preferentially set, so that the first slice features are associated with each other from top to bottom, thereby improving the gait recognition accuracy under the cross-view angle and complex conditions.

According to the embodiment, the slice is generated through horizontal slice operation, and the slice generated through concentration association is utilized, so that the first slice characteristic is obtained, the purpose of carrying out association analysis on the characteristics of each part of the body is achieved, and therefore, higher recognition precision is achieved in actual gait recognition, and the probability of misjudgment is reduced; in this embodiment, the dimensions of each batch of the first slice feature include a time dimension and a channel dimension.

S4, pooling the first slice characteristics and inputting the pooled first slice characteristics into a multi-scale channel attention mechanism to obtain second slice characteristics.

Firstly, pooling the first slice features by using one-dimensional global maximum pooling and one-dimensional global average pooling, and then inputting the pooled first slice features into a multi-scale channel attention mechanism to obtain second slice features, wherein the second slice features comprise spatial information of different body parts of a human body; in this embodiment, the multi-scale channel attention mechanism first pools the first slice feature by one-dimensional global max pooling and one-dimensional global average pooling of a plurality of different kernel sizes; then, the channel of the pooled first slice feature is adjusted by convolution, and channel weight is obtained through a sigmoid function; and finally, multiplying the channel weight with the pooled first slice feature to obtain a second slice feature.

The embodiment can capture multi-scale information and model the connection among channels by using the multi-scale channel attention mechanism; in addition, the embodiment combines the slice extraction device with the multi-scale channel attention mechanism, improves the identification performance of the second slice feature, and greatly improves the gait recognition rate.

S5, inputting the second slice characteristic into a residual error frame attention mechanism to obtain a third slice characteristic.

Since each of the second slice features contains spatial information of a different body part of the human body, in order to capture a key frame of each body part sequence, the present embodiment models the timing of the gait contour sequence using a residual frame attention mechanism, thereby obtaining the importance of each of the second slice feature frames.

In one embodiment, the present embodiment uses global maximum pooling and global average pooling to perform channel compression on the second slice feature, so as to obtain a slice compression feature, thereby implementing amplification of channel data.

The present embodiment uses global max pooling and global average pooling in combination, as compared to using global max pooling or global average pooling alone, it is possible to more effectively highlight important information areas of channels.

In one embodiment, as shown in fig. 6, the present embodiment first inputs the slice compression feature into a frame attention network to obtain a frame attention weight; and then, carrying out residual weighting on the frame attention weight and the second slice characteristic to obtain a third slice characteristic.

Wherein the frame attention network comprises a convolution layer with a channel number of 1, the convolution layer is unbiased, the embodiment normalizes the output of the frame attention network to a value between 0 and 1 by a sigmoid function, and takes the value as the frame attention weight.

According to the method, the residual frame attention mechanism and the second slice feature are combined in parallel, key frame screening can be flexibly performed on each adjacent part feature of the human body, so that information of different degrees contained in each frame is obtained.

In this embodiment, since the residual frame attention mechanism is constructed in parallel, the key frames of the second slice feature containing the temporal and spatial information can be captured at the same time, and the embodiment learns the frame attention weight by using the features between frames through the residual frame attention mechanism and weights to each frame of the sequence through the residual, so as to obtain the third slice feature of each body part, thereby interpreting the whole gait sequence; it should be noted that, the residual frame attention mechanism is different from the channel attention mechanism and the spatial attention mechanism, and it is meaningful that the channel attention mechanism focuses on "what" of the input image; the spatial attention mechanism focuses on "where" of the input image, the residual frame attention mechanism in this embodiment emphasizes the meaning of "when" of the input image to classification and recognition.

S6, inputting the third slice feature into a time width aggregator after splicing to obtain a refined feature.

In one embodiment, as shown in fig. 6, the third slice feature output by each residual frame attention mechanism is spliced together in the time dimension, and then the spliced third slice feature is input into a time width aggregator to obtain a refined feature, where the refined feature represents key frame information of a body part related to each other.

In general, polymerization includes two types, namely: maximum polymerization and average polymerization; since the third slice feature reduces the contribution of the most critical frames in the sequence using average aggregation, especially if there are more critical frames, and using maximum aggregation can act to the greatest extent on the most critical frames in the sequence, in this embodiment, the time-width aggregator uses maximum aggregation.

The third slice feature output by each residual frame attention mechanism is aggregated by the time width aggregator, so that the gait recognition method provided by the embodiment is not limited by the number of video frames, and videos from a person in different scenes can be flexibly integrated.

S7, inputting the refined features into a full connection layer to obtain feature descriptors for gait recognition.

In this embodiment, the fully connected layer encodes the refined features containing the spatiotemporal information into high-dimensional vectors, and obtains feature descriptors suitable for gait recognition.

It should be noted that, in the training stage, the embodiment of the present application optimizes the gait recognition method provided by the embodiment by gradient descent and back propagation using the Batch All (ba+) triple loss as a loss function.

In the test stage, the gait recognition method provided in the embodiment is utilized to obtain gait feature descriptors of the atlas and the probe set, and then average Euclidean distance between the gait feature descriptors of the atlas and the gait feature descriptors of the probe set is compared to obtain a gait recognition result; the probe set is an acquired video to be identified.

In the embodiment, the slice extraction device is utilized to correlate the adjacent features generated when the human body walks from top to bottom, and meanwhile, the embodiment analyzes the second slice features frame by utilizing the residual error frame attention mechanism so as to acquire the frame attention weight, and the residual error weight is added to each frame of the gait contour sequence, so that the accuracy of gait recognition is greatly improved; in addition, in the embodiment, by combining the residual frame attention mechanism with the second slice feature in parallel, key frames can be flexibly screened for each adjacent part feature of the human body, the real situation of the human body during movement is well simulated, and a good gait recognition effect can be obtained in most practical application scenes; in addition, the embodiment fully utilizes the space-time information in gait recognition and improves the precision of gait recognition.

It should be noted that, the sequence number of each process does not mean that the execution sequence of each process is determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

In one embodiment, as shown in fig. 7, there is provided a gait recognition system, the system comprising:

the feature extraction module 101 is configured to input the preprocessed gait contour sequence into the spatial feature extractor to obtain gait spatial features, and process the gait spatial features by using the spatial horizontal pooling device to obtain gait spatial pooling features;

the feature processing module 102 is configured to input the gait space pooling feature into a slice extraction device to obtain a first slice feature, and input the pooled first slice feature into a multi-scale channel attention mechanism to obtain a second slice feature;

a key frame analysis module 103, configured to input the second slice feature into a residual frame attention mechanism to obtain a third slice feature;

and the feature recognition module 104 is configured to splice the third slice feature, input the spliced third slice feature into a time width aggregator to obtain a refined feature, and input the refined feature into a full-connection layer to obtain a feature descriptor for gait recognition.

For specific limitations of a gait recognition system, reference may be made to the above-mentioned limitations of a gait recognition method, which are not repeated here. Those of ordinary skill in the art will appreciate that the various modules and steps described in connection with the disclosed embodiments of the application may be implemented in hardware, software, or a combination of both. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Compared with the prior art, the gait recognition system provided by the embodiment utilizes the feature processing module 102 and the key frame analysis module 103 to correlate adjacent features generated when a human body walks from top to bottom, and simultaneously improves the cross visual angle and the gait recognition performance under complex walking conditions by screening key frames of each adjacent part feature of the human body; in addition, the feature extraction mode of the embodiment is simple, and the complexity of the system is greatly reduced, so that the gait of the human body can be rapidly and accurately identified.

FIG. 8 is a diagram of a computer device including a memory, a processor, and a transceiver connected by a bus, according to an embodiment of the present application; the memory is used to store a set of computer program instructions and data and the stored data may be transferred to the processor, which may execute the program instructions stored by the memory to perform the steps of the above-described method.

Wherein the memory may comprise volatile memory or nonvolatile memory, or may comprise both volatile and nonvolatile memory; the processor may be a central processing unit, a microprocessor, an application specific integrated circuit, a programmable logic device, or a combination thereof. By way of example and not limitation, the programmable logic device described above may be a complex programmable logic device, a field programmable gate array, general purpose array logic, or any combination thereof.

In addition, the memory may be a physically separate unit or may be integrated with the processor.

It will be appreciated by those of ordinary skill in the art that the structure shown in FIG. 8 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be implemented, and that a particular computer device may include more or fewer components than those shown, or may combine some of the components, or have the same arrangement of components.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, implements the steps of the above method.

The embodiment of the application provides a gait recognition method, a system, equipment and a storage medium based on space-time slicing features, wherein the gait recognition method is characterized in that human body gait is modeled and analyzed, the gait features are respectively extracted and refined in space and time by using a slicing extraction device and a residual frame attention mechanism, the slicing extraction device correlates features from adjacent body parts from top to bottom in space, and slice features with strong correlation are formed by adjusting the number and the degree of correlation of the correlated features; in time, the residual frame attention mechanism focuses on the time characteristics of the gait contour sequence and learns the weight of each frame, so that the network can focus more on the frames with high contribution; in addition, in space-time, the embodiment of the application combines the residual frame attention mechanism and slice characteristics in parallel, so that the embodiment can simultaneously and flexibly select the key frame of each body part, thereby improving the gait recognition accuracy under the cross visual angle and complex conditions.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., from one website, computer, server, or data center, via a wired (e.g., coaxial cable, fiber optic, digital subscriber line, or wireless (e.g., infrared, wireless, microwave, etc.) connection to another website, computer, server, or data center.

Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed, may comprise the steps of embodiments of the methods described above.

The foregoing examples represent only a few preferred embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the application. It should be noted that modifications and substitutions can be made by those skilled in the art without departing from the technical principles of the present application, and such modifications and substitutions should also be considered to be within the scope of the present application. Therefore, the protection scope of the patent of the application is subject to the protection scope of the claims.

Claims

1. A gait recognition method, comprising the steps of:

inputting the preprocessed gait contour sequence into each convolution layer of a spatial feature extractor to obtain spatial features output by each convolution layer, and connecting the spatial features output by the preset convolution layers in the channel dimension to obtain gait spatial features fused with multi-layer semantic information; the spatial feature extractor comprises six convolution layers, wherein the preset convolution layers are a second convolution layer, a fourth convolution layer and a sixth convolution layer;

performing feature mapping and pooling operation on the gait space features by using a space horizontal pooling device to obtain gait space pooling features;

inputting the gait space pooling feature into a slice extraction device to obtain a first slice feature; the slice extraction device is used for simulating the association between adjacent body parts during walking;

after the third slice feature is spliced in the time dimension, inputting the spliced third slice feature into a time width aggregator to obtain a refined feature; the time width aggregator adopts maximum aggregation;

inputting the refined features into a full connection layer to obtain feature descriptors for gait recognition;

the step of inputting the gait space pooling feature into a slice extraction device to obtain a first slice feature comprises the following steps:

processing the slices in the space dimension by using a maximum value statistical function and an average value statistical function respectively to obtain output results of two statistical functions, splicing the output results of the two statistical functions, and concentrating the slices with the preset number of slices in the space dimension to obtain a first slice characteristic;

the step of inputting the second slice feature into a residual frame attention mechanism to obtain a third slice feature comprises:

and carrying out residual weighting on the frame attention weight and the second slice feature to obtain a third slice feature.

2. A gait recognition method as claimed in claim 1, wherein: the spatial feature extractor comprises a first convolution layer, a second convolution layer, a third convolution layer, a fourth convolution layer, a fifth convolution layer and a sixth convolution layer;

3. The gait recognition method of claim 1, wherein the step of performing feature mapping and pooling operations on the gait spatial features using a spatial horizontal pooler, to obtain gait spatial pooling features comprises:

4. A gait recognition method as claimed in claim 1, wherein: the number of the association features is greater than the association degree.

5. A gait recognition method as claimed in claim 1, wherein said frame attention network comprises a convolutional layer with a channel number of 1.

6. A gait recognition method system, the system comprising:

the feature extraction module is used for inputting the preprocessed gait outline sequences into each convolution layer of the space feature extractor to obtain space features output by each convolution layer, splicing the space features output by the preset convolution layers in the channel dimension to obtain gait space features fused with multiple layers of semantic information, and performing feature mapping and pooling operation on the gait space features by using the space horizontal pooling device to obtain gait space pooling features; the spatial feature extractor comprises six convolution layers, wherein the preset convolution layers are a second convolution layer, a fourth convolution layer and a sixth convolution layer;

the feature processing module is used for inputting the gait space pooling feature into a slice extraction device to obtain a first slice feature, and inputting the pooled first slice feature into a multi-scale channel attention mechanism to obtain a second slice feature; the slice extraction device is used for simulating the association between adjacent body parts during walking;

the feature identification module is used for inputting the spliced third slice feature into a time width aggregator after the third slice feature is spliced in the time dimension to obtain a refined feature, and inputting the refined feature into a full-connection layer to obtain a feature descriptor for gait identification; the time width aggregator adopts maximum aggregation;

the feature processing module is specifically configured to:

the key frame analysis module is specifically configured to:

7. A computer device, characterized by: comprising a processor and a memory, the processor being connected to the memory, the memory being for storing a computer program, the processor being for executing the computer program stored in the memory to cause the computer device to perform the method of any one of claims 1 to 5.

8. A computer-readable storage medium, characterized by: the computer readable storage medium has stored therein a computer program which, when executed, implements the method of any of claims 1 to 5.