CN116645697A

CN116645697A - Multi-view gait recognition method and device, electronic equipment and storage medium

Info

Publication number: CN116645697A
Application number: CN202310639621.3A
Authority: CN
Inventors: 侯赛辉; 曹春水; 刘旭
Original assignee: Watrix Technology Beijing Co ltd
Current assignee: Watrix Technology Beijing Co ltd
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2023-08-25

Abstract

The application provides a multi-view gait recognition method, a device, electronic equipment and a storage medium, which relate to the technical field of feature recognition, and are characterized in that cameras placed at different angles are used for recording walking state videos of target personnel at the same time to obtain walking sequences of the target personnel at different view angles; extracting gait features from walking sequences of different perspectives of the target person; and comparing the gait characteristics with a base to obtain an identification object meeting a set threshold, so that a plurality of cameras record walking sequences of target personnel at the same time, and one reverse side can avoid judging a single angle sequence, thereby improving the accuracy of gait identification, and meeting the application scene requirements of the current monitoring equipment system deployment.

Description

Multi-view gait recognition method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of feature recognition, in particular to a multi-view gait recognition method and device, electronic equipment and a storage medium.

Background

Gait recognition is used as a biological characteristic recognition means which is long-distance and does not need to be matched with an object, and aims to realize the recognition and identification of personal identity or the detection of physiological, pathological and psychological characteristics according to the walking gait of people. Gait recognition can be performed at lower image quality, does not require matching of recognition objects, is long in recognition distance, is difficult to disguise and hide, and has obvious advantages compared with traditional biological feature recognition.

However, at present, gait recognition technology is mostly performed based on a single view angle of walking of a person, so that the walking gesture and limb change condition of the person are relatively unilateral, and the accuracy of gait recognition is affected.

Disclosure of Invention

In view of the above, an object of the present application is to provide a multi-view gait recognition method, a multi-view gait recognition device, an electronic apparatus, and a storage medium, which can perform gait recognition with higher accuracy.

In a first aspect, an embodiment of the present application provides a multi-view gait recognition method, the method including the steps of:

recording walking state videos of a target person simultaneously by using cameras placed at different angles to obtain walking sequences of the target person at different angles;

extracting gait features from walking sequences of different perspectives of the target person;

and comparing the gait characteristics with a base to obtain an identification object meeting a set threshold.

In some embodiments, based on the recording time and the walking area, the target person included in the video recorded by the camera placed at different angles is associated, so as to obtain the walking sequence of the same target person at different angles.

In some embodiments, the walking sequence of the target person at different perspectives is obtained by:

the recorded walking state video of the target person is disassembled into a plurality of continuous images frame by frame;

and detecting and dividing the target personnel on the basis of the training image division model for each frame of the disassembled image to obtain walking sequences of the same target personnel at different visual angles.

In some embodiments, the extracting gait features from the walking sequence of different perspectives of the target person comprises the steps of:

3D modeling is carried out on walking sequences of the same target person at different view angles based on a nerve radiation field NeRF, so that three-dimensional implicit expression of human body parameters of the target person is obtained;

and extracting gait characteristics by using a convolutional neural network based on the three-dimensional implicit expression of the human body parameters of the target person.

In some embodiments, the comparing the gait feature with the base library to obtain the identification object satisfying the set contrast threshold includes the steps of:

normalizing the gait characteristics, and respectively calculating the similarity between the gait characteristics and all walking sequences in a bottom library;

and determining the identification objects meeting the set threshold according to the calculated similarity, and arranging the determined identification objects in the order from high similarity to low similarity.

In some embodiments, the similarity is calculated by the following formula:

wherein ,representing gait characteristics of the target person->Representing in a baseGesture features.

In some embodiments, the angle at which the camera is positioned includes a horizontal angle and a pitch angle.

In a second aspect, embodiments of the present application provide a multi-view gait recognition device, the device comprising:

the acquisition module is used for recording walking state videos of the target person simultaneously by using cameras placed at different angles to obtain walking sequences of the target person at different angles;

the feature extraction module is used for extracting gait features from walking sequences of different visual angles of the target person;

and the identification module is used for comparing the gait characteristics with a base to obtain an identification object meeting a set threshold.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, and a bus, where the memory stores machine-readable instructions executable by the processor, where the processor and the memory communicate via the bus when the electronic device is running, and where the machine-readable instructions, when executed by the processor, perform the steps of the multi-view gait recognition method according to any one of the first aspect above.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor performing the steps of the multi-view gait recognition method of any of the first aspects described above.

According to the multi-view gait recognition method, the device, the electronic equipment and the storage medium, the cameras placed at different angles are used for recording walking state videos of a target person at the same time, so that walking sequences of the target person at different view angles are obtained; extracting gait features from walking sequences of different perspectives of the target person; comparing the gait characteristics with a base to obtain an identification object meeting a set threshold; that is, a plurality of cameras record walking sequences of target persons at the same time, and single angle sequences are prevented from being judged, so that gait recognition accuracy is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a multi-view gait recognition method according to an embodiment of the application;

fig. 2 is a schematic diagram showing that cameras placed at different angles record walking state videos of a target person at the same time according to an embodiment of the application;

FIG. 3 shows a flow chart of a walking sequence that results in different perspectives of the target person according to an embodiment of the present application;

FIG. 4 illustrates a flow chart of extracting gait features from a walking sequence of different perspectives of the target person in accordance with an embodiment of the present application;

FIG. 5 is a schematic diagram of acquiring parameters of a human body based on a neural radiation field NeRF according to an embodiment of the present application;

FIG. 6 is a flowchart of comparing the gait feature with a base library to obtain an identification object satisfying a set comparison threshold, according to an embodiment of the application;

FIG. 7 is a schematic structural diagram of a multi-view gait recognition device according to an embodiment of the application;

fig. 8 shows a block diagram of an electronic device according to an embodiment of the application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for the purpose of illustration and description only and are not intended to limit the scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure.

In addition, the described embodiments are only some, but not all, embodiments of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that the term "comprising" will be used in embodiments of the application to indicate the presence of the features stated hereafter, but not to exclude the addition of other features.

In view of the technical problems set forth in the background art, the application provides a multi-view gait recognition method, a device, electronic equipment and a storage medium, which can improve the gait recognition precision.

Referring to fig. 1 of the specification, the multi-view gait recognition method provided by the embodiment of the application comprises the following steps:

s1, recording walking state videos of a target person by using cameras placed at different angles, and obtaining walking sequences of the target person at different angles;

s2, extracting gait features from walking sequences of different visual angles of the target person;

s3, comparing the gait characteristics with a base to obtain an identification object meeting a set threshold.

According to the multi-view gait recognition method provided by the embodiment of the application, the walking sequences of the target personnel are recorded by the cameras, so that the judgment of a single-angle sequence is avoided, and the gait recognition precision can be improved. Moreover, the method also accords with the application scene of more and more video monitoring systems which are arranged in the public places at present, and after all, the visual angles of pedestrians in the video monitoring systems are diverse.

Specifically, in step S1, the walking state of the target person is recorded simultaneously by using cameras placed at different angles, where the angles include a horizontal angle and a pitch angle, and the angles at which the cameras are placed can be referred to fig. 2 of the specification. In an embodiment, the walking state of the target person is recorded by setting three groups of cameras placed at different positions, the first group of cameras can be flush with the horizontal plane, the second group of cameras and the third group of cameras are higher than the horizontal plane, and the angle for recording the target person can be O degrees, the depression angle is 45 degrees, the elevation angle is 30 degrees and the like. In other embodiments, the number of cameras, the placement position and the recording angle can be specifically adjusted according to the requirements, which is not limited and fixed by the present application.

In the application, based on the recording time and the walking area, the target person included in the video recorded by the cameras placed at different angles is associated to obtain the walking sequences of the same target person at different angles. The walking sequence of the target person at different visual angles is obtained by the following steps with reference to the attached figure 3 in the specification:

s101, decomposing recorded walking state video of a target person into a plurality of continuous images frame by frame;

s102, detecting and segmenting target personnel on the basis of the trained image segmentation model on each frame of the disassembled image, and obtaining walking sequences of the same target personnel at different visual angles.

The walking sequence specifically refers to a contour sequence of a target person when walking, is used for representing the gesture and behavior characteristics of the human body when walking, and can be a series of continuous activities of the human body through hip, knee, ankle and toe.

In step S101, firstly, frame-by-Frame disassembling is performed on a video file recorded by a camera to obtain continuous multiple pictures, for example, the recording Frame rate fps of the camera is 50, that is, 20ms is one Frame, and a video file of one second can be disassembled into 50 continuous pictures, where the Frame rate is the frequency (rate) at which bitmap images in units of frames continuously appear on a display; if the camera records a video file with the duration of 10S for a target person, the video file can be disassembled into 500 continuous pictures according to frames;

then, in step S102, the contour image of the target person is sequentially segmented from each of the disassembled images by using the trained image segmentation model. The image segmentation model comprises a target detection network and an image segmentation network, wherein a large number of walking state videos of target persons can be recorded, each frame of image in the walking state videos is marked, for example, the area where the target person is located in each frame of image is marked in a rectangular frame mode, and the marked image is used as a sample for training the target detection network; and collecting a large number of human-shaped area images, marking the human body outline, and taking the marked images as samples of a training image segmentation network. The image segmentation model after deep learning can accurately detect target personnel from each frame of image and segment the outline of the target personnel, so that interference of environment background on gait information extraction is reduced.

In step S2, referring to fig. 4 of the specification, the step of extracting gait features from walking sequences of different perspectives of the target person includes the steps of:

s201, carrying out 3D modeling on walking sequences of the same target person at different view angles based on a nerve radiation field NeRF to obtain three-dimensional implicit expression of human parameters of the target person;

s202, extracting gait features by using a convolutional neural network based on three-dimensional implicit expression of human body parameters of the target person.

Namely, in the application, 3D modeling is firstly performed by utilizing a nerve radiation field technology to obtain refined human body parameters, and then gait characteristics are extracted by utilizing a neural network.

In step S201, the neural radiation field (Neural Radiance Field, neRF) is a novel view synthesis technology with implicit scene representation, and is attracting attention in the field of computer vision, where the neural radiation field (NeRF) model is a novel view synthesis method for implicit neural scene volume rendering by using multi-layer perceptrons (MLPs), and is widely applied in the fields of robots, urban maps, autonomous navigation, virtual reality/augmented reality, and the like. Specifically, referring to fig. 5 of the specification, in the present application, the input of the neural radiation field is a set of continuously captured images and a source pose, where the source pose refers to a transformation matrix converted from camera coordinates to world coordinates, and the related concepts are as follows:

first, the camera coordinate system [ X _c ,Y _c ,Z _c ] ^T And three-dimensional world coordinate system [ X, Y, Z ]] ^T The coordinate conversion of (2) is as follows:

wherein ,is an affine transformation matrix,/>Is rotation information->Is translation information;

second, coordinates [ x, y ] of the two-dimensional image] ^T And coordinates in camera coordinate system [ X ] _c ,Y _c ,Z _c ] ^T There is a conversion relationship as follows:

wherein the matrixIs an internal reference of the camera, including the focal length (f _x ,f _y ) And the coordinates (c) of the center point of the image _x ,c _y )。

Thereby obtaining spatial coordinates (x, y, z, θ, Φ) and using the spatial coordinates (x, y, z, θ, Φ) as inputs to the neural radiation field, the weights are optimized using the MLP network F (x, D) → (c, σ) to emit color from each input 5D coordinate to its corresponding volumetric density direction, the output of the neural radiation field being RGB and voxel density (r, g, b, σ). The loss of the color value and the corresponding input picture under the current gesture is predicted, and the model can be gradually converged by optimizing. Wherein, the network rendering process can be summarized as the following formula:

wherein the voxel density σ (x) reflects the density of the particles of the model somewhere in the ray, i.e. the density of particles on a specific three-dimensional coordinate; the color c (x) reflects the specific three-dimensional coordinate, and the cumulative amount T (x) of the color light reflected by the particles is an amount which continuously integrates the voxel density along with the increase of the path length of the light and gradually decreases along with the increase of the depth of the place where the light reaches, as seen from the direction of the light.

Then, comparing the rendered pixel value of the camera ray r with the corresponding real pixel value, and defining a loss function as follows:

in the three-dimensional implicit expression input identification network of human body obtained in the above-mentioned nerve radiation field optimization process, said identification network is formed from another convolutional neural network, and uses ReLU to make activation, and uses said network to extract gait characteristics so as to obtain the invented human body

In step S3, referring to fig. 6 of the specification, the step of comparing the gait feature with the base library to obtain an identification object satisfying the set comparison threshold includes the following steps:

s301, normalizing the gait characteristics, and respectively calculating the similarity between the gait characteristics and all walking sequences in a bottom library;

s302, determining recognition objects meeting a set threshold according to the calculated similarity, and arranging the determined recognition objects in the order from high similarity to low similarity.

That is, in the present application, objects are identified from the base by calculating the similarity. In step S301, before calculating the similarity between the gait feature and all the walking sequences in the base, the extracted gait feature is normalized to eliminate the dimensional influence between the data features. The base is composed of a plurality of candidate walking sequences determined by various gait data captured in a real scene. For example: a circle of cameras are built around a target site, different pedestrians walk back and forth in the target site, corresponding various visual angle walking state videos are collected, and then a plurality of candidate walking sequences corresponding to the various visual angle walking state videos are determined, so that a base is formed, namely enough visual angle data are obtained, and further accuracy of cross-angle identification is improved.

In this embodiment, objects are identified from the base by computing cosine similarity. Specifically, the average gait feature finally output in step S2 is used as a search targetThe expression, in the bottom pool, of a single walking sequence +.>C is the dimension of the gait feature, and the calculation formula is as follows:

in step S302, the cosine similarity between p and all sequences in the base is calculated sequentially by using the above formula, and then the query result satisfying a certain threshold is returned and arranged in order of high-to-low similarity. For example, if the threshold is set to 0.7, the structure with cosine similarity s >0.7 of all sequences in the bottom library is returned, and the sequences are sorted from high to low.

Therefore, according to the multi-view gait recognition method provided by the application, the cameras placed at different angles are used for recording the walking state video of the target person at the same time, so that the walking sequences of the target person at different view angles are obtained; extracting gait features from walking sequences of different perspectives of the target person; and comparing the gait characteristics with a base to obtain an identification object meeting a set threshold, so that a plurality of cameras record walking sequences of target personnel at the same time, and one reverse side can avoid judging a single angle sequence, thereby improving the accuracy of gait identification, and meeting the application scene requirements of the current monitoring equipment system deployment.

Based on the same inventive concept, the embodiment of the application further provides a multi-view gait recognition device, and because the principle of solving the problem of the device in the embodiment of the application is similar to that of the multi-view gait recognition method in the embodiment of the application, the implementation of the device can be referred to the implementation of the method, and the repetition is omitted.

As shown in fig. 7 of the specification, the present application further provides a multi-view gait recognition device, which includes:

the acquisition module 701 is used for recording walking state videos of a target person simultaneously by using cameras placed at different angles to obtain walking sequences of the target person at different angles;

a feature extraction module 702, configured to extract gait features from walking sequences of different perspectives of the target person;

and the recognition module 703 is configured to compare the gait feature with a base library, and obtain a recognition object that meets a set threshold.

In some embodiments, the acquisition module 701 correlates the target person included in the video recorded by the camera placed at different angles based on the recording time and the walking area, so as to obtain walking sequences of the same target person at different angles; wherein, the angle that the camera was put includes horizontal angle and pitch angle.

In some embodiments, the acquisition module 701 records the walking state video of the target person simultaneously by using cameras placed at different angles, so as to obtain a walking sequence of the target person at different angles, including:

and detecting and dividing the target personnel on the basis of the well-trained deep learning model for each frame of the disassembled image to obtain walking sequences of the same target personnel at different visual angles.

In some embodiments, the feature extraction module 702 extracts gait features from a walking sequence of different perspectives of the target person, comprising:

In some embodiments, the identifying module 703 compares the gait feature with a base library to obtain an identified object that meets a set comparison threshold, including:

Wherein, the recognition module 703 calculates the similarity by the following formula:

wherein ,representing gait characteristics of the target person->Representing a single walking sequence in the base.

According to the multi-view gait recognition device provided by the application, the walking state videos of the target person are recorded simultaneously by using the cameras placed at different angles through the acquisition module, so that the walking sequences of the target person at different view angles are obtained; extracting gait features from walking sequences of different visual angles of the target person through a feature extraction module; comparing the gait characteristics with a base through an identification module to obtain an identification object meeting a set threshold; that is, a plurality of cameras record walking sequences of target persons at the same time, and single angle sequences are prevented from being judged, so that gait recognition accuracy is improved.

Based on the same concept of the present application, fig. 8 of the present application shows a structure of an electronic device 800 according to an embodiment of the present application, where the electronic device 800 includes: at least one processor 801, at least one network interface 804 or other user interface 803, memory 805, at least one communication bus 802. Communication bus 802 is used to enable connected communication between these components. The electronic device 800 optionally includes a user interface 803 including a display (e.g., touch screen, LCD, CRT, holographic imaging (Holographic) or projection (Projector), etc.), keyboard or pointing device (e.g., mouse, trackball, touch pad or touch screen, etc.).

Memory 805 may include read only memory and random access memory and provide instructions and data to the processor 801. A portion of the memory 805 may also include non-volatile random access memory (NVRAM).

In some implementations, the memory 805 stores elements that may protect modules or data structures, or a subset thereof, or an extended set thereof:

an operating system 8051 containing various system programs for implementing various basic services and handling hardware-based tasks;

the application program module 8052 contains various application programs such as a desktop (desktop), a Media Player (Media Player), a Browser (Browser), and the like for implementing various application services.

In the embodiment of the present application, the processor 801 is configured to execute steps in a multi-view gait recognition method, for example, by calling a program or instructions stored in the memory 805, so as to improve the accuracy of gait recognition.

The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs steps as in a multi-view gait recognition method.

In particular, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, or the like, on which a computer program can be executed to perform the above-described multi-view gait recognition method.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the above examples are only specific embodiments of the present application for illustrating the technical solution of the present application, but not for limiting the scope of the present application, and although the present application has been described in detail with reference to the foregoing examples, it will be understood by those skilled in the art that the present application is not limited thereto: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A multi-view gait recognition method, the method comprising the steps of:

2. The multi-view gait recognition method according to claim 1, wherein the target person included in the video recorded by the cameras placed at different angles is associated based on the recording time and the walking area, so as to obtain walking sequences of the same target person at different view angles.

3. The multi-view gait recognition method according to claim 1, wherein the walking sequence of the target person at different view angles is obtained by:

4. A multi-view gait recognition method as claimed in claim 3, wherein said extracting gait features from the walking sequence of the target person from different views comprises the steps of:

5. The method according to claim 4, wherein comparing the gait feature with a base to obtain an identification object satisfying a set comparison threshold, comprises the steps of:

6. The multi-view gait recognition method of claim 5, wherein the similarity is calculated by the formula:

wherein ,representing gait characteristics of the target person->Representing the gesture features in the base.

7. The method of claim 6, wherein the camera placement angle comprises a horizontal angle and a pitch angle.

8. A multi-view gait recognition device, the device comprising:

9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the multi-view gait recognition method of any of claims 1 to 7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the multi-view gait recognition method as claimed in any one of claims 1 to 7.