CN113537121A

CN113537121A - Identity recognition method and device, storage medium and electronic equipment

Info

Publication number: CN113537121A
Application number: CN202110858664.1A
Authority: CN
Inventors: 潘华东; 桂青; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-10-22
Anticipated expiration: 2041-07-28
Also published as: CN113537121B

Abstract

The invention discloses an identity recognition method and device, a storage medium and electronic equipment. Wherein, include: respectively carrying out image processing on each initial image to obtain a first contour map sequence corresponding to each initial image; acquiring a first skeleton map sequence, a second skeleton map sequence and a second skeleton map sequence corresponding to the first contour map sequence; respectively inputting the sequences into a gait recognition model to obtain a first contour vector, a first skeleton vector, a second contour vector and a second skeleton vector; respectively calculating the similar distances between the multiple groups of reference image sequences and the corresponding first contour vector, first skeleton vector, second contour vector and second skeleton vector; and determining a target reference image sequence according to the similar distance, and determining the identity information corresponding to the target reference image sequence as the identity information of the target object. The invention solves the technical problem that the gait recognition accuracy is lower by using a single gait contour map, so that the identity recognition accuracy is lower.

Description

Identity recognition method and device, storage medium and electronic equipment

Technical Field

The invention relates to the field of computers, in particular to an identity recognition method and device, a storage medium and electronic equipment.

Background

At present, how to identify the identity of a person quickly and efficiently is an important problem to be solved in the information era. The traditional identity recognition method, such as identity recognition by using a marker, marker knowledge and the like, has the defects that the marker is easy to lose and leak, and the like, which cannot be overcome, and is difficult to meet the current requirement on identity recognition. In this context, biometric identification techniques have been developed. Gait recognition is a new biological feature recognition technology, can perform identity recognition by recognizing walking postures of people, and has the advantages of non-contact, long distance and the like.

Most of the current mainstream gait identification algorithms use a statistical model or a deep learning model to perform parameter learning on a gait contour map, obtain a feature extractor on the basis, and then perform feature matching by using the extracted gait features, thereby realizing identification of people.

Although the gait contour map can effectively remove the interference of the color information of the articles such as clothes, carried objects and the like on gait recognition, the difference between the contour maps of the same person before and after the conversion of the gait contour map is possibly larger than that of different persons due to the difference of the self contours of the clothes and the carried objects, which undoubtedly affects the accuracy of the gait recognition.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides an identity recognition method and device, a storage medium and electronic equipment, which are used for at least solving the technical problem that the gait recognition accuracy is low and further the identity recognition accuracy is low due to the fact that a single gait contour map is used.

According to an aspect of an embodiment of the present invention, there is provided an identity recognition method, including: respectively carrying out image processing on each initial image in an initial image sequence corresponding to a target video to obtain a first contour map sequence consisting of first contour maps corresponding to the initial images, wherein the initial images are video frames containing target objects in the target video; acquiring a first skeleton map sequence, a second skeleton map sequence and a second skeleton map sequence corresponding to the first contour map sequence, wherein the first skeleton map sequence comprises first skeleton maps respectively corresponding to the first contour maps, the second skeleton map sequence comprises second skeleton maps respectively corresponding to the first contour maps, and the second skeleton map sequence comprises second skeleton maps respectively corresponding to the first contour maps; inputting the first contour map sequence, the first skeleton map sequence, the second contour map sequence and the second skeleton map sequence into a gait recognition model respectively to obtain a first contour vector, a first skeleton vector, a second contour vector and a second skeleton vector, wherein the gait recognition model is a multilayer convolutional neural network model; respectively calculating the similar distance between each reference vector of each group of reference image sequences in the multiple groups of reference image sequences and the corresponding first contour vector, the corresponding first skeleton vector, the corresponding second contour vector and the corresponding second skeleton vector; and determining a target reference image sequence according to the similar distance, and determining the identity information corresponding to the target reference image sequence as the identity information of the target object.

According to another aspect of the embodiments of the present invention, there is also provided an identity recognition apparatus, including: the processing unit is used for respectively carrying out image processing on each initial image in an initial image sequence corresponding to a target video to obtain a first contour map sequence formed by first contour maps corresponding to the initial images, wherein the initial images are video frames containing target objects in the target video; an acquiring unit, configured to acquire a first skeleton diagram sequence, a second skeleton diagram sequence, and a second skeleton diagram sequence corresponding to the first skeleton diagram sequence, where the first skeleton diagram sequence includes first skeleton diagrams corresponding to the first skeleton diagrams, the second skeleton diagram sequence includes second skeleton diagrams corresponding to the first skeleton diagrams, and the second skeleton diagram sequence includes second skeleton diagrams corresponding to the first skeleton diagrams; an input unit, configured to input the first contour map sequence, the first skeleton map sequence, the second contour map sequence, and the second skeleton map sequence into a gait recognition model respectively to obtain a first contour vector, a first skeleton vector, a second contour vector, and a second skeleton vector, where the gait recognition model is a multi-layer convolutional neural network model; a calculating unit, configured to calculate similarity distances between each reference vector of each group of reference image sequences in the multiple groups of reference image sequences and the corresponding first contour vector, the corresponding first skeleton vector, the corresponding second contour vector, and the corresponding second skeleton vector; and the determining unit is used for determining a target reference image sequence according to the similar distance and determining the identity information corresponding to the target reference image sequence as the identity information of the target object.

According to a further aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned identification method when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the above-mentioned identification method through the computer program.

In the embodiment of the invention, a mode that a video frame containing a target object forms an initial image sequence, a first contour map sequence, a first skeleton map sequence, a second contour map sequence and a second skeleton map sequence corresponding to the initial image sequence are obtained, a gait recognition model is used for obtaining a first contour vector, a first skeleton vector, a second contour vector and a second skeleton vector corresponding to each image sequence, similarity distances between a reference vector corresponding to each group of reference image sequence and the first contour vector, the first skeleton vector, the second contour vector and the second skeleton vector are respectively calculated so as to determine a target reference image sequence is adopted, a plurality of groups of corresponding image sequences and a plurality of gait feature vectors are obtained by processing the initial image sequence, the similarity distances between each image vector and the reference vectors are respectively calculated so as to determine a target reference image sequence corresponding to the initial image sequence according to the plurality of groups of similarity distances, the gait feature vectors determined by integrating the multiple groups of image sequences are obtained to obtain more accurate gait matching results according to the similarity corresponding to the multiple groups of gait feature vectors, so that more accurate identity recognition information is obtained, the technical effect of improving the accuracy of identity recognition based on gait features is achieved, and the technical problem that the gait recognition accuracy is lower and the identity recognition accuracy is lower due to the fact that a single gait contour map is used is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of an application environment of an alternative identification method according to an embodiment of the invention;

FIG. 2 is a schematic flow chart diagram of an alternative method of identification according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a gait recognition model of an alternative identity recognition method according to an embodiment of the invention;

FIG. 4 is a schematic flow chart diagram illustrating an alternative method of identifying an identity according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart diagram of an alternative method of identification according to an embodiment of the present invention;

FIG. 6 is a flow chart illustrating an alternative method of identity recognition according to embodiments of the present invention;

FIG. 7 is a schematic flow chart diagram illustrating an alternative method of identifying an identity according to an embodiment of the present invention;

FIG. 8 is a schematic flow chart diagram illustrating an alternative method of identifying an identity according to an embodiment of the present invention;

FIG. 9 is a schematic flow chart diagram illustrating an alternative method of identifying an identity according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of an alternative identification device in accordance with embodiments of the present invention;

fig. 11 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiments of the present invention, there is provided an identification method, which may be applied to, but not limited to, the environment shown in fig. 1. The terminal device 102 performs data interaction with the server 122 through the network 110, the terminal device 102 sends the shot target video to the server 112 through the network 110, and the server 112 performs identity recognition on a target object in the target video based on the received target video and sends identity information to the terminal device 102.

The server 112 has a processing engine 116 running therein, and identification is not limited to by executing S102 to S110 in sequence. A first sequence of contour maps is obtained. And respectively carrying out image processing on each initial image in the initial image sequence corresponding to the target video to obtain a first contour map sequence formed by first contour maps corresponding to the initial images, wherein the initial images are video frames containing the target object in the target video. A first skeleton map sequence, a second contour map sequence, and a second skeleton map sequence are acquired. And acquiring a first skeleton map sequence, a second skeleton map sequence and a second skeleton map sequence corresponding to the first contour map sequence. The first skeleton map sequence comprises first skeleton maps respectively corresponding to the first contour maps, the second skeleton map sequence comprises second contour maps respectively corresponding to the first contour maps, and the second skeleton map sequence comprises second skeleton maps respectively corresponding to the first contour maps. A first contour vector, a first bone vector, a second contour vector and a second bone vector are obtained. And respectively inputting the first contour map sequence, the first bone map sequence, the second contour map sequence and the second bone map sequence into a gait recognition model to obtain a first contour vector, a first bone vector, a second contour vector and a second bone vector, wherein the gait recognition model is a multilayer convolutional neural network model. Calculating the similarity distance between each reference vector and the corresponding first contour vector, first bone vector, second contour vector and second bone vector. And respectively calculating the similarity distance between each reference vector of each group of reference image sequences in the plurality of groups of reference image sequences and the corresponding first contour vector, first bone vector, second contour vector and second bone vector. And determining a target reference image sequence and acquiring identity information. And determining a target reference image sequence according to the similar distance, and determining the identity information corresponding to the target reference image sequence as the identity information of the target object.

Optionally, in this embodiment, the terminal device 102 may be a terminal device configured with a target client, and may include but is not limited to at least one of the following: mobile phones (such as Android phones, IOS phones, etc.), notebook computers, tablet computers, palm computers, MID (Mobile Internet Devices), PAD, desktop computers, smart televisions, smart cameras, etc. The target client is a client having a video shooting function, and is not limited to a video client, an instant messaging client, a browser client, an education client, and the like. Such networks may include, but are not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communication. The server may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The above is merely an example, and this is not limited in this embodiment.

As an alternative implementation, as shown in fig. 2, the identity recognition method includes:

s202, each initial image in the initial image sequence corresponding to the target video is subjected to image processing, and a first contour map sequence formed by first contour maps corresponding to the initial images is obtained.

In step S202, the initial image is a video frame of the target video including the target object. The initial image is not limited to a video clip containing a target object as a target video, and the target video is segmented in units of frames.

Optionally, the initial image sequence is an image sequence including a plurality of video frames. The arrangement order of the initial images in the initial image sequence is a time order, and is a time order corresponding to the video frames. The initial image sequence combination can be restored to the target video.

Optionally, the first contour map is not limited to a background subtraction method, and the initial RGB image is converted into a gait contour map, in the gait contour map, black pixel points are used to identify the background, and white pixel points are used to identify the human gait contour.

S204, a first skeleton map sequence, a second skeleton map sequence and a second skeleton map sequence corresponding to the first contour map sequence are obtained.

In step S204, the first skeleton map sequence includes first skeleton maps corresponding to the first contour maps, the second skeleton map sequence includes second skeleton maps corresponding to the first contour maps, and the second skeleton map sequence includes second skeleton maps corresponding to the first contour maps.

Optionally, a first bone sequence is derived from the first contour sequence, a second contour sequence is derived based on the first contour sequence, and each second contour map in the second contour sequence is not limited to a partial image of each first contour map. A second bone sequence is derived based on the first bone sequence, each second bone map in the second bone map sequence not being limited to a partial image of each first bone map.

S206, inputting the first contour map sequence, the first skeleton map sequence, the second contour map sequence and the second skeleton map sequence into the gait recognition model respectively to obtain a first contour vector, a first skeleton vector, a second contour vector and a second skeleton vector.

In the above step S206, the gait recognition model is a multilayer convolutional neural network model. The network model of the gait recognition model is not limited to that shown in fig. 3, and includes a feature extraction section and a discrimination phase. The characteristic extraction part mainly comprises a plurality of convolution layers and is used for extracting gait characteristic information of the line target object from the image sequence. And the judging part comprises a full connection layer and a SoftMax layer, judges and classifies according to the extracted gait feature information to obtain a vector corresponding to the gait recognition, and outputs the vector.

Optionally, different model parameters are trained for different types of image sequences, and the different types of image sequences are processed by using different sets of model parameters to obtain an output gait feature vector.

And S208, respectively calculating the similar distances between each reference vector of each group of reference image sequences in the multiple groups of reference image sequences and the corresponding first contour vector, first skeleton vector, second contour vector and second skeleton vector.

Optionally, each group of reference image sequences corresponds to an image sequence of a reference object, the reference objects contained in the reference images contained in each group of reference image sequences are the same reference object,

alternatively, it is not limited to acquiring the reference vectors corresponding to the first contour vector, the first bone vector, the second contour vector and the second bone vector respectively for each group of the reference image sequence to calculate the similarity distance between each pair of vectors.

S210, determining a target reference image sequence according to the similar distance, and determining identity information corresponding to the target reference image sequence as the identity information of the target object.

Optionally, the reference image sequences are reordered according to the similarity distances corresponding to the first contour vector, the first bone vector, the second contour vector and the second bone vector, respectively, a target reference image sequence is determined from the multiple sets of reference image sequences based on the multiple reference image sequences, so as to determine that a target object corresponding to the initial image sequence and a target reference object corresponding to the target reference image are the same object, and thus, the identity information of the target reference object is determined as the identity information of the target object.

Optionally, the identity information of the reference object corresponding to the reference image sequence is known object information, and the target reference image sequence is determined by comparing a plurality of vectors of a plurality of groups of image sequences of the initial image sequence with the reference image sequence, so as to determine the identity information of the target object.

As an alternative implementation manner, in the case that the target reference image sequence is not determined from the first candidate image sequence, the second candidate image sequence, the third candidate image sequence, and the fourth candidate image sequence, it is prompted that the target object identification fails.

In the embodiment of the application, a mode that a video frame containing a target object forms an initial image sequence, a first contour map sequence, a first skeleton map sequence, a second contour map sequence and a second skeleton map sequence corresponding to the initial image sequence are obtained, a gait recognition model is used for obtaining a first contour vector, a first skeleton vector, a second contour vector and a second skeleton vector corresponding to each image sequence, similarity distances between a reference vector corresponding to each group of reference image sequence and the first contour vector, the first skeleton vector, the second contour vector and the second skeleton vector are respectively calculated so as to determine a target reference image sequence is adopted, a plurality of groups of corresponding image sequences and a plurality of gait feature vectors are obtained by processing the initial image sequence, the similarity distances between each image vector and the reference vectors are respectively calculated so as to determine a target reference image sequence corresponding to the initial image sequence according to the plurality of groups of similarity distances, the gait feature vectors determined by integrating the multiple groups of image sequences are obtained to obtain more accurate gait matching results according to the similarity corresponding to the multiple groups of gait feature vectors, so that more accurate identity recognition information is obtained, the technical effect of improving the accuracy of identity recognition based on gait features is achieved, and the technical problem that the gait recognition accuracy is lower and the identity recognition accuracy is lower due to the fact that a single gait contour map is used is solved.

As an alternative embodiment, as shown in fig. 4, the acquiring of the first skeleton diagram sequence, the second skeleton diagram sequence and the second skeleton diagram sequence corresponding to the first contour diagram sequence includes:

s402, extracting the joint position of the initial image by using a joint extraction model to obtain joint coordinates corresponding to the joint position;

s404, mapping the joint coordinates to the first contour map to obtain a first skeleton map;

s406, constructing a first skeleton map sequence corresponding to the initial image sequence by using the first skeleton map.

Alternatively, the joint extraction model is not limited to extracting the preset joint position in the initial image, and the extracted joint coordinates are mapped in the first contour map. For human subjects, the preset joint positions are not limited to include: left eye, right eye, nose tip, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, and right ankle. For a plurality of joint positions, extracting the joint extraction model is not limited to outputting the joint coordinate matrix. The joint extraction model is not limited to the use of pose estimation algorithms.

Alternatively, the mapping of the joint coordinates into the first contour map is not limited to the mapping of the joint coordinate matrix into the first contour matrix. The first contour matrix is an image matrix corresponding to the first contour map in terms of image size and characteristics, and the joint coordinates are mapped into the first contour matrix, so that the situation that the joint coordinates cannot be completely represented due to collision with pixel points on the first contour map in terms of image representation can be avoided.

Optionally, after mapping the joint coordinates into the first contour map, further comprising: and connecting the joint points represented by the joint coordinates according to the skeleton structure of the human body to obtain skeleton information. The image containing the bone information is used as a first bone map.

In the embodiment of the application, the first bone map containing the joint and bone information is obtained through extraction and mapping of the joint coordinates, so that the first bone vector with the joint and bone information is obtained, the identification mode for determining the identity information based on gait is enriched, and the identification accuracy is improved.

As an alternative embodiment, as shown in fig. 5, the acquiring of the first skeleton map sequence, the second skeleton map sequence and the second skeleton map sequence corresponding to the first contour map sequence includes:

s502, carrying out image segmentation processing on the first contour map to obtain a second contour map, and constructing a second contour map sequence corresponding to the initial image sequence by using the second contour map, wherein the second contour map comprises a partial image of the first contour map;

s504, image segmentation processing is carried out on the first bone image to obtain a second bone image, and a second bone image sequence corresponding to the initial image sequence is constructed by utilizing the second bone image, wherein the second bone image comprises a partial image of the first bone image.

Optionally, the first contour map is subjected to image segmentation processing to obtain a second contour map including the lower half part of the human body. And carrying out image segmentation processing on the first skeleton map to obtain a second skeleton map containing the lower half part of the human body. The lower half of the body is not limited to the portion from below the left and right hip joint points.

In the embodiment of the application, the first pair of contour maps and the first skeleton map are segmented to obtain the second contour map and the second skeleton map of the lower body part with more gait features, and the second contour map and the second skeleton map which can enrich the gait feature vector dimension are obtained in a simpler image processing mode, so that the number of vectors capable of calculating the gait feature similar distance is increased, and the accuracy of identity recognition through the gait features is improved by integrating a plurality of gait feature similar distances.

As an alternative implementation, as shown in fig. 6, before the calculating the similar distances between the respective reference vectors of each of the sets of reference image sequences and the corresponding first contour vector, first bone vector, second contour vector and second bone vector, respectively, the method further includes:

s602, acquiring a first reference contour image sequence, a first reference skeleton image sequence, a second reference contour image sequence and a second reference skeleton image sequence corresponding to each group of reference image sequences;

s604, inputting the first reference contour image sequence, the first reference bone image sequence, the second reference contour image sequence and the second reference bone image sequence into the gait recognition model respectively to obtain a corresponding first reference contour vector, a corresponding first reference bone vector, a corresponding second reference contour vector and a corresponding second reference bone vector.

Optionally, the same image processing as the initial image sequence is performed on each group of reference image sequences to obtain a first reference contour image sequence, a first reference bone image sequence, a second reference contour image sequence, and a second reference bone image sequence, respectively.

Optionally, the gait recognition model is used to process the first reference contour image sequence by using the same model parameters as the first contour image sequence, so as to obtain a first reference contour vector. And processing the first reference bone image sequence by using the same model parameters of the first bone image sequence and a gait recognition model to obtain a first reference bone vector. And processing the second reference contour image sequence by using the same model parameters of the second contour image sequence and the gait recognition model to obtain a second reference contour vector. And processing the second reference bone image sequence by using the same model parameters of the second bone image sequence and the gait recognition model to obtain a second reference bone vector.

In the embodiment of the application, the reference image sequence is subjected to image processing in the same mode, and a plurality of reference gait vectors corresponding to the reference image sequence are obtained by using the same model parameters through a gait recognition model, so that the reference image sequence and the initial image sequence respectively obtain a plurality of corresponding pairs of gait feature vectors by using the same processing mode and the same parameters, the same obtaining mode of the vectors for calculating the gait similar distance is ensured, and the calculation accuracy of the gait similar distance is improved.

As an alternative embodiment, as shown in fig. 7, the calculating the similarity distance between each reference vector of each of the sets of reference image sequences and the corresponding first contour vector, first bone vector, second contour vector and second bone vector includes:

s702, calculating the cosine distance between the first contour vector and the first reference contour vector to obtain a first candidate similar distance;

s704, calculating the cosine distance between the first skeleton vector and the first reference skeleton vector to obtain a second candidate similar distance;

s706, calculating the cosine distance between the second contour vector and the second reference contour vector to obtain a third similarity distance;

s708, calculating a cosine distance between the second bone vector and the second reference bone vector to obtain a fourth similar distance.

Optionally, a cosine vector similarity between the two vectors is calculated, and the cosine distance is determined based on the cosine vector similarity. The calculation of the cosine vector similarity using two vectors as the example vector a and the example vector B is not limited to the following formula (1):

The calculation of the cosine distances of vector a and vector B is not limited to that shown in the following equation (2):

D[AB]＝1-similarity[AB] (2)

wherein D [ AB ] is used for representing the cosine distance between the vector A and the vector B, and similarity [ AB ] is used for representing the cosine vector similarity between the vector A and the vector B.

In the embodiment of the application, the gait similar distance between the gait feature vectors is determined by calculating the cosine similarity, the gait similar distance between four pairs of gait feature vectors is sequentially calculated, and the target reference image sequence is determined from the multiple groups of reference image sequences by the gait similar distance of each group of reference image sequences in the multiple groups of reference image sequences so as to acquire the identity information of the target object in the initial image sequence.

As an alternative embodiment, as shown in fig. 8, after the first skeleton map sequence, the second skeleton map sequence and the second skeleton map sequence corresponding to the first contour map sequence are obtained, the method further includes:

s802, extracting an initial central axis deflection angle value from the first skeleton image, and constructing an initial deflection angle vector corresponding to the initial image sequence by using the initial central axis deflection angle value;

s804, extracting a reference mean axis deflection angle value from the first reference skeleton image, and constructing a reference deflection angle vector corresponding to the reference image sequence by using the reference mean axis deflection angle value;

s806, calculating the cosine distance between the initial deflection angle vector and the reference deflection angle vector to obtain the corrected distance.

Optionally, the extraction of the initial decentration angle value is not limited to include:

s1, connecting the left shoulder joint point and the right shoulder joint point, and taking the midpoint of a connecting line segment of the left shoulder joint point and the right shoulder joint point as P1;

s2, connecting the left hip joint point and the right hip joint point, and taking the midpoint of a connecting line segment of the left hip joint point and the right hip joint point as P2;

s3, connecting P1 and P2, wherein the line segment P1P2 is the central axis of the human body object;

and S4, acquiring an included angle between the central axis and the vertical direction of the image as an initial central axis deflection angle.

Optionally, according to the above steps, an initial central axis deflection angle value of each first bone image is determined sequentially from the first bone images, and a plurality of initial central axis deflection angle values are constructed as an initial deflection angle vector according to an arrangement order of the first bone images in the first bone image sequence. By the same method, a reference declination vector of a reference image sequence is obtained.

Optionally, a cosine similarity between the initial declination vector and the reference declination vector is calculated, thereby obtaining the corrected distance.

In the embodiment of the application, the connecting line of the middle points of the left shoulder joint line segment and the left hip joint line segment and the middle point of the right hip joint line segment is used as the central axis, so that the central axis deflection angle vector is obtained, the deflection angle distance is calculated and used as the correction distance, the obtained gait similar distance is corrected, and the final gait similar distance is more accurate.

As an alternative embodiment, the calculating the similarity distance between each reference vector of each of the reference image sequences in each of the multiple sets of reference image sequences and the corresponding first contour vector, first bone vector, second contour vector, and second bone vector includes: under the condition of obtaining the first candidate similar distance, correcting the first candidate similar distance by using the corrected distance to obtain the first similar distance; and under the condition of obtaining the second candidate similar distance, correcting the second candidate similar distance by using the corrected distance to obtain the second similar distance.

Alternatively, the gait similar distance is corrected by the correction distance, and the correction is not limited to the correction of the first candidate similar distance and the second candidate similar distance by the correction distance. The specific modification method is not limited to using the product of the modified distance and the first candidate similar distance as the first similar distance, and using the product of the modified distance and the second candidate similar distance as the second similar distance.

In the embodiment of the application, the gait similar distance calculated by including the complete gait feature is corrected by using the correction distance, so that the obtained first similar distance and the second similar distance more accurately represent the gait similar distance between the complete contour map sequences and the gait similar distance between the complete skeleton map sequences.

As an alternative implementation, as shown in fig. 9, the determining the target reference image sequence according to the similarity distance includes:

s902, sorting the reference image sequences in the multiple groups of reference image sequences according to the sequence of the numerical values of the first similar distances from small to large to obtain a first candidate image sequence;

s904, sequencing the reference image sequences in the multiple groups of reference image sequences according to the sequence of the numerical values of the second similar distances from small to large to obtain a second candidate image sequence;

s906, sequencing the reference image sequences in the multiple groups of reference image sequences according to the sequence of the numerical values of the third similarity distance from small to large to obtain a third candidate image sequence;

s908, according to the sequence of the numerical values of the fourth similar distance from small to large, the reference image sequences in the multiple groups of reference image sequences are sequenced to obtain a fourth candidate image sequence;

s910, determining a target reference image sequence according to the first candidate image sequence, the second candidate image sequence, the third candidate image sequence and the fourth candidate image sequence.

Optionally, the first candidate image sequence, the second candidate image sequence, the third candidate image sequence and the fourth candidate image sequence are image sequences formed according to different sorting criteria for a plurality of groups of reference image sequences, where the position of each group of reference images in the plurality of groups of reference image sequences is adjusted, and the position of each reference image in each group of reference image sequences is not adjusted or changed.

As an alternative implementation, determining the target reference image sequence according to the first candidate image sequence, the second candidate image sequence, the third candidate image sequence and the fourth candidate image sequence includes:

s1, sequentially acquiring candidate reference image sequences positioned on the current sequence position in the first candidate image sequence, the second candidate image sequence, the third candidate image sequence and the fourth candidate image sequence;

s2, taking the candidate reference image sequence as the target reference image sequence under the condition that the candidate reference image sequence is the same reference image sequence;

and S3, acquiring a candidate reference image sequence corresponding to the next ordinal position when the candidate reference image sequence is different from the reference image sequence.

Optionally, starting from the first order, it is determined whether the candidate reference image sequences located in the first order among the first candidate image sequence, the second candidate image sequence, the third candidate image sequence, and the fourth candidate image sequence are the same reference image sequence. And taking the reference image sequence of the candidate reference image sequence positioned at the first order as a target reference image sequence under the condition that the four candidate reference image sequences positioned at the first order are the same reference image sequence. And under the condition that the four candidate reference image sequences positioned at the first ordinal are not the same reference image sequence, judging the second ordinal, and sequentially performing according to the ordinal until the target reference image sequences, which are the same as the four candidate reference image sequences positioned at the same ordinal, are obtained.

In the embodiment of the application, the multiple groups of reference image sequences are reordered according to the similarity distances calculated in the four ways, the obtained four candidate reference image sequences are sequentially judged according to the sequence positions, and under the condition that the four candidate reference image sequences located on the same sequence position are the same reference image sequence, the four candidate reference image sequences are determined as the target reference image sequence, so that the target reference image sequence matched with the initial image sequence is obtained. By comparing the four candidate reference image sequences with each other, the gait contour map and the joint skeleton map as well as all gait features and partial gait features are synthesized, and the accuracy of identity recognition based on the gait features is improved by four different gait similar distances from two different dimensions.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiment of the invention, an identity recognition device for implementing the identity recognition method is also provided. As shown in fig. 10, the apparatus includes:

the processing unit 1002 is configured to perform image processing on each initial image in an initial image sequence corresponding to a target video, to obtain a first contour map sequence formed by first contour maps corresponding to the initial images, where the initial images are video frames containing a target object in the target video;

an obtaining unit 1004, configured to obtain a first skeleton diagram sequence, a second skeleton diagram sequence, and a second skeleton diagram sequence corresponding to the first skeleton diagram sequence, where the first skeleton diagram sequence includes first skeleton diagrams corresponding to the first skeleton diagrams, respectively, the second skeleton diagram sequence includes second skeleton diagrams corresponding to the first skeleton diagrams, respectively, and the second skeleton diagram sequence includes second skeleton diagrams corresponding to the first skeleton diagrams, respectively;

the input unit 1006 is configured to input the first contour map sequence, the first skeleton map sequence, the second contour map sequence, and the second skeleton map sequence into a gait recognition model respectively to obtain a first contour vector, a first skeleton vector, a second contour vector, and a second skeleton vector, where the gait recognition model is a multilayer convolutional neural network model;

a calculating unit 1008, configured to calculate similar distances between each reference vector of each group of reference image sequences in the multiple groups of reference image sequences and the corresponding first contour vector, first skeleton vector, second contour vector, and second skeleton vector, respectively;

the determining unit 1010 is configured to determine a target reference image sequence according to the similar distance, and determine identity information corresponding to the target reference image sequence as identity information of the target object.

Optionally, the obtaining unit 1004 further includes a first constructing module, configured to perform joint position extraction on the initial image by using a joint extraction model, so as to obtain joint coordinates corresponding to the joint position; mapping the joint coordinates to the first contour map to obtain a first skeleton map; using the first bone map, a first bone map sequence corresponding to the initial image sequence is constructed.

Optionally, the obtaining unit 1004 further includes:

the second construction module is used for carrying out image segmentation processing on the first contour map to obtain a second contour map, and constructing a second contour map sequence corresponding to the initial image sequence by using the second contour map, wherein the second contour map comprises a partial image of the first contour map;

and the third construction module is used for carrying out image segmentation processing on the first bone image to obtain a second bone image, and constructing a second bone image sequence corresponding to the initial image sequence by using the second bone image, wherein the second bone image comprises a partial image of the first bone image.

Optionally, the identity recognition apparatus further includes a treasure house reference obtaining unit, configured to obtain a first reference contour image sequence, a first reference skeleton image sequence, a second reference contour image sequence, and a second reference skeleton image sequence corresponding to each group of reference image sequences before calculating similarity distances between each reference vector of each group of reference image sequences in the plurality of groups of reference image sequences and the corresponding first contour vector, first skeleton vector, second contour vector, and second skeleton vector, respectively; and respectively inputting the first reference contour image sequence, the first reference bone image sequence, the second reference contour image sequence and the second reference bone image sequence into the gait recognition model to obtain a corresponding first reference contour vector, a corresponding first reference bone vector, a corresponding second reference contour vector and a corresponding second reference bone vector.

Optionally, the calculating unit 1008 is further configured to calculate a cosine distance between the first contour vector and the first reference contour vector, so as to obtain a first candidate similar distance; calculating the cosine distance between the first skeleton vector and the first reference skeleton vector to obtain a second candidate similar distance; calculating the cosine distance between the second contour vector and the second reference contour vector to obtain a third similarity distance; and calculating the cosine distance between the second skeleton vector and the second reference skeleton vector to obtain a fourth similar distance.

Optionally, the identification apparatus further includes a correction unit, configured to extract an initial off-axis angle value from the first bone image after acquiring the first bone map sequence, the second contour map sequence, and the second bone map sequence corresponding to the first contour map sequence, and construct an initial off-axis angle vector corresponding to the initial image sequence by using the initial off-axis angle value; extracting a reference mean axis deflection angle value from the first reference skeleton image, and constructing a reference deflection angle vector corresponding to the reference image sequence by using the reference mean axis deflection angle value; and calculating the cosine distance between the initial deflection angle vector and the reference deflection angle vector to obtain the corrected distance.

Optionally, the calculating unit 1008 is further configured to, in a case that the first candidate similar distance is obtained, correct the first candidate similar distance by using the corrected distance, so as to obtain the first similar distance; and under the condition of obtaining the second candidate similar distance, correcting the second candidate similar distance by using the corrected distance to obtain the second similar distance.

Optionally, the determining unit 1010 is further configured to sort the reference image sequences in the multiple groups of reference image sequences according to a descending order of the numerical values of the first similar distances, so as to obtain a first candidate image sequence; sequencing the reference image sequences in the multiple groups of reference image sequences according to the sequence of the numerical values of the second similar distances from small to large to obtain a second candidate image sequence; sequencing the reference image sequences in the multiple groups of reference image sequences according to the sequence of the numerical values of the third similarity distance from small to large to obtain a third candidate image sequence; sequencing the reference image sequences in the multiple groups of reference image sequences according to the sequence of the numerical values of the fourth similar distance from small to large to obtain a fourth candidate image sequence; and determining a target reference image sequence according to the first candidate image sequence, the second candidate image sequence, the third candidate image sequence and the fourth candidate image sequence.

Optionally, the determining unit 1010 is further configured to sequentially obtain a candidate reference image sequence located at a current ordinal position in a first candidate image sequence, a second candidate image sequence, a third candidate image sequence, and a fourth candidate image sequence; taking the candidate reference image sequence as a target reference image sequence under the condition that the candidate reference image sequence is the same reference image sequence; and under the condition that the candidate reference image sequences are different reference image sequences, acquiring the candidate reference image sequence corresponding to the next ordinal.

Optionally, the identification apparatus further includes a prompting unit, configured to prompt that identification of the target object fails when the target reference image sequence is not determined from the first candidate image sequence, the second candidate image sequence, the third candidate image sequence, and the fourth candidate image sequence.

According to another aspect of the embodiment of the present invention, there is also provided an electronic device for implementing the above identity recognition method, where the electronic device may be the terminal device or the server shown in fig. 1. The present embodiment takes the electronic device as a server as an example for explanation. As shown in fig. 11, the electronic device comprises a memory 1102 and a processor 1104, wherein the memory 1102 stores a computer program and the processor 1104 is arranged to execute the steps of any of the above method embodiments by means of the computer program.

Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, performing image processing on each initial image in the initial image sequence corresponding to the target video to obtain a first contour map sequence formed by first contour maps corresponding to the initial images, wherein the initial images are video frames containing the target object in the target video;

s2, acquiring a first skeleton map sequence, a second skeleton map sequence and a second skeleton map sequence corresponding to the first contour map sequence, wherein the first skeleton map sequence comprises first skeleton maps respectively corresponding to the first contour maps, the second skeleton map sequence comprises second contour maps respectively corresponding to the first contour maps, and the second skeleton map sequence comprises second skeleton maps respectively corresponding to the first contour maps;

s3, inputting the first contour map sequence, the first skeleton map sequence, the second contour map sequence and the second skeleton map sequence into a gait recognition model respectively to obtain a first contour vector, a first skeleton vector, a second contour vector and a second skeleton vector, wherein the gait recognition model is a multilayer convolutional neural network model;

s4, respectively calculating the similarity distance between each reference vector of each group of reference image sequence in the multiple groups of reference image sequences and the corresponding first contour vector, first skeleton vector, second contour vector and second skeleton vector;

and S5, determining the target reference image sequence according to the similar distance, and determining the identity information corresponding to the target reference image sequence as the identity information of the target object.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 11 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 11 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 11, or have a different configuration than shown in FIG. 11.

The memory 1102 may be used to store software programs and modules, such as program instructions/modules corresponding to the identification method and apparatus in the embodiments of the present invention, and the processor 1104 executes various functional applications and data processing by operating the software programs and modules stored in the memory 1102, that is, implementing the identification method described above. The memory 1102 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1102 can further include memory located remotely from the processor 1104 and such remote memory can be coupled to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1102 may be specifically, but not limited to, used for storing information such as an initial image sequence, a reference image sequence, a gait recognition model, and the like. As an example, as shown in fig. 11, the memory 1102 may include, but is not limited to, a processing unit 1002, an obtaining unit 1004, an input unit 1006, a calculating unit 1008, and a determining unit 1010 of the identification device. In addition, other module units in the above identity recognition apparatus may also be included, but are not limited to these, and are not described in detail in this example.

Optionally, the transmitting device 1106 is used for receiving or transmitting data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1106 includes a Network adapter (NIC) that can be connected to a router via a Network cable to communicate with the internet or a local area Network. In one example, the transmission device 1106 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In addition, the electronic device further includes: a display 1108 for displaying the initial image sequence; and a connection bus 1110 for connecting the respective module components in the above-described electronic apparatus.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations of the identification aspect described above. Wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An identity recognition method, comprising:

respectively carrying out image processing on each initial image in an initial image sequence corresponding to a target video to obtain a first contour map sequence formed by first contour maps corresponding to the initial images, wherein the initial images are video frames containing target objects in the target video;

acquiring a first skeleton map sequence, a second skeleton map sequence and a second skeleton map sequence corresponding to the first contour map sequence, wherein the first skeleton map sequence comprises first skeleton maps respectively corresponding to the first contour maps, the second skeleton map sequence comprises second skeleton maps respectively corresponding to the first contour maps, and the second skeleton map sequence comprises second skeleton maps respectively corresponding to the first contour maps;

inputting the first contour map sequence, the first bone map sequence, the second contour map sequence and the second bone map sequence into a gait recognition model respectively to obtain a first contour vector, a first bone vector, a second contour vector and a second bone vector, wherein the gait recognition model is a multilayer convolutional neural network model;

respectively calculating the similar distance between each reference vector of each group of reference image sequences in the multiple groups of reference image sequences and the corresponding first contour vector, the corresponding first bone vector, the corresponding second contour vector and the corresponding second bone vector;

and determining a target reference image sequence according to the similar distance, and determining identity information corresponding to the target reference image sequence as the identity information of the target object.

2. The method of claim 1, wherein said obtaining a first bone map sequence, a second contour map sequence, and a second bone map sequence corresponding to said first contour map sequence comprises:

extracting the joint position of the initial image by using a joint extraction model to obtain joint coordinates corresponding to the joint position;

mapping the joint coordinates into the first contour map to obtain the first skeleton map;

and constructing the first bone map sequence corresponding to the initial image sequence by using the first bone map.

3. The method of claim 2, wherein said obtaining a first bone map sequence, a second contour map sequence, and a second bone map sequence corresponding to said first contour map sequence comprises:

performing image segmentation processing on the first contour map to obtain a second contour map, and constructing a second contour map sequence corresponding to the initial image sequence by using the second contour map, wherein the second contour map comprises a partial image of the first contour map;

and performing image segmentation processing on the first bone image to obtain a second bone image, and constructing a second bone image sequence corresponding to the initial image sequence by using the second bone image, wherein the second bone image comprises a partial image of the first bone image.

4. The method of claim 1, wherein prior to separately calculating the similarity distance between each respective reference vector of each of the sets of reference image sequences and the corresponding first contour vector, the first bone vector, the second contour vector, and the second bone vector, the method further comprises:

acquiring a first reference contour image sequence, a first reference skeleton image sequence, a second reference contour image sequence and a second reference skeleton image sequence corresponding to each group of reference image sequences;

and respectively inputting the first reference contour image sequence, the first reference bone image sequence, the second reference contour image sequence and the second reference bone image sequence into the gait recognition model to obtain a corresponding first reference contour vector, a corresponding first reference bone vector, a corresponding second reference contour vector and a corresponding second reference bone vector.

5. The method of claim 4, wherein the calculating the similarity distance between each reference vector of each of the sets of reference image sequences and the corresponding first contour vector, the first bone vector, the second contour vector, and the second bone vector comprises:

calculating the cosine distance between the first contour vector and the first reference contour vector to obtain a first candidate similar distance;

calculating the cosine distance between the first skeleton vector and the first reference skeleton vector to obtain a second candidate similar distance;

calculating the cosine distance between the second contour vector and the second reference contour vector to obtain a third similarity distance;

and calculating the cosine distance between the second skeleton vector and the second reference skeleton vector to obtain a fourth similar distance.

6. The method of claim 5, wherein after acquiring a first bone map sequence, a second contour map sequence, and a second bone map sequence corresponding to the first contour map sequence, the method further comprises:

extracting an initial central axis deflection angle value from the first skeleton image, and constructing an initial deflection angle vector corresponding to the initial image sequence by using the initial central axis deflection angle value;

extracting a reference mean axis declination angle value from the first reference skeleton image, and constructing a reference declination angle vector corresponding to the reference image sequence by using the reference mean axis declination angle value;

and calculating the cosine distance between the initial deflection angle vector and the reference deflection angle vector to obtain a corrected distance.

7. The method of claim 6, wherein said separately calculating similarity distances between each respective reference vector of each respective set of reference image sequences of the plurality of sets of reference image sequences and the corresponding first contour vector, the first bone vector, the second contour vector, and the second bone vector comprises:

under the condition of obtaining the first candidate similar distance, correcting the first candidate similar distance by using the corrected distance to obtain a first similar distance;

and under the condition of obtaining the second candidate similar distance, correcting the second candidate similar distance by using the corrected distance to obtain a second similar distance.

8. The method of claim 7, wherein the determining the target reference image sequence according to the similarity distance comprises:

sequencing the reference image sequences in the multiple groups of reference image sequences according to the sequence of the numerical values of the first similar distances from small to large to obtain a first candidate image sequence;

sequencing the reference image sequences in the multiple groups of reference image sequences according to the sequence of the numerical values of the second similar distances from small to large to obtain a second candidate image sequence;

sequencing the reference image sequences in the multiple groups of reference image sequences according to the sequence of the numerical values of the third similarity distance from small to large to obtain a third candidate image sequence;

sequencing the reference image sequences in the multiple groups of reference image sequences according to the sequence of the numerical values of the fourth similar distance from small to large to obtain a fourth candidate image sequence;

and determining a target reference image sequence according to the first candidate image sequence, the second candidate image sequence, the third candidate image sequence and the fourth candidate image sequence.

9. The method of claim 8, wherein determining a target reference image sequence from the first candidate image sequence, the second candidate image sequence, the third candidate image sequence, and the fourth candidate image sequence comprises:

sequentially acquiring candidate reference image sequences positioned on a current sequence position in the first candidate image sequence, the second candidate image sequence, the third candidate image sequence and the fourth candidate image sequence;

taking the candidate reference image sequence as the target reference image sequence when the candidate reference image sequence is the same reference image sequence;

and under the condition that the candidate reference image sequences are different reference image sequences, acquiring the candidate reference image sequence corresponding to the next ordinal.

10. The method according to any one of claims 1 to 9, wherein:

prompting that the identification of the target object fails if the target reference image sequence is not determined from the first candidate image sequence, the second candidate image sequence, the third candidate image sequence, and the fourth candidate image sequence.

11. An identification device, comprising:

the processing unit is used for respectively carrying out image processing on each initial image in an initial image sequence corresponding to a target video to obtain a first contour map sequence formed by first contour maps corresponding to the initial images, wherein the initial images are video frames containing target objects in the target video;

an obtaining unit, configured to obtain a first skeleton map sequence, a second skeleton map sequence, and a second skeleton map sequence corresponding to the first skeleton map sequence, where the first skeleton map sequence includes first skeleton maps corresponding to the first contour maps, the second skeleton map sequence includes second skeleton maps corresponding to the first contour maps, and the second skeleton map sequence includes second skeleton maps corresponding to the first contour maps;

the input unit is used for respectively inputting the first contour map sequence, the first bone map sequence, the second contour map sequence and the second bone map sequence into a gait recognition model so as to obtain a first contour vector, a first bone vector, a second contour vector and a second bone vector, wherein the gait recognition model is a multilayer convolutional neural network model;

a calculating unit, configured to calculate similar distances between each reference vector of each group of reference image sequences in the multiple groups of reference image sequences and the corresponding first contour vector, the corresponding first bone vector, the corresponding second contour vector, and the corresponding second bone vector;

and the determining unit is used for determining a target reference image sequence according to the similar distance and determining the identity information corresponding to the target reference image sequence as the identity information of the target object.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program which when executed performs the method of any of claims 1 to 10.

13. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 10 by means of the computer program.