CN113420697B - Reloading video pedestrian re-identification method and system based on appearance and shape characteristics - Google Patents

Reloading video pedestrian re-identification method and system based on appearance and shape characteristics Download PDF

Info

Publication number
CN113420697B
CN113420697B CN202110748180.1A CN202110748180A CN113420697B CN 113420697 B CN113420697 B CN 113420697B CN 202110748180 A CN202110748180 A CN 202110748180A CN 113420697 B CN113420697 B CN 113420697B
Authority
CN
China
Prior art keywords
pedestrian
shape
video
features
apparent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110748180.1A
Other languages
Chinese (zh)
Other versions
CN113420697A (en
Inventor
王亮
黄岩
单彩峰
韩苛
王海滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cas Artificial Intelligence Research Qingdao Co ltd
Original Assignee
Cas Artificial Intelligence Research Qingdao Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cas Artificial Intelligence Research Qingdao Co ltd filed Critical Cas Artificial Intelligence Research Qingdao Co ltd
Priority to CN202110748180.1A priority Critical patent/CN113420697B/en
Publication of CN113420697A publication Critical patent/CN113420697A/en
Application granted granted Critical
Publication of CN113420697B publication Critical patent/CN113420697B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The method comprises the steps that an obtained inquiry video to be identified and a search library video are input into a trained deep neural network model together, and pedestrian features are extracted; calculating Euclidean distances between the characteristics of the query video and the characteristics of the search library video, and sequencing identity matching according to the Euclidean distances; the feature fusion module of the deep neural network model can adaptively fuse the apparent features and the shape features into pedestrian features with stronger discrimination power so as to generalize the pedestrian features to different reloading degrees and achieve a better recognition effect.

Description

Reloading video pedestrian re-identification method and system based on appearance and shape characteristics
Technical Field
The disclosure belongs to the technical field of machine learning, and particularly relates to a reloading video pedestrian re-identification method and system based on appearance and shape features.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
The video pedestrian re-identification aims at providing a pedestrian inquiry video (query) under a certain camera, and matching the video of the same pedestrian in a video library (galery) of another camera; with the increasingly wide distribution of the cameras of the streets, the re-identification of the pedestrians shows a wide application prospect in the aspects of security and protection and the like; for example, scenes such as finding lost children and tracking suspects can be assisted by a pedestrian re-identification technology.
The reloading video pedestrian re-identification refers to the situation that the same pedestrian wears different clothes in videos of a query video (query) and a video library (challenge); many previous methods discard the apparent information of pedestrians (colors, styles and the like of clothes and shoes), and only use shape information (human body contours and the like) to identify pedestrians; these methods are only applicable to severe changes, such as one wearing a red dress to change it to a white shirt and black pants; however, in practical application scenarios, it is very likely that a person may only slightly change clothes, for example, only changing gray short sleeves into black short sleeves, and in this case, the apparent information of pedestrians may also provide some useful identity information.
Disclosure of Invention
The invention can adaptively learn the appearance of the coarse granularity and the shape information of the fine granularity of the pedestrian, perform adaptive feature fusion and construct richer pedestrian features so as to achieve better pedestrian recognition effect.
In order to achieve the above purpose, the present disclosure is achieved by the following technical solutions:
in a first aspect, the present disclosure provides a method for re-identifying a pedestrian in a change-over video based on appearance and shape features, including:
establishing a deep neural network model;
based on the deep neural network model, extracting pedestrian apparent characteristics and pedestrian shape characteristics of the pedestrian video in the training set;
fusing the pedestrian appearance characteristic and the pedestrian shape characteristic to obtain a final pedestrian characteristic;
carrying out deep neural network model training by adopting the pedestrian appearance characteristic, the pedestrian shape characteristic and the fused final pedestrian characteristic to obtain a trained model;
inputting the inquiry video and the search library video into the trained model to extract the pedestrian characteristics;
and calculating Euclidean distances between the pedestrian features of the inquiry video and the pedestrian features of the search library video, and sequencing identity matching according to the Euclidean distances.
Further, the deep neural network model comprises an apparent encoder for extracting the apparent features of the pedestrian video coarse granularity, a shape encoder for extracting the shape features of the pedestrian video fine granularity and a feature fusion module for adaptively fusing the apparent features and the shape features into final pedestrian features.
Further, the extracting the pedestrian appearance feature and the pedestrian shape feature of the pedestrian video in the training set comprises:
inputting the pedestrian video into an apparent encoder, and extracting the pedestrian apparent features frame by frame to obtain an apparent feature map; carrying out average pooling on the apparent feature maps corresponding to each frame, and aggregating the apparent feature maps into an apparent feature map of the video; then carrying out global average pooling on the apparent feature map of the video to obtain an apparent feature vector of the video; defining an apparent loss function by using the identity tag data in the data set;
simultaneously carrying out human body segmentation on the pedestrian video to obtain a binary pedestrian segmentation video; inputting the human segmentation video into a shape encoder, and extracting the shape features of the pedestrians frame by frame to obtain a shape feature map; performing maximum pooling on the shape characteristic graph corresponding to each frame, aggregating the shape characteristic graphs into a shape characteristic graph of the pedestrian, and uniformly and horizontally dividing the shape characteristic graph into a plurality of sub-characteristic graphs; performing maximum pooling in each sub-feature map, wherein the pooled feature vectors pass through a plurality of full-connection layers respectively to obtain a plurality of feature vectors which respectively represent a plurality of horizontal areas of the pedestrian; connecting a plurality of characteristic vectors to form a final characteristic vector representing the shape of the pedestrian; a shape loss function is defined using the identity tag data in the dataset.
Further, fusing the pedestrian appearance feature and the pedestrian shape feature includes:
respectively normalizing the feature vectors of the pedestrian apparent feature and the pedestrian shape feature;
performing weight prediction, connecting the normalized feature vectors, and generating two weight vectors through two convolution layers with convolution kernel size of 1 × 2 respectively; obtaining an apparent weight vector and a shape weight vector after passing through a Softmax function;
performing characteristic conversion, respectively inputting the normalized characteristic vectors into two convolution layers with convolution kernel size of 1 multiplied by 1, and obtaining converted apparent characteristic vectors and shape characteristic vectors through a Sigmoid function;
the feature vector that ultimately represents the pedestrian is a weighted sum of the apparent and shape feature vectors.
Further, the deep neural network model training process is two stages, including:
in the stage I, an appearance encoder and a shape encoder are optimized only through an appearance loss function and a shape loss function, and a coarse-grained appearance feature and a fine-grained shape feature with discriminant power are learned respectively.
In stage II, the entire network is trained jointly by a weighted sum of the apparent loss function, the shape loss function and the fusion loss function.
Further, inputting a tested inquiry video and a search library video into the model together to extract features; in the feature space, the Euclidean distance between the features of the query video and the search library video is calculated to measure the similarity.
Further, the search-pool videos with higher similarity are positioned more forward in the identity matching result.
In a second aspect, the present disclosure further provides a reloading video pedestrian re-identification system based on appearance and shape features, which includes a model establishing module, a feature extracting module, a model optimizing module and a testing module;
the model building module configured to: establishing a deep neural network model;
the feature extraction module configured to: based on a deep neural network model, extracting pedestrian appearance characteristics and pedestrian shape characteristics of pedestrian videos in a training set; fusing the pedestrian appearance characteristic and the pedestrian shape characteristic to obtain a final pedestrian characteristic;
the model optimization module configured to: carrying out deep neural network model training by adopting the pedestrian appearance characteristic, the pedestrian shape characteristic and the fused final pedestrian characteristic to obtain a trained model;
the testing module configured to: inputting the inquiry video and the search library video into the trained model to extract the pedestrian characteristics; calculating Euclidean distance between the pedestrian features of the inquiry video and the pedestrian features of the search library video, and sequencing identity matching according to the Euclidean distance
In a third aspect, the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the above-mentioned reloading video pedestrian re-identification method based on appearance and shape features.
In a fourth aspect, the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the above-mentioned reloading video pedestrian re-identification method based on appearance and shape features.
Compared with the prior art, the beneficial effect of this disclosure is:
1. the coarse-grained apparent branch can extract the coarse-grained apparent features based on the whole situation to process the condition that the pedestrians slightly change the outfit, and the fine-grained shape branch can extract the fine-grained shape features based on the part to process the condition that the pedestrians seriously change the outfit.
2. The appearance and the shape characteristic of extraction can be fused in a self-adaptive mode, and the fused characteristic can be better generalized to the condition of changing clothes of different degrees, so that the recognition effect of the people who change clothes is improved.
Drawings
The accompanying drawings, which form a part hereof, are included to provide a further understanding of the present embodiments, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the present embodiments and together with the description serve to explain the present embodiments without unduly limiting the present embodiments.
Fig. 1 is a flow chart of embodiment 1 of the present disclosure.
The specific implementation mode is as follows:
the present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
The term "change of clothing" as used in this disclosure means that the same pedestrian is considered to have changed in clothing (such as a change in color, pattern or style of upper garment, lower garment or shoes) which is recognizable to the human eye in two video sequences.
Example 1:
the embodiment provides a reloading video pedestrian re-identification method based on appearance and shape characteristics; the method specifically comprises the following steps:
inputting the obtained pedestrian video to be identified into a trained deep neural network model, and extracting pedestrian features;
and calculating Euclidean distances between the pedestrian features of the query video and the video features of the search library, and sequencing identity matching according to the Euclidean distances.
Specifically, in this embodiment, the establishing of the deep neural network model and the training process of the model specifically include:
establishing a deep neural network model, and setting a corresponding module network structure; as shown in FIG. 1, the model includes three parts, namely an Appearance Encoder (Appearance Encoder), a Shape Encoder (Shape Encoder) and a Feature Integration Module (Feature Integration Module).
Suppose a segment of pedestrian video v in a training set i Is shown as
Figure BDA0003143703150000061
The video is composed of T frames, I t (T =1,2, \8230;, T) denotes the T-th frame; will be provided with
Figure BDA0003143703150000062
Inputting the image data into an apparent encoder, extracting the pedestrian apparent features frame by frame to obtain an apparent feature map
Figure BDA0003143703150000063
Apparent feature map corresponding to each frame
Figure BDA0003143703150000064
Carrying out average pooling and aggregating to form an apparent characteristic diagram of the video; then global average pooling is carried out on the apparent feature map of the video to obtain an apparent feature vector of the video
Figure BDA0003143703150000065
Defining an apparent loss L using identity tag data in a dataset a Comprises the following steps:
Figure BDA0003143703150000066
wherein
Figure BDA0003143703150000067
Is the cross-entropy loss of the entropy of the sample,
Figure BDA0003143703150000068
is the ternary loss and alpha is the weighting factor.
Will be provided with
Figure BDA0003143703150000069
Simultaneously input into a human body segmentation module to obtain a binary pedestrian segmentation video
Figure BDA00031437031500000610
Will be provided with
Figure BDA00031437031500000611
Inputting the pedestrian shape feature into a shape encoder, and extracting the pedestrian shape feature frame by frame to obtain a shape feature map; to pairPerforming maximum pooling on the shape feature maps corresponding to each frame, aggregating the shape feature maps into a shape feature map of the pedestrian, and uniformly and horizontally dividing the shape feature map into J sub-feature maps
Figure BDA0003143703150000071
And performing maximum pooling in each sub-feature map, wherein the pooled feature vectors pass through J full-connected layers respectively, and J feature vectors are obtained and represent J horizontal regions of the pedestrian respectively. J feature vectors are connected to form a final feature vector representing the shape of the pedestrian
Figure BDA0003143703150000072
Defining a shape loss L using identity tag data in a dataset s Comprises the following steps:
Figure BDA0003143703150000073
wherein
Figure BDA0003143703150000074
Is the cross entropy loss corresponding to the jth shape feature vector,
Figure BDA0003143703150000075
is the corresponding ternary penalty, and β is the weighting factor.
Apparent character to pedestrian
Figure BDA0003143703150000076
And shape characteristics
Figure BDA0003143703150000077
Carrying out feature fusion; the feature fusion comprises two steps of weight prediction and feature conversion; first, L is performed on two feature vectors, respectively 2 Normalization; when weight prediction is performed, L is calculated 2 The normalized feature vectors are concatenated (concatenated), and passed through two convolution layers with convolution kernel size of 1 × 2, respectively, to generate two weight vectors
Figure BDA0003143703150000078
And
Figure BDA0003143703150000079
after passing through a Softmax function, obtaining an apparent weight vector
Figure BDA00031437031500000710
And shape weight vector
Figure BDA00031437031500000711
This process can be expressed as:
Figure BDA00031437031500000712
Figure BDA00031437031500000713
Figure BDA00031437031500000714
when performing feature conversion, the device L 2 The normalized feature vectors are respectively input into two convolution layers with convolution kernel size of 1 multiplied by 1, and then are processed by a Sigmoid function to obtain an apparent feature vector and a shape feature vector after conversion. This process can be expressed as:
Figure BDA00031437031500000715
Figure BDA00031437031500000716
feature vector ultimately representing pedestrian
Figure BDA0003143703150000081
Is a weighted sum of the apparent and shape feature vectors,expressed as:
Figure BDA0003143703150000082
wherein |, indicates a hadamard product. Defining a fusion loss L using identity tag data in a dataset c Comprises the following steps:
Figure BDA0003143703150000083
wherein
Figure BDA0003143703150000084
Is the cross-entropy loss of the entropy of the sample,
Figure BDA0003143703150000085
is the ternary loss and alpha is the weighting factor.
The total loss L when training the model is as follows:
L all =L a1 L s2 L c
λ 12 are all weight factors; the objective function is minimized using an error back propagation algorithm, thereby optimizing the parameters of the model.
In this embodiment, a two-stage training strategy is used to train the model; in stage I, the feature fusion module is not used, and only L is passed a And L s Learning with discriminant power separately
Figure BDA0003143703150000086
And
Figure BDA0003143703150000087
optimizing parameters of the apparent encoder and the shape encoder; in stage II, through the total loss L all And (4) jointly training the whole model (except a human body segmentation module) and optimizing the model parameters.
During testing, inputting query video (query) and search library video (galery) into a trained model to extract features; in the feature space, similarity is measured by their euclidean distance; the smaller the Euclidean distance between the two picture characteristics is, the higher the similarity is; the higher the similarity, the more forward the position in the identity matching result of the query video (query) is in the search base video (gallery).
For the purpose of explaining the specific embodiment of the present disclosure in detail, a pedestrian re-identification data set COCV of a change-over video is taken as an example for explanation; in this embodiment, the apparent encoder is a ResNet50 module pre-trained on an ImageNet data set, the pedestrian segmentation module uses a JPPNet model pre-trained on a human body analysis data set LIP, and the shape encoder uses a gaitput model; the method specifically comprises the following steps:
stage i training was performed. Without using a feature fusion module, by apparent loss L a And shape loss L s Having discriminant power for separate learning
Figure BDA0003143703150000091
And
Figure BDA0003143703150000092
optimizing parameters of the apparent encoder and the shape encoder; the training period for pre-training is set to 400.
At apparent loss L a Shape loss L s And fusion loss L c Weighted sum of (L) all For the total loss function, the whole model (except the human body segmentation module) is trained, and model parameters are optimized. The training period is set to 400.
During testing, inputting both a query video (query) and a search library video (gallery) into a trained model to extract features; in the feature space, the similarity is measured by the Euclidean distance of the features; the smaller the Euclidean distance between the two picture characteristics is, the higher the similarity is; the higher the similarity, the more forward the position in the identity matching result of the query video (query) is.
Example 2:
the embodiment provides a reloading video pedestrian re-identification system based on appearance and shape characteristics, which comprises a model establishing module, a characteristic extracting module, a model optimizing module and a testing module;
the model building module configured to: establishing a deep neural network model;
the feature extraction module configured to: based on the deep neural network model, extracting pedestrian apparent characteristics and pedestrian shape characteristics of the pedestrian video in the training set; fusing the pedestrian appearance characteristic and the pedestrian shape characteristic to obtain a final pedestrian characteristic;
the model optimization module configured to: carrying out deep neural network model training by adopting the pedestrian appearance characteristic, the pedestrian shape characteristic and the fused final pedestrian characteristic to obtain a trained model;
the test module configured to: inputting the inquiry video and the search library video into the trained model to extract the pedestrian characteristics; and calculating Euclidean distances between the pedestrian features of the inquiry video and the pedestrian features of the search library video, and sequencing identity matching according to the Euclidean distances.
Example 3:
the present embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the processor implements the reloaded video pedestrian re-identification method based on appearance and shape features as described in embodiment 1.
It is understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on.
A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include both read-only memory and random access memory and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.
Example 4:
the present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the reloading video pedestrian re-identification method based on appearance and shape features as described in embodiment 1.
The method for pedestrian re-identification based on the reloading video of the appearance and shape features in the embodiment 1 can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is positioned in the memory, the processor reads the information in the memory and combines the hardware thereof to complete the steps of the method; to avoid repetition, they will not be described in detail here
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and those skilled in the art can make various modifications and variations. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present embodiment should be included in the protection scope of the present embodiment.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (9)

1. The reloading video pedestrian re-identification method based on appearance and shape features is characterized by comprising the following steps of:
establishing a deep neural network model;
based on the deep neural network model, extracting pedestrian apparent characteristics and pedestrian shape characteristics of the pedestrian video in the training set;
fusing the pedestrian appearance characteristic and the pedestrian shape characteristic to obtain a final pedestrian characteristic;
carrying out deep neural network model training by adopting the pedestrian appearance characteristic, the pedestrian shape characteristic and the fused final pedestrian characteristic to obtain a trained model;
inputting the inquiry video and the search library video into the trained model to extract the pedestrian characteristics;
calculating Euclidean distances between the pedestrian features of the query video and the pedestrian features of the search library video, and sequencing identity matching according to the Euclidean distances;
fusing the pedestrian appearance features and the pedestrian shape features includes:
respectively normalizing the feature vectors of the pedestrian apparent features and the pedestrian shape features;
performing weight prediction, connecting the normalized feature vectors, and generating two weight vectors through two convolution layers with convolution kernel size of 1 × 2 respectively; obtaining an apparent weight vector and a shape weight vector after passing through a Softmax function;
performing feature conversion, respectively inputting the normalized feature vectors into two convolution layers with convolution kernel sizes of 1 multiplied by 1, and obtaining converted apparent feature vectors and shape feature vectors through a Sigmoid function;
the feature vector that ultimately represents the pedestrian is a weighted sum of the apparent and shape feature vectors.
2. The method of claim 1, wherein the deep neural network model comprises an appearance encoder for extracting appearance features of pedestrian video coarse granularity, a shape encoder for extracting shape features of pedestrian video fine granularity, and a feature fusion module for adaptively fusing the appearance features and the shape features into final pedestrian features.
3. The video-over-mount pedestrian re-recognition method of claim 2, wherein extracting pedestrian appearance features and pedestrian shape features of the pedestrian video in the training set comprises:
inputting the pedestrian video into an apparent encoder, and extracting the pedestrian apparent features frame by frame to obtain an apparent feature map; carrying out average pooling on the apparent feature maps corresponding to each frame, and aggregating the apparent feature maps into an apparent feature map of the video; then carrying out global average pooling on the apparent feature map of the video to obtain an apparent feature vector of the video; defining an apparent loss function by using the identity tag data in the data set;
simultaneously carrying out human body segmentation on the pedestrian video to obtain a binary pedestrian segmentation video; inputting the human segmentation video into a shape encoder, and extracting the shape features of the pedestrians frame by frame to obtain a shape feature map; performing maximum pooling on the shape characteristic graph corresponding to each frame, aggregating the shape characteristic graphs into a shape characteristic graph of the pedestrian, and uniformly and horizontally dividing the shape characteristic graph into a plurality of sub-characteristic graphs; performing maximum pooling in each sub-feature map, and allowing the pooled feature vectors to pass through a plurality of full-connection layers respectively to obtain a plurality of feature vectors respectively representing a plurality of horizontal areas of the pedestrians; connecting a plurality of feature vectors to form a final feature vector representing the shape of the pedestrian; a shape loss function is defined using the identity tag data in the dataset.
4. The method of claim 1, wherein the deep neural network model training process is performed in two stages, including:
in the first stage, an apparent encoder and a shape encoder are optimized only through an apparent loss function and a shape loss function, and coarse-grained apparent features and fine-grained shape features with discrimination are learned respectively;
in stage ii, the entire network is jointly trained by a weighted sum of the apparent loss function, the shape loss function, and the fusion loss function.
5. The method for pedestrian re-identification of reloading video based on appearance and shape features as claimed in claim 1, wherein the query video of the test is input to the model together with the search library video to extract features; in the feature space, the Euclidean distance between the features of the query video and the search library video is calculated to measure the similarity.
6. The reloading video pedestrian re-identification method based on appearance and shape features as recited in claim 5, wherein the higher the similarity, the more forward the position in the identity matching result is for searching the library video.
7. The reloading video pedestrian re-identification system based on appearance and shape features is characterized by comprising a model establishing module, a feature extraction module, a model optimization module and a testing module;
the model building module configured to: establishing a deep neural network model;
the feature extraction module configured to: based on the deep neural network model, extracting pedestrian apparent characteristics and pedestrian shape characteristics of the pedestrian video in the training set; fusing the pedestrian appearance characteristic and the pedestrian shape characteristic to obtain a final pedestrian characteristic;
the model optimization module configured to: performing deep neural network model training by using the pedestrian appearance characteristics, the pedestrian shape characteristics and the fused final pedestrian characteristics to obtain a trained model;
the test module configured to: inputting the inquiry video and the search library video into the trained model to extract the pedestrian characteristics; calculating Euclidean distances between the pedestrian features of the inquiry video and the pedestrian features of the search library video, and sequencing identity matching according to the Euclidean distances;
fusing the pedestrian appearance features and the pedestrian shape features includes:
respectively normalizing the feature vectors of the pedestrian apparent features and the pedestrian shape features;
performing weight prediction, connecting the normalized feature vectors, and generating two weight vectors through two convolution layers with convolution kernel size of 1 × 2 respectively; obtaining an apparent weight vector and a shape weight vector after passing through a Softmax function;
performing feature conversion, respectively inputting the normalized feature vectors into two convolution layers with convolution kernel sizes of 1 multiplied by 1, and obtaining converted apparent feature vectors and shape feature vectors through a Sigmoid function;
the feature vector that ultimately represents the pedestrian is a weighted sum of the apparent and shape feature vectors.
8. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor performs the visual pedestrian re-identification method based on appearance and shape features of any one of claims 1-6.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method for pedestrian re-identification based on reloaded video of apparent and shape features according to any one of claims 1 to 6.
CN202110748180.1A 2021-07-01 2021-07-01 Reloading video pedestrian re-identification method and system based on appearance and shape characteristics Active CN113420697B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110748180.1A CN113420697B (en) 2021-07-01 2021-07-01 Reloading video pedestrian re-identification method and system based on appearance and shape characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110748180.1A CN113420697B (en) 2021-07-01 2021-07-01 Reloading video pedestrian re-identification method and system based on appearance and shape characteristics

Publications (2)

Publication Number Publication Date
CN113420697A CN113420697A (en) 2021-09-21
CN113420697B true CN113420697B (en) 2022-12-09

Family

ID=77720004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110748180.1A Active CN113420697B (en) 2021-07-01 2021-07-01 Reloading video pedestrian re-identification method and system based on appearance and shape characteristics

Country Status (1)

Country Link
CN (1) CN113420697B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821689A (en) * 2021-09-22 2021-12-21 沈春华 Pedestrian retrieval method and device based on video sequence and electronic equipment
CN114758362B (en) * 2022-06-15 2022-10-11 山东省人工智能研究院 Clothing changing pedestrian re-identification method based on semantic perception attention and visual shielding

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942502A (en) * 2019-11-29 2020-03-31 中山大学 Voice lip fitting method and system and storage medium
CN111191526A (en) * 2019-12-16 2020-05-22 汇纳科技股份有限公司 Pedestrian attribute recognition network training method, system, medium and terminal

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11443165B2 (en) * 2018-10-18 2022-09-13 Deepnorth Inc. Foreground attentive feature learning for person re-identification
CN109816701B (en) * 2019-01-17 2021-07-27 北京市商汤科技开发有限公司 Target tracking method and device and storage medium
CN110046599A (en) * 2019-04-23 2019-07-23 东北大学 Intelligent control method based on depth integration neural network pedestrian weight identification technology
CN110070066B (en) * 2019-04-30 2022-12-09 福州大学 Video pedestrian re-identification method and system based on attitude key frame
CN111259786B (en) * 2020-01-14 2022-05-03 浙江大学 Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN111339908B (en) * 2020-02-24 2023-08-15 青岛科技大学 Group behavior identification method based on multi-mode information fusion and decision optimization
CN111523470B (en) * 2020-04-23 2022-11-18 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device, equipment and medium
CN111967408B (en) * 2020-08-20 2022-06-21 中科人工智能创新技术研究院(青岛)有限公司 Low-resolution pedestrian re-identification method and system based on prediction-recovery-identification
CN112507953B (en) * 2020-12-21 2022-10-14 重庆紫光华山智安科技有限公司 Target searching and tracking method, device and equipment
CN112926396B (en) * 2021-01-28 2022-05-13 杭州电子科技大学 Action identification method based on double-current convolution attention

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942502A (en) * 2019-11-29 2020-03-31 中山大学 Voice lip fitting method and system and storage medium
CN111191526A (en) * 2019-12-16 2020-05-22 汇纳科技股份有限公司 Pedestrian attribute recognition network training method, system, medium and terminal

Also Published As

Publication number Publication date
CN113420697A (en) 2021-09-21

Similar Documents

Publication Publication Date Title
Li et al. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios
Li et al. Person search with natural language description
Matsukawa et al. Person re-identification using CNN features learned from combination of attributes
Liu et al. Matching-cnn meets knn: Quasi-parametric human parsing
CN109325952B (en) Fashionable garment image segmentation method based on deep learning
Johnson et al. Clustered pose and nonlinear appearance models for human pose estimation.
Yamaguchi et al. Paper doll parsing: Retrieving similar styles to parse clothing items
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
Miksik et al. Efficient temporal consistency for streaming video scene analysis
CN113420697B (en) Reloading video pedestrian re-identification method and system based on appearance and shape characteristics
Aurangzeb et al. Human behavior analysis based on multi-types features fusion and Von Nauman entropy based features reduction
Chai et al. Boosting palmprint identification with gender information using DeepNet
CN112215180A (en) Living body detection method and device
CN112001353A (en) Pedestrian re-identification method based on multi-task joint supervised learning
CN112669343A (en) Zhuang minority nationality clothing segmentation method based on deep learning
CN112084998A (en) Pedestrian re-identification method based on attribute information assistance
CN115205903A (en) Pedestrian re-identification method for generating confrontation network based on identity migration
Galiyawala et al. Person retrieval in surveillance videos using deep soft biometrics
Kurnianggoro et al. Identification of pedestrian attributes using deep network
CN117333901A (en) Clothing changing pedestrian re-identification method based on uniform and various fusion of clothing
Khan et al. Deep semantic pyramids for human attributes and action recognition
Galiyawala et al. Person retrieval in surveillance videos using attribute recognition
CN113807200B (en) Multi-row person identification method and system based on dynamic fitting multi-task reasoning network
Li et al. SAT-Net: Self-attention and temporal fusion for facial action unit detection
Oh et al. Visual adversarial attacks and defenses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant