CN110766093A - Video target re-identification method based on multi-frame feature fusion - Google Patents

Video target re-identification method based on multi-frame feature fusion Download PDF

Info

Publication number
CN110766093A
CN110766093A CN201911055853.4A CN201911055853A CN110766093A CN 110766093 A CN110766093 A CN 110766093A CN 201911055853 A CN201911055853 A CN 201911055853A CN 110766093 A CN110766093 A CN 110766093A
Authority
CN
China
Prior art keywords
target
feature
orientation
fusion
identifying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911055853.4A
Other languages
Chinese (zh)
Inventor
李冠华
徐晓刚
管慧艳
刘静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Smart Vision Hangzhou Technology Development Co Ltd
Original Assignee
Smart Vision Hangzhou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smart Vision Hangzhou Technology Development Co Ltd filed Critical Smart Vision Hangzhou Technology Development Co Ltd
Priority to CN201911055853.4A priority Critical patent/CN110766093A/en
Publication of CN110766093A publication Critical patent/CN110766093A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video target re-identification method based on multi-frame feature fusion, which comprises the following steps: acquiring multi-frame continuous images of the same target; classifying the images according to the orientation of the target; extracting target features of all the images; performing feature fusion and pooling on the images in the same orientation to obtain fusion features; identifying the orientation of a target to be identified, and extracting the characteristic to be identified; taking the product of the similarity of the feature to be identified and the fused feature corresponding to the orientation and the weight factor of the orientation as the final similarity; if the maximum value of the final similarity is larger than the given threshold value, the recognition is successful, the target corresponding to the maximum value of the final similarity is output as a re-recognition result, and otherwise, the recognition is failed. The invention performs correlation on the target on the time axis and solves the matching problem of different orientations of the target.

Description

Video target re-identification method based on multi-frame feature fusion
Technical Field
The invention relates to the technical field of image recognition, in particular to a video target re-recognition method based on multi-frame feature fusion.
Background
Searching for specific pedestrians in a common video is a problem which needs to be solved urgently, and is particularly used for searching for a suspected target in case detection. Generally, the resolution of a pedestrian image area is low due to a small pedestrian target in a video, and the pedestrian image area cannot be authenticated by a face recognition method, and therefore, a pedestrian re-recognition method based on human body appearance characteristics is widely researched, but most of the current methods focus on extracting image characteristics, and have the following defects:
1. the video has time continuity, and the feature extraction method based on the picture ignores continuous features on a time axis and is not accurate enough for feature extraction;
2. the same pedestrian can be in different orientations in the video, and the different orientations have great influence on the final recognition.
Disclosure of Invention
The invention aims to provide a video target re-identification method based on multi-frame feature fusion, which is used for carrying out correlation on a target on a time axis and solving the matching problem of different orientations of the target.
In order to achieve the purpose, the invention provides the following technical scheme:
a video target re-identification method based on multi-frame feature fusion is characterized by comprising the following steps:
s1, acquiring multi-frame continuous images of the same target;
s2, classifying the images according to the orientation of the target;
s3, extracting target features of all the images;
s4, performing feature fusion and pooling on the images in the same orientation to obtain fusion features;
s5, identifying the orientation of the target to be identified, and extracting the characteristic to be identified according to S3 and S4;
s6, taking the product of the similarity of the feature to be identified and the fused feature corresponding to the orientation and the weight factor of the orientation as the final similarity;
and S7, if the maximum value of the final similarity is larger than a given threshold value, the recognition is successful, and the target corresponding to the maximum value of the final similarity is output as a re-recognition result, otherwise, the recognition is failed.
Further, the classification of the orientation in S2 employs a deep neural network model.
Further, the S2 includes training a deep neural network model, and the image with the manually labeled orientation is used as a sample to train the deep neural network model.
Further, the extraction of the target feature in S3 adopts a CNN network.
Further, the S4 feature fusion uses an RNN network, and uses a linear combination of the target feature input at the current time and the feature vector of the RNN network at the previous time as an output, specifically:
o(t)=Wif(t)+Wsr(t-1)
r(t)=Tanh(o(t))
wherein o is(t)Is the output of the RNN network at the current time t; wiAnd WsIs a weight coefficient; f. of(t)Target features input for the current time t; r is(t-1)The characteristic vector of the RNN at the last moment t-1 is obtained; tanh (. cndot.) is the excitation function.
Further, the pooling is an average pooling:
Figure BDA0002256536780000021
wherein, VyFor the fused feature, T is duration.
Further, the calculation of the final similarity in S6 is specifically as follows:
So=wS(Vx,Vy)
wherein, VxIs a feature to be identified; vyIs a fusion feature; s (-) is a similarity calculation function; w is a weight factor for orientation, W ∈ W, W ═ W [ W ]s,wd,wn},wsIs a VxAnd VyTowards the same weight factor, wdIs a VxAnd VyOppositely oriented weighting factors, wnIs a VxAnd VyTowards neighboring weight factors; soThe final similarity.
Further, the value of the weight factor is ws=[0.8,0.9],wd=[0.4,0.5],wn=[0.55,0.65]。
Further, the given threshold is 0.6.
Compared with the prior art, the invention has the beneficial effects that: according to the invention, the human body is divided into four orientations, and corresponding weight factors are set according to different orientations, so that the matching problem of different orientations of the target is solved; on the other hand, when multi-frame features are fused, the features in the unified orientation are classified according to the orientation, and time sequence fusion is carried out on the features in the unified orientation, so that correlation of the target on a time axis is realized.
Drawings
FIG. 1 is an overall process flow diagram of the present invention.
Fig. 2 is a diagram of an RNN network model.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a video object re-identification method based on multi-frame feature fusion, which includes the following steps:
s1, acquiring multi-frame continuous images of the same target;
s2, classifying the images according to the orientation of the target to obtain corresponding orientation information;
specifically, the present invention divides the pose of the target into four orientations, front, back, right and left, respectively. The classification algorithm adopts a deep neural network model trained in advance. And in the training of the deep neural network model, the artificially marked image is used as a sample to train the deep neural network model. The multiple frames of consecutive images in S1 are thus classified into multiple classes, one for each class, according to the orientation of the object.
And S3, performing target feature extraction on all the images, preferably adopting a CNN network.
By i(t)Representing the image at time t. Will i(t)Inputting into CNN network, and calculating and outputting target feature represented as f(t)=C(i(t))。
And S4, performing feature fusion and pooling on the images in the same orientation to obtain fused features.
Specifically, the feature fusion adopts an RNN network and only images in the same orientation are fused, so that feature instability caused by different orientations is avoided. As shown in fig. 2, a linear combination of the target feature input at the current time and the feature vector of the RNN network at the previous time is used as an output, and specifically:
o(t)=Wif(t)+Wsr(t-1)
r(t)=Tanh(o(t))
wherein o is(t)Is the output of the RNN network at the current time t; wiAnd WsIs a weight coefficient; f. of(t)Target features input for the current time t; r is(t-1)The characteristic vector of the RNN at the last moment t-1 is obtained; tanh (. cndot.) is the excitation function.
After RNN network fusion, pooling operations, specifically average pooling, need to be performed:
Figure BDA0002256536780000041
wherein, VyT is the duration of the resulting fusion signature.
Thus, for each orientation of the image set classified at S2, there is a corresponding fusion feature VyAnd forming a fused feature set.
S5, identifying the orientation of the target to be identified, and extracting the characteristic to be identified according to S3 and S4;
specifically, for a given target to be recognized, the orientation recognition is performed according to the deep neural network model trained in S2. Following the steps of S3 and S4,inputting the target to be identified into the CNN network to obtain the characteristic V to be identifiedx
S6, taking the product of the similarity of the feature to be identified and the fused feature corresponding to the orientation and the weight factor of the orientation as the final similarity;
specifically, the final similarity is calculated as follows:
So=wS(Vx,Vy)
wherein, VxIs a feature to be identified; vyIs a fusion feature; s (-) is a similarity calculation function, specifically a cosine distance; w is a weight factor for orientation, W ∈ W, W ═ W [ W ]s,wd,wn},wsIs a VxAnd VyTowards the same weight factor, wdIs a VxAnd VyOppositely oriented weighting factors, wnIs a VxAnd VyTowards neighboring weight factors; soThe final similarity. Preferably, the value of the weighting factor is ws=[0.8,0.9],wd=[0.4,0.5],wn=[0.55,0.65]. In particular ws=0.85,wd=0.45,wn=0.6。
It is worth mentioning that the mutual relationship of the orientations is as follows: taking the front face as an example, the opposite direction is the back face, and the directions adjacent to it are the left side and the right side.
And S7, if the maximum value of the final similarity is greater than the given threshold value of 0.6, the recognition is successful, and the target corresponding to the maximum value of the final similarity is output as a re-recognition result, otherwise, the recognition is failed.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (9)

1. A video target re-identification method based on multi-frame feature fusion is characterized by comprising the following steps:
s1, acquiring multi-frame continuous images of the same target;
s2, classifying the images according to the orientation of the target;
s3, extracting target features of all the images;
s4, performing feature fusion and pooling on the images in the same orientation to obtain fusion features;
s5, identifying the orientation of the target to be identified, and extracting the characteristic to be identified according to S3 and S4;
s6, taking the product of the similarity of the feature to be identified and the fused feature corresponding to the orientation and the weight factor of the orientation as the final similarity;
and S7, if the maximum value of the final similarity is larger than a given threshold value, the recognition is successful, and the target corresponding to the maximum value of the final similarity is output as a re-recognition result, otherwise, the recognition is failed.
2. The method for re-identifying the video target based on the multi-frame feature fusion of the claim 1, wherein the classification of the orientation in the S2 adopts a deep neural network model.
3. The method for re-identifying the video target based on the multi-frame feature fusion of claim 2, wherein the S2 further includes training a deep neural network model, and the deep neural network model is trained by using the artificially oriented pictures as samples.
4. The method for re-identifying the video target based on the multi-frame feature fusion as claimed in claim 1, wherein the extraction of the target feature in S3 adopts a CNN network.
5. The method according to claim 1, wherein the S4 feature fusion uses RNN network, and uses a linear combination of the target feature input at the current time and the feature vector of the RNN network at the previous time as an output, specifically:
o(t)=Wif(t)+Wsr(t-1)
r(t)=Tanh(o(t))
wherein o is(t)Is the output of the RNN network at the current time t; wiAnd WsIs a weight coefficient; f. of(t)Target features input for the current time t; r is(t-1)The characteristic vector of the RNN at the last moment t-1 is obtained; tanh (. cndot.) is the excitation function.
6. The method for re-identifying the video target based on the multi-frame feature fusion as claimed in claim 5, wherein the pooling is an average pooling:
Figure FDA0002256536770000021
wherein, VyFor the fused feature, T is duration.
7. The method for re-identifying the video target based on the multi-frame feature fusion as claimed in claim 1, wherein the final similarity in S6 is calculated as follows:
So=wS(Vx,Vy)
wherein, VxIs a feature to be identified; vyIs a fusion feature; s (-) is a similarity calculation function; w is a weight factor for orientation, W ∈ W, W ═ W [ W ]s,wd,wn},wsIs a VxAnd VyTowards the same weight factor, wdIs a VxAnd VyOppositely oriented weighting factors, wnIs a VxAnd VyTowards neighboring weight factors; soThe final similarity.
8. The multi-frame based device of claim 7The method for re-identifying the sign-fused video target is characterized in that the value of the weight factor is ws=[0.8,0.9],wd=[0.4,0.5],wn=[0.55,0.65]。
9. The method according to claim 1, wherein the given threshold is 0.6.
CN201911055853.4A 2019-10-31 2019-10-31 Video target re-identification method based on multi-frame feature fusion Pending CN110766093A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911055853.4A CN110766093A (en) 2019-10-31 2019-10-31 Video target re-identification method based on multi-frame feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911055853.4A CN110766093A (en) 2019-10-31 2019-10-31 Video target re-identification method based on multi-frame feature fusion

Publications (1)

Publication Number Publication Date
CN110766093A true CN110766093A (en) 2020-02-07

Family

ID=69335446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911055853.4A Pending CN110766093A (en) 2019-10-31 2019-10-31 Video target re-identification method based on multi-frame feature fusion

Country Status (1)

Country Link
CN (1) CN110766093A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909210A (en) * 2020-02-18 2020-03-24 北京海天瑞声科技股份有限公司 Video screening method and device and storage medium
CN111444817A (en) * 2020-03-24 2020-07-24 咪咕文化科技有限公司 Person image identification method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709449A (en) * 2016-12-22 2017-05-24 深圳市深网视界科技有限公司 Pedestrian re-recognition method and system based on deep learning and reinforcement learning
CN107767416A (en) * 2017-09-05 2018-03-06 华南理工大学 The recognition methods of pedestrian's direction in a kind of low-resolution image
CN109145777A (en) * 2018-08-01 2019-01-04 北京旷视科技有限公司 Vehicle recognition methods, apparatus and system again
CN109784130A (en) * 2017-11-15 2019-05-21 株式会社日立制作所 Pedestrian recognition methods and its device and equipment again

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709449A (en) * 2016-12-22 2017-05-24 深圳市深网视界科技有限公司 Pedestrian re-recognition method and system based on deep learning and reinforcement learning
CN107767416A (en) * 2017-09-05 2018-03-06 华南理工大学 The recognition methods of pedestrian's direction in a kind of low-resolution image
CN109784130A (en) * 2017-11-15 2019-05-21 株式会社日立制作所 Pedestrian recognition methods and its device and equipment again
CN109145777A (en) * 2018-08-01 2019-01-04 北京旷视科技有限公司 Vehicle recognition methods, apparatus and system again

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAO LIU等: ""Video-Based Person Re-Identification With Accumulative Motion Context"", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》 *
罗浩 等: ""基于深度学习的行人重识别研究进展"", 《自动化学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909210A (en) * 2020-02-18 2020-03-24 北京海天瑞声科技股份有限公司 Video screening method and device and storage medium
CN111444817A (en) * 2020-03-24 2020-07-24 咪咕文化科技有限公司 Person image identification method and device, electronic equipment and storage medium
CN111444817B (en) * 2020-03-24 2023-07-07 咪咕文化科技有限公司 Character image recognition method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
CN107832672B (en) Pedestrian re-identification method for designing multi-loss function by utilizing attitude information
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
CN106709449B (en) Pedestrian re-identification method and system based on deep learning and reinforcement learning
Wang et al. Static and moving object detection using flux tensor with split Gaussian models
CN107563345B (en) Human body behavior analysis method based on space-time significance region detection
CN109101865A (en) A kind of recognition methods again of the pedestrian based on deep learning
Pigou et al. Gesture and sign language recognition with temporal residual networks
JP5675229B2 (en) Image processing apparatus and image processing method
Wu et al. A detection system for human abnormal behavior
CN109961051A (en) A kind of pedestrian's recognition methods again extracted based on cluster and blocking characteristic
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
Damen et al. Detecting carried objects from sequences of walking pedestrians
CN107767416B (en) Method for identifying pedestrian orientation in low-resolution image
CN113963032A (en) Twin network structure target tracking method fusing target re-identification
WO2013075295A1 (en) Clothing identification method and system for low-resolution video
CN115527269B (en) Intelligent human body posture image recognition method and system
CN111539351A (en) Multi-task cascaded face frame selection comparison method
CN111126143A (en) Deep learning-based exercise judgment guidance method and system
Zhou et al. A study on attention-based LSTM for abnormal behavior recognition with variable pooling
CN112784712A (en) Missing child early warning implementation method and device based on real-time monitoring
CN110766093A (en) Video target re-identification method based on multi-frame feature fusion
Liang et al. Egocentric hand pose estimation and distance recovery in a single RGB image
CN112560618A (en) Behavior classification method based on skeleton and video feature fusion
Konstantinidis et al. Skeleton-based action recognition based on deep learning and Grassmannian pyramids

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200207

RJ01 Rejection of invention patent application after publication