CN107480178A - A kind of pedestrian's recognition methods again compared based on image and video cross-module state - Google Patents

A kind of pedestrian's recognition methods again compared based on image and video cross-module state Download PDF

Info

Publication number
CN107480178A
CN107480178A CN201710536118.XA CN201710536118A CN107480178A CN 107480178 A CN107480178 A CN 107480178A CN 201710536118 A CN201710536118 A CN 201710536118A CN 107480178 A CN107480178 A CN 107480178A
Authority
CN
China
Prior art keywords
mtd
video
mrow
msub
mtr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710536118.XA
Other languages
Chinese (zh)
Other versions
CN107480178B (en
Inventor
林倞
张冬雨
吴文熙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DMAI Guangzhou Co Ltd
Original Assignee
Guangzhou Deep Domain Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Deep Domain Mdt Infotech Ltd filed Critical Guangzhou Deep Domain Mdt Infotech Ltd
Priority to CN201710536118.XA priority Critical patent/CN107480178B/en
Publication of CN107480178A publication Critical patent/CN107480178A/en
Application granted granted Critical
Publication of CN107480178B publication Critical patent/CN107480178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Library & Information Science (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a kind of pedestrian's recognition methods again compared based on image and video cross-module state, corresponds to the video of personage in the query image containing input for being retrieved from multiple videos, comprises the following steps:The configurable depth model of S1, structure;S2, training sample is obtained, and training sample is inputted in depth model, to be trained to depth model;Learn each several part parameter of the depth model of structure using forwards algorithms and backward algorithm;S3, the parameter obtained using S2 learnings are initialized to depth model;Query image to be measured and multiple videos are inputted into depth model, similarity measurement of each video respectively with the query image is calculated by depth model;It S4, will list with video of the similarity measurement of the query image higher than a threshold value, and be ranked up according to the size of similarity measurement.The present invention is realized the pedestrian compared based on image with video cross-module state and identified again on the premise of degree of precision is ensured.

Description

A kind of pedestrian's recognition methods again compared based on image and video cross-module state
Technical field
The present invention relates to computer vision and area of pattern recognition, and in particular to one kind is based on image and video cross-module state ratio To pedestrian's recognition methods again.
Background technology
Pedestrian's weight identification technology is the important basic research problem of computer vision field one.Pedestrian identifies origin again In personage's tracking technique of video field.When tracked personage's video camera shooting area away from keyboard, when its reenter by , it is necessary to be identified again during shooting area, and ID as before is distributed for it.With extensively should for video monitoring With being received more and more attention about the research that pedestrian identifies again.At present, pedestrian identifies and is not limited under single visual angle again Same person identification, more common situation refers to that the personage under different time, different visual angles identifies again.
The most of method for being confined to compare based on similitude between image of research identified again on pedestrian at present.Input Inquiry be image, and search for database be similarly by image construction.Although carry out on the research that pedestrian identifies again For a long time, while also made significant headway, but the problem is still a very challenging research topic. Its chief reason is due to the difference between existing illumination, angle, background between different cameras, along with personage The change of posture, same person have very big difference in appearance in the photo that different cameras obtains.
With the quickening of smart city pace of construction, we can easily obtain the monitor video comprising people information. In criminal investigation and safety-security area, the monitor video of the personage is included as a kind of new demand quilt according to the human image retrieval of input It is proposed.Compared with the image of static state, video bag contains more abundant people information, can preferably portray the feature of personage.Together When, the image of multiframe is included in video, for the single image of static state, it can be preferably tackled in the presence of the feelings blocked Condition.Further, since video is the set of successive image frame, its continuous dynamic space time information included, can be used for aiding in The identification of personage.Therefore, identify it is a kind of more natural mode again using video information to carry out pedestrian.
However, how compare pedestrian based on image and video will effectively extract in identification problem and rationally utilize video again Information is one of difficult point.Because substantial amounts of redundancy for image, in video be present, if processing is not When the precision of identification can be reduced on the contrary.Further, since the comparison between image and video belongs to two kinds of different mode, therefore How the comparison of cross-module state and an other difficult point are reasonably carried out.
The content of the invention
In view of this, it is necessary to for problems of the prior art, there is provided one kind is based on image and video cross-module state The pedestrian of comparison recognition methods again, this method can in video database by calculating the similitude between image and video, So as to retrieve all videos containing personage corresponding with input picture.
To achieve the above object, the present invention uses following technical scheme:
A kind of pedestrian's recognition methods again compared based on image and video cross-module state, is contained for being retrieved from multiple videos There is the video that personage is corresponded in the query image of input, comprise the following steps:
The configurable depth model of S1, structure;
The depth model includes convolutional neural networks, long memory network and similarity-based learning network in short-term;Convolutional Neural Network is used to extract the characteristics of image of query image and the video features of video respectively;Long memory network in short-term is used for extracting Video features in embedded video spatial information, and export the space video feature for including the spatial information;Similarity-based learning Network is used for characteristics of image and space video Feature Mapping to same dimension, and learns similarity measurement between the two;
S2, training sample is obtained, and training sample is inputted in depth model, to be trained to depth model;Utilize Forwards algorithms and backward algorithm learn each several part parameter of the depth model of structure;
S3, the parameter obtained using S2 learnings are initialized to depth model;Inputted into depth model to be measured Query image and multiple videos, similarity measurement of each video respectively with the query image is calculated by depth model;
S4, it will be listed with video of the similarity measurement of the query image higher than a threshold value, and according to similarity measurement Size is ranked up.
Further, in S2, input before training sample, the parameter of depth model is carried out using random fashion initial Change.
Further, in S2, every group of training sample includes a query image and a video, and precalculated two Mutual similarity measurement between person.
Further, in convolutional neural networks, x is made to represent the query image of input, Y represents video, Y={ yt| t= 1 ..., N }, wherein ytFor video Y t two field pictures, N is the totalframes of video.Cnn is made to represent the letter of convolutional neural networks Number, then query image x is as follows by the image feature representation obtained after convolutional neural networks:
fx=Cnn (x);
For video Y, its each two field picture y is obtained using convolutional neural networkstFeature, i.e. video features are expressed as:
{f(yt)=Cnn (yt) | t=1 ..., N }.
Further, for the t two field pictures of video, its feature obtained by convolutional neural networks is f (yt), f (yt) Input as long memory network in short-term;In long memory network in short-term, output corresponding to it is state ht, htFor the two field picture By the feature of long memory network in short-term;Above-mentioned processing is all carried out for each frame of video, finally all state htGroup It is combined together as the space video feature that embedded in spatial information:
fy={ ht| t=1 ..., N }.
Further, in similarity-based learning network, S (x, y) is made to represent the similitude between query image x and video Y Measurement, then have:
Wherein, A, B, C, d, e, f are similarity measurement S (x, y) parameter, and wherein A, B are positive definite matrix, and C is positive semidefinite Matrix;
OrderThen have
S (x, y)=| | LAfx||2+||LBfy||2+2dTfx-2(Lc xfx)T(Lc yfy)+2etfy
Wherein, LA, LB, LCFor the parameter of similarity-based learning network, learn to obtain by the training to depth model.
Further, in S2, using class hinge loss function, and the stochastic gradient descent method of standard is used to depth mould Type is trained;
W is made to represent the parameter of network, then loss function is defined as follows:
Wherein,
Wherein, liFor indicator function, it is defined as follows:
The present invention can will automatically input the query image comprising personage and is compared with the video in database, and return The video of personage identical with query image is returned, and according to the similarity auto-sequencing of personage in video and input picture personage. The present invention is by the way that similarity-based learning is embedded into deep neural network, so as to which depth characteristic is expressed into study and similarity-based learning Consolidated network framework is fused to, therefore can be by the two progress combination learning with optimizing, so as to solve spy in conventional method Sign study isolates study of coming with similarity-based learning, the defects of can not be optimized end to end with study.The present invention is using deeply The adaptive feature for removing study image and video of degree neutral net, it is achieved thereby that compared based on image and video cross-module state Pedestrian identifies solve a great problem in the pedestrian's weight identification technology compared based on image and video again.
Brief description of the drawings
Fig. 1 is that a kind of flow of pedestrian compared based on image with video cross-module state provided by the invention recognition methods again is shown It is intended to.
Fig. 2 is the structural representation of the depth model used in the present invention.
Embodiment
Technical scheme is described in detail below in conjunction with accompanying drawing and specific embodiment.
The invention provides a kind of pedestrian's recognition methods again compared based on image and video cross-module state, for being regarded from multiple The video that personage is corresponded in the query image containing input is retrieved in frequency.It should be noted that heretofore described is multiple Video can be the multiple videos for the multiple videos or scattered storage being uniformly stored in a video database.
A kind of as shown in figure 1, pedestrian compared based on image and video cross-module state provided by the invention recognition methods bag again Include following steps:
The configurable depth model of S1, structure;
The depth model includes convolutional neural networks, long memory network and similarity-based learning network in short-term;Convolutional Neural Network is used to extract the characteristics of image of query image and the video features of video respectively;Long memory network in short-term is used for extracting Video features in embedded video spatial information, and export the space video feature for including the spatial information;Similarity-based learning Network is used for characteristics of image and space video Feature Mapping to same dimension, and learns similarity measurement between the two;
S2, training sample is obtained, and training sample is inputted in depth model, to be trained to depth model;Utilize Forwards algorithms and backward algorithm learn each several part parameter of the depth model of structure;
S3, the parameter obtained using S2 learnings are initialized to depth model;Inputted into depth model to be measured Query image and multiple videos, similarity measurement of each video respectively with the query image is calculated by depth model;
S4, it will be listed with video of the similarity measurement of the query image higher than a threshold value, and according to similarity measurement Size is ranked up.
As shown in Fig. 2 the depth model framework used in the present invention is as follows:The depth model includes convolutional Neural net Network, long three parts of memory network and similarity-based learning network in short-term.The input of model has two, respectively query image and regards Frequently, correspondingly, the data transfer of model is divided into Liang Ge branches.One of branch extracts inquiry with convolutional neural networks The feature of image, it exports a branch for being connected to similarity-based learning network inputs;Another branch of model is refreshing with convolution The feature of video is extracted through network and long memory network in short-term, it exports second branch for being connected to similarity-based learning network.
The concrete function of each network is described in detail below:
Convolutional neural networks:In invention, convolutional neural networks are responsible for extracting query image and the feature of video.Convolution god Structure through network is using classical GoogLeNet structures.For the query image of input, by being obtained after convolutional neural networks Its feature, it is output to the study that similarity-based learning e-learning is used for similarity measurement.
X is made to represent the query image of input, Y represents video, Y={ yt| t=1 ..., N }, wherein ytFor video Y t Two field picture, N are the totalframes of video.Cnn is made to represent the function of convolutional neural networks, then query image x passes through convolutional Neural net The image feature representation obtained after network is as follows:
fx=Cnn (x);
For video Y, its each two field picture y is obtained using convolutional neural networkstFeature, i.e. video features are expressed as:
{f(yt)=Cnn (yt) | t=1 ..., N }.
After each frame of video obtains its feature by convolutional neural networks, then by long memory network in short-term in video Embedded space time information.
Long memory network in short-term:Long memory network in short-term as a kind of Recognition with Recurrent Neural Network, can handle random length Video data, and export video features.In the present invention, we using long memory network in short-term come to from convolutional neural networks The information of each frame of video of acquisition further encodes.The characteristics of long memory network in short-term be its current state not only with currently The data of input are relevant, and further relate to the previous state of current state be.
For the t two field pictures of video, its feature obtained by convolutional neural networks is f (yt), f (yt) it is used as length When memory network input;In long memory network in short-term, output corresponding to it is state ht, htPass through length for the two field picture When memory network feature;Above-mentioned processing is all carried out for each frame of video, finally all state htIt is grouped together As the space video feature that embedded in spatial information:
fy={ ht| t=1 ..., N }.
Similarity-based learning network:After the feature and the feature of video for being extracted query image respectively, we use phase Learn the similitude between query image and video like inquiry learning network.
Specifically, make S (x, y) represent the similarity measurement between query image x and video Y, then have:
Wherein, A, B, C, d, e, f are similarity measurement S (x, y) parameter, and wherein A, B are positive definite matrix, and C is positive semidefinite Matrix.OrderThen have:
S (x, y)=| | LAfx||2+||LBfy||2+2dTfx-2(Lc xfx)T(Lc yfy)+2etfy
Wherein, LA, LB, LCFor the parameter of similarity-based learning network, learn to obtain by the training to depth model.
During the model training described in S2, every group of training sample includes a query image and a video, and in advance The similarity measurement mutual between the two calculated.Before training sample is inputted, using random fashion to depth model Parameter is initialized.
In the present invention, using class hinge loss function, and depth model is entered using the stochastic gradient descent method of standard Row training;W is made to represent the parameter of network, then loss function is defined as follows:
Wherein,
Wherein, liFor indicator function, it is defined as follows:
Learnt by S2 after each several part parameter of the depth model of structure, depth model is entered again using these parameters Row initialization, can formally carries out pedestrian and identifies work again afterwards.
In S3 and S4, the present invention calculates the query image and video database of input with the depth model succeeded in school respectively In multiple videos between similitude.Then the mode according to similitude from big to small is arranged the video in database Sequence, and return to ranking results.
Embodiment described above only expresses the several embodiments of the present invention, and its description is more specific and detailed, but simultaneously Therefore the limitation to the scope of the claims of the present invention can not be interpreted as.It should be pointed out that for one of ordinary skill in the art For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the guarantor of the present invention Protect scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (7)

1. a kind of pedestrian's recognition methods again compared based on image and video cross-module state, for retrieved from multiple videos containing The video of personage is corresponded in the query image of input, it is characterised in that comprise the following steps:
The configurable depth model of S1, structure;
The depth model includes convolutional neural networks, long memory network and similarity-based learning network in short-term;Convolutional neural networks For extracting the characteristics of image of query image and the video features of video respectively;Long memory network in short-term is used to regard to what is extracted The spatial information of embedded video in frequency feature, and export the space video feature for including the spatial information;Similarity-based learning network For by characteristics of image and space video Feature Mapping to same dimension, and learn similarity measurement between the two;
S2, training sample is obtained, and training sample is inputted in depth model, to be trained to depth model;Using it is preceding to Algorithm and backward algorithm learn each several part parameter of the depth model of structure;
S3, the parameter obtained using S2 learnings are initialized to depth model;Inquiry to be measured is inputted into depth model Image and multiple videos, similarity measurement of each video respectively with the query image is calculated by depth model;
S4, it will be listed with video of the similarity measurement of the query image higher than a threshold value, and according to the size of similarity measurement It is ranked up.
2. according to the method for claim 1, it is characterised in that in S2, input before training sample, using random fashion The parameter of depth model is initialized.
3. according to the method for claim 1, it is characterised in that in S2, every group of training sample include a query image and One video, and precalculated similarity measurement mutual between the two.
4. according to the method for claim 1, it is characterised in that in convolutional neural networks, make x represent the query graph of input Picture, Y represent video, Y={ yt| t=1 ..., N }, wherein ytFor video Y t two field pictures, N is the totalframes of video.Make Cnn The function of convolutional neural networks is represented, then query image x is as follows by the image feature representation obtained after convolutional neural networks:
fx=Cnn (x);
For video Y, its each two field picture y is obtained using convolutional neural networkstFeature, i.e. video features are expressed as:
{f(yt)=Cnn (yt) | t=1 ..., N }.
5. according to the method for claim 4, it is characterised in that for the t two field pictures of video, it passes through convolutional Neural net The feature that network obtains is f (yt), f (yt) as the input for growing memory network in short-term;In long memory network in short-term, corresponding to it Export as state ht, htFeature for the two field picture by long memory network in short-term;Above-mentioned place is all carried out for each frame of video Reason, finally all state htIt is grouped together as the space video feature that embedded in spatial information:
fy={ ht| t=1 ..., N }.
6. according to the method for claim 5, it is characterised in that in similarity-based learning network, make S (x, y) represent inquiry Similarity measurement between image x and video Y, then have:
<mrow> <mi>S</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msub> <mi>f</mi> <mi>x</mi> </msub> </mtd> <mtd> <msub> <mi>f</mi> <mi>y</mi> </msub> </mtd> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mi>A</mi> </mtd> <mtd> <mi>C</mi> </mtd> <mtd> <mi>d</mi> </mtd> </mtr> <mtr> <mtd> <msup> <mi>C</mi> <mi>T</mi> </msup> </mtd> <mtd> <mi>B</mi> </mtd> <mtd> <mi>e</mi> </mtd> </mtr> <mtr> <mtd> <msup> <mi>d</mi> <mi>T</mi> </msup> </mtd> <mtd> <msup> <mi>e</mi> <mi>T</mi> </msup> </mtd> <mtd> <mi>f</mi> </mtd> </mtr> </mtable> </mfenced> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <msub> <mi>f</mi> <mi>x</mi> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mi>f</mi> <mi>y</mi> </msub> </mtd> </mtr> <mtr> <mtd> <mn>1</mn> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>
Wherein, A, B, C, d, e, f are similarity measurement S (x, y) parameter, and wherein A, B are positive definite matrix, and C is positive semidefinite matrix;
OrderThen have:
S (x, y)=| | LAfx||2+||LBfy||2+2dTfx-2(Lc xfx)T(Lc yfy)+2etfy
Wherein, LA, LB, LCFor the parameter of similarity-based learning network, learn to obtain by the training to depth model.
7. according to the method for claim 1, it is characterised in that in S2, using class hinge loss function, and use standard Stochastic gradient descent method depth model is trained;
W is made to represent the parameter of network, then loss function is defined as follows:
<mrow> <mi>W</mi> <mo>=</mo> <msub> <mi>argmin</mi> <mi>w</mi> </msub> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>l</mi> <mi>i</mi> </msub> <mi>S</mi> <mo>(</mo> <mrow> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>+</mo> </msub> <mo>+</mo> <mi>&amp;lambda;</mi> <mo>|</mo> <mo>|</mo> <mi>W</mi> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> <mo>;</mo> </mrow>
Wherein,
<mrow> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>&lt;</mo> <mo>-</mo> <mn>1</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mtable> <mtr> <mtd> <mrow> <mi>i</mi> <mi>f</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>l</mi> <mi>i</mi> </msub> <mo>=</mo> <mo>-</mo> <mn>1</mn> </mrow> </mtd> </mtr> </mtable> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>&amp;GreaterEqual;</mo> <mn>1</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>w</mi> <mi>i</mi> <mi>s</mi> <mi>e</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>
Wherein, liFor indicator function, it is defined as follows:
CN201710536118.XA 2017-07-01 2017-07-01 Pedestrian re-identification method based on cross-modal comparison of image and video Active CN107480178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710536118.XA CN107480178B (en) 2017-07-01 2017-07-01 Pedestrian re-identification method based on cross-modal comparison of image and video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710536118.XA CN107480178B (en) 2017-07-01 2017-07-01 Pedestrian re-identification method based on cross-modal comparison of image and video

Publications (2)

Publication Number Publication Date
CN107480178A true CN107480178A (en) 2017-12-15
CN107480178B CN107480178B (en) 2020-07-07

Family

ID=60595188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710536118.XA Active CN107480178B (en) 2017-07-01 2017-07-01 Pedestrian re-identification method based on cross-modal comparison of image and video

Country Status (1)

Country Link
CN (1) CN107480178B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228915A (en) * 2018-03-29 2018-06-29 华南理工大学 A kind of video retrieval method based on deep learning
CN108932509A (en) * 2018-08-16 2018-12-04 新智数字科技有限公司 A kind of across scene objects search methods and device based on video tracking
CN108960141A (en) * 2018-07-04 2018-12-07 国家新闻出版广电总局广播科学研究院 Pedestrian's recognition methods again based on enhanced depth convolutional neural networks
CN109063589A (en) * 2018-07-12 2018-12-21 杭州电子科技大学 Instrument and equipment on-line monitoring method neural network based and system
CN109165563A (en) * 2018-07-27 2019-01-08 北京市商汤科技开发有限公司 Pedestrian recognition methods and device, electronic equipment, storage medium, program product again
CN109635676A (en) * 2018-11-23 2019-04-16 清华大学 A method of positioning source of sound from video
CN110084979A (en) * 2019-04-23 2019-08-02 暗物智能科技(广州)有限公司 Man-machine interaction method, device and controller and interactive device
CN110245267A (en) * 2019-05-17 2019-09-17 天津大学 Multi-user's video flowing deep learning is shared to calculate multiplexing method
CN110334743A (en) * 2019-06-10 2019-10-15 浙江大学 A kind of progressive transfer learning method based on the long memory network in short-term of convolution
CN111050219A (en) * 2018-10-12 2020-04-21 奥多比公司 Spatio-temporal memory network for locating target objects in video content
CN111931637A (en) * 2020-08-07 2020-11-13 华南理工大学 Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network
CN112651262A (en) * 2019-10-09 2021-04-13 四川大学 Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment
CN113283362A (en) * 2021-06-04 2021-08-20 中国矿业大学 Cross-modal pedestrian re-identification method
CN113761995A (en) * 2020-08-13 2021-12-07 四川大学 Cross-mode pedestrian re-identification method based on double-transformation alignment and blocking

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110182469A1 (en) * 2010-01-28 2011-07-28 Nec Laboratories America, Inc. 3d convolutional neural networks for automatic human action recognition
US20110222724A1 (en) * 2010-03-15 2011-09-15 Nec Laboratories America, Inc. Systems and methods for determining personal characteristics
CN104915643A (en) * 2015-05-26 2015-09-16 中山大学 Deep-learning-based pedestrian re-identification method
CN106096568A (en) * 2016-06-21 2016-11-09 同济大学 A kind of pedestrian's recognition methods again based on CNN and convolution LSTM network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110182469A1 (en) * 2010-01-28 2011-07-28 Nec Laboratories America, Inc. 3d convolutional neural networks for automatic human action recognition
US20110222724A1 (en) * 2010-03-15 2011-09-15 Nec Laboratories America, Inc. Systems and methods for determining personal characteristics
CN104915643A (en) * 2015-05-26 2015-09-16 中山大学 Deep-learning-based pedestrian re-identification method
CN106096568A (en) * 2016-06-21 2016-11-09 同济大学 A kind of pedestrian's recognition methods again based on CNN and convolution LSTM network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DONG YI等: "Deep Metric Learning for Person Re-identification", 《2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION》 *
LIANG ZHENG等: "MARS: A video benchmark for large-scale person re-identification", 《COMPUTER VISION – ECCV 2016》 *
YICHAO YAN等: "Person Re-identification via recurrent feature aggregation", 《COMPUTER VISION – ECCV 2016》 *
马连洋: "跨摄像机行人再标识研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228915A (en) * 2018-03-29 2018-06-29 华南理工大学 A kind of video retrieval method based on deep learning
CN108960141B (en) * 2018-07-04 2021-04-23 国家新闻出版广电总局广播科学研究院 Pedestrian re-identification method based on enhanced deep convolutional neural network
CN108960141A (en) * 2018-07-04 2018-12-07 国家新闻出版广电总局广播科学研究院 Pedestrian's recognition methods again based on enhanced depth convolutional neural networks
CN109063589A (en) * 2018-07-12 2018-12-21 杭州电子科技大学 Instrument and equipment on-line monitoring method neural network based and system
CN109165563A (en) * 2018-07-27 2019-01-08 北京市商汤科技开发有限公司 Pedestrian recognition methods and device, electronic equipment, storage medium, program product again
CN109165563B (en) * 2018-07-27 2021-03-23 北京市商汤科技开发有限公司 Pedestrian re-identification method and apparatus, electronic device, storage medium, and program product
CN108932509A (en) * 2018-08-16 2018-12-04 新智数字科技有限公司 A kind of across scene objects search methods and device based on video tracking
CN111050219A (en) * 2018-10-12 2020-04-21 奥多比公司 Spatio-temporal memory network for locating target objects in video content
CN109635676A (en) * 2018-11-23 2019-04-16 清华大学 A method of positioning source of sound from video
CN110084979B (en) * 2019-04-23 2022-05-10 暗物智能科技(广州)有限公司 Human-computer interaction method and device, controller and interaction equipment
CN110084979A (en) * 2019-04-23 2019-08-02 暗物智能科技(广州)有限公司 Man-machine interaction method, device and controller and interactive device
CN110245267A (en) * 2019-05-17 2019-09-17 天津大学 Multi-user's video flowing deep learning is shared to calculate multiplexing method
CN110245267B (en) * 2019-05-17 2023-08-11 天津大学 Multi-user video stream deep learning sharing calculation multiplexing method
CN110334743A (en) * 2019-06-10 2019-10-15 浙江大学 A kind of progressive transfer learning method based on the long memory network in short-term of convolution
CN110334743B (en) * 2019-06-10 2021-05-04 浙江大学 Gradual migration learning method based on convolution long-time and short-time memory network
CN112651262A (en) * 2019-10-09 2021-04-13 四川大学 Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment
CN112651262B (en) * 2019-10-09 2022-10-14 四川大学 Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment
CN111931637A (en) * 2020-08-07 2020-11-13 华南理工大学 Cross-modal pedestrian re-identification method and system based on double-current convolutional neural network
CN111931637B (en) * 2020-08-07 2023-09-15 华南理工大学 Cross-modal pedestrian re-identification method and system based on double-flow convolutional neural network
CN113761995A (en) * 2020-08-13 2021-12-07 四川大学 Cross-mode pedestrian re-identification method based on double-transformation alignment and blocking
CN113283362A (en) * 2021-06-04 2021-08-20 中国矿业大学 Cross-modal pedestrian re-identification method
CN113283362B (en) * 2021-06-04 2024-03-22 中国矿业大学 Cross-mode pedestrian re-identification method

Also Published As

Publication number Publication date
CN107480178B (en) 2020-07-07

Similar Documents

Publication Publication Date Title
CN107480178A (en) A kind of pedestrian&#39;s recognition methods again compared based on image and video cross-module state
CN107358257B (en) Under a kind of big data scene can incremental learning image classification training method
CN108875674B (en) Driver behavior identification method based on multi-column fusion convolutional neural network
Li et al. Building-a-nets: Robust building extraction from high-resolution remote sensing images with adversarial networks
CN106204449B (en) A kind of single image super resolution ratio reconstruction method based on symmetrical depth network
CN106503687B (en) Merge the monitor video system for identifying figures and its method of face multi-angle feature
CN104361363B (en) Depth deconvolution feature learning network, generation method and image classification method
Zhou et al. Learning deep features for scene recognition using places database
CN108062574B (en) Weak supervision target detection method based on specific category space constraint
CN110008842A (en) A kind of pedestrian&#39;s recognition methods again for more losing Fusion Model based on depth
CN109543602A (en) A kind of recognition methods again of the pedestrian based on multi-view image feature decomposition
CN106897714A (en) A kind of video actions detection method based on convolutional neural networks
CN109815826A (en) The generation method and device of face character model
CN107220604A (en) A kind of fall detection method based on video
CN105574510A (en) Gait identification method and device
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
CN109410190B (en) Tower pole reverse-breaking detection model training method based on high-resolution remote sensing satellite image
CN104063721B (en) A kind of human behavior recognition methods learnt automatically based on semantic feature with screening
CN105718960A (en) Image ordering model based on convolutional neural network and spatial pyramid matching
CN106682628B (en) Face attribute classification method based on multilayer depth feature information
CN109784288B (en) Pedestrian re-identification method based on discrimination perception fusion
CN104462494A (en) Remote sensing image retrieval method and system based on non-supervision characteristic learning
CN105005798B (en) One kind is based on the similar matched target identification method of structures statistics in part
CN107633226A (en) A kind of human action Tracking Recognition method and system
CN104281572B (en) A kind of target matching method and its system based on mutual information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200612

Address after: 511400 floor 16, No.37, Jinlong Road, Nansha District, Guangzhou City, Guangdong Province (office only)

Applicant after: DMAI (GUANGZHOU) Co.,Ltd.

Address before: Panyu District city in Guangdong province Guangzhou Shilou town 510000 Gen Kai Road No. 63 Building No. 1 Chong Kai 210-5

Applicant before: GUANGZHOU SHENYU INFORMATION TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant