CN107085609A - A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net - Google Patents
A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net Download PDFInfo
- Publication number
- CN107085609A CN107085609A CN201710270659.2A CN201710270659A CN107085609A CN 107085609 A CN107085609 A CN 107085609A CN 201710270659 A CN201710270659 A CN 201710270659A CN 107085609 A CN107085609 A CN 107085609A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- vector
- cnn
- characteristic
- distance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
- G06F16/784—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Library & Information Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a kind of Video Analysis Technology, and in particular to a kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net.It is that the pedestrian detected in video is calculated into feature, is stored in the property data base of pedestrian's collection to be measured, then calculates feature for pedestrian to be checked, and is compared with property data base, obtains coming the retrieval result high for similarity above.The present invention is integrated so that the upper lower part of the body of retrieval result and pedestrian to be checked are all more similar, and be ranked up to improve the convenience of retrieval by using 1 characteristic distance by calculating a variety of retrieval characters and characteristic distance using optimal distance weights W;With adapt to wide, accuracy rate it is high, using simplicity the characteristics of.The accuracy rate that is currently based in the pedestrian detection of monitor video can effectively be solved and retrieval similarity is all relatively low, various detection methods and characteristic distance can not be combined well, and adaptive surface is not wide, using it is more complicated the problem of.
Description
Technical field
The present invention relates to a kind of Video Analysis Technology, and in particular to a kind of row that multiple features fusion is carried out based on neutral net
People's search method.
Background technology
The method of current image retrieval and pedestrian retrieval is more, such as image search method CEDD (CEDD:Color and
Edge Directivity Descriptor.A Compact Descriptor for Image Indexing and
Retrieval, Savvas are A.2008), pedestrian retrieval method WHOS (Person Re-Identification by
Iterative Re-Weighted Sparse Ranking, Giuseppe Lisanti, 2015), these methods are for one
A little scientific data collection, such as pedestrian retrieval data set ViPeR Dataset (https://vision.soe.ucsc.edu/node/
178) preferable retrieval effectiveness, can be obtained, but for the pedestrian in actual monitored video, is retrieved undesirable, it is necessary to integrate
Form new retrieval character.
From the angle of retrieval result, such as some methods, WHOS, although pedestrian to be checked is included in retrieval result, that is, is retrieved
Success, but the similarity of other pedestrians of retrieval result and pedestrian to be checked is not high, it is impossible to provide more ginsengs to user
Information is examined, such as pedestrian to be checked is that expression similarity above is come in the blue clothes lower part of the body black trousers of upper body, retrieval result
Some larger pedestrians, it is not a lot " the upper body indigo plant lower part of the body is black " to have;And the method that other can be retrieved by body part, such as:A
General Method for Appearance-based People Search Based on Textual Queries,
R. Satta, 2012, the similarity of retrieval result and pedestrian to be checked are larger, but retrieving and characteristic distance are more multiple
It is miscellaneous, it is necessary to the multiple characteristic distances of explicit use and retrieval filtering link, it is impossible to represent similarity with 1 characteristic distance.
At present, characteristic distance has a variety of computational methods, such as:Bhattachayya methods (abbreviation Bh distances, https://
), and Tanimoto methods (Fuzzy Algorithms en.wikipedia.org/wiki/Bhattacharyya_distance:
With Applications to image processing and pattern recognition, 1996) etc., it is different
Distance is applied to different feature and scene, now desires to the situation that these feature integrations are got up applied to monitor video.
In view of developing rapidly and its in many fields for convolutional neural networks (CNN)(Such as image recognition)What is obtained is excellent
Effect (mageNet Classi cation with Deep Convolutional Neural Networks, 2012;
OverFeat: Integrated Recognition, Localization and Detection using
Convolutional Networks, 2014), it is how using CNN models, various detection methods and characteristic distance are good
Combine, reach adapt to wide, accuracy rate it is high, using easy more satisfactory effect, the side studied as people
To.
The content of the invention
It is an object of the present invention to for above-mentioned the deficiencies in the prior art there is provided one kind by calculating a variety of retrieval characters
And characteristic distance, integrated using optimal distance weights W so that the upper lower part of the body of retrieval result and pedestrian to be checked all compare
It is similar, and be ranked up to improve the convenience of retrieval by using 1 characteristic distance;With adapting to, wide, accuracy rate is high, answer
With simplicity, the accuracy rate that is currently based in the pedestrian detection of monitor video and the retrieval all relatively low problem of similarity can be effectively solved
The pedestrian retrieval method of multiple features fusion is carried out based on neutral net.
The present invention is to realize above-mentioned purpose by the following technical solutions:
This is based on basic ideas of pedestrian retrieval method that neutral net carries out multiple features fusion:
Initial step is pedestrian detection, is as a result detection square frame(See Fig. 1).The basic process of retrieval will be examined referring to Fig. 2 in video
The pedestrian measured calculates feature, is stored in the property data base of pedestrian's collection to be measured, then for pedestrian to be checked calculating feature, and with
Property data base compares, and obtains retrieval result, come above for similarity it is high, such as rank1 similarity highest, in Fig. 2
Rank1 and pedestrian to be checked belong to same pedestrian.Pedestrian's foreground mask is extremely important for pedestrian retrieval, reference can be made to Fig. 3, this
The height and width for pedestrian's foreground mask that method is obtained are that RGB image block corresponding with pedestrian's square frame is identical, and foreground mask only has 3 kinds
Value:Background 0, the upper part of the body 1 and the lower part of the body 2.This method includes 2 important steps:The meter of pedestrian's foreground mask and characteristic distance
Calculate, 2 CNN models are corresponded to respectively, wherein, " foreground mask CNN " can be found in Fig. 4, and the pedestrian of the upper lower part of the body is distinguished for calculating
Foreground mask, and calculate optimal distance weights W " optimizing CNN " can be found in Fig. 7, this weights be used for integrate various features distance,
Calculate the similarity degree between 2 pedestrians.It is comprised the following steps that:
A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net, it is characterised in that it comprises the following steps:
(1)Extract CNN foreground masks:Video and the pedestrian having been detected by for input, using GMM(Gaussian
Mixture Model) GMM foreground masks in pedestrian's square frame are calculated, covered with GMM prospects in the RGB image block that square frame is included
The color of the corresponding part of background in code is changed to grey, can so eliminate the interference of background area;Recycle in video
Movable information calculates the light stream vector of each pixel in pedestrian's square frame, then by GMM foreground masks, the light in pedestrian's square frame
The amplitude of flow vector, the direction of light stream vector and amended RGB image block are combined into " pedestrian's mask assemblage characteristic ", input
To " in foreground mask CNN ", obtaining distinguishing the CNN foreground masks of the lower part of the body, i.e. mask value only has:Background 0, the upper part of the body 1
With the lower part of the body 2;
(2)Calculate searching characteristic vector:The CNN prospects of the lower part of the body on RGB image block and corresponding differentiation for each pedestrian
Mask, calculates whole body, the upper part of the body and the HS of lower part of the body foreground mask corresponding region, RGB, improves CEDD and improves WHOS and be special respectively
Levy, altogether 12 kinds of searching characteristic vectors, and the searching characteristic vector that pedestrian to be measured collects is stored in property data base;Wherein,
Improve CEDD features and only calculate the pixel of foreground mask corresponding region, and improve WHOS and also only calculate foreground mask corresponding region
Pixel, while not calculating HOG features;
(3)Calculate characteristic distance:For the searching characteristic vector of 2 pedestrians, using Bhattachayya methods and Tanimoto methods
The distance of the subcharacter vector of corresponding same type is calculated respectively, and 24 kinds of distances, group so are obtained for 12 seed characteristics
Into the distance vector D of 1 24 dimension;Then use by " this 1 distance vector is converted to 1 by the weight vector W that optimizing CNN " is tried to achieve
Individual distance value, conversion formula is d=W'D, wherein:W is 24 right-safeguarding value column vectors, and D is tieed up apart from column vector for 24, and result of calculation d is
1 1x1 dimension value;In order to obtain and the upper lower part of the body feature of pedestrian to be checked all similar retrieval results, method of the invention
Using 1 characteristic distance and filtration treatment is not needed;
(4)Sequence and output retrieval result:For pedestrian to be checked, (1) and (2) methods described is utilized to calculate characteristic vector, so
Afterwards using the distance between each pedestrian to be measured in the calculating of (3) methods described and property data base, finally by these distances
Value, which is ranked up, obtains retrieval result, high apart from small expression similarity degree, and distance is big to represent that similarity degree is low.
Step(1)Described in the CNN foreground masks of calculating pedestrian concretely comprise the following steps:The first step trains " foreground mask
CNN ", first prepares training sample, using the pedestrian detected in monitor video as sample, zooms to PxQ dimension standard sizes, then hand
Work mark pedestrian expectation foreground mask, mark different mask values respectively for the upper lower part of the body, i.e., by background, the upper part of the body and under
Half body is respectively labeled as 0,1,2, is used as the output valve of training sample;Then the mask assemblage characteristic of sample is calculated, first using GMM
The GMM foreground masks in pedestrian detection square frame are calculated, and by the background pair in pedestrian's square frame RGB image block with GMM foreground masks
The region answered is set to grey, the light stream vector of each pixel in pedestrian detection square frame is then calculated, so for each row
People can obtain 6 PxQ dimension matrixes:GMM foreground masks, the size of light stream vector, the direction of light stream vector, amended R, G,
B, constitutes the characteristic of sample, is used as the input of training sample;Finally " foreground mask is trained with above-mentioned training sample
CNN”;This CNN uses 6 layers, is respectively:Input layer, convolutional layer, max-pooling layers, convolutional layer, max-pooling layers, complete
Articulamentum, inputs and ties up matrix for 6 above-mentioned PxQ, is output as PxQ dimension images, and data value is 0,1 or 2, and the back of the body is represented respectively
Scape, the upper part of the body and the lower part of the body;Second step is using " foreground mask CNN " calculates CNN foreground masks.
Step(3)Described in weight vector W obtaining value method it is as follows:
The first step prepares training sample:For the pedestrian in pedestrian sample storehouse, calculate searching characteristic vector and preserve;Then select
1 pedestrian A, and find out 1 another pedestrian B1 for belonging to same pedestrian with pedestrian A, then select N-1 ((N>3)Individual and row
People A is not belonging to other pedestrians of same pedestrian, constitutes sample group { B1, B2 ..., BN }, wherein A and B1 belong to same row
People, then pedestrian A and the characteristic distance vector of 24 dimensions of each pedestrian in sample group { B1, B2 ..., BN } are calculated, and to each spy
Levy distance vector and carry out L2 normalization, so obtain the characteristic distance matrix of 1 Nx24 dimension, 2 pedestrian's samples are represented per a line
Characteristic distance vector between this, 1 matrix 1 training sample of formation;And the desired output of each training sample is fixed as to N
The set { 0,1 ..., 1 } of individual element carries out the vectorial Y that L2 normalization is obtained;Multiple training samples are generated in this way;
" optimizing CNN " calculates W for second step training:Above-mentioned training sample input " is trained in optimizing CNN ", obtained
1x24 dimension convolution kernels are best initial weights vector W;" optimizing CNN " structure includes 3 layers:Input layer, convolutional layer, max-
Pooling output layers, the characteristic distance vector D of input layer correspondence Nx24 dimensions, convolutional layer is the convolution kernel dimension square of K 1x24 dimension
Battle array, max-pooling output layers are N-dimensional vector Y;The convolution kernel dimension matrix of K 1x24 dimension is taken average by training after finishing, and is obtained
Optimal W.
The beneficial effect of the present invention compared with prior art is:
This based on neutral net carry out multiple features fusion pedestrian retrieval method by calculating a variety of retrieval characters and characteristic distance,
Integrated using optimal distance weights W so that the upper lower part of the body of retrieval result and pedestrian to be checked are all more similar, and pass through
It is ranked up to improve the convenience of retrieval using 1 characteristic distance;With adapt to wide, accuracy rate it is high, using easy spy
Point.The accuracy rate and retrieval similarity that can effectively solve to be currently based in the pedestrian detection of monitor video are all relatively low, various detections
Method and characteristic distance can not be combined well, and adaptive surface is not wide, using it is more complicated the problem of.
Brief description of the drawings
Fig. 1 is pedestrian detection block diagram of the invention;
Fig. 2 is pedestrian retrieval process schematic of the invention;
Fig. 3 is pedestrian's foreground mask schematic diagram of the invention;
Calculating and foreground mask CNN training schematic diagram of the Fig. 4 for pedestrian's mask assemblage characteristic of the present invention;
Fig. 5 is CNN foreground mask schematic diagram calculations of the invention;
Fig. 6 is the vectorial schematic diagram calculation of the retrieval character and characteristic distance of the present invention;
Fig. 7 is the optimal W of calculating of present invention CNN structural representation;
Fig. 8 is " optimizing CNN " structural representation of the invention;
Fig. 9 is the schematic diagram of the multiple features fusion pedestrian retrieval algorithmic descriptions based on neutral net of the present invention.
Embodiment
1~9 pair of pedestrian retrieval method that multiple features fusion should be carried out based on neutral net makees further below in conjunction with the accompanying drawings
Description.
(1)The dimension of foreground mask:
For the pedestrian detected, using GMM(Gaussian Mixture Model) calculate pedestrian's square frame in GMM prospects
The color of part corresponding with the background in GMM foreground masks is changed to grey in mask, the RGB image block that square frame is included, with
Eliminate the interference of background area;Height and width are zoomed into normal size PxQ again, and CNN foreground masks are PxQ dimension matrixes, each
Element only has 3 kinds of values:Background 0, the upper part of the body 1 and the lower part of the body 2;Referring to Fig. 4, data below constitutes pedestrian's mask assemblage characteristic,
All it is PxQ dimension matrixes:The amplitude of light stream vector, the direction of light stream vector, GMM foreground masks, the R-portion of RGB image block, G portions
Point, part B.
(2)The calculating of characteristic distance and its dimension:
In order to obtain needing artificial choosing to the upper lower part of the body feature of pedestrian to be checked all similar retrieval results, many search methods
Select and calculate multiple characteristic distances and carry out filtration treatment, method of the invention can use 1 characteristic distance and need not
Filtration treatment;Referring to Fig. 6~Fig. 8, the characteristic distance between 2 pedestrians that method of the invention is calculated is 1 1x1 dimension value,
High apart from small expression similarity, distance is big to represent that similarity is low;Obtained with this characteristic distance with pedestrian's similarity to be checked compared with
High pedestrian, is such as upper part of the body white clothes lower part of the body black with similar upper lower part of the body feature is compared with pedestrian to be checked
Trousers are easier than other method accurate;When calculating characteristic distance, 12 kinds of retrieval characters of pedestrian, Ran Houji are first calculated
The Bh distances and Tanimoto distances of this 12 kinds of retrieval characters between 2 pedestrians are calculated, 1 24 dimensional feature distance vector D is obtained,
Finally characteristic distance d is calculated with below equation:
d=W'D;W and D are 24 dimensional vectors;W is optimal distance weights;
The calculation formula of Bh distances is:
;P, q are 2 n-dimensional vectors;
The calculation formula of Tanimoto distances is:
,
Xi and xj in formula are 2 vectors, and Tij scope is [0,1].
(3)The structure of foreground mask CNN models and use:
Referring to Fig. 4 and Fig. 5, foreground mask CNN is used for the foreground mask for calculating detection pedestrian, and the pedestrian inputted as 1 pedestrian covers
Code assemblage characteristic, including 6 PxQ dimension matrixes:The amplitude of light stream vector, the direction of light stream vector, GMM foreground masks, RGB image
R-portion, G parts, the part B of block;It is output as the CNN foreground masks that PxQ ties up matrix;This CNN uses 6 layers:
[a] input layer is 6 PxQ dimension matrixes;
The convolution kernel of [b] convolutional layer is M1 5x5x6 dimension matrix, is output as M1 PxQ matrix;
The processing unit that max-pooling layers of [c] is 2x2, is output as M1 (P/2) x (Q/2) matrixes;
The convolution kernel of [d] convolutional layer is M2 3x3xM1 dimension matrix, is output as M2 PxQ matrix;
The processing unit that max-pooling layers of [e] is 2x2, is output as M2 (P/2) x (Q/2) matrixes;
[f] full articulamentum, is output as 1 PxQ dimension matrix, as CNN foreground masks, only 3 kinds values:Background 0, the upper part of the body 1
With the lower part of the body 2;
Foreground mask CNN loss function is CNN output maskings and the Euclidean distance for expecting mask;
(4)The training of foreground mask CNN models:
Referring to Fig. 1~Fig. 5, pedestrian of the training sample in monitor video so that training pattern is adapted to actual conditions,
For monitor video, pedestrian is first detected, Piotr Dollar toolbox can be used(http://
vision.ucsd.edu/-pdollar/toolbox/doc/index.html), obtain pedestrian detection square frame, then mark by hand
The PxQ dimensions of each pedestrian expect foreground mask, only 3 kinds values:Background 0, above the waist, the lower part of the body 2;Then 1 pedestrian is calculated
Pedestrian's mask assemblage characteristic, including 6 PxQ dimension matrixes:The amplitude of light stream vector, the direction of light stream vector, GMM prospects are covered
Code, the R-portion of the RGB image block of modification, G parts, part B;So, the data of each pedestrian constitute 1 training sample, input
Data are pedestrian's mask assemblage characteristic, are output as expecting foreground mask;For the RGB image block in pedestrian's square frame, will with before GMM
The corresponding region of background of scape mask is changed to grey, can so eliminate some ambient noises, improves accuracy;The training of collection
The quantity of sample>5000, using stochastic gradient descent method(SGD: Stochastic Gradient Descent)Carry out
Training, obtains foreground mask CNN;CNN uses the MatConvNet increased income(http://www.vlfeat.org/
matconvnet/);The calculating of light stream vector can use Piotr Dollar toolbox;
(5)The calculating of retrieval character:
Referring to Fig. 6, present invention uses a variety of retrieval characters, including:HS, RGB, improved CEDD and improved WHOS;Calculate 1
The retrieval character of individual pedestrian is divided into 2 steps:
[a] calculates the CNN foreground masks for distinguishing the upper lower part of the body;
[b] is for 3 kinds of CNN foreground masks(Above the waist, the lower part of the body, whole body)Corresponding pedestrian area, calculates above-mentioned 4 kinds respectively
Feature, obtains 12 kinds of retrieval characters;
12 kinds of retrieval characters of each pedestrian are stored in the property data base of pedestrian's collection to be measured;
CEDD comes from htpp://chatzichristofis.info, former algorithm does not support ROI (Region Of
Interest), improve CEDD and support that ROI, ROI can be 3 kinds of CNN foreground masks;
WHOS features come from http://www.micc.unifi.it/lisanti/source-code/re-id/, former algorithm is not
ROI is supported, WHOS is improved and supports ROI, and delete the HOG features in former algorithm;
(6)Optimizing CNN structure and optimal distance weights W calculating:
Optimizing CNN is used to calculate optimal distance weights W;Referring to Fig. 6, Fig. 8,4 layers are included:
[a] input layer:1 Nx24 matrix, comprising N to the characteristic distance vector between pedestrian;
[b] convolutional layer:Convolution kernel is K 1x24 dimension matrix, is output as K Nx1 matrix;
[c] output layer:The processing unit of 1 1xK dimension is included, 1 Nx1 matrix is output as;Every 1 numerical value represent 1 couple of pedestrian it
Between characteristic distance;
After being finished to optimizing CNN training, tie up matrix (i.e. individual 24 dimensional vectors of K) for K 1x24 and take average, obtain 24 dimensions it is optimal away from
From weight vector W, formula is:
;i=1,…,24; VijFor i-th of element in j-th of 24 dimensional vectors;
Loss function can be found in Fig. 7, and the normalized formula of L2 are:
;X and Y are 24 dimensional vectors;
(7)Optimizing CNN training:
Participate in Fig. 6~Fig. 8, pedestrian of the training sample in monitor video so that training pattern is adapted to actual conditions, 1
Individual training sample is that 1 Nx24 ties up distance matrix, it is necessary to N number of pedestrian;For 1 pedestrian A, find out 1 and belong to same with pedestrian A
Another pedestrian B1 of individual pedestrian, then selects N-1 (N>3) it is individual same pedestrian is not belonging to pedestrian A other pedestrian B2~
BN, composition sample group { B1, B2 ..., BN }, wherein A and B1 belong to same pedestrian, then calculate sample A and sample group B1,
B2 ..., BN } in each pedestrian 24 dimensions characteristic distances vector, can so obtain the characteristic distance matrix of 1 Nx24 dimension;
Desired output is the vector after 24 dimensional vectors { 0,1,1 ..., 1 } are normalized through L2, and 0 represents that distance is 0, belongs to same pedestrian,
1 represents to be not belonging to same pedestrian, and desired output represents the sequencing of similarity of retrieval result;Fig. 7 is shown in the definition of loss function;Adopt
The number of samples of collection>5000.
Simply presently preferred embodiments of the present invention described above, the example above illustrates that the substantive content not to the present invention is made
Any formal limitation, technology of the person of an ordinary skill in the technical field after this specification has been read according to the present invention
Any simple modification or deformation that essence is made to above embodiment, and possibly also with the technology contents of the disclosure above
The equivalent embodiment of equivalent variations is changed or is modified to, in the range of still falling within technical solution of the present invention, without departing from
The spirit and scope of the invention.
Claims (3)
1. a kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net, it is characterised in that it comprises the following steps:
(1)Extract CNN foreground masks:Video and the pedestrian having been detected by for input, using GMM(Gaussian
Mixture Model) GMM foreground masks in pedestrian's square frame are calculated, covered with GMM prospects in the RGB image block that square frame is included
The color of the corresponding part of background in code is changed to grey, can so eliminate the interference of background area;Recycle in video
Movable information calculates the light stream vector of each pixel in pedestrian's square frame, then by GMM foreground masks, the light in pedestrian's square frame
The amplitude of flow vector, the direction of light stream vector and amended RGB image block are combined into " pedestrian's mask assemblage characteristic ", input
To " in foreground mask CNN ", obtaining distinguishing the CNN foreground masks of the lower part of the body, i.e. mask value only has:Background 0, the upper part of the body 1
With the lower part of the body 2;
(2)Calculate searching characteristic vector:The CNN prospects of the lower part of the body on RGB image block and corresponding differentiation for each pedestrian
Mask, calculates whole body, the upper part of the body and the HS of lower part of the body foreground mask corresponding region, RGB, improves CEDD and improves WHOS and be special respectively
Levy, altogether 12 kinds of searching characteristic vectors, and the searching characteristic vector that pedestrian to be measured collects is stored in property data base;Wherein,
Improve CEDD features and only calculate the pixel of foreground mask corresponding region, and improve WHOS and also only calculate foreground mask corresponding region
Pixel, while not calculating HOG features;
(3)Calculate characteristic distance:For the searching characteristic vector of 2 pedestrians, using Bhattachayya methods and Tanimoto methods
The distance of the subcharacter vector of corresponding same type is calculated respectively, and 24 kinds of distances, group so are obtained for 12 seed characteristics
Into the distance vector D of 1 24 dimension;Then use by " this 1 distance vector is converted to 1 by the weight vector W that optimizing CNN " is tried to achieve
Individual distance value, conversion formula is d=W'D, wherein:W is 24 right-safeguarding value column vectors, and D is tieed up apart from column vector for 24, and result of calculation d is
1 1x1 dimension value;In order to obtain and the upper lower part of the body feature of pedestrian to be checked all similar retrieval results, method of the invention
Using 1 characteristic distance and filtration treatment is not needed;
(4)Sequence and output retrieval result:For pedestrian to be checked, (1) and (2) methods described is utilized to calculate characteristic vector, so
Afterwards using the distance between each pedestrian to be measured in the calculating of (3) methods described and property data base, finally by these distances
Value, which is ranked up, obtains retrieval result, high apart from small expression similarity degree, and distance is big to represent that similarity degree is low.
2. a kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net according to claim 1, its feature
It is:Step(1)Described in the CNN foreground masks of calculating pedestrian concretely comprise the following steps:The first step trains " foreground mask
CNN ", first prepares training sample, using the pedestrian detected in monitor video as sample, zooms to PxQ dimension standard sizes, then hand
Work mark pedestrian expectation foreground mask, mark different mask values respectively for the upper lower part of the body, i.e., by background, the upper part of the body and under
Half body is respectively labeled as 0,1,2, is used as the output valve of training sample;Then the mask assemblage characteristic of sample is calculated, first using GMM
The GMM foreground masks in pedestrian detection square frame are calculated, and by the background pair in pedestrian's square frame RGB image block with GMM foreground masks
The region answered is set to grey, the light stream vector of each pixel in pedestrian detection square frame is then calculated, so for each row
People can obtain 6 PxQ dimension matrixes:GMM foreground masks, the size of light stream vector, the direction of light stream vector, amended R, G,
B, constitutes the characteristic of sample, is used as the input of training sample;Finally " foreground mask is trained with above-mentioned training sample
CNN”;This CNN uses 6 layers, is respectively:Input layer, convolutional layer, max-pooling layers, convolutional layer, max-pooling layers, complete
Articulamentum, inputs and ties up matrix for 6 above-mentioned PxQ, is output as PxQ dimension images, and data value is 0,1 or 2, and the back of the body is represented respectively
Scape, the upper part of the body and the lower part of the body;Second step is using " foreground mask CNN " calculates CNN foreground masks.
3. a kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net according to claim 1, its feature
It is:Step(3)Described in weight vector W obtaining value method it is as follows:
The first step prepares training sample:For the pedestrian in pedestrian sample storehouse, calculate searching characteristic vector and preserve;Then select
1 pedestrian A, and find out 1 another pedestrian B1 for belonging to same pedestrian with pedestrian A, then select N-1 ((N>3)Individual and row
People A is not belonging to other pedestrians of same pedestrian, constitutes sample group { B1, B2 ..., BN }, wherein A and B1 belong to same row
People, then pedestrian A and the characteristic distance vector of 24 dimensions of each pedestrian in sample group { B1, B2 ..., BN } are calculated, and to each spy
Levy distance vector and carry out L2 normalization, so obtain the characteristic distance matrix of 1 Nx24 dimension, 2 pedestrian's samples are represented per a line
Characteristic distance vector between this, 1 matrix 1 training sample of formation;And the desired output of each training sample is fixed as to N
The set { 0,1 ..., 1 } of individual element carries out the vectorial Y that L2 normalization is obtained;Multiple training samples are generated in this way;
" optimizing CNN " calculates W for second step training:Above-mentioned training sample input " is trained in optimizing CNN ", obtained
1x24 dimension convolution kernels are best initial weights vector W;" optimizing CNN " structure includes 3 layers:Input layer, convolutional layer, max-
Pooling output layers, the characteristic distance vector D of input layer correspondence Nx24 dimensions, convolutional layer is the convolution kernel dimension square of K 1x24 dimension
Battle array, max-pooling output layers are N-dimensional vector Y;The convolution kernel dimension matrix of K 1x24 dimension is taken average by training after finishing, and is obtained
Optimal W.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710270659.2A CN107085609A (en) | 2017-04-24 | 2017-04-24 | A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710270659.2A CN107085609A (en) | 2017-04-24 | 2017-04-24 | A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107085609A true CN107085609A (en) | 2017-08-22 |
Family
ID=59611511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710270659.2A Withdrawn CN107085609A (en) | 2017-04-24 | 2017-04-24 | A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107085609A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766794A (en) * | 2017-09-22 | 2018-03-06 | 天津大学 | The image, semantic dividing method that a kind of Fusion Features coefficient can learn |
CN108416266A (en) * | 2018-01-30 | 2018-08-17 | 同济大学 | A kind of video behavior method for quickly identifying extracting moving target using light stream |
CN108460411A (en) * | 2018-02-09 | 2018-08-28 | 北京市商汤科技开发有限公司 | Example dividing method and device, electronic equipment, program and medium |
CN108960331A (en) * | 2018-07-10 | 2018-12-07 | 重庆邮电大学 | A kind of recognition methods again of the pedestrian based on pedestrian image feature clustering |
CN110929770A (en) * | 2019-11-15 | 2020-03-27 | 云从科技集团股份有限公司 | Intelligent tracking method, system and equipment based on image processing and readable medium |
CN110929619A (en) * | 2019-11-15 | 2020-03-27 | 云从科技集团股份有限公司 | Target object tracking method, system and device based on image processing and readable medium |
CN111046724A (en) * | 2019-10-21 | 2020-04-21 | 武汉大学 | Pedestrian retrieval method based on area matching network |
CN111951189A (en) * | 2020-08-13 | 2020-11-17 | 神思电子技术股份有限公司 | Data enhancement method for multi-scale texture randomization |
CN113192101A (en) * | 2021-05-06 | 2021-07-30 | 影石创新科技股份有限公司 | Image processing method, image processing device, computer equipment and storage medium |
US20220012885A1 (en) * | 2019-07-26 | 2022-01-13 | Adobe Inc. | Utilizing a two-stream encoder neural network to generate composite digital images |
US11270158B2 (en) | 2018-02-09 | 2022-03-08 | Beijing Sensetime Technology Development Co., Ltd. | Instance segmentation methods and apparatuses, electronic devices, programs, and media |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050129285A1 (en) * | 2003-09-29 | 2005-06-16 | Fuji Photo Film Co., Ltd. | Collation system and computer readable medium storing thereon program |
CN104484324A (en) * | 2014-09-26 | 2015-04-01 | 徐晓晖 | Pedestrian retrieval method based on multiple models and fuzzy color |
-
2017
- 2017-04-24 CN CN201710270659.2A patent/CN107085609A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050129285A1 (en) * | 2003-09-29 | 2005-06-16 | Fuji Photo Film Co., Ltd. | Collation system and computer readable medium storing thereon program |
CN104484324A (en) * | 2014-09-26 | 2015-04-01 | 徐晓晖 | Pedestrian retrieval method based on multiple models and fuzzy color |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766794B (en) * | 2017-09-22 | 2021-05-14 | 天津大学 | Image semantic segmentation method with learnable feature fusion coefficient |
CN107766794A (en) * | 2017-09-22 | 2018-03-06 | 天津大学 | The image, semantic dividing method that a kind of Fusion Features coefficient can learn |
CN108416266A (en) * | 2018-01-30 | 2018-08-17 | 同济大学 | A kind of video behavior method for quickly identifying extracting moving target using light stream |
US11270158B2 (en) | 2018-02-09 | 2022-03-08 | Beijing Sensetime Technology Development Co., Ltd. | Instance segmentation methods and apparatuses, electronic devices, programs, and media |
CN108460411A (en) * | 2018-02-09 | 2018-08-28 | 北京市商汤科技开发有限公司 | Example dividing method and device, electronic equipment, program and medium |
CN108960331A (en) * | 2018-07-10 | 2018-12-07 | 重庆邮电大学 | A kind of recognition methods again of the pedestrian based on pedestrian image feature clustering |
US20220012885A1 (en) * | 2019-07-26 | 2022-01-13 | Adobe Inc. | Utilizing a two-stream encoder neural network to generate composite digital images |
US11568544B2 (en) * | 2019-07-26 | 2023-01-31 | Adobe Inc. | Utilizing a two-stream encoder neural network to generate composite digital images |
CN111046724A (en) * | 2019-10-21 | 2020-04-21 | 武汉大学 | Pedestrian retrieval method based on area matching network |
CN111046724B (en) * | 2019-10-21 | 2021-09-14 | 武汉大学 | Pedestrian retrieval method based on area matching network |
CN110929770A (en) * | 2019-11-15 | 2020-03-27 | 云从科技集团股份有限公司 | Intelligent tracking method, system and equipment based on image processing and readable medium |
CN110929619A (en) * | 2019-11-15 | 2020-03-27 | 云从科技集团股份有限公司 | Target object tracking method, system and device based on image processing and readable medium |
CN111951189A (en) * | 2020-08-13 | 2020-11-17 | 神思电子技术股份有限公司 | Data enhancement method for multi-scale texture randomization |
CN111951189B (en) * | 2020-08-13 | 2022-05-06 | 神思电子技术股份有限公司 | Data enhancement method for multi-scale texture randomization |
CN113192101A (en) * | 2021-05-06 | 2021-07-30 | 影石创新科技股份有限公司 | Image processing method, image processing device, computer equipment and storage medium |
CN113192101B (en) * | 2021-05-06 | 2024-03-29 | 影石创新科技股份有限公司 | Image processing method, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107085609A (en) | A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net | |
CN111079602B (en) | Vehicle fine granularity identification method and device based on multi-scale regional feature constraint | |
Zhou et al. | Point to set similarity based deep feature learning for person re-identification | |
CN105512684B (en) | Logo automatic identifying method based on principal component analysis convolutional neural networks | |
CN104063719B (en) | Pedestrian detection method and device based on depth convolutional network | |
CN112801015B (en) | Multi-mode face recognition method based on attention mechanism | |
CN107463920A (en) | A kind of face identification method for eliminating partial occlusion thing and influenceing | |
CN104504395A (en) | Method and system for achieving classification of pedestrians and vehicles based on neural network | |
CN104881671B (en) | A kind of high score remote sensing image Local Feature Extraction based on 2D Gabor | |
CN107767416B (en) | Method for identifying pedestrian orientation in low-resolution image | |
CN112862849B (en) | Image segmentation and full convolution neural network-based field rice ear counting method | |
CN106022223B (en) | A kind of higher-dimension local binary patterns face identification method and system | |
CN104077742B (en) | Human face sketch synthetic method and system based on Gabor characteristic | |
CN113610046B (en) | Behavior recognition method based on depth video linkage characteristics | |
CN109190458A (en) | A kind of person of low position's head inspecting method based on deep learning | |
CN108876776B (en) | Classification model generation method, fundus image classification method and device | |
CN112396036A (en) | Method for re-identifying blocked pedestrians by combining space transformation network and multi-scale feature extraction | |
CN105868711A (en) | Method for identifying human body behaviors based on sparse and low rank | |
CN116524255A (en) | Wheat scab spore identification method based on Yolov5-ECA-ASFF | |
Sehree et al. | Olive trees cases classification based on deep convolutional neural network from unmanned aerial vehicle imagery | |
CN107704509A (en) | A kind of method for reordering for combining stability region and deep learning | |
Yun et al. | Part-level convolutional neural networks for pedestrian detection using saliency and boundary box alignment | |
Zhang et al. | Hyperspectral Image Classification Based on Spectral-Spatial Attention Tensor Network | |
CN105718858B (en) | A kind of pedestrian recognition method based on positive and negative broad sense maximum pond | |
CN116778468A (en) | Three-dimensional target detection method based on point cloud structure perception |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20170822 |
|
WW01 | Invention patent application withdrawn after publication |