CN107085609A - A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net - Google Patents

A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net Download PDF

Info

Publication number
CN107085609A
CN107085609A CN201710270659.2A CN201710270659A CN107085609A CN 107085609 A CN107085609 A CN 107085609A CN 201710270659 A CN201710270659 A CN 201710270659A CN 107085609 A CN107085609 A CN 107085609A
Authority
CN
China
Prior art keywords
pedestrian
vector
cnn
characteristic
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710270659.2A
Other languages
Chinese (zh)
Inventor
吴耀文
周学平
廖宜良
张修
吴颖波
张勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HUBEI KENENG POWER ELECTRONICS CO Ltd
Jingzhou Power Supply Co of State Grid Hubei Electric Power Co Ltd
Original Assignee
HUBEI KENENG POWER ELECTRONICS CO Ltd
Jingzhou Power Supply Co of State Grid Hubei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HUBEI KENENG POWER ELECTRONICS CO Ltd, Jingzhou Power Supply Co of State Grid Hubei Electric Power Co Ltd filed Critical HUBEI KENENG POWER ELECTRONICS CO Ltd
Priority to CN201710270659.2A priority Critical patent/CN107085609A/en
Publication of CN107085609A publication Critical patent/CN107085609A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Library & Information Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a kind of Video Analysis Technology, and in particular to a kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net.It is that the pedestrian detected in video is calculated into feature, is stored in the property data base of pedestrian's collection to be measured, then calculates feature for pedestrian to be checked, and is compared with property data base, obtains coming the retrieval result high for similarity above.The present invention is integrated so that the upper lower part of the body of retrieval result and pedestrian to be checked are all more similar, and be ranked up to improve the convenience of retrieval by using 1 characteristic distance by calculating a variety of retrieval characters and characteristic distance using optimal distance weights W;With adapt to wide, accuracy rate it is high, using simplicity the characteristics of.The accuracy rate that is currently based in the pedestrian detection of monitor video can effectively be solved and retrieval similarity is all relatively low, various detection methods and characteristic distance can not be combined well, and adaptive surface is not wide, using it is more complicated the problem of.

Description

A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net
Technical field
The present invention relates to a kind of Video Analysis Technology, and in particular to a kind of row that multiple features fusion is carried out based on neutral net People's search method.
Background technology
The method of current image retrieval and pedestrian retrieval is more, such as image search method CEDD (CEDD:Color and Edge Directivity Descriptor.A Compact Descriptor for Image Indexing and Retrieval, Savvas are A.2008), pedestrian retrieval method WHOS (Person Re-Identification by Iterative Re-Weighted Sparse Ranking, Giuseppe Lisanti, 2015), these methods are for one A little scientific data collection, such as pedestrian retrieval data set ViPeR Dataset (https://vision.soe.ucsc.edu/node/ 178) preferable retrieval effectiveness, can be obtained, but for the pedestrian in actual monitored video, is retrieved undesirable, it is necessary to integrate Form new retrieval character.
From the angle of retrieval result, such as some methods, WHOS, although pedestrian to be checked is included in retrieval result, that is, is retrieved Success, but the similarity of other pedestrians of retrieval result and pedestrian to be checked is not high, it is impossible to provide more ginsengs to user Information is examined, such as pedestrian to be checked is that expression similarity above is come in the blue clothes lower part of the body black trousers of upper body, retrieval result Some larger pedestrians, it is not a lot " the upper body indigo plant lower part of the body is black " to have;And the method that other can be retrieved by body part, such as:A General Method for Appearance-based People Search Based on Textual Queries, R. Satta, 2012, the similarity of retrieval result and pedestrian to be checked are larger, but retrieving and characteristic distance are more multiple It is miscellaneous, it is necessary to the multiple characteristic distances of explicit use and retrieval filtering link, it is impossible to represent similarity with 1 characteristic distance.
At present, characteristic distance has a variety of computational methods, such as:Bhattachayya methods (abbreviation Bh distances, https:// ), and Tanimoto methods (Fuzzy Algorithms en.wikipedia.org/wiki/Bhattacharyya_distance: With Applications to image processing and pattern recognition, 1996) etc., it is different Distance is applied to different feature and scene, now desires to the situation that these feature integrations are got up applied to monitor video.
In view of developing rapidly and its in many fields for convolutional neural networks (CNN)(Such as image recognition)What is obtained is excellent Effect (mageNet Classi cation with Deep Convolutional Neural Networks, 2012; OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, 2014), it is how using CNN models, various detection methods and characteristic distance are good Combine, reach adapt to wide, accuracy rate it is high, using easy more satisfactory effect, the side studied as people To.
The content of the invention
It is an object of the present invention to for above-mentioned the deficiencies in the prior art there is provided one kind by calculating a variety of retrieval characters And characteristic distance, integrated using optimal distance weights W so that the upper lower part of the body of retrieval result and pedestrian to be checked all compare It is similar, and be ranked up to improve the convenience of retrieval by using 1 characteristic distance;With adapting to, wide, accuracy rate is high, answer With simplicity, the accuracy rate that is currently based in the pedestrian detection of monitor video and the retrieval all relatively low problem of similarity can be effectively solved The pedestrian retrieval method of multiple features fusion is carried out based on neutral net.
The present invention is to realize above-mentioned purpose by the following technical solutions:
This is based on basic ideas of pedestrian retrieval method that neutral net carries out multiple features fusion:
Initial step is pedestrian detection, is as a result detection square frame(See Fig. 1).The basic process of retrieval will be examined referring to Fig. 2 in video The pedestrian measured calculates feature, is stored in the property data base of pedestrian's collection to be measured, then for pedestrian to be checked calculating feature, and with Property data base compares, and obtains retrieval result, come above for similarity it is high, such as rank1 similarity highest, in Fig. 2 Rank1 and pedestrian to be checked belong to same pedestrian.Pedestrian's foreground mask is extremely important for pedestrian retrieval, reference can be made to Fig. 3, this The height and width for pedestrian's foreground mask that method is obtained are that RGB image block corresponding with pedestrian's square frame is identical, and foreground mask only has 3 kinds Value:Background 0, the upper part of the body 1 and the lower part of the body 2.This method includes 2 important steps:The meter of pedestrian's foreground mask and characteristic distance Calculate, 2 CNN models are corresponded to respectively, wherein, " foreground mask CNN " can be found in Fig. 4, and the pedestrian of the upper lower part of the body is distinguished for calculating Foreground mask, and calculate optimal distance weights W " optimizing CNN " can be found in Fig. 7, this weights be used for integrate various features distance, Calculate the similarity degree between 2 pedestrians.It is comprised the following steps that:
A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net, it is characterised in that it comprises the following steps:
(1)Extract CNN foreground masks:Video and the pedestrian having been detected by for input, using GMM(Gaussian Mixture Model) GMM foreground masks in pedestrian's square frame are calculated, covered with GMM prospects in the RGB image block that square frame is included The color of the corresponding part of background in code is changed to grey, can so eliminate the interference of background area;Recycle in video Movable information calculates the light stream vector of each pixel in pedestrian's square frame, then by GMM foreground masks, the light in pedestrian's square frame The amplitude of flow vector, the direction of light stream vector and amended RGB image block are combined into " pedestrian's mask assemblage characteristic ", input To " in foreground mask CNN ", obtaining distinguishing the CNN foreground masks of the lower part of the body, i.e. mask value only has:Background 0, the upper part of the body 1 With the lower part of the body 2;
(2)Calculate searching characteristic vector:The CNN prospects of the lower part of the body on RGB image block and corresponding differentiation for each pedestrian Mask, calculates whole body, the upper part of the body and the HS of lower part of the body foreground mask corresponding region, RGB, improves CEDD and improves WHOS and be special respectively Levy, altogether 12 kinds of searching characteristic vectors, and the searching characteristic vector that pedestrian to be measured collects is stored in property data base;Wherein, Improve CEDD features and only calculate the pixel of foreground mask corresponding region, and improve WHOS and also only calculate foreground mask corresponding region Pixel, while not calculating HOG features;
(3)Calculate characteristic distance:For the searching characteristic vector of 2 pedestrians, using Bhattachayya methods and Tanimoto methods The distance of the subcharacter vector of corresponding same type is calculated respectively, and 24 kinds of distances, group so are obtained for 12 seed characteristics Into the distance vector D of 1 24 dimension;Then use by " this 1 distance vector is converted to 1 by the weight vector W that optimizing CNN " is tried to achieve Individual distance value, conversion formula is d=W'D, wherein:W is 24 right-safeguarding value column vectors, and D is tieed up apart from column vector for 24, and result of calculation d is 1 1x1 dimension value;In order to obtain and the upper lower part of the body feature of pedestrian to be checked all similar retrieval results, method of the invention Using 1 characteristic distance and filtration treatment is not needed;
(4)Sequence and output retrieval result:For pedestrian to be checked, (1) and (2) methods described is utilized to calculate characteristic vector, so Afterwards using the distance between each pedestrian to be measured in the calculating of (3) methods described and property data base, finally by these distances Value, which is ranked up, obtains retrieval result, high apart from small expression similarity degree, and distance is big to represent that similarity degree is low.
Step(1)Described in the CNN foreground masks of calculating pedestrian concretely comprise the following steps:The first step trains " foreground mask CNN ", first prepares training sample, using the pedestrian detected in monitor video as sample, zooms to PxQ dimension standard sizes, then hand Work mark pedestrian expectation foreground mask, mark different mask values respectively for the upper lower part of the body, i.e., by background, the upper part of the body and under Half body is respectively labeled as 0,1,2, is used as the output valve of training sample;Then the mask assemblage characteristic of sample is calculated, first using GMM The GMM foreground masks in pedestrian detection square frame are calculated, and by the background pair in pedestrian's square frame RGB image block with GMM foreground masks The region answered is set to grey, the light stream vector of each pixel in pedestrian detection square frame is then calculated, so for each row People can obtain 6 PxQ dimension matrixes:GMM foreground masks, the size of light stream vector, the direction of light stream vector, amended R, G, B, constitutes the characteristic of sample, is used as the input of training sample;Finally " foreground mask is trained with above-mentioned training sample CNN”;This CNN uses 6 layers, is respectively:Input layer, convolutional layer, max-pooling layers, convolutional layer, max-pooling layers, complete Articulamentum, inputs and ties up matrix for 6 above-mentioned PxQ, is output as PxQ dimension images, and data value is 0,1 or 2, and the back of the body is represented respectively Scape, the upper part of the body and the lower part of the body;Second step is using " foreground mask CNN " calculates CNN foreground masks.
Step(3)Described in weight vector W obtaining value method it is as follows:
The first step prepares training sample:For the pedestrian in pedestrian sample storehouse, calculate searching characteristic vector and preserve;Then select 1 pedestrian A, and find out 1 another pedestrian B1 for belonging to same pedestrian with pedestrian A, then select N-1 ((N>3)Individual and row People A is not belonging to other pedestrians of same pedestrian, constitutes sample group { B1, B2 ..., BN }, wherein A and B1 belong to same row People, then pedestrian A and the characteristic distance vector of 24 dimensions of each pedestrian in sample group { B1, B2 ..., BN } are calculated, and to each spy Levy distance vector and carry out L2 normalization, so obtain the characteristic distance matrix of 1 Nx24 dimension, 2 pedestrian's samples are represented per a line Characteristic distance vector between this, 1 matrix 1 training sample of formation;And the desired output of each training sample is fixed as to N The set { 0,1 ..., 1 } of individual element carries out the vectorial Y that L2 normalization is obtained;Multiple training samples are generated in this way;
" optimizing CNN " calculates W for second step training:Above-mentioned training sample input " is trained in optimizing CNN ", obtained 1x24 dimension convolution kernels are best initial weights vector W;" optimizing CNN " structure includes 3 layers:Input layer, convolutional layer, max- Pooling output layers, the characteristic distance vector D of input layer correspondence Nx24 dimensions, convolutional layer is the convolution kernel dimension square of K 1x24 dimension Battle array, max-pooling output layers are N-dimensional vector Y;The convolution kernel dimension matrix of K 1x24 dimension is taken average by training after finishing, and is obtained Optimal W.
The beneficial effect of the present invention compared with prior art is:
This based on neutral net carry out multiple features fusion pedestrian retrieval method by calculating a variety of retrieval characters and characteristic distance, Integrated using optimal distance weights W so that the upper lower part of the body of retrieval result and pedestrian to be checked are all more similar, and pass through It is ranked up to improve the convenience of retrieval using 1 characteristic distance;With adapt to wide, accuracy rate it is high, using easy spy Point.The accuracy rate and retrieval similarity that can effectively solve to be currently based in the pedestrian detection of monitor video are all relatively low, various detections Method and characteristic distance can not be combined well, and adaptive surface is not wide, using it is more complicated the problem of.
Brief description of the drawings
Fig. 1 is pedestrian detection block diagram of the invention;
Fig. 2 is pedestrian retrieval process schematic of the invention;
Fig. 3 is pedestrian's foreground mask schematic diagram of the invention;
Calculating and foreground mask CNN training schematic diagram of the Fig. 4 for pedestrian's mask assemblage characteristic of the present invention;
Fig. 5 is CNN foreground mask schematic diagram calculations of the invention;
Fig. 6 is the vectorial schematic diagram calculation of the retrieval character and characteristic distance of the present invention;
Fig. 7 is the optimal W of calculating of present invention CNN structural representation;
Fig. 8 is " optimizing CNN " structural representation of the invention;
Fig. 9 is the schematic diagram of the multiple features fusion pedestrian retrieval algorithmic descriptions based on neutral net of the present invention.
Embodiment
1~9 pair of pedestrian retrieval method that multiple features fusion should be carried out based on neutral net makees further below in conjunction with the accompanying drawings Description.
(1)The dimension of foreground mask:
For the pedestrian detected, using GMM(Gaussian Mixture Model) calculate pedestrian's square frame in GMM prospects The color of part corresponding with the background in GMM foreground masks is changed to grey in mask, the RGB image block that square frame is included, with Eliminate the interference of background area;Height and width are zoomed into normal size PxQ again, and CNN foreground masks are PxQ dimension matrixes, each Element only has 3 kinds of values:Background 0, the upper part of the body 1 and the lower part of the body 2;Referring to Fig. 4, data below constitutes pedestrian's mask assemblage characteristic, All it is PxQ dimension matrixes:The amplitude of light stream vector, the direction of light stream vector, GMM foreground masks, the R-portion of RGB image block, G portions Point, part B.
(2)The calculating of characteristic distance and its dimension:
In order to obtain needing artificial choosing to the upper lower part of the body feature of pedestrian to be checked all similar retrieval results, many search methods Select and calculate multiple characteristic distances and carry out filtration treatment, method of the invention can use 1 characteristic distance and need not Filtration treatment;Referring to Fig. 6~Fig. 8, the characteristic distance between 2 pedestrians that method of the invention is calculated is 1 1x1 dimension value, High apart from small expression similarity, distance is big to represent that similarity is low;Obtained with this characteristic distance with pedestrian's similarity to be checked compared with High pedestrian, is such as upper part of the body white clothes lower part of the body black with similar upper lower part of the body feature is compared with pedestrian to be checked Trousers are easier than other method accurate;When calculating characteristic distance, 12 kinds of retrieval characters of pedestrian, Ran Houji are first calculated The Bh distances and Tanimoto distances of this 12 kinds of retrieval characters between 2 pedestrians are calculated, 1 24 dimensional feature distance vector D is obtained, Finally characteristic distance d is calculated with below equation:
d=W'D;W and D are 24 dimensional vectors;W is optimal distance weights;
The calculation formula of Bh distances is:
;P, q are 2 n-dimensional vectors;
The calculation formula of Tanimoto distances is:
,
Xi and xj in formula are 2 vectors, and Tij scope is [0,1].
(3)The structure of foreground mask CNN models and use:
Referring to Fig. 4 and Fig. 5, foreground mask CNN is used for the foreground mask for calculating detection pedestrian, and the pedestrian inputted as 1 pedestrian covers Code assemblage characteristic, including 6 PxQ dimension matrixes:The amplitude of light stream vector, the direction of light stream vector, GMM foreground masks, RGB image R-portion, G parts, the part B of block;It is output as the CNN foreground masks that PxQ ties up matrix;This CNN uses 6 layers:
[a] input layer is 6 PxQ dimension matrixes;
The convolution kernel of [b] convolutional layer is M1 5x5x6 dimension matrix, is output as M1 PxQ matrix;
The processing unit that max-pooling layers of [c] is 2x2, is output as M1 (P/2) x (Q/2) matrixes;
The convolution kernel of [d] convolutional layer is M2 3x3xM1 dimension matrix, is output as M2 PxQ matrix;
The processing unit that max-pooling layers of [e] is 2x2, is output as M2 (P/2) x (Q/2) matrixes;
[f] full articulamentum, is output as 1 PxQ dimension matrix, as CNN foreground masks, only 3 kinds values:Background 0, the upper part of the body 1 With the lower part of the body 2;
Foreground mask CNN loss function is CNN output maskings and the Euclidean distance for expecting mask;
(4)The training of foreground mask CNN models:
Referring to Fig. 1~Fig. 5, pedestrian of the training sample in monitor video so that training pattern is adapted to actual conditions, For monitor video, pedestrian is first detected, Piotr Dollar toolbox can be used(http:// vision.ucsd.edu/-pdollar/toolbox/doc/index.html), obtain pedestrian detection square frame, then mark by hand The PxQ dimensions of each pedestrian expect foreground mask, only 3 kinds values:Background 0, above the waist, the lower part of the body 2;Then 1 pedestrian is calculated Pedestrian's mask assemblage characteristic, including 6 PxQ dimension matrixes:The amplitude of light stream vector, the direction of light stream vector, GMM prospects are covered Code, the R-portion of the RGB image block of modification, G parts, part B;So, the data of each pedestrian constitute 1 training sample, input Data are pedestrian's mask assemblage characteristic, are output as expecting foreground mask;For the RGB image block in pedestrian's square frame, will with before GMM The corresponding region of background of scape mask is changed to grey, can so eliminate some ambient noises, improves accuracy;The training of collection The quantity of sample>5000, using stochastic gradient descent method(SGD: Stochastic Gradient Descent)Carry out Training, obtains foreground mask CNN;CNN uses the MatConvNet increased income(http://www.vlfeat.org/ matconvnet/);The calculating of light stream vector can use Piotr Dollar toolbox;
(5)The calculating of retrieval character:
Referring to Fig. 6, present invention uses a variety of retrieval characters, including:HS, RGB, improved CEDD and improved WHOS;Calculate 1 The retrieval character of individual pedestrian is divided into 2 steps:
[a] calculates the CNN foreground masks for distinguishing the upper lower part of the body;
[b] is for 3 kinds of CNN foreground masks(Above the waist, the lower part of the body, whole body)Corresponding pedestrian area, calculates above-mentioned 4 kinds respectively Feature, obtains 12 kinds of retrieval characters;
12 kinds of retrieval characters of each pedestrian are stored in the property data base of pedestrian's collection to be measured;
CEDD comes from htpp://chatzichristofis.info, former algorithm does not support ROI (Region Of Interest), improve CEDD and support that ROI, ROI can be 3 kinds of CNN foreground masks;
WHOS features come from http://www.micc.unifi.it/lisanti/source-code/re-id/, former algorithm is not ROI is supported, WHOS is improved and supports ROI, and delete the HOG features in former algorithm;
(6)Optimizing CNN structure and optimal distance weights W calculating:
Optimizing CNN is used to calculate optimal distance weights W;Referring to Fig. 6, Fig. 8,4 layers are included:
[a] input layer:1 Nx24 matrix, comprising N to the characteristic distance vector between pedestrian;
[b] convolutional layer:Convolution kernel is K 1x24 dimension matrix, is output as K Nx1 matrix;
[c] output layer:The processing unit of 1 1xK dimension is included, 1 Nx1 matrix is output as;Every 1 numerical value represent 1 couple of pedestrian it Between characteristic distance;
After being finished to optimizing CNN training, tie up matrix (i.e. individual 24 dimensional vectors of K) for K 1x24 and take average, obtain 24 dimensions it is optimal away from From weight vector W, formula is:
;i=1,…,24; VijFor i-th of element in j-th of 24 dimensional vectors;
Loss function can be found in Fig. 7, and the normalized formula of L2 are:
;X and Y are 24 dimensional vectors;
(7)Optimizing CNN training:
Participate in Fig. 6~Fig. 8, pedestrian of the training sample in monitor video so that training pattern is adapted to actual conditions, 1 Individual training sample is that 1 Nx24 ties up distance matrix, it is necessary to N number of pedestrian;For 1 pedestrian A, find out 1 and belong to same with pedestrian A Another pedestrian B1 of individual pedestrian, then selects N-1 (N>3) it is individual same pedestrian is not belonging to pedestrian A other pedestrian B2~ BN, composition sample group { B1, B2 ..., BN }, wherein A and B1 belong to same pedestrian, then calculate sample A and sample group B1, B2 ..., BN } in each pedestrian 24 dimensions characteristic distances vector, can so obtain the characteristic distance matrix of 1 Nx24 dimension; Desired output is the vector after 24 dimensional vectors { 0,1,1 ..., 1 } are normalized through L2, and 0 represents that distance is 0, belongs to same pedestrian, 1 represents to be not belonging to same pedestrian, and desired output represents the sequencing of similarity of retrieval result;Fig. 7 is shown in the definition of loss function;Adopt The number of samples of collection>5000.
Simply presently preferred embodiments of the present invention described above, the example above illustrates that the substantive content not to the present invention is made Any formal limitation, technology of the person of an ordinary skill in the technical field after this specification has been read according to the present invention Any simple modification or deformation that essence is made to above embodiment, and possibly also with the technology contents of the disclosure above The equivalent embodiment of equivalent variations is changed or is modified to, in the range of still falling within technical solution of the present invention, without departing from The spirit and scope of the invention.

Claims (3)

1. a kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net, it is characterised in that it comprises the following steps:
(1)Extract CNN foreground masks:Video and the pedestrian having been detected by for input, using GMM(Gaussian Mixture Model) GMM foreground masks in pedestrian's square frame are calculated, covered with GMM prospects in the RGB image block that square frame is included The color of the corresponding part of background in code is changed to grey, can so eliminate the interference of background area;Recycle in video Movable information calculates the light stream vector of each pixel in pedestrian's square frame, then by GMM foreground masks, the light in pedestrian's square frame The amplitude of flow vector, the direction of light stream vector and amended RGB image block are combined into " pedestrian's mask assemblage characteristic ", input To " in foreground mask CNN ", obtaining distinguishing the CNN foreground masks of the lower part of the body, i.e. mask value only has:Background 0, the upper part of the body 1 With the lower part of the body 2;
(2)Calculate searching characteristic vector:The CNN prospects of the lower part of the body on RGB image block and corresponding differentiation for each pedestrian Mask, calculates whole body, the upper part of the body and the HS of lower part of the body foreground mask corresponding region, RGB, improves CEDD and improves WHOS and be special respectively Levy, altogether 12 kinds of searching characteristic vectors, and the searching characteristic vector that pedestrian to be measured collects is stored in property data base;Wherein, Improve CEDD features and only calculate the pixel of foreground mask corresponding region, and improve WHOS and also only calculate foreground mask corresponding region Pixel, while not calculating HOG features;
(3)Calculate characteristic distance:For the searching characteristic vector of 2 pedestrians, using Bhattachayya methods and Tanimoto methods The distance of the subcharacter vector of corresponding same type is calculated respectively, and 24 kinds of distances, group so are obtained for 12 seed characteristics Into the distance vector D of 1 24 dimension;Then use by " this 1 distance vector is converted to 1 by the weight vector W that optimizing CNN " is tried to achieve Individual distance value, conversion formula is d=W'D, wherein:W is 24 right-safeguarding value column vectors, and D is tieed up apart from column vector for 24, and result of calculation d is 1 1x1 dimension value;In order to obtain and the upper lower part of the body feature of pedestrian to be checked all similar retrieval results, method of the invention Using 1 characteristic distance and filtration treatment is not needed;
(4)Sequence and output retrieval result:For pedestrian to be checked, (1) and (2) methods described is utilized to calculate characteristic vector, so Afterwards using the distance between each pedestrian to be measured in the calculating of (3) methods described and property data base, finally by these distances Value, which is ranked up, obtains retrieval result, high apart from small expression similarity degree, and distance is big to represent that similarity degree is low.
2. a kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net according to claim 1, its feature It is:Step(1)Described in the CNN foreground masks of calculating pedestrian concretely comprise the following steps:The first step trains " foreground mask CNN ", first prepares training sample, using the pedestrian detected in monitor video as sample, zooms to PxQ dimension standard sizes, then hand Work mark pedestrian expectation foreground mask, mark different mask values respectively for the upper lower part of the body, i.e., by background, the upper part of the body and under Half body is respectively labeled as 0,1,2, is used as the output valve of training sample;Then the mask assemblage characteristic of sample is calculated, first using GMM The GMM foreground masks in pedestrian detection square frame are calculated, and by the background pair in pedestrian's square frame RGB image block with GMM foreground masks The region answered is set to grey, the light stream vector of each pixel in pedestrian detection square frame is then calculated, so for each row People can obtain 6 PxQ dimension matrixes:GMM foreground masks, the size of light stream vector, the direction of light stream vector, amended R, G, B, constitutes the characteristic of sample, is used as the input of training sample;Finally " foreground mask is trained with above-mentioned training sample CNN”;This CNN uses 6 layers, is respectively:Input layer, convolutional layer, max-pooling layers, convolutional layer, max-pooling layers, complete Articulamentum, inputs and ties up matrix for 6 above-mentioned PxQ, is output as PxQ dimension images, and data value is 0,1 or 2, and the back of the body is represented respectively Scape, the upper part of the body and the lower part of the body;Second step is using " foreground mask CNN " calculates CNN foreground masks.
3. a kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net according to claim 1, its feature It is:Step(3)Described in weight vector W obtaining value method it is as follows:
The first step prepares training sample:For the pedestrian in pedestrian sample storehouse, calculate searching characteristic vector and preserve;Then select 1 pedestrian A, and find out 1 another pedestrian B1 for belonging to same pedestrian with pedestrian A, then select N-1 ((N>3)Individual and row People A is not belonging to other pedestrians of same pedestrian, constitutes sample group { B1, B2 ..., BN }, wherein A and B1 belong to same row People, then pedestrian A and the characteristic distance vector of 24 dimensions of each pedestrian in sample group { B1, B2 ..., BN } are calculated, and to each spy Levy distance vector and carry out L2 normalization, so obtain the characteristic distance matrix of 1 Nx24 dimension, 2 pedestrian's samples are represented per a line Characteristic distance vector between this, 1 matrix 1 training sample of formation;And the desired output of each training sample is fixed as to N The set { 0,1 ..., 1 } of individual element carries out the vectorial Y that L2 normalization is obtained;Multiple training samples are generated in this way;
" optimizing CNN " calculates W for second step training:Above-mentioned training sample input " is trained in optimizing CNN ", obtained 1x24 dimension convolution kernels are best initial weights vector W;" optimizing CNN " structure includes 3 layers:Input layer, convolutional layer, max- Pooling output layers, the characteristic distance vector D of input layer correspondence Nx24 dimensions, convolutional layer is the convolution kernel dimension square of K 1x24 dimension Battle array, max-pooling output layers are N-dimensional vector Y;The convolution kernel dimension matrix of K 1x24 dimension is taken average by training after finishing, and is obtained Optimal W.
CN201710270659.2A 2017-04-24 2017-04-24 A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net Withdrawn CN107085609A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710270659.2A CN107085609A (en) 2017-04-24 2017-04-24 A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710270659.2A CN107085609A (en) 2017-04-24 2017-04-24 A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net

Publications (1)

Publication Number Publication Date
CN107085609A true CN107085609A (en) 2017-08-22

Family

ID=59611511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710270659.2A Withdrawn CN107085609A (en) 2017-04-24 2017-04-24 A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net

Country Status (1)

Country Link
CN (1) CN107085609A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766794A (en) * 2017-09-22 2018-03-06 天津大学 The image, semantic dividing method that a kind of Fusion Features coefficient can learn
CN108416266A (en) * 2018-01-30 2018-08-17 同济大学 A kind of video behavior method for quickly identifying extracting moving target using light stream
CN108460411A (en) * 2018-02-09 2018-08-28 北京市商汤科技开发有限公司 Example dividing method and device, electronic equipment, program and medium
CN108960331A (en) * 2018-07-10 2018-12-07 重庆邮电大学 A kind of recognition methods again of the pedestrian based on pedestrian image feature clustering
CN110929770A (en) * 2019-11-15 2020-03-27 云从科技集团股份有限公司 Intelligent tracking method, system and equipment based on image processing and readable medium
CN110929619A (en) * 2019-11-15 2020-03-27 云从科技集团股份有限公司 Target object tracking method, system and device based on image processing and readable medium
CN111046724A (en) * 2019-10-21 2020-04-21 武汉大学 Pedestrian retrieval method based on area matching network
CN111951189A (en) * 2020-08-13 2020-11-17 神思电子技术股份有限公司 Data enhancement method for multi-scale texture randomization
CN113192101A (en) * 2021-05-06 2021-07-30 影石创新科技股份有限公司 Image processing method, image processing device, computer equipment and storage medium
US20220012885A1 (en) * 2019-07-26 2022-01-13 Adobe Inc. Utilizing a two-stream encoder neural network to generate composite digital images
US11270158B2 (en) 2018-02-09 2022-03-08 Beijing Sensetime Technology Development Co., Ltd. Instance segmentation methods and apparatuses, electronic devices, programs, and media

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050129285A1 (en) * 2003-09-29 2005-06-16 Fuji Photo Film Co., Ltd. Collation system and computer readable medium storing thereon program
CN104484324A (en) * 2014-09-26 2015-04-01 徐晓晖 Pedestrian retrieval method based on multiple models and fuzzy color

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050129285A1 (en) * 2003-09-29 2005-06-16 Fuji Photo Film Co., Ltd. Collation system and computer readable medium storing thereon program
CN104484324A (en) * 2014-09-26 2015-04-01 徐晓晖 Pedestrian retrieval method based on multiple models and fuzzy color

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766794B (en) * 2017-09-22 2021-05-14 天津大学 Image semantic segmentation method with learnable feature fusion coefficient
CN107766794A (en) * 2017-09-22 2018-03-06 天津大学 The image, semantic dividing method that a kind of Fusion Features coefficient can learn
CN108416266A (en) * 2018-01-30 2018-08-17 同济大学 A kind of video behavior method for quickly identifying extracting moving target using light stream
US11270158B2 (en) 2018-02-09 2022-03-08 Beijing Sensetime Technology Development Co., Ltd. Instance segmentation methods and apparatuses, electronic devices, programs, and media
CN108460411A (en) * 2018-02-09 2018-08-28 北京市商汤科技开发有限公司 Example dividing method and device, electronic equipment, program and medium
CN108960331A (en) * 2018-07-10 2018-12-07 重庆邮电大学 A kind of recognition methods again of the pedestrian based on pedestrian image feature clustering
US20220012885A1 (en) * 2019-07-26 2022-01-13 Adobe Inc. Utilizing a two-stream encoder neural network to generate composite digital images
US11568544B2 (en) * 2019-07-26 2023-01-31 Adobe Inc. Utilizing a two-stream encoder neural network to generate composite digital images
CN111046724A (en) * 2019-10-21 2020-04-21 武汉大学 Pedestrian retrieval method based on area matching network
CN111046724B (en) * 2019-10-21 2021-09-14 武汉大学 Pedestrian retrieval method based on area matching network
CN110929770A (en) * 2019-11-15 2020-03-27 云从科技集团股份有限公司 Intelligent tracking method, system and equipment based on image processing and readable medium
CN110929619A (en) * 2019-11-15 2020-03-27 云从科技集团股份有限公司 Target object tracking method, system and device based on image processing and readable medium
CN111951189A (en) * 2020-08-13 2020-11-17 神思电子技术股份有限公司 Data enhancement method for multi-scale texture randomization
CN111951189B (en) * 2020-08-13 2022-05-06 神思电子技术股份有限公司 Data enhancement method for multi-scale texture randomization
CN113192101A (en) * 2021-05-06 2021-07-30 影石创新科技股份有限公司 Image processing method, image processing device, computer equipment and storage medium
CN113192101B (en) * 2021-05-06 2024-03-29 影石创新科技股份有限公司 Image processing method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107085609A (en) A kind of pedestrian retrieval method that multiple features fusion is carried out based on neutral net
CN111079602B (en) Vehicle fine granularity identification method and device based on multi-scale regional feature constraint
Zhou et al. Point to set similarity based deep feature learning for person re-identification
CN105512684B (en) Logo automatic identifying method based on principal component analysis convolutional neural networks
CN104063719B (en) Pedestrian detection method and device based on depth convolutional network
CN112801015B (en) Multi-mode face recognition method based on attention mechanism
CN107463920A (en) A kind of face identification method for eliminating partial occlusion thing and influenceing
CN104504395A (en) Method and system for achieving classification of pedestrians and vehicles based on neural network
CN104881671B (en) A kind of high score remote sensing image Local Feature Extraction based on 2D Gabor
CN107767416B (en) Method for identifying pedestrian orientation in low-resolution image
CN112862849B (en) Image segmentation and full convolution neural network-based field rice ear counting method
CN106022223B (en) A kind of higher-dimension local binary patterns face identification method and system
CN104077742B (en) Human face sketch synthetic method and system based on Gabor characteristic
CN113610046B (en) Behavior recognition method based on depth video linkage characteristics
CN109190458A (en) A kind of person of low position's head inspecting method based on deep learning
CN108876776B (en) Classification model generation method, fundus image classification method and device
CN112396036A (en) Method for re-identifying blocked pedestrians by combining space transformation network and multi-scale feature extraction
CN105868711A (en) Method for identifying human body behaviors based on sparse and low rank
CN116524255A (en) Wheat scab spore identification method based on Yolov5-ECA-ASFF
Sehree et al. Olive trees cases classification based on deep convolutional neural network from unmanned aerial vehicle imagery
CN107704509A (en) A kind of method for reordering for combining stability region and deep learning
Yun et al. Part-level convolutional neural networks for pedestrian detection using saliency and boundary box alignment
Zhang et al. Hyperspectral Image Classification Based on Spectral-Spatial Attention Tensor Network
CN105718858B (en) A kind of pedestrian recognition method based on positive and negative broad sense maximum pond
CN116778468A (en) Three-dimensional target detection method based on point cloud structure perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20170822

WW01 Invention patent application withdrawn after publication