CN110688512A - Pedestrian image search algorithm based on PTGAN region gap and depth neural network - Google Patents

Pedestrian image search algorithm based on PTGAN region gap and depth neural network Download PDF

Info

Publication number
CN110688512A
CN110688512A CN201910751899.3A CN201910751899A CN110688512A CN 110688512 A CN110688512 A CN 110688512A CN 201910751899 A CN201910751899 A CN 201910751899A CN 110688512 A CN110688512 A CN 110688512A
Authority
CN
China
Prior art keywords
pedestrian
image
branch
region
ptgan
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910751899.3A
Other languages
Chinese (zh)
Inventor
张斯尧
谢喜林
王思远
黄晋
蒋杰
张�诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Jiu Ling Software Engineering Co Ltd
Original Assignee
Shenzhen Jiu Ling Software Engineering Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jiu Ling Software Engineering Co Ltd filed Critical Shenzhen Jiu Ling Software Engineering Co Ltd
Priority to CN201910751899.3A priority Critical patent/CN110688512A/en
Publication of CN110688512A publication Critical patent/CN110688512A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a big data pedestrian image search algorithm based on combination of a PTGAN region gap and a deep neural network, which comprises the following steps: constructing a Spark big data platform based on an MLbase machine learning library; building a deep learning neural network based on the combination of the PTGAN and the multiple branches, performing pedestrian image database training, extracting corresponding image features, and completing a pedestrian re-identification image database; the method comprises the steps of transmitting a video file into a Spark big data platform, segmenting a video key frame, and extracting feature information of a target image based on a deep learning algorithm; and detecting and calculating the similarity of the target pedestrian features and all pedestrian object features in the target image, and sequencing and searching the most similar pedestrian information and pedestrian images. The invention can be applied to a pedestrian feature extraction and real-time pedestrian detection search video monitoring system, has high reliability, good identification degree, good robustness and simple step calculation, and can keep high efficiency and real-time performance.

Description

Pedestrian image search algorithm based on PTGAN region gap and depth neural network
Technical Field
The invention relates to the field of computer vision and video investigation, in particular to a big data pedestrian image search algorithm based on the combination of a PTGAN (PersonTransfer GAN) region gap and a deep neural network.
Background
Given a monitored pedestrian image, the pedestrian image is retrieved across the device. In surveillance video, very high quality face pictures are often not available due to camera resolution and shooting angle. Re-ID becomes a very important alternative technology in case of face recognition failure. One very important characteristic of Re-ID is that it is across cameras, so retrieving the same pedestrian picture under different cameras becomes the key to Re-ID.
Although the detection capability of pedestrian re-identification has been significantly improved, many challenging problems have not been fully solved in practical situations: such as in complex scenes, differences in light, changes in perspective and pose, a large number of pedestrians in a surveillance camera network, etc. Under the conditions, the cross-camera retrieval is difficult generally, meanwhile, the marking work in the early stage of video image sample training is expensive, a large amount of manpower is consumed, the existing algorithm cannot achieve the expected effect generally, and the re-recognition accuracy is low.
Disclosure of Invention
The invention mainly aims to provide a pedestrian image search algorithm based on a PTGAN (packet switched gan) region gap and a deep neural network, and aims to solve the problems that in an actual complex scene, the cross-camera retrieval is usually difficult, meanwhile, the marking work cost in the early stage of video image sample training is high, a large amount of manpower is consumed, the conventional algorithm cannot achieve the expected effect usually, and the re-recognition accuracy is low.
In order to achieve the above object, the big data pedestrian image search algorithm based on the combination of the PTGAN region gap and the deep neural network provided by the invention comprises the following steps:
s1, constructing a Spark big data platform based on the MLbase machine learning library;
s2, building a deep learning neural network based on the combination of the PTGAN and the multiple branches, performing pedestrian image database training, extracting corresponding image features, and completing pedestrian re-recognition of an image database;
s3, transmitting the video file into a Spark big data platform, segmenting a video key frame, and extracting feature information of a target image based on a deep learning algorithm;
and S4, detecting and calculating the similarity of the target pedestrian feature and all the pedestrian object features in the target image, and then sequencing and searching the most similar pedestrian information and pedestrian images.
Preferably, the step S2 includes the steps of:
s2.1, carrying out PTGAN processing on the common video image to obtain an image to be identified, wherein the image to be identified is an image with unchanged pedestrian foreground and shifted background difference area;
s2.2, performing multi-branch combined training on the image to be recognized, wherein the specific steps are as follows:
s2.2.1, inputting the image to be recognized into a training model, and obtaining feature vectors corresponding to a plurality of branches, specifically as follows: given an input processed pedestrian image, the RAM generates a set of function vectors, specifically a feature map M for five shared convolutional layers, and then M is fed to four branches to generate different features, the four branches including a global branch, a BN branch, an attribute branch, and a local region branch;
s2.2.2 local feature extraction, using local region branching to generate region features, the details are as follows: the local region branch divides the feature map M evenly from top to bottom into K overlapping local regions while using the overlapping regions to enhance the robustness of the learning features to possible misalignments or viewpoint changes, applying the FC layer after embedding the pooling layer after each region to generate region features from each of them, supervising each region feature learning using a classification task with a pedestrian identity information ID tag;
s2.2.3 extracting attribute features, wherein the attribute branch takes the output of the first FC layer in the global branch as input, then the FC layer generates the attribute features, and finally the attribute features are learned in the attribute classification task;
s2.2.4 training feature vector model, taking the front and back features of the pedestrian as two different classes for training, repeating the above-mentioned S2.2.1, S2.2.2 and S2.2.3 training process and forming feature vector; each branch of the RAM is trained by a separate classification task with softmax penalty, and model training is performed by successively adding global branch, BN branch, attribute branch, and local region branch to train out a feature vector model that meets the requirements.
Preferably, the loss function employed in performing the PTGAN process in said step S2.1 is:
LPTGAN=LStyle1LID
wherein L isStyleRepresenting loss of style or loss of area difference domain, LIDRepresenting loss of identity information, λ, of the generated image1Is a weight that balances style loss and identity information loss.
Preferably, said LStyleThe concrete formula of (1) is as follows:
Figure BDA0002167459780000031
wherein A and B are two frames of image processed by GAN, G is image A-B style mapping function,
Figure BDA0002167459780000032
for the B to A style mapping function, λ2Is the weight of the segmentation loss and the identity information loss.
Preferably, in step S2.1, the video image is further subjected to foreground segmentation by using PSPNet to obtain a mask layer regionL isIDThe concrete formula of (1) is as follows:
Figure BDA0002167459780000033
wherein G (a) is a pedestrian image transferred in the image a,
Figure BDA0002167459780000034
is the pedestrian image, IE, shifted in the image ba~pdata(a)For data distribution of image A, IEb~pdata(b)For the data distribution of B, M (a) and M (B) are two divided mask regions.
Preferably, in the step S2.2.1, the generated features are: f. ofc、fb、fa、frt、frmAnd frbWherein f iscFrom a global branch, fbFrom BN branch, faFrom the attribute branch, frt、frmAnd frbLocal region branches from the head, upper body and lower body of the pedestrian, respectively; global and BN branches generate a global feature f from the entire feature map, respectivelycAnd fb(ii) a The BN branch adds batch standardization operation to the global branch to learn complementary global features; the local area branching first divides the sketch into three overlapping areas, denoted respectively: top Rt, middle Rm, bottom Rb, then use three sets of fully connected layers to generate the region feature f from the corresponding regionrt,frmAnd frb
Preferably, the method for extracting the corresponding features by the global branch is as follows: global branching first assembles the feature map M into a 6 × 6 × 512, and then uses two fully connected layers to generate the feature fc;fcTraining by using the pedestrian identity information ID in the classification task; the method for extracting corresponding features by the BN branch comprises the following steps: embedding BN layer between feature map M and pooling layer to generate new feature map MbThen using two FC layers to generate feature fb
Preferably, in the step S2.2.4, the overall objective function adopted by the RAM in the plurality of classification tasks is:
L(Θ)=lconv3lBN4lre5latt
wherein Θ represents a parameter in the deep model; lconv,lBN,lreAnd lattRespectively representing classification losses in global, BN, local regions and attribute branches; lambda [ alpha ]3,λ4,λ5A weight representing the corresponding penalty; the above-mentionedreConsisting of three equally weighted classification penalties for different regions.
Preferably, the segmenting the video key frame step in step S3 includes:
distinguishing I frame data and P frame data according to different frame data in the video file, and taking out key frame information of the video;
judging when the moving target appears or disappears, and obtaining the accurate time and the file position of the segmentation according to the moving target and the moving target;
intelligently segmenting and outputting the video file based on the basis of the intelligent video segmentation, wherein the basis of the intelligent video segmentation comprises the following constraint items: 1) the video segmentation points are video key frames; 2) a time point when a moving object begins to appear or just disappears in the video; 3) the fragment length cannot be less than 30 seconds and cannot exceed 6 minutes.
Preferably, in step S4, the similarity calculation formula is as follows:
according to the technical scheme provided by the invention, on the basis of the structural composition of a Spark big data platform and the intelligent analysis of big data, a pedestrian recognition algorithm based on deep learning, a pedestrian search technology based on video and a video key frame intelligent segmentation algorithm are sequentially realized, meanwhile, the pedestrian re-recognition is carried out by adopting the deep learning algorithm under the Spark open source framework, and the intelligent segmentation of the video data is improved, so that a better practical application effect is achieved. The method has the advantages of high reliability, good identification degree, good robustness, simple step calculation, high efficiency maintenance and real-time performance meeting requirements.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is an overall algorithm flow diagram of the present invention;
FIG. 2 is a schematic diagram of the assembly structure of the Spark platform of the present invention;
FIG. 3 is a diagram of a video intelligence analysis architecture in accordance with the present invention;
FIG. 4 is a comparison graph of the image conversion effect of pedestrians in the present invention;
fig. 5 is an overall structural view of the multiple branching structure of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, an embodiment of a big data pedestrian image search algorithm based on a combination of PTGAN region differences and a deep neural network according to the present invention includes the following steps:
s1, constructing a Spark big data platform based on the MLbase machine learning library.
Over the past decade, an extensible distributed programming framework has emerged to manage large data. The first programming model is MapReduce and its open source implementation Apache Hadoop. In recent years a new distributed framework Apache Spark has emerged. This is a platform for fast and versatile large-scale data processing. The Spark platform, based on memory computation, is naturally adapted to big data processing and analysis.
Spark has the advantages of Hadoop MapReduce; but different from MapReduce, the Job intermediate output result can be stored in a memory, and HDFS reading and writing is not needed any more, so Spark can be better suitable for MapReduce algorithms which need iteration, such as data mining, machine learning and the like. As the video data is stored in the HDFS file system, the Spark accesses a data source in a TCP socket-based mode, and intelligent video analysis is performed by using a Map-Reduce distributed computing model.
The component structure of Spark is shown in fig. 2. MLlib is a library of Spark implementations of commonly used machine learning algorithms. MLlib currently supports four common machine learning problems: binary classification, regression, clustering and collaborative filtering, and also comprises a bottom gradient descent optimization basic algorithm. The machine learning algorithm comprises two parts of training and predicting, wherein a model is trained, and then an unknown sample is predicted. MLbase is automatically optimized for distributed execution, and algorithm selection is achieved according to MLbase best practices and cost-based models. The system of the invention uses MLbase as a tool to process information feature detection and training processing of vehicles, human faces, pedestrians, left-over articles and the like in videos.
And after constructing a Spark big data platform based on the MLbase machine learning library according to the corresponding components, accessing a video image on the platform to perform subsequent algorithm operation.
S2, building a deep learning neural network based on combination of the PTGAN and the multiple branches, training a pedestrian image database, extracting corresponding image features, and completing the pedestrian re-recognition image database.
The basic network of the neural network is based on Resnet-50.
S2 specifically includes the following steps:
s2.1, carrying out PTGAN processing on the common video image to obtain an image to be identified, wherein the image to be identified is an image with unchanged pedestrian foreground and shifted background difference area.
PTGAN is a generative countermeasure network aimed at Re-identifying the Re-ID problem. In the inventionThe biggest characteristic of PTGAN is to realize the migration of background area differences on the premise of ensuring the foreground of the pedestrian as unchanged as possible. First, the loss function of the PTGAN network is: l isPTGAN=LStyle1LID
Wherein L isStyleRepresenting the loss of the generated style or the loss of the area difference domain, namely whether the generated image is like a new dataset style; l isIDRepresenting the loss of identity information of the generated image, namely verifying whether the generated image is the same person as the original image; lambda [ alpha ]1Is a weight that balances style loss and identity information loss;
the style loss LStyleThe concrete formula of (1) is as follows:
Figure BDA0002167459780000061
wherein A and B are two frames of image processed by GAN, G is image A-B style mapping function,
Figure BDA0002167459780000062
for the B to A style mapping function, λ2Is the weight of the segmentation loss and the identity information loss. The purpose of this partial loss function is to ensure that the difference region (domain) of the generated picture and the desired data set is the same.
Secondly, in order to ensure that the foreground is not changed in the picture migration process, the PSPNet is used for carrying out foreground segmentation on the video image to obtain a mask area. Generally, conventional generation of countermeasure networks such as CycleGAN is not used for Re-ID tasks, and therefore there is no need to ensure that the identity information of the foreground object is unchanged, with the result that the foreground may be of poor quality such as blurred, and worse, the appearance of pedestrians may change. To solve this problem, the present invention proposes LIDAnd loss, performing foreground segmentation on the video image by using the PSPNet, wherein the foreground is a mask layer area, and the final loss of the identity information is as follows:
Figure BDA0002167459780000063
wherein G (a) is a pedestrian image transferred in the image a,
Figure BDA0002167459780000064
is the pedestrian image, IE, shifted in the image ba~pdata(a)For data distribution of image A, IEb~pdata(b)For the data distribution of B, m (a) and m (B) are two segmented mask regions, and the identity information Loss function (Loss) will constrain the pedestrian foreground to remain as unchanged as possible during the migration process.
The final conversion effect is shown in fig. 4, and it can be seen that the algorithm of the invention intuitively can better ensure the identity information of the pedestrian compared with the traditional ring generation countermeasure network (CycleGAN).
S2.2, performing multi-branch combined training on the video pedestrian image subjected to background difference region migration processing based on the PTGAN algorithm, and obtaining more accurate identity information in order to effectively combine global and local information of pedestrians;
the overall structure diagram of the multiple branch structure is shown in fig. 5, and the specific steps of the algorithm are as follows:
s2.2.1, inputting the processed image to be recognized into the training model to obtain the global feature vector, which is as follows:
given an input processed pedestrian image, the RAM generates a set of function vectors, specifically a feature map M for five shared convolutional layers, and then M is fed to four branches to generate different features, the four branches including a global branch, a BN branch, an attribute branch, and a local region branch; then fed to four branches to generate different features, fcFrom a global branch, fbFrom BN branch, faFrom the attribute branch, frtfrmAnd frbLocal region branches from the head, upper body and lower body of the pedestrian, respectively; global and BN branches generate a global feature f from the entire feature map, respectivelycAnd fb(ii) a The BN branch adds batch standardization operation to the global branch to learn complementary global features; the regional branches will firstThe element map is divided into three overlapping regions, which are respectively expressed as: top Rt, middle Rm, bottom Rb, then use three sets of fully connected layers to generate the region feature f from the corresponding regionrt,frmAnd frb
The method for extracting the corresponding features by the global branch comprises the following steps: global branching first assembles the feature map M into 6 × 6 × 512, and then uses two fully connected layer (FC) layers to generate the feature fc;fcTraining by using the pedestrian identity information ID in the classification task; the network structure and training strategy encourages network localization and focus on areas that are discriminative in the classification of the target pedestrian, i.e., localized areas that effectively minimize classification loss; the corresponding feature map M for global branch learning will show the activation values for these higher regions; the highly activated areas cover different areas on the pedestrian ID image and are critical for pedestrian information classification.
In addition to the area highlighted on M, other areas can be used for Re-identification Re-ID of pedestrians; in order to make the model focus on more and larger context areas, the invention designs a BN branch at the same time, and the specific branch structure is shown in the general structure of FIG. 5.
The method for extracting corresponding features by the BN branch comprises the following steps: embedding a BN layer between M and pooling layers to generate a new feature map MbThen use both layers to generate feature fb(ii) a Similarly, the classification task based on the identity information of the pedestrian is finally used to train the BN branch.
BN operations tend to suppress highly activated local regions on the feature map and increase visibility of other regions; this causes the BN branch to depict additional context cues in addition to those captured by the global branch; it is clear that Mb depicts a larger context region, which may result in complementary global features.
S2.2.2 local feature extraction
In general, the difference between the identities of similar pedestrians may exist in some local areas, therefore, the invention designs a local area branch to generate the area characteristics, which is as follows:
first, the local region branch divides the feature map M evenly from top to bottom into K overlapping local regions, which are denoted as: rt represents the head, Rm represents the upper body, Rb represents the lower body, each of these regions corresponding to only a portion of the entire pedestrian;
while using overlapping regions to enhance the robustness of the learning features to possible misalignments or viewpoint variations, applying the FC layer after embedding the pooling layer after each region to generate region features from each of them, i.e. f to which Rt belongsrtF from RmrmAnd f from Rbrb
Finally, a classification task with a pedestrian identity Information (ID) tag is used to supervise feature learning of each region; during training of each branch, updating the FC layer to identify only pedestrian images having a portion of the feature map as input;
in this process, the network is forced to extract the discrimination details in each region, and obviously, more distinctive local regions than the feature map of the global branch can be identified.
S2.2.3 Attribute feature extraction
Clothing, pedestrian attributes such as color and the like can be regarded as middle-level description of pedestrians, and compared with visual features, the attribute features have stronger robustness to appearance changes caused by changes of viewpoints, lighting, backgrounds and the like; thus, the attribute features are complementary to the visual features that can be extracted on the global and local images, so the present invention uses the attributes to learn the features of the pedestrian Re-identification Re-ID.
In general, attribute prediction can be considered as an easier recognition method than fine-grained pedestrian recognition, the invention learns attribute characteristics from an attribute branch of a pedestrian Re-ID, and the attribute branch takes the output of the first FC layer in a global branch as input; the attribute feature f is then generated by the FC layera(ii) a Finally, learning attribute features in an attribute classification task; compared with the direct learning of the attribute characteristics of the input image, the strategy introduces fewer parameters and makes the training process easier.
S2.2.4 feature vector model training
In order to more rapidly and accurately identify the identity information of the pedestrian, the front side and the back side of the pedestrian are used as two different types of training, the training process is repeated, and a feature vector is formed;
in the algorithm, each branch of the RAM is trained by a single classification task with softmax loss, the RAM is optimized in a plurality of classification tasks, and the overall objective function can be expressed as:
L(Θ)=lconv3lBN4lre5latt
wherein Θ represents a parameter in the deep model; lconv,lBN,lreAnd lattRepresents classification losses in global, BN, local and attribute branches, respectively; lambda [ alpha ]3,λ4,λ5A weight representing the corresponding penalty; wherein lreConsists of three equal-weight classification losses of different regions;
usually, four branches are trained at the beginning, and usually convergence is difficult, so the algorithm of the invention adopts step-by-step model training; we first train a model with only global branches, where other branches, i.e. BN, local and attribute branches are added in order, convolutional layers will be shared by different branches and fine-tuned in multiple classification tasks, as shown above, and finally train a feature vector model that meets the requirements.
And S3, transmitting the video file into a Spark big data platform, segmenting the video key frame, and extracting characteristic information of the target image based on a deep learning algorithm.
And transmitting the real-time video or video files to a Spark big data platform. The real-time video has no ending and does not contain starting point information, so that the parallel operation is not supported; for historical videos, if the video files are intelligently segmented, the video files can be subjected to parallel operation. Video image data is automatically segmented into video segments through a Map method, then a deep learning algorithm of the video image is accessed for processing, and a processing result is transmitted to a Reduce method for automatic convergence and data storage. The structure of the video intelligent analysis is shown in figure 3.
The video segmentation processing is carried out based on the video key frames, so that the video can be better parallelized.
The segmentation process is roughly divided into the following two steps: 1. distinguishing I frame data and P frame data according to the difference of different frame data in the video file, and taking out key frame information of the video to serve as a segmentation key point of the video file; 2. by combining the existing moving target-based detection method, when a moving target appears or disappears, judgment is carried out, the accurate time and the file position of the segmentation are obtained according to the moving target and the moving target, and the basis of the intelligent video segmentation is 3 constraint items as follows: a moving object in the video, such as a time point when a target pedestrian begins to appear or just disappears; the video segmentation points are required to be video key frames, and the complete video image can be obtained only by the files which are segmented according to the video key frames; the fragment length cannot be less than 30 seconds and cannot exceed 6 minutes. And finally, outputting the video image subjected to intelligent segmentation.
In addition, in consideration of the requirement of target detection in practical application, the aspect ratio of the picture is variable when the dimension of the search area is set, and the size of the overall picture is not changed. This not only helps to increase the processing requirements of the video image, but also greatly reduces the amount of computation. For the original input picture, the RPN network will get about twenty thousand search boxes. In practical application, some search boxes beyond the picture boundary can be eliminated; meanwhile, for the search frames overlapped and covered by the same target, a Non-Maximum Suppression (NMS) method is adopted for processing so as to achieve the purpose of removing the overlapped search frames. The strategy can remarkably improve the searching efficiency of the candidate target frame.
Finally, feature vectors are extracted and stored through the deep learning network established in the step S2, and it is noted that the more training images are input, the more accurate the model is, and the wider the coverage range is.
The pedestrian target training is carried out through a huge amount of pedestrian image learning samples, a large amount of on-site system adjustment and testing are carried out, various characteristics such as appearance outlines, relative positions, colors and textures of various parts such as clothes, faces, upper half bodies and lower half bodies are collected, a large amount of auxiliary classification information is formed, and a comprehensive credibility score can be finally obtained together with the results such as the ages and the sexes of pedestrians.
And S4, detecting and calculating the similarity of the target pedestrian feature and all the pedestrian object features in the target image, and then sequencing and searching the most similar pedestrian information and pedestrian images.
The similarity calculation usually adopts cosine distance calculation, and cosine similarity uses a cosine value of an included angle between two vectors in a vector space as a measure of the difference between two individuals; compared with distance measurement, cosine similarity emphasizes the difference of two vectors in direction rather than distance or length, and the formula is as follows:
Figure BDA0002167459780000101
the smaller the calculated numerical value is, the higher the similarity is, and finally the final re-recognition system model is output in combination;
the method provided by the invention can be actually embedded into an FPGA (field programmable gate array) to realize, and is applied to a system with real-time pedestrian re-identification.
In the description herein, references to the description of the term "one embodiment," "another embodiment," or "first through xth embodiments," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, method steps, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A big data pedestrian image search algorithm based on combination of a PTGAN region gap and a deep neural network is characterized by comprising the following steps:
s1, constructing a Spark big data platform based on the MLbase machine learning library;
s2, building a deep learning neural network based on the combination of the PTGAN and the multiple branches, performing pedestrian image database training, extracting corresponding image features, and completing pedestrian re-recognition of an image database;
s3, transmitting the video file into a Spark big data platform, segmenting a video key frame, and extracting feature information of a target image based on a deep learning algorithm;
and S4, detecting and calculating the similarity of the target pedestrian feature and all the pedestrian object features in the target image, and then sequencing and searching the most similar pedestrian information and pedestrian images.
2. The big-data pedestrian image search algorithm based on the combination of PTGAN region gaps and deep neural network as claimed in claim 1, wherein the step S2 comprises the steps of:
s2.1, carrying out PTGAN processing on the common video image to obtain an image to be identified, wherein the image to be identified is an image with unchanged pedestrian foreground and shifted background difference area;
s2.2, performing multi-branch combined training on the image to be recognized, wherein the specific steps are as follows:
s2.2.1, inputting the image to be recognized into a training model, and obtaining feature vectors corresponding to a plurality of branches, specifically as follows: given an input processed pedestrian image, the RAM generates a set of function vectors, specifically a feature map M for five shared convolutional layers, and then M is fed to four branches to generate different features, the four branches including a global branch, a BN branch, an attribute branch, and a local region branch;
s2.2.2 local feature extraction, using local region branching to generate region features, the details are as follows: the local region branch divides the feature map M evenly from top to bottom into K overlapping local regions while using the overlapping regions to enhance the robustness of the learning features to possible misalignments or viewpoint changes, applying the FC layer after embedding the pooling layer after each region to generate region features from each of them, supervising each region feature learning using a classification task with a pedestrian identity information ID tag;
s2.2.3 extracting attribute features, wherein the attribute branch takes the output of the first FC layer in the global branch as input, then the FC layer generates the attribute features, and finally the attribute features are learned in the attribute classification task;
s2.2.4 training feature vector model, taking the front and back features of the pedestrian as two different classes for training, repeating the above-mentioned S2.2.1, S2.2.2 and S2.2.3 training process and forming feature vector; each branch of the RAM is trained by a separate classification task with softmax penalty, and model training is performed by successively adding global branch, BN branch, attribute branch, and local region branch to train out a feature vector model that meets the requirements.
3. The big-data pedestrian image search algorithm based on combination of PTGAN region gaps and deep neural network as claimed in claim 2, wherein the loss function adopted in performing the PTGAN process in the step S2.1 is:
LPTGAN=LStyle1LID
wherein L isStyleRepresenting loss of style or loss of area difference domain, LIDRepresenting loss of identity information, λ, of the generated image1Is a weight that balances style loss and identity information loss.
4. The big-data pedestrian image search algorithm based on combination of PTGAN region gaps and deep neural network as claimed in claim 3, wherein L isStyleThe concrete formula of (1) is as follows:
Figure FDA0002167459770000021
wherein A and B are two frames of image processed by GAN, G is image A-B style mapping function,
Figure FDA0002167459770000022
for the B to A style mapping function, λ2Is the weight of the segmentation loss and the identity information loss.
5. The big data pedestrian image searching algorithm based on the combination of the PTGAN region gap and the deep neural network as claimed in claim 3,characterized in that in step S2.1, the video image is further foreground-segmented by PSPNet to obtain a mask layer region, and the LIDThe concrete formula of (1) is as follows:
Figure FDA0002167459770000023
wherein G (a) is a pedestrian image transferred in the image a,
Figure FDA0002167459770000024
is the pedestrian image, IE, shifted in the image ba~pdata(a)For data distribution of image A, IEb~pdata(b)For the data distribution of B, M (a) and M (B) are two divided mask regions.
6. The big-data pedestrian image search algorithm based on the combination of PTGAN region gaps and deep neural network as claimed in claim 2, wherein in the step S2.2.1, the generated features are: f. ofc、fb、fa、frt、frmAnd frbWherein f iscFrom a global branch, fbFrom BN branch, faFrom the attribute branch, frt、frmAnd frbLocal region branches from the head, upper body and lower body of the pedestrian, respectively; global and BN branches generate a global feature f from the entire feature map, respectivelycAnd fb(ii) a The BN branch adds batch standardization operation to the global branch to learn complementary global features; the local area branching first divides the sketch into three overlapping areas, denoted respectively: top Rt, middle Rm, bottom Rb, then use three sets of fully connected layers to generate the region feature f from the corresponding regionrt,frmAnd frb
7. The big-data pedestrian image search algorithm based on the combination of the PTGAN region gap and the deep neural network as claimed in claim 6, wherein the method for extracting the corresponding features by the global branch is as follows: global branchingFeature maps M are first assembled into 6 × 6 × 512, and then features f are generated using two fully connected layersc;fcTraining by using the pedestrian identity information ID in the classification task; the method for extracting corresponding features by the BN branch comprises the following steps: embedding BN layer between feature map M and pooling layer to generate new feature map MbThen using two FC layers to generate feature fb
8. The big-data pedestrian image searching algorithm based on the combination of PTGAN region gaps and deep neural network as claimed in claim 7, wherein in the step S2.2.4, the overall objective function adopted by the RAM in the plurality of classification tasks is:
L(Θ)=lconv3lBN4lre5latt
wherein Θ represents a parameter in the deep model; lconv,lBN,lreAnd lattRespectively representing classification losses in global, BN, local regions and attribute branches; lambda [ alpha ]3,λ4,λ5A weight representing the corresponding penalty; the above-mentionedreConsisting of three equally weighted classification penalties for different regions.
9. The big-data pedestrian image searching algorithm based on the combination of PTGAN region gaps and deep neural network as claimed in claim 1, wherein the step of segmenting the video key frames in the step S3 comprises:
distinguishing I frame data and P frame data according to different frame data in the video file, and taking out key frame information of the video;
judging when the moving target appears or disappears, and obtaining the accurate time and the file position of the segmentation according to the moving target and the moving target;
intelligently segmenting and outputting the video file based on the basis of the intelligent video segmentation, wherein the basis of the intelligent video segmentation comprises the following constraint items: 1) the video segmentation points are video key frames; 2) a time point when a moving object begins to appear or just disappears in the video; 3) the fragment length cannot be less than 30 seconds and cannot exceed 6 minutes.
10. The big-data pedestrian image searching algorithm based on the combination of the PTGAN region gap and the deep neural network as claimed in claim 1, wherein in the step S4, the similarity calculation formula is as follows:
Figure FDA0002167459770000041
CN201910751899.3A 2019-08-15 2019-08-15 Pedestrian image search algorithm based on PTGAN region gap and depth neural network Pending CN110688512A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910751899.3A CN110688512A (en) 2019-08-15 2019-08-15 Pedestrian image search algorithm based on PTGAN region gap and depth neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910751899.3A CN110688512A (en) 2019-08-15 2019-08-15 Pedestrian image search algorithm based on PTGAN region gap and depth neural network

Publications (1)

Publication Number Publication Date
CN110688512A true CN110688512A (en) 2020-01-14

Family

ID=69108256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910751899.3A Pending CN110688512A (en) 2019-08-15 2019-08-15 Pedestrian image search algorithm based on PTGAN region gap and depth neural network

Country Status (1)

Country Link
CN (1) CN110688512A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111565303A (en) * 2020-05-29 2020-08-21 深圳市易链信息技术有限公司 Video monitoring method, system and readable storage medium based on fog calculation and deep learning
CN112733920A (en) * 2020-12-31 2021-04-30 中国地质调查局成都地质调查中心 Image identification method and system based on deep learning
CN113239782A (en) * 2021-05-11 2021-08-10 广西科学院 Pedestrian re-identification system and method integrating multi-scale GAN and label learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256439A (en) * 2017-12-26 2018-07-06 北京大学 A kind of pedestrian image generation method and system based on cycle production confrontation network
CN110084139A (en) * 2019-04-04 2019-08-02 长沙千视通智能科技有限公司 A kind of recognition methods again of the vehicle based on multiple-limb deep learning
CN110096982A (en) * 2019-04-22 2019-08-06 长沙千视通智能科技有限公司 A kind of video frequency vehicle big data searching method based on deep learning
CN110110755A (en) * 2019-04-04 2019-08-09 长沙千视通智能科技有限公司 Based on the pedestrian of PTGAN Regional disparity and multiple branches weight recognition detection algorithm and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256439A (en) * 2017-12-26 2018-07-06 北京大学 A kind of pedestrian image generation method and system based on cycle production confrontation network
CN110084139A (en) * 2019-04-04 2019-08-02 长沙千视通智能科技有限公司 A kind of recognition methods again of the vehicle based on multiple-limb deep learning
CN110110755A (en) * 2019-04-04 2019-08-09 长沙千视通智能科技有限公司 Based on the pedestrian of PTGAN Regional disparity and multiple branches weight recognition detection algorithm and device
CN110096982A (en) * 2019-04-22 2019-08-06 长沙千视通智能科技有限公司 A kind of video frequency vehicle big data searching method based on deep learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111565303A (en) * 2020-05-29 2020-08-21 深圳市易链信息技术有限公司 Video monitoring method, system and readable storage medium based on fog calculation and deep learning
CN111565303B (en) * 2020-05-29 2021-12-14 广东省电子口岸管理有限公司 Video monitoring method, system and readable storage medium based on fog calculation and deep learning
CN112733920A (en) * 2020-12-31 2021-04-30 中国地质调查局成都地质调查中心 Image identification method and system based on deep learning
CN113239782A (en) * 2021-05-11 2021-08-10 广西科学院 Pedestrian re-identification system and method integrating multi-scale GAN and label learning

Similar Documents

Publication Publication Date Title
CN112818903B (en) Small sample remote sensing image target detection method based on meta-learning and cooperative attention
CN110097568B (en) Video object detection and segmentation method based on space-time dual-branch network
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN110110755B (en) Pedestrian re-identification detection method and device based on PTGAN region difference and multiple branches
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
CN111104903B (en) Depth perception traffic scene multi-target detection method and system
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
CN110033007B (en) Pedestrian clothing attribute identification method based on depth attitude estimation and multi-feature fusion
CN108090919A (en) Improved kernel correlation filtering tracking method based on super-pixel optical flow and adaptive learning factor
CN112489081B (en) Visual target tracking method and device
CN110633632A (en) Weak supervision combined target detection and semantic segmentation method based on loop guidance
Huang et al. Spatial-temproal based lane detection using deep learning
CN110298297A (en) Flame identification method and device
CN107688830B (en) Generation method of vision information correlation layer for case serial-parallel
CN110705412A (en) Video target detection method based on motion history image
CN111539320B (en) Multi-view gait recognition method and system based on mutual learning network strategy
CN110688512A (en) Pedestrian image search algorithm based on PTGAN region gap and depth neural network
Xing et al. Traffic sign recognition using guided image filtering
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN117252904B (en) Target tracking method and system based on long-range space perception and channel enhancement
Liang et al. Methods of moving target detection and behavior recognition in intelligent vision monitoring.
Dong et al. Learning regional purity for instance segmentation on 3d point clouds
Khoshboresh-Masouleh et al. Robust building footprint extraction from big multi-sensor data using deep competition network
Al-Shammri et al. A combined method for object detection under rain conditions using deep learning
CN110659576A (en) Pedestrian searching method and device based on joint judgment and generation learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200114

RJ01 Rejection of invention patent application after publication