CN110688512A

CN110688512A - Pedestrian image search algorithm based on PTGAN region gap and depth neural network

Info

Publication number: CN110688512A
Application number: CN201910751899.3A
Authority: CN
Inventors: 张斯尧; 谢喜林; 王思远; 黄晋; 蒋杰; 张�诚
Original assignee: Shenzhen Jiu Ling Software Engineering Co Ltd
Current assignee: Shenzhen Jiu Ling Software Engineering Co Ltd
Priority date: 2019-08-15
Filing date: 2019-08-15
Publication date: 2020-01-14

Abstract

The invention discloses a big data pedestrian image search algorithm based on combination of a PTGAN region gap and a deep neural network, which comprises the following steps: constructing a Spark big data platform based on an MLbase machine learning library; building a deep learning neural network based on the combination of the PTGAN and the multiple branches, performing pedestrian image database training, extracting corresponding image features, and completing a pedestrian re-identification image database; the method comprises the steps of transmitting a video file into a Spark big data platform, segmenting a video key frame, and extracting feature information of a target image based on a deep learning algorithm; and detecting and calculating the similarity of the target pedestrian features and all pedestrian object features in the target image, and sequencing and searching the most similar pedestrian information and pedestrian images. The invention can be applied to a pedestrian feature extraction and real-time pedestrian detection search video monitoring system, has high reliability, good identification degree, good robustness and simple step calculation, and can keep high efficiency and real-time performance.

Description

Pedestrian image search algorithm based on PTGAN region gap and depth neural network

Technical Field

The invention relates to the field of computer vision and video investigation, in particular to a big data pedestrian image search algorithm based on the combination of a PTGAN (PersonTransfer GAN) region gap and a deep neural network.

Background

Given a monitored pedestrian image, the pedestrian image is retrieved across the device. In surveillance video, very high quality face pictures are often not available due to camera resolution and shooting angle. Re-ID becomes a very important alternative technology in case of face recognition failure. One very important characteristic of Re-ID is that it is across cameras, so retrieving the same pedestrian picture under different cameras becomes the key to Re-ID.

Although the detection capability of pedestrian re-identification has been significantly improved, many challenging problems have not been fully solved in practical situations: such as in complex scenes, differences in light, changes in perspective and pose, a large number of pedestrians in a surveillance camera network, etc. Under the conditions, the cross-camera retrieval is difficult generally, meanwhile, the marking work in the early stage of video image sample training is expensive, a large amount of manpower is consumed, the existing algorithm cannot achieve the expected effect generally, and the re-recognition accuracy is low.

Disclosure of Invention

The invention mainly aims to provide a pedestrian image search algorithm based on a PTGAN (packet switched gan) region gap and a deep neural network, and aims to solve the problems that in an actual complex scene, the cross-camera retrieval is usually difficult, meanwhile, the marking work cost in the early stage of video image sample training is high, a large amount of manpower is consumed, the conventional algorithm cannot achieve the expected effect usually, and the re-recognition accuracy is low.

In order to achieve the above object, the big data pedestrian image search algorithm based on the combination of the PTGAN region gap and the deep neural network provided by the invention comprises the following steps:

s1, constructing a Spark big data platform based on the MLbase machine learning library;

s2, building a deep learning neural network based on the combination of the PTGAN and the multiple branches, performing pedestrian image database training, extracting corresponding image features, and completing pedestrian re-recognition of an image database;

s3, transmitting the video file into a Spark big data platform, segmenting a video key frame, and extracting feature information of a target image based on a deep learning algorithm;

and S4, detecting and calculating the similarity of the target pedestrian feature and all the pedestrian object features in the target image, and then sequencing and searching the most similar pedestrian information and pedestrian images.

Preferably, the step S2 includes the steps of:

s2.1, carrying out PTGAN processing on the common video image to obtain an image to be identified, wherein the image to be identified is an image with unchanged pedestrian foreground and shifted background difference area;

s2.2, performing multi-branch combined training on the image to be recognized, wherein the specific steps are as follows:

s2.2.1, inputting the image to be recognized into a training model, and obtaining feature vectors corresponding to a plurality of branches, specifically as follows: given an input processed pedestrian image, the RAM generates a set of function vectors, specifically a feature map M for five shared convolutional layers, and then M is fed to four branches to generate different features, the four branches including a global branch, a BN branch, an attribute branch, and a local region branch;

s2.2.2 local feature extraction, using local region branching to generate region features, the details are as follows: the local region branch divides the feature map M evenly from top to bottom into K overlapping local regions while using the overlapping regions to enhance the robustness of the learning features to possible misalignments or viewpoint changes, applying the FC layer after embedding the pooling layer after each region to generate region features from each of them, supervising each region feature learning using a classification task with a pedestrian identity information ID tag;

s2.2.3 extracting attribute features, wherein the attribute branch takes the output of the first FC layer in the global branch as input, then the FC layer generates the attribute features, and finally the attribute features are learned in the attribute classification task;

s2.2.4 training feature vector model, taking the front and back features of the pedestrian as two different classes for training, repeating the above-mentioned S2.2.1, S2.2.2 and S2.2.3 training process and forming feature vector; each branch of the RAM is trained by a separate classification task with softmax penalty, and model training is performed by successively adding global branch, BN branch, attribute branch, and local region branch to train out a feature vector model that meets the requirements.

Preferably, the loss function employed in performing the PTGAN process in said step S2.1 is:

L_PTGAN＝L_Style+λ₁L_ID

wherein L is_StyleRepresenting loss of style or loss of area difference domain, L_IDRepresenting loss of identity information, λ, of the generated image₁Is a weight that balances style loss and identity information loss.

Preferably, said L_StyleThe concrete formula of (1) is as follows:

wherein A and B are two frames of image processed by GAN, G is image A-B style mapping function,

for the B to A style mapping function, λ₂Is the weight of the segmentation loss and the identity information loss.

Preferably, in step S2.1, the video image is further subjected to foreground segmentation by using PSPNet to obtain a mask layer regionL is_IDThe concrete formula of (1) is as follows:

wherein G (a) is a pedestrian image transferred in the image a,

is the pedestrian image, IE, shifted in the image b_a～pdata(a)For data distribution of image A, IE_b～pdata(b)For the data distribution of B, M (a) and M (B) are two divided mask regions.

Preferably, in the step S2.2.1, the generated features are: f. of_c、f_b、f_a、f_rt、f_rmAnd f_rbWherein f is_cFrom a global branch, f_bFrom BN branch, f_aFrom the attribute branch, f_rt、f_rmAnd f_rbLocal region branches from the head, upper body and lower body of the pedestrian, respectively; global and BN branches generate a global feature f from the entire feature map, respectively_cAnd f_b(ii) a The BN branch adds batch standardization operation to the global branch to learn complementary global features; the local area branching first divides the sketch into three overlapping areas, denoted respectively: top Rt, middle Rm, bottom Rb, then use three sets of fully connected layers to generate the region feature f from the corresponding region_rt，f_rmAnd f_rb。

Preferably, the method for extracting the corresponding features by the global branch is as follows: global branching first assembles the feature map M into a 6 × 6 × 512, and then uses two fully connected layers to generate the feature f_c；f_cTraining by using the pedestrian identity information ID in the classification task; the method for extracting corresponding features by the BN branch comprises the following steps: embedding BN layer between feature map M and pooling layer to generate new feature map M_bThen using two FC layers to generate feature f_b。

Preferably, in the step S2.2.4, the overall objective function adopted by the RAM in the plurality of classification tasks is:

L(Θ)＝l_conv+λ₃l_BN+λ₄l_re+λ₅l_att

wherein Θ represents a parameter in the deep model; l_conv，l_BN，l_reAnd l_attRespectively representing classification losses in global, BN, local regions and attribute branches; lambda [ alpha ]₃，λ₄，λ₅A weight representing the corresponding penalty; the above-mentioned_reConsisting of three equally weighted classification penalties for different regions.

Preferably, the segmenting the video key frame step in step S3 includes:

distinguishing I frame data and P frame data according to different frame data in the video file, and taking out key frame information of the video;

judging when the moving target appears or disappears, and obtaining the accurate time and the file position of the segmentation according to the moving target and the moving target;

intelligently segmenting and outputting the video file based on the basis of the intelligent video segmentation, wherein the basis of the intelligent video segmentation comprises the following constraint items: 1) the video segmentation points are video key frames; 2) a time point when a moving object begins to appear or just disappears in the video; 3) the fragment length cannot be less than 30 seconds and cannot exceed 6 minutes.

Preferably, in step S4, the similarity calculation formula is as follows:

according to the technical scheme provided by the invention, on the basis of the structural composition of a Spark big data platform and the intelligent analysis of big data, a pedestrian recognition algorithm based on deep learning, a pedestrian search technology based on video and a video key frame intelligent segmentation algorithm are sequentially realized, meanwhile, the pedestrian re-recognition is carried out by adopting the deep learning algorithm under the Spark open source framework, and the intelligent segmentation of the video data is improved, so that a better practical application effect is achieved. The method has the advantages of high reliability, good identification degree, good robustness, simple step calculation, high efficiency maintenance and real-time performance meeting requirements.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

FIG. 1 is an overall algorithm flow diagram of the present invention;

FIG. 2 is a schematic diagram of the assembly structure of the Spark platform of the present invention;

FIG. 3 is a diagram of a video intelligence analysis architecture in accordance with the present invention;

FIG. 4 is a comparison graph of the image conversion effect of pedestrians in the present invention;

fig. 5 is an overall structural view of the multiple branching structure of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, an embodiment of a big data pedestrian image search algorithm based on a combination of PTGAN region differences and a deep neural network according to the present invention includes the following steps:

s1, constructing a Spark big data platform based on the MLbase machine learning library.

Over the past decade, an extensible distributed programming framework has emerged to manage large data. The first programming model is MapReduce and its open source implementation Apache Hadoop. In recent years a new distributed framework Apache Spark has emerged. This is a platform for fast and versatile large-scale data processing. The Spark platform, based on memory computation, is naturally adapted to big data processing and analysis.

Spark has the advantages of Hadoop MapReduce; but different from MapReduce, the Job intermediate output result can be stored in a memory, and HDFS reading and writing is not needed any more, so Spark can be better suitable for MapReduce algorithms which need iteration, such as data mining, machine learning and the like. As the video data is stored in the HDFS file system, the Spark accesses a data source in a TCP socket-based mode, and intelligent video analysis is performed by using a Map-Reduce distributed computing model.

The component structure of Spark is shown in fig. 2. MLlib is a library of Spark implementations of commonly used machine learning algorithms. MLlib currently supports four common machine learning problems: binary classification, regression, clustering and collaborative filtering, and also comprises a bottom gradient descent optimization basic algorithm. The machine learning algorithm comprises two parts of training and predicting, wherein a model is trained, and then an unknown sample is predicted. MLbase is automatically optimized for distributed execution, and algorithm selection is achieved according to MLbase best practices and cost-based models. The system of the invention uses MLbase as a tool to process information feature detection and training processing of vehicles, human faces, pedestrians, left-over articles and the like in videos.

And after constructing a Spark big data platform based on the MLbase machine learning library according to the corresponding components, accessing a video image on the platform to perform subsequent algorithm operation.

S2, building a deep learning neural network based on combination of the PTGAN and the multiple branches, training a pedestrian image database, extracting corresponding image features, and completing the pedestrian re-recognition image database.

The basic network of the neural network is based on Resnet-50.

S2 specifically includes the following steps:

s2.1, carrying out PTGAN processing on the common video image to obtain an image to be identified, wherein the image to be identified is an image with unchanged pedestrian foreground and shifted background difference area.

PTGAN is a generative countermeasure network aimed at Re-identifying the Re-ID problem. In the inventionThe biggest characteristic of PTGAN is to realize the migration of background area differences on the premise of ensuring the foreground of the pedestrian as unchanged as possible. First, the loss function of the PTGAN network is: l is_PTGAN＝L_Style+λ₁L_ID

Wherein L is_StyleRepresenting the loss of the generated style or the loss of the area difference domain, namely whether the generated image is like a new dataset style; l is_IDRepresenting the loss of identity information of the generated image, namely verifying whether the generated image is the same person as the original image; lambda [ alpha ]₁Is a weight that balances style loss and identity information loss;

the style loss L_StyleThe concrete formula of (1) is as follows:

for the B to A style mapping function, λ₂Is the weight of the segmentation loss and the identity information loss. The purpose of this partial loss function is to ensure that the difference region (domain) of the generated picture and the desired data set is the same.

Secondly, in order to ensure that the foreground is not changed in the picture migration process, the PSPNet is used for carrying out foreground segmentation on the video image to obtain a mask area. Generally, conventional generation of countermeasure networks such as CycleGAN is not used for Re-ID tasks, and therefore there is no need to ensure that the identity information of the foreground object is unchanged, with the result that the foreground may be of poor quality such as blurred, and worse, the appearance of pedestrians may change. To solve this problem, the present invention proposes L_IDAnd loss, performing foreground segmentation on the video image by using the PSPNet, wherein the foreground is a mask layer area, and the final loss of the identity information is as follows:

wherein G (a) is a pedestrian image transferred in the image a,

is the pedestrian image, IE, shifted in the image b_a～pdata(a)For data distribution of image A, IE_b～pdata(b)For the data distribution of B, m (a) and m (B) are two segmented mask regions, and the identity information Loss function (Loss) will constrain the pedestrian foreground to remain as unchanged as possible during the migration process.

The final conversion effect is shown in fig. 4, and it can be seen that the algorithm of the invention intuitively can better ensure the identity information of the pedestrian compared with the traditional ring generation countermeasure network (CycleGAN).

S2.2, performing multi-branch combined training on the video pedestrian image subjected to background difference region migration processing based on the PTGAN algorithm, and obtaining more accurate identity information in order to effectively combine global and local information of pedestrians;

the overall structure diagram of the multiple branch structure is shown in fig. 5, and the specific steps of the algorithm are as follows:

s2.2.1, inputting the processed image to be recognized into the training model to obtain the global feature vector, which is as follows:

given an input processed pedestrian image, the RAM generates a set of function vectors, specifically a feature map M for five shared convolutional layers, and then M is fed to four branches to generate different features, the four branches including a global branch, a BN branch, an attribute branch, and a local region branch; then fed to four branches to generate different features, f_cFrom a global branch, f_bFrom BN branch, f_aFrom the attribute branch, f_rtf_rmAnd f_rbLocal region branches from the head, upper body and lower body of the pedestrian, respectively; global and BN branches generate a global feature f from the entire feature map, respectively_cAnd f_b(ii) a The BN branch adds batch standardization operation to the global branch to learn complementary global features; the regional branches will firstThe element map is divided into three overlapping regions, which are respectively expressed as: top Rt, middle Rm, bottom Rb, then use three sets of fully connected layers to generate the region feature f from the corresponding region_rt，f_rmAnd f_rb；

The method for extracting the corresponding features by the global branch comprises the following steps: global branching first assembles the feature map M into 6 × 6 × 512, and then uses two fully connected layer (FC) layers to generate the feature f_c；f_cTraining by using the pedestrian identity information ID in the classification task; the network structure and training strategy encourages network localization and focus on areas that are discriminative in the classification of the target pedestrian, i.e., localized areas that effectively minimize classification loss; the corresponding feature map M for global branch learning will show the activation values for these higher regions; the highly activated areas cover different areas on the pedestrian ID image and are critical for pedestrian information classification.

In addition to the area highlighted on M, other areas can be used for Re-identification Re-ID of pedestrians; in order to make the model focus on more and larger context areas, the invention designs a BN branch at the same time, and the specific branch structure is shown in the general structure of FIG. 5.

The method for extracting corresponding features by the BN branch comprises the following steps: embedding a BN layer between M and pooling layers to generate a new feature map M_bThen use both layers to generate feature f_b(ii) a Similarly, the classification task based on the identity information of the pedestrian is finally used to train the BN branch.

BN operations tend to suppress highly activated local regions on the feature map and increase visibility of other regions; this causes the BN branch to depict additional context cues in addition to those captured by the global branch; it is clear that Mb depicts a larger context region, which may result in complementary global features.

S2.2.2 local feature extraction

In general, the difference between the identities of similar pedestrians may exist in some local areas, therefore, the invention designs a local area branch to generate the area characteristics, which is as follows:

first, the local region branch divides the feature map M evenly from top to bottom into K overlapping local regions, which are denoted as: rt represents the head, Rm represents the upper body, Rb represents the lower body, each of these regions corresponding to only a portion of the entire pedestrian;

while using overlapping regions to enhance the robustness of the learning features to possible misalignments or viewpoint variations, applying the FC layer after embedding the pooling layer after each region to generate region features from each of them, i.e. f to which Rt belongs_rtF from Rm_rmAnd f from Rb_rb；

Finally, a classification task with a pedestrian identity Information (ID) tag is used to supervise feature learning of each region; during training of each branch, updating the FC layer to identify only pedestrian images having a portion of the feature map as input;

in this process, the network is forced to extract the discrimination details in each region, and obviously, more distinctive local regions than the feature map of the global branch can be identified.

S2.2.3 Attribute feature extraction

Clothing, pedestrian attributes such as color and the like can be regarded as middle-level description of pedestrians, and compared with visual features, the attribute features have stronger robustness to appearance changes caused by changes of viewpoints, lighting, backgrounds and the like; thus, the attribute features are complementary to the visual features that can be extracted on the global and local images, so the present invention uses the attributes to learn the features of the pedestrian Re-identification Re-ID.

In general, attribute prediction can be considered as an easier recognition method than fine-grained pedestrian recognition, the invention learns attribute characteristics from an attribute branch of a pedestrian Re-ID, and the attribute branch takes the output of the first FC layer in a global branch as input; the attribute feature f is then generated by the FC layer_a(ii) a Finally, learning attribute features in an attribute classification task; compared with the direct learning of the attribute characteristics of the input image, the strategy introduces fewer parameters and makes the training process easier.

S2.2.4 feature vector model training

In order to more rapidly and accurately identify the identity information of the pedestrian, the front side and the back side of the pedestrian are used as two different types of training, the training process is repeated, and a feature vector is formed;

in the algorithm, each branch of the RAM is trained by a single classification task with softmax loss, the RAM is optimized in a plurality of classification tasks, and the overall objective function can be expressed as:

L(Θ)＝l_conv+λ₃l_BN+λ₄l_re+λ₅l_att，

wherein Θ represents a parameter in the deep model; l_conv，l_BN，l_reAnd l_attRepresents classification losses in global, BN, local and attribute branches, respectively; lambda [ alpha ]₃，λ₄，λ₅A weight representing the corresponding penalty; wherein l_reConsists of three equal-weight classification losses of different regions;

usually, four branches are trained at the beginning, and usually convergence is difficult, so the algorithm of the invention adopts step-by-step model training; we first train a model with only global branches, where other branches, i.e. BN, local and attribute branches are added in order, convolutional layers will be shared by different branches and fine-tuned in multiple classification tasks, as shown above, and finally train a feature vector model that meets the requirements.

And S3, transmitting the video file into a Spark big data platform, segmenting the video key frame, and extracting characteristic information of the target image based on a deep learning algorithm.

And transmitting the real-time video or video files to a Spark big data platform. The real-time video has no ending and does not contain starting point information, so that the parallel operation is not supported; for historical videos, if the video files are intelligently segmented, the video files can be subjected to parallel operation. Video image data is automatically segmented into video segments through a Map method, then a deep learning algorithm of the video image is accessed for processing, and a processing result is transmitted to a Reduce method for automatic convergence and data storage. The structure of the video intelligent analysis is shown in figure 3.

The video segmentation processing is carried out based on the video key frames, so that the video can be better parallelized.

The segmentation process is roughly divided into the following two steps: 1. distinguishing I frame data and P frame data according to the difference of different frame data in the video file, and taking out key frame information of the video to serve as a segmentation key point of the video file; 2. by combining the existing moving target-based detection method, when a moving target appears or disappears, judgment is carried out, the accurate time and the file position of the segmentation are obtained according to the moving target and the moving target, and the basis of the intelligent video segmentation is 3 constraint items as follows: a moving object in the video, such as a time point when a target pedestrian begins to appear or just disappears; the video segmentation points are required to be video key frames, and the complete video image can be obtained only by the files which are segmented according to the video key frames; the fragment length cannot be less than 30 seconds and cannot exceed 6 minutes. And finally, outputting the video image subjected to intelligent segmentation.

In addition, in consideration of the requirement of target detection in practical application, the aspect ratio of the picture is variable when the dimension of the search area is set, and the size of the overall picture is not changed. This not only helps to increase the processing requirements of the video image, but also greatly reduces the amount of computation. For the original input picture, the RPN network will get about twenty thousand search boxes. In practical application, some search boxes beyond the picture boundary can be eliminated; meanwhile, for the search frames overlapped and covered by the same target, a Non-Maximum Suppression (NMS) method is adopted for processing so as to achieve the purpose of removing the overlapped search frames. The strategy can remarkably improve the searching efficiency of the candidate target frame.

Finally, feature vectors are extracted and stored through the deep learning network established in the step S2, and it is noted that the more training images are input, the more accurate the model is, and the wider the coverage range is.

The pedestrian target training is carried out through a huge amount of pedestrian image learning samples, a large amount of on-site system adjustment and testing are carried out, various characteristics such as appearance outlines, relative positions, colors and textures of various parts such as clothes, faces, upper half bodies and lower half bodies are collected, a large amount of auxiliary classification information is formed, and a comprehensive credibility score can be finally obtained together with the results such as the ages and the sexes of pedestrians.

The similarity calculation usually adopts cosine distance calculation, and cosine similarity uses a cosine value of an included angle between two vectors in a vector space as a measure of the difference between two individuals; compared with distance measurement, cosine similarity emphasizes the difference of two vectors in direction rather than distance or length, and the formula is as follows:

the smaller the calculated numerical value is, the higher the similarity is, and finally the final re-recognition system model is output in combination;

the method provided by the invention can be actually embedded into an FPGA (field programmable gate array) to realize, and is applied to a system with real-time pedestrian re-identification.

In the description herein, references to the description of the term "one embodiment," "another embodiment," or "first through xth embodiments," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, method steps, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A big data pedestrian image search algorithm based on combination of a PTGAN region gap and a deep neural network is characterized by comprising the following steps:

2. The big-data pedestrian image search algorithm based on the combination of PTGAN region gaps and deep neural network as claimed in claim 1, wherein the step S2 comprises the steps of:

3. The big-data pedestrian image search algorithm based on combination of PTGAN region gaps and deep neural network as claimed in claim 2, wherein the loss function adopted in performing the PTGAN process in the step S2.1 is:

L_PTGAN＝L_Style+λ₁L_ID

4. The big-data pedestrian image search algorithm based on combination of PTGAN region gaps and deep neural network as claimed in claim 3, wherein L is_StyleThe concrete formula of (1) is as follows:

5. The big data pedestrian image searching algorithm based on the combination of the PTGAN region gap and the deep neural network as claimed in claim 3,characterized in that in step S2.1, the video image is further foreground-segmented by PSPNet to obtain a mask layer region, and the L_IDThe concrete formula of (1) is as follows:

wherein G (a) is a pedestrian image transferred in the image a,

6. The big-data pedestrian image search algorithm based on the combination of PTGAN region gaps and deep neural network as claimed in claim 2, wherein in the step S2.2.1, the generated features are: f. of_c、f_b、f_a、f_rt、f_rmAnd f_rbWherein f is_cFrom a global branch, f_bFrom BN branch, f_aFrom the attribute branch, f_rt、f_rmAnd f_rbLocal region branches from the head, upper body and lower body of the pedestrian, respectively; global and BN branches generate a global feature f from the entire feature map, respectively_cAnd f_b(ii) a The BN branch adds batch standardization operation to the global branch to learn complementary global features; the local area branching first divides the sketch into three overlapping areas, denoted respectively: top Rt, middle Rm, bottom Rb, then use three sets of fully connected layers to generate the region feature f from the corresponding region_rt，f_rmAnd f_rb。

7. The big-data pedestrian image search algorithm based on the combination of the PTGAN region gap and the deep neural network as claimed in claim 6, wherein the method for extracting the corresponding features by the global branch is as follows: global branchingFeature maps M are first assembled into 6 × 6 × 512, and then features f are generated using two fully connected layers_c；f_cTraining by using the pedestrian identity information ID in the classification task; the method for extracting corresponding features by the BN branch comprises the following steps: embedding BN layer between feature map M and pooling layer to generate new feature map M_bThen using two FC layers to generate feature f_b。

8. The big-data pedestrian image searching algorithm based on the combination of PTGAN region gaps and deep neural network as claimed in claim 7, wherein in the step S2.2.4, the overall objective function adopted by the RAM in the plurality of classification tasks is:

L(Θ)＝l_conv+λ₃l_BN+λ₄l_re+λ₅l_att

9. The big-data pedestrian image searching algorithm based on the combination of PTGAN region gaps and deep neural network as claimed in claim 1, wherein the step of segmenting the video key frames in the step S3 comprises:

10. The big-data pedestrian image searching algorithm based on the combination of the PTGAN region gap and the deep neural network as claimed in claim 1, wherein in the step S4, the similarity calculation formula is as follows: