CN111126250A

CN111126250A - Pedestrian re-identification method and device based on PTGAN

Info

Publication number: CN111126250A
Application number: CN201911327963.1A
Authority: CN
Inventors: 张斯尧; 王思远; 谢喜林; 张�诚; 黄晋; 文戎; 田磊
Original assignee: Changsha Qianshitong Intelligent Technology Co ltd
Current assignee: Changsha Qianshitong Intelligent Technology Co ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2020-05-08

Abstract

The invention discloses a pedestrian re-identification method and device based on PTGAN, wherein the method comprises the following steps: acquiring a first image which is acquired by a first camera and contains a target object; inputting the first image into a trained PTGAN model, and realizing the migration of a background difference area on the premise of realizing the unchanged foreground of the pedestrian to obtain a second image with the same style as the image shot by the second camera; extracting the pedestrian features of the second image; and calculating the similarity between the pedestrian feature vector extracted from the second identification image and the pedestrian image feature vector shot by the second camera according to the cosine distance, and acquiring the pedestrian image with the highest similarity with the target object according to the similarity. The invention solves the problems of high cross-camera retrieval difficulty and low re-identification accuracy of the pedestrian re-identification method in the prior art.

Description

Pedestrian re-identification method and device based on PTGAN

Technical Field

The invention relates to the technical field of computer vision and smart cities, in particular to a pedestrian re-identification method and device based on PTGAN, terminal equipment and a computer readable medium.

Background

With the continuous development of artificial intelligence, computer vision and hardware technology, video image processing technology has been widely applied to intelligent city systems.

Pedestrian Re-identification (Person Re-identification) is also called pedestrian Re-identification, abbreviated Re-ID. The method is a technology for judging whether a specific pedestrian exists in an image or a video sequence by utilizing a computer vision technology. Is widely considered as a sub-problem for image retrieval. Given a monitored pedestrian image, the pedestrian image is retrieved across the device. Due to the difference between different camera devices and the characteristic of rigidity and flexibility of pedestrians, the appearance is easily affected by wearing, size, shielding, posture, visual angle and the like, so that the pedestrian re-identification becomes a hot topic which has research value and is very challenging in the field of computer vision.

Currently, although the detection capability of pedestrian re-identification has been significantly improved, many challenging problems have not been completely solved in practical situations: for example, the search across cameras is often difficult, and due to the difference in fields and different cameras having different styles, such as background, lighting conditions, camera parameters, etc., it is difficult to search the pedestrian pictures captured by the camera a in the camera B, and the re-recognition accuracy is low.

Disclosure of Invention

In view of the above, the present invention provides a pedestrian re-identification method, apparatus, terminal device and computer readable medium based on PTGAN, which can improve the accuracy of pedestrian re-identification under different cameras, and solve the problems of difficult cross-camera search and low re-identification accuracy of the pedestrian re-identification method in the prior art.

The first aspect of the embodiment of the invention provides a pedestrian re-identification method based on PTGAN, which comprises the following steps:

acquiring a first image which is acquired by a first camera and contains a target object;

inputting the first image into a trained PTGAN model, and realizing the migration of a background difference region on the premise of keeping the foreground of the pedestrian unchanged to obtain a second image with the same style as the image shot by the second camera;

extracting pedestrian features of the second image;

and calculating the similarity between the pedestrian feature vector extracted from the second identification image and the pedestrian image feature vector shot by the second camera according to the cosine distance, and acquiring the pedestrian image with the highest similarity with the target object according to the similarity.

Further, before inputting the first image into the trained PTGAN model, the method further comprises:

constructing a network model based on PTGAN;

taking the video image acquired by the first camera and the video image acquired by the second camera as parameter values of target parameters for training a network model based on the PTGAN in a training set, and converting the video image acquired by the first camera into an image with the same style as the video image acquired by the second camera through training and iterative feedback;

wherein the loss function expression of the PTGAN-based network model is as follows:

in the formula L_StyleRepresenting a loss of generated style or regional differences, L_IDRepresenting a loss of identity of the generated image. Lambda [ alpha ]₁Is to balance L_StyleAnd L_IDThe weight of (c).

Further, after constructing the PTGAN-based network model, the method further comprises:

performing foreground segmentation on the first video image sequence by using PSPNet to obtain a mask layer area, wherein the identity loss L is_IDThe expression of (a) is shown as:

wherein G (a) is a pedestrian image transferred in the image a,

is the pedestrian image that is shifted in the image b,

for the data distribution of the video image captured by the first camera,

for the data distribution of the video image acquired by the second camera, m (a) and m (b) are two segmented mask regions.

Further, the extracting the pedestrian feature of the second image includes:

extracting appearance characteristics of the second image based on the trained AlexNet model;

facial features of the second image are extracted based on the trained VGG-16 model.

A second aspect of an embodiment of the present invention provides a pedestrian re-identification apparatus based on PTGAN, including:

the acquisition module is used for acquiring a first image which is acquired by the first camera and contains a target object;

the PTGAN module is used for inputting the first image into a trained PTGAN model, and realizing the migration of a background difference region on the premise of keeping the foreground of a pedestrian unchanged to obtain a second image with the same style as the image shot by the second camera;

the feature extraction module is used for extracting the pedestrian features of the second image;

and the identification module is used for calculating the similarity between the pedestrian feature vector extracted from the second identification image and the pedestrian image feature vector shot by the second camera according to the cosine distance, and acquiring the pedestrian image with the highest similarity with the target object according to the similarity.

Further, the apparatus further comprises:

the PTGAN construction module is used for constructing a network model based on the PTGAN;

the PTGAN training module is used for taking the video image acquired by the first camera and the video image acquired by the second camera as parameter values of target parameters of a network model based on the PTGAN trained by a training set, and converting the video image acquired by the first camera into an image with the same style as the video image acquired by the second camera through training and iterative feedback;

Further, the apparatus further comprises:

a foreground segmentation module for performing foreground segmentation on the first video image sequence using PSPNet to obtain a mask layer region, the identity loss L_IDThe expression of (a) is shown as:

wherein G (a) is a pedestrian image transferred in the image a,

is the pedestrian image that is shifted in the image b,

for the data distribution of the video image captured by the first camera,

Further, the feature extraction module comprises:

the appearance characteristic module is used for extracting appearance characteristics of the second image based on the trained AlexNet model;

and the facial feature module is used for extracting the facial features of the second image based on the trained VGG-16 model.

A third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the PTGAN-based pedestrian re-identification method when executing the computer program.

A fourth aspect of the embodiments of the present invention provides a computer-readable medium, which stores a computer program that, when being processed and executed, implements the steps of the above-mentioned PTGAN-based pedestrian re-identification method.

In the embodiment of the invention, the first image acquired by the first camera is input into the trained PTGAN model, so that the migration of the background difference area is realized on the premise that the foreground of the pedestrian is not changed, and the second image with the same style as the image shot by the second camera can be obtained, thereby improving the accuracy of pedestrian re-identification under different cameras, and solving the problem that the image shot in one camera is difficult to search in the other camera due to the field difference or the different styles of the cameras in the prior art.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of a pedestrian re-identification method based on PTGAN according to an embodiment of the present invention;

FIG. 2 is a comparison graph of real-time conversion effects of different pedestrian re-identification methods provided by the embodiment of the invention;

fig. 3 is a schematic structural diagram of a pedestrian re-identification apparatus based on PTGAN according to an embodiment of the present invention;

FIG. 4 is a detailed structure diagram of a feature extraction module provided in an embodiment of the present invention;

fig. 5 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Referring to fig. 1, fig. 1 is a flowchart illustrating a pedestrian re-identification method based on PTGAN according to an embodiment of the present invention. As shown in fig. 1, the PTGAN-based pedestrian re-identification method of the present embodiment includes the following steps:

step S102, acquiring a first image which is acquired by a first camera and contains a target object;

step S104, inputting the first image into a trained PTGAN model, and realizing the migration of a background difference area on the premise of unchanging the foreground of a pedestrian to obtain a second image with the same style as the image shot by the second camera;

constructing a network model based on PTGAN;

ptgan (person Transfer gan) is a generative countermeasure network aimed at Re-identifying Re-ID problems. In the invention, the biggest characteristic of the PTGAN is to realize the migration of the difference of the background area on the premise of ensuring the unchanged foreground of the pedestrian as much as possible. First, the loss function of the PTGAN network consists of two parts:

wherein L is_StyleRepresenting the loss of the generated style, or domain difference loss, is whether the generated image resembles a new dataset style. L is_IDThe loss of identity representing the generated image is to verify that the generated image is the same person as the original image. λ there₁Is a weight that balances the two losses. These two losses are defined as follows:

firstly, the Loss function (Loss) of the PTGAN is divided into two parts; the first part is L_StyleThe concrete formula is as follows:

wherein the content of the first and second substances,

represents a loss of resistance in the standard,

representing a loss of periodic consistency, A, B is a two frame GAN processed image, let G be the image a to B style mapping function,

for the style mapping function of B to a, λ 2 is the weight of segmentation loss and identity loss.

The above parts are all normal losses of PTGAN in order to ensure that the difference area (domain) of the generated picture and the desired data set is the same.

Secondly, in order to ensure that the foreground is not changed in the process of image migration, a foreground segmentation is firstly carried out on the video image by using the PSPNet to obtain a mask (mask layer) area. Generally, conventional generation of countermeasure networks such as CycleGAN is not used for Re-ID tasks, and therefore there is no need to ensure that the identity information of the foreground object is unchanged, with the result that the foreground may be of poor quality such as blurred, and worse, the appearance of pedestrians may change. To solve this problem, the present invention proposes L_IDLoss, foreground extracted by PSPNet, this foreground is a mask, and the final loss of identity information is:

wherein, M (a) and M (b) are two divided foreground mask layers, and the identity information Loss function (Loss) can restrain the foreground of the pedestrian to keep unchanged as much as possible in the migration process.

Wherein G (a) is a pedestrian image transferred in the image a,

is the pedestrian image that is shifted in the image b,

is a distribution of the data of a,

is the data distribution of B, M (a) and M (B) areTwo divided mask areas.

Fig. 2 shows a comparison graph of real-time conversion effects of different pedestrian re-identification methods, wherein the first row of pictures is pictures to be converted, and the fourth row shows the result of PTGAN conversion, and it can be seen that the image quality generated by PTGAN is higher than that of the third row of pictures using Cycle-GAN conversion results. For example, the appearance of the person remains the same and the style is effectively transferred to another camera. Shadows, road markings and backgrounds are automatically generated, similar to the effect captured by another camera. Meanwhile, PTGAN can handle the noise segmentation result generated by PSPNet well. The algorithm provided by the invention can intuitively ensure the identity information of the pedestrian compared with the traditional annular generation countermeasure network (cycleGAN).

And step S106, extracting the pedestrian feature of the second image.

Appearance-based attributes are first extracted from human detection, which capture the traits and characteristics of an individual in the form of appearance. Common to the image representations is the Convolutional Neural Network (CNN). The present invention uses an AlexNet model pre-trained on ImageNet as an extractor of appearance characteristics. This is done by removing the top output layer and using the activation of the last fully connected layer as a feature (length 4096). The AlexNet architecture includes five convolutional layers, three fully connected layers, and three largest pool layers immediately following the first, second, and fifth convolutional layers. The first convolution layer has 96 filters of size 11 x 11, the second layer 256 filters of size 5 x 5, the third, fourth and fifth layers are connected to each other without any interference pool and have 384/384 and 256 filters of size 3 x 3, respectively. Fully connected layer L learning nonlinear function

Wherein

W and b are input data X_iHas respective weights and offsets, and f is a corrective linear unit that activates the hidden layer.

Further, facial features are extracted, and biometric identification of human faces is an established biometric identification technology for identity identification and verification. The face morphology can be used for re-recognition because it is essentially a non-contact biometric and can be extracted remotely. The invention extracts facial features from the facial bounding box using a VGG-16 model pre-trained on ImageNet. This is done by removing the top output layer and using the activation of the last fully connected layer as a facial feature (length 4096). VGG-16 is a convolutional neural network, the structure of which is composed of 13 convolutional layers and 3 fully-connected layers, and the filter size is 3 x 3. The pool will be applied between convolution layers with a 2 x 2 pixel window, with a step of 2. The average subtraction of the training set is used as a pre-processing step.

And step S108, calculating the similarity between the pedestrian feature vector extracted from the second identification image and the pedestrian image feature vector shot by the second camera according to the cosine distance, and acquiring the pedestrian image with the highest similarity with the target object according to the similarity.

And calculating the similarity by adopting the cosine distance, wherein the cosine similarity uses the cosine value of an included angle between two vectors in a vector space as the measure of the difference between the two individuals. Cosine similarity emphasizes the difference of two vectors in direction rather than distance or length, compared to distance measurement. The formula is as follows:

wherein, X represents the pedestrian feature vector extracted from the second identification image, and Y represents the pedestrian image feature vector shot by the second camera.

Referring to fig. 3, fig. 3 is a block diagram illustrating a pedestrian re-identification apparatus based on PTGAN according to an embodiment of the present invention. As shown in fig. 3, the PTGAN-based pedestrian re-identification 20 of the present embodiment includes an acquisition module 202, a PTGAN module 204, a feature extraction module 206, and an identification module 208. The obtaining module 202, the PTGAN module 204, the feature extracting module 206 and the identifying module 208 are respectively configured to perform the specific methods in S102, S104, S106 and S108 in fig. 1, and details can be referred to in the related introduction of fig. 1 and are only briefly described here:

the acquisition module 202 is configured to acquire a first image which includes a target object and is acquired by a first camera;

the PTGAN module 204 is configured to input the first image into a trained PTGAN model, and obtain a second image with the same style as an image shot by the second camera by implementing migration of a background difference region on the premise that a foreground of a pedestrian is not changed;

a feature extraction module 206, configured to extract pedestrian features of the second image;

the identification module 208 is configured to calculate a similarity between the pedestrian feature extracted from the second identification image and a pedestrian image feature vector captured by the second camera according to the cosine distance, and obtain a pedestrian image with the highest similarity to the target object according to the similarity.

Further, the PTGAN-based pedestrian re-recognition apparatus further includes:

the PTGAN training module is used for taking the video image acquired by the first camera and the video image acquired by the second camera as parameter values of target parameters of a network model based on the PTGAN for training by a training set, and converting the video image acquired by the first camera into an image with the same style as the video image acquired by the second camera through training and iterative feedback;

wherein, the loss function expression of the network model based on the PTGAN is shown as follows:

Further, the PTGAN-based pedestrian re-recognition apparatus further includes:

a foreground segmentation module for performing foreground segmentation on the first video image sequence by using PSPNet to obtain a mask layer region with an identity loss L_IDThe expression of (a) is shown as:

wherein G (a) is a pedestrian image transferred in the image a,

is the pedestrian image that is shifted in the image b,

for the data distribution of the video image captured by the first camera,

Further, as can be seen in fig. 4, the feature extraction module 206 includes:

an appearance feature module 2061, configured to extract an appearance feature of the second image based on the trained AlexNet model;

a facial feature module 2062, configured to extract facial features of the second image based on the trained VGG-16 model.

In the embodiment of the invention, the first image acquired by the first camera is input into the trained PTGAN model through the PTGAN module 204, the migration of the background difference region is realized on the premise that the foreground of the pedestrian is not changed, and the second image with the same style as the image shot by the second camera can be obtained, so that the accuracy of pedestrian re-identification under different cameras is improved, and the problem that the image shot in one camera is difficult to search in the other camera due to the field difference or the different styles of the cameras in the prior art is solved.

Fig. 5 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 5, the terminal device 10 of this embodiment includes: a processor 100, a memory 101 and a computer program 102 stored in the memory 101 and executable on the processor 100, for example a program for pedestrian re-identification based on PTGAN. The processor 100, when executing the computer program 102, implements the steps in the above-described method embodiments, e.g., the steps of S102, S104, S106, S108 shown in fig. 1. Alternatively, the processor 100, when executing the computer program 102, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the obtaining module 202, the PTGAN module 204, the feature extracting module 206 and the identifying module 208 shown in fig. 5.

Illustratively, the computer program 102 may be partitioned into one or more modules/units that are stored in the memory 101 and executed by the processor 100 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 102 in the terminal device 10. For example, the computer program 102 may be partitioned into an acquisition module 202, a PTGAN module 204, a feature extraction module 206, and a recognition module 208. (modules in the virtual device), the specific functions of each module are as follows:

The terminal device 10 may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. Terminal device 10 may include, but is not limited to, a processor 100, a memory 101. Those skilled in the art will appreciate that fig. 5 is merely an example of a terminal device 10 and does not constitute a limitation of terminal device 10 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The Processor 100 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 101 may be an internal storage unit of the terminal device 10, such as a hard disk or a memory of the terminal device 10. The memory 101 may also be an external storage device of the terminal device 10, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 10. Further, the memory 101 may also include both an internal storage unit of the terminal device 10 and an external storage device. The memory 101 is used for storing the computer program and other programs and data required by the terminal device 10. The memory 101 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A pedestrian re-identification method based on PTGAN is characterized by comprising the following steps:

extracting pedestrian features of the second image;

2. The PTGAN-based pedestrian re-recognition method of claim 1, wherein prior to inputting the first image into a trained PTGAN model, the method further comprises:

constructing a network model based on PTGAN;

3. The PTGAN-based pedestrian re-identification method according to claim 2, wherein after constructing the PTGAN-based network model, the method further comprises:

wherein G (a) is a pedestrian image transferred in the image a,

is the pedestrian image that is shifted in the image b,

for the data distribution of the video image captured by the first camera,

4. The PTGAN-based pedestrian re-recognition method according to claim 3, wherein the extracting the pedestrian feature of the second image comprises:

5. A pedestrian re-identification device based on PTGAN is characterized by comprising:

6. The PTGAN-based pedestrian re-identification device according to claim 5, further comprising:

7. The PTGAN-based pedestrian re-identification device according to claim 6, further comprising:

wherein G (a) is a pedestrian image transferred in the image a,

is the pedestrian image that is shifted in the image b,

for the data distribution of the video image captured by the first camera,

8. The PTGAN-based pedestrian re-recognition device according to claim 3, wherein the feature extraction module comprises:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-4 when executing the computer program.

10. A computer-readable medium, in which a computer program is stored which, when being processed and executed, carries out the steps of the method according to any one of claims 1 to 4.