CN110502975B

CN110502975B - Batch processing system for pedestrian re-identification

Info

Publication number: CN110502975B
Application number: CN201910616631.9A
Authority: CN
Inventors: 郭玲玲
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-07-09
Filing date: 2019-07-09
Publication date: 2023-06-23
Anticipated expiration: 2039-07-09
Also published as: CN110502975A

Abstract

The invention discloses a batch processing system for pedestrian re-identification, relates to the technical field of data processing, and aims to solve the problem of low batch processing speed for pedestrian re-identification in the prior art. The system mainly comprises: a central processing unit CPU and a graphics processing unit GPU; the CPU reads the image data of the image to be processed into the memory; the CPU performs matrix splicing on the picture data to generate batch image data, and the batch image data is stored in the memory; the GPU reads the batch image data into a video memory; the GPU inputs the batch image data into a residual neural network model, extracts batch image characteristics of the batch image data, and stores the batch image characteristics in a video memory; the GPU reads the batch image characteristics into the memory; the CPU decomposes the batch image features into image feature vectors corresponding to the images to be processed one by one so as to identify the images to be processed. The system is mainly applied to the pedestrian re-identification process.

Description

Batch processing system for pedestrian re-identification

Technical Field

The invention relates to the technical field of data, in particular to a batch processing system for pedestrian re-identification.

Background

Pedestrian re-recognition refers to a technique of determining whether a particular pedestrian is present in an image or video sequence. The current popular practice is to train a deep learning model, input a pedestrian picture, output a feature vector, and then judge whether the person is the same person according to the similarity of the feature vector. When the terminal or the server terminal carries out pedestrian re-identification, the method generally comprises the steps that a CPU reads image data of each picture into a memory, then a drive program of a GPU reads the image data in the memory into a video memory, the GPU uses the image data in the video memory to calculate, the calculated picture feature data is stored in the video memory, then the drive program of the GPU reads the picture feature data in the video memory into the memory, and finally the CPU obtains the picture feature data in the memory.

Before GPU calculation, the image needs to be moved from the memory to the video memory, and the time T consumed by one data exchange between the memory and the video memory is composed of two parts: the time T < data > of the data exchange itself and the additional preparation time T < ext >, i.e., t=t < data > +t < ext >. Assuming that the data exchange amount is N, the time complexity relationship between T < data > and N is T < data > -O (N), but T < ext > is irrelevant to N and is related to the number of data exchanges, and the time complexity relationship is T < ext > -O (1). If the data exchange amount is N and the data exchange is performed in N times, the time complexity relation between T < data > and N is T < data > to O (N), and the time complexity relation between T < ext > and N is T < ext > to O (N), so the total time T to O (N) +O (N) required by the data exchange is required, and when the data exchange amount N is unchanged, the transmission times N are reduced, and the total time T can be reduced.

Because the image data is moved from the memory to the directly accessible video memory of the GPU before calculation, the moving process needs to consume extra time, so that when the number of pedestrians to be processed is large, the wasted time is relatively large, the calculation capability of the GPU cannot be fully utilized, and the speed of re-identifying the pedestrians in batch processing is relatively low.

Disclosure of Invention

In view of the above, the present invention provides a batch processing system for pedestrian re-recognition, and is mainly aimed at solving the problem of slow batch processing speed for pedestrian re-recognition in the prior art.

According to one aspect of the present invention, there is provided a batch processing system for pedestrian re-recognition, including a central processing unit CPU and a graphics processing unit GPU;

the CPU reads image data of the image to be processed into a memory;

the CPU performs matrix splicing on the picture data to generate batch image data, and the batch image data is stored in the memory;

the GPU reads the batch of image data into a video memory;

the GPU inputs the batch image data into a residual neural network model, extracts batch image characteristics of the batch image data, and stores the batch image characteristics in the video memory;

the GPU reads the batch image characteristics into the memory;

the CPU decomposes the batch image features into image feature vectors corresponding to the images to be processed one by one so as to facilitate identification of the images to be processed.

According to yet another aspect of the present invention, there is provided a storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to a batch processing system as described above for pedestrian re-identification.

According to still another aspect of the present invention, there is provided a computer apparatus including: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the batch processing system for pedestrian re-identification.

By means of the technical scheme, the technical scheme provided by the embodiment of the invention has at least the following advantages:

the invention provides a batch processing system for pedestrian re-identification, which comprises a Central Processing Unit (CPU) and a Graphic Processor (GPU), wherein the CPU firstly reads image data of an image to be processed into a memory, then the CPU performs matrix splicing on the image data to generate batch image data, then the GPU reads the batch image data into a video memory, then the GPU inputs the batch image data into a residual neural network model, batch image characteristics of the batch image data are extracted and stored in the video memory, then the GPU reads the batch image characteristics into the memory, and finally the CPU decomposes the batch image characteristics into image characteristic vectors corresponding to the image to be processed one by one so as to facilitate identification of the image to be processed. Compared with the prior art, the embodiment of the invention can read the data of the plurality of images into the video memory once by splicing the data in one matrix, then extract the image characteristics of all the images read into the video memory once, realize the batch processing of the pedestrian re-identification images by saving the transmission time, and improve the identification speed of the pedestrian re-identification.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 shows a flow chart of a batch processing system for pedestrian re-identification provided by an embodiment of the invention;

FIG. 2 illustrates another flow chart of a batch processing system for pedestrian re-identification provided by an embodiment of the invention;

fig. 3 shows a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The embodiment of the invention provides a batch processing system for pedestrian re-identification, which comprises a Central Processing Unit (CPU) and a Graphic Processing Unit (GPU) as shown in figure 1.

The CPU is a very large scale integrated circuit, is an operation core and a control core of an electronic device, and is mainly used for data processing. The graphic processor GPU is a microprocessor specially used for image operation. Since pedestrian re-recognition is actually recognition image data, the image data is processed using both the CPU and the GPU for pedestrian re-recognition.

101. The CPU reads the image data of the image to be processed into the memory.

The images to be processed refer to all the images for which the pedestrian re-recognition is required. The CPU reads the image data of the image to be processed into the memory, namely, the CPU transmits an instruction to the memory, the memory searches the image data and the hard disk storage position thereof according to the instruction, and then reads the image data from the hard disk and stores the image data in the memory.

102. And the CPU performs matrix splicing on the picture data to generate batch image data, and the batch image data is stored in the memory.

Before matrix stitching, whether the sizes of the images to be detected are the same or not needs to be detected, if the sizes of the images are the same, stitching is directly prepared, and if the sizes of the images are different, the images to be processed are corrected to be the same size and then stitching is performed. And calculating the maximum number of images which can be carried in at the same time according to the image size and the video memory capacity of the images to be processed. Dividing the images to be processed into one or more groups according to the number of the images to be processed and the maximum number of the images, and then performing matrix stitching on the images to be processed in each group to generate batch image data.

The matrix stitching is implemented by using a matrix stitching function in a programming language, and assuming that the image data of the Shan Zhangdai processed image is a matrix with a c×h×w, the matrix stitching refers to that n to-be-processed images belonging to the same group are combined into the same matrix, and the n to-be-processed images belonging to the same group are stitched according to a first axis.

103. And the GPU reads the batch of image data into a video memory.

The GPU reads the image data into the video memory, namely, the batch image data generated by performing matrix splicing on a group of images to be processed is stored into the video memory. If the image to be processed is divided into a plurality of groups, a plurality of readings are required.

In the hardware level, data exchange is generally performed between the memory and the video memory through a PCI-E bus, and in the software level, application software generally calls a driver of the GPU to complete the data exchange. The GPU is better at performing single instruction multiple data computations, for example, the time required for simultaneously performing 1000 multiplications may be 2 times that required for 10 multiplications instead of 100 times, so that the more the same operations that the GPU can perform at one time, the higher the benefit of saving computation time. In the pedestrian re-recognition process, the image features of the image to be processed need to be extracted, and a large number of additions and multiplication operations exist in the process of extracting the image features, so that the more the number of pedestrian images processed at one time, the more the number of multiplication or addition operations can be performed simultaneously, and the more the time is saved.

104. And the GPU inputs the batch image data into a residual neural network model, extracts batch image characteristics of the batch image data, and stores the batch image characteristics in the video memory.

The residual neural network model is a neural network model with a residual structure, and the residual structure can accelerate training of the neural network and provide accuracy of the model. After the batch image features are extracted, model parameters of the residual neural network model are corrected through a Softmax loss function, so that the accuracy of extracting the batch image features is improved. The batch image features are image features corresponding to batch image data and comprise characteristic features of all images to be processed which are spliced by a matrix.

105. And the GPU reads the batch image characteristics into the memory.

After the batch image features in the video memory are read into the memory, if unprocessed batch image data still exist, the step 103 is continuously executed, that is, the step 103, the step 104 and the step 105 are repeatedly executed until the batch image features of all the images to be processed are extracted.

106. And the CPU decomposes the batch of image features into image feature vectors corresponding to the images to be processed one by one so as to identify the images to be processed.

And in the step, performing inverse operation of matrix stitching on the batch image features corresponding to the matrix stitching, and obtaining image feature vectors corresponding to the images to be processed one by one.

Another batch processing system for pedestrian re-recognition is provided in an embodiment of the present invention, as shown in fig. 2, and includes a central processing unit CPU and a graphics processing unit GPU.

201. The CPU reads the image data of the image to be processed into the memory.

The memory capacity for storing the image to be processed is limited, and when the image to be processed exceeds the memory capacity more, the image to be processed needs to be imported into the memory in batches. Because the key point of the scheme is how to process data by using the GPU after the video memory is read in, and the memory capacity is generally larger than the video memory capacity, the situation of batched processing of the to-be-processed image caused by insufficient memory is not analyzed in detail.

202. And the CPU performs matrix splicing on the picture data to generate batch image data, and the batch image data is stored in the memory.

Before matrix stitching, whether the sizes of the images to be detected are the same or not needs to be detected, if the sizes of the images are the same, stitching is directly prepared, and if the sizes of the images are different, the images to be processed are corrected to be the same size and then stitching is performed. The specific splicing process comprises the following steps: acquiring batch identification parameters, wherein the batch identification parameters comprise video memory capacity, image size, node quantity and feature vector dimension; selecting images to be spliced from the images to be processed according to the batch identification parameters; and splicing the images to be spliced according to a first axis, and generating the batch of image data.

The memory capacity is determined by the hardware configuration and is a fixed value, such as 128MB, 256MB, 512MB, 1024MB, etc. The image size refers to the input matrix size of the neural network model used to extract the image features. The node number refers to the node number of the neural network model adopted for extracting the image features, and the calculation parameters among all nodes are shared in the process of extracting the image features, but the temporary storage area used for storing the calculation records cannot be shared, so that the node number is also the parameter of the image to be spliced. Feature vector dimension refers to the size of the output matrix. If there are n images to be processed, there is n times the temporary storage area, and the size of the temporary storage area is proportional to the number of network nodes. The feature vector dimensions are similar, and n output feature vectors are required to be stored for n images to be processed. The effects of image size, number of nodes, feature vector dimension on n are juxtaposed.

Selecting images to be spliced according to the batch identification parameters, specifically comprising: calculating the number of spliced pictures according to a preset calculation rule according to the batch identification parameters, wherein the preset calculation rule is n approximately equal to M/(c, h, w, s) ₁ +N*s ₂ +f*s ₃ ) Wherein M is the video memory capacity, c is h is w is the image size, s ₁ To the size of the imageUnit storage capacity, N is the number of nodes, s ₂ A unit capacity of the node quantity, f is the feature vector dimension, s ₃ A unit capacity for the feature vector dimension; judging whether the number of the images to be processed is larger than the number of the spliced pictures or not; if the judgment result is negative, determining all the images to be processed as images to be spliced; if the judgment result is yes, selecting a first image to be spliced of the number of the pictures to be spliced from the images to be processed according to the reading sequence of the images to be processed, and re-determining the rest images to be processed as the images to be processed. Wherein s is ₁ 、s ₂ Sum s ₃ Depending on the data format selected, for example, 4 when using single precision floating point numbers and 8 when using double precision floating point numbers. In the actual working process, an extra reserved space is needed in the video memory to ensure the normal operation of the system, but the influence of the reserved space on the calculation result of the spliced image data is small, so that the influence of the reserved space on the quantity of the spliced images is ignored.

Generating batch image data according to first axis stitching, specifically comprising: searching a splicing function in functions provided by a currently used programming language, wherein the splicing function can splice the images to be spliced according to a first axis; and inputting the images to be spliced into the splicing function to generate the batch of image data. The single Zhang Dai processes the matrix with the feature vector 1*f of the image, if the matrix is spliced according to the zeroth axis, the size of the f-dimensional vector is uncertain, so that the splitting parameters need to be continuously changed during splitting, which is unfavorable for quickly splitting the feature vector of the single image, and if the matrix line number of the feature vector of the single image is 1 according to the first axis, the feature vector of each single image object can be taken out after traversing the first axis during splitting. In summary, in order to facilitate the splitting, the stitching is performed according to the first axis when stitching the image data of the image to be processed.

203. And the GPU reads the batch of image data into a video memory.

204. And the GPU inputs the batch image data into a residual neural network model, extracts batch image characteristics of the batch image data, and stores the batch image characteristics in the video memory.

Before extracting the batch image features by using the residual neural network model, the method further comprises the following steps: the GPU inputs a training image into the residual neural network model, and calculates training feature vectors of the training data; the GPU calculates the deviation value of the training feature vector and the actual feature vector of the training image according to a preset loss function; the GPU calculates feedback tuning parameters according to the deviation value; and the GPU corrects the residual neural network model according to the feedback tuning parameters. Before inputting the training data, further comprising: and the CPU performs matrix stitching on the image data of the training image to generate training image data.

The residual neural network is a deep convolutional neural network, and is similar to other artificial neural networks, and is equivalent to a mapping function from an input matrix to an output matrix, a parameter which minimizes a deviation value is found through supervised learning by a large number of samples in a training stage, and batch image features are extracted by adopting the parameter. And storing the extracted batch image features in a display memory.

205. And the GPU reads the batch image characteristics into the memory.

206. And the CPU decomposes the batch of image features into image feature vectors corresponding to the images to be processed one by one so as to identify the images to be processed.

The decomposition mode of the batch image features is corresponding to the matrix splicing method, and specifically comprises the following steps: and traversing the first axis of the batch image features, and sequentially extracting row vectors of each row of the batch image features, wherein the row vectors are image feature vectors corresponding to the images to be processed one by one.

That is, the batch image features are extracted according to the rows, and one row of data is extracted from the 0 th row at a time to serve as the image feature vector of one image to be processed. The sequence of the extracted image feature vectors is the same as the splicing sequence of the images to be processed which are spliced by the matrix.

207. And the CPU calculates the similarity score of the feature vector of the target pedestrian and the image feature vector.

The similarity score may be a cosine similarity score, which can measure the difference between the feature vector of the target pedestrian and the image feature vector.

208. And if the similarity score is larger than a preset threshold value, the CPU enables the target pedestrian and the image to be processed to be the same pedestrian.

And judging whether the image to be processed and the target pedestrian are the same pedestrian or not so as to be convenient for tracking the specific pedestrian in the processed image.

According to one embodiment of the present invention, there is provided a storage medium storing at least one executable instruction for executing the batch processing system for pedestrian re-recognition in any of the above embodiments.

Fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention, and the specific embodiment of the present invention is not limited to the specific implementation of the computer device.

As shown in fig. 3, the computer device may include: a processor (processor) 302, a communication interface (Communications Interface) 304, a memory (memory) 306, and a communication bus 308.

Wherein: processor 302, communication interface 304, and memory 306 perform communication with each other via communication bus 308.

A communication interface 304 for communicating with network elements of other devices, such as clients or other servers.

Processor 302 is configured to execute program 310 and may specifically perform the relevant steps of the batch processing system embodiment described above for pedestrian re-identification.

In particular, program 310 may include program code including computer-operating instructions.

The processor 302 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the computer device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.

Memory 306 for storing programs 310. Memory 306 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

Program 310 may be specifically operable to cause processor 302 to:

the method comprises a Central Processing Unit (CPU) and a Graphic Processing Unit (GPU);

the CPU reads image data of the image to be processed into a memory;

the GPU reads the batch of image data into a video memory;

the GPU reads the batch image characteristics into the memory;

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The batch processing system for pedestrian re-recognition is characterized by comprising a Central Processing Unit (CPU) and a Graphic Processing Unit (GPU);

the CPU reads image data of the image to be processed into a memory;

the CPU acquires batch identification parameters, wherein the batch identification parameters comprise video memory capacity, image size, node quantity and feature vector dimension; selecting images to be spliced from the images to be processed according to the batch identification parameters; splicing the images to be spliced according to a first axis, generating batch image data and storing the batch image data in the memory;

the GPU reads the batch of image data into a video memory;

the GPU reads the batch image characteristics into the memory;

the CPU decomposes the batch image features into image feature vectors corresponding to the images to be processed one by one so as to identify the images to be processed;

the selecting the image to be spliced from the images to be processed according to the batch identification parameters comprises the following steps:

calculating the number of spliced pictures according to the batch identification parameters and preset calculation rules, wherein the preset calculation rules are as follows

Wherein->

For the memory capacity, < > of>

For the image size, +.>

For the unit storage capacity of said image size, < > and>

for the number of nodes +.>

For the unit capacity of the number of nodes, +.>

For the feature vector dimension,/->

A unit capacity for the feature vector dimension;

judging whether the number of the images to be processed is larger than the number of the spliced pictures or not;

if the judgment result is negative, determining all the images to be processed as images to be spliced;

if the judgment result is yes, selecting a first image to be spliced of the number of the pictures to be spliced from the images to be processed according to the reading sequence of the images to be processed, and re-determining the rest images to be processed as the images to be processed.

2. The system of claim 1, wherein the stitching the images to be stitched together according to a first axis generates a batch of image data, comprising:

searching a splicing function in functions provided by a currently used programming language, wherein the splicing function can splice the images to be spliced according to a first axis;

and inputting the images to be spliced into the splicing function to generate the batch of image data.

3. The system of claim 1, wherein the inputting the batch of image data into a residual neural network model, prior to extracting batch image features of the batch of image data, further comprises:

the GPU inputs a training image into the residual neural network model, and calculates training feature vectors of the training data;

the GPU calculates the deviation value of the training feature vector and the actual feature vector of the training image according to a preset loss function;

the GPU calculates feedback tuning parameters according to the deviation value;

and the GPU corrects the residual neural network model according to the feedback tuning parameters.

4. The system of claim 3, wherein the training image is the residual neural network model, and wherein prior to computing the training feature vector for the training data, the system further comprises:

and the CPU performs matrix stitching on the image data of the training image to generate training image data.

5. The system of claim 1, wherein the decomposing the batch of image features into image feature vectors that correspond one-to-one to the image to be processed comprises:

and traversing the first axis of the batch image features, and sequentially extracting row vectors of each row of the batch image features, wherein the row vectors are image feature vectors corresponding to the images to be processed one by one.

6. The system of claim 1, wherein after the decomposing the batch of image features into image feature vectors that correspond one-to-one to the image to be processed, the system further comprises:

the CPU calculates the similarity score of the feature vector of the target pedestrian and the image feature vector;

and if the similarity score is larger than a preset threshold value, the CPU enables the target pedestrian and the image to be processed to be the same pedestrian.

7. A storage medium having stored therein at least one executable instruction that causes a processor to perform operations corresponding to the pedestrian re-identification batch processing system of any one of claims 1-6.

8. A computer device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the pedestrian re-identification batch processing system of any one of claims 1-6.