CN109871821B

CN109871821B - Pedestrian re-identification method, device, equipment and storage medium of self-adaptive network

Info

Publication number: CN109871821B
Application number: CN201910159239.6A
Authority: CN
Inventors: 陈文杰; 刘鹏程; 彭敏; 徐华泽; 石宇; 周祥东; 罗代建; 程俊
Original assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Current assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Priority date: 2019-03-04
Filing date: 2019-03-04
Publication date: 2020-10-09
Anticipated expiration: 2039-03-04
Also published as: CN109871821A

Abstract

The invention provides a pedestrian re-identification method, a device, equipment and a storage medium of a self-adaptive network, wherein the method comprises the following steps: acquiring a pedestrian video, and extracting pedestrian images in the pedestrian video by utilizing a pedestrian detection algorithm; adaptively adjusting a redundant residual error network structure to an optimal network structure according to a current scene; extracting a feature vector of the pedestrian image to be detected by using the optimal network structure based on a residual error network; and identifying the cosine similarity of the characteristic vector between the pedestrian to be detected and the preset pedestrian to obtain a pedestrian re-identification result. According to the invention, the patch branches in the redundant residual error network structure are automatically divided from the fine to the coarse to the optimal network structure under the current scene, so that the huge workload of repeated manual design and experiment for obtaining the optimal network architecture is reduced, meanwhile, the misjudgment caused by artificial factors is avoided, the method is more suitable for the free switching among a plurality of scenes, and the subsequent scene expansion is facilitated.

Description

Pedestrian re-identification method, device, equipment and storage medium of self-adaptive network

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a pedestrian re-identification method, apparatus, device, and storage medium for an adaptive network.

Background

Pedestrian RE-identification (RE-ID) is also called pedestrian RE-identification, namely, whether the pedestrian targets are the same Person or not is judged by cameras at different positions at different moments, and the method can be used for video monitoring and the like. Conventionally, pedestrian re-identification is realized by extracting artificial design features of a pedestrian image and comparing the extracted artificial design features, and errors are brought to imaging conditions due to the fact that the pedestrian image to be identified and an original image are shot by different cameras, and differences of equipment can occur; the environment is inconsistent under different scenes, and different deviations can be generated in the acquired data; and the change of illumination can lead the same color to be represented differently; more importantly, the posture of the pedestrian under the camera changes and the problem of shielding all cause the difficulty in distinguishing the same person to be quite large.

However, in the existing pedestrian re-identification process, the pedestrian image can be well expressed only by combining the global features and the local features, and from the current pedestrian re-identification research situation in the industry, if a certain identification rate is to be achieved, a network structure of a global branch and a series of fine-grained patch branches needs to be designed to reasonably fuse the global features and the local features of the pedestrian image, and because the environments under different scenes are inconsistent, the acquired data can generate different deviations; the image recognition algorithm relies on scene data excessively, so that the data of different scenes may need to be divided in different ways according to fine-grained features, and the determination of fine-grained feature division by means of manual experiments not only has large workload, but also is difficult to find a network structure which is most suitable for matching with the current application scene.

Disclosure of Invention

In view of the above drawbacks of the prior art, an object of the present invention is to provide a pedestrian re-identification method, apparatus, device and storage medium for an adaptive network, which are used to solve the problem that the pedestrian re-identification in the prior art cannot adaptively match a suitable network structure according to the current scene.

To achieve the above and other related objects, in a first aspect of the present application, the present invention provides a pedestrian re-identification method for an adaptive network, including:

acquiring a pedestrian video, and extracting pedestrian images in the pedestrian video by utilizing a pedestrian detection algorithm;

adaptively adjusting a redundant residual error network structure to an optimal network structure according to a current scene;

extracting a feature vector of the pedestrian image to be detected by using the optimal network structure based on a residual error network;

and identifying the cosine similarity of the characteristic vector between the pedestrian to be detected and the preset pedestrian to obtain a pedestrian re-identification result.

In a second aspect of the present application, there is provided a pedestrian re-identification device of an adaptive network, including:

the pedestrian detection module is used for acquiring a pedestrian video and extracting pedestrian images in the pedestrian video by utilizing a pedestrian detection algorithm;

the self-adaptive network structure is used for self-adaptively adjusting the redundant residual error network structure to the optimal network structure according to the current scene;

the characteristic extraction module is used for extracting a characteristic vector of the pedestrian image to be detected by utilizing the optimal network structure based on the residual error network;

and the pedestrian re-identification module is used for identifying the cosine similarity of the characteristic vector between the pedestrian to be detected and the preset pedestrian to obtain a pedestrian re-identification result.

In a third aspect of the present application, there is provided a pedestrian re-identification electronic device of an adaptive network, including:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors to execute the instructions, the one or more processors executing the instructions causing the electronic device to perform the steps of the adaptive network pedestrian re-identification method described above.

In a fourth aspect of the present application, a computer-readable storage medium is provided, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the adaptive network pedestrian re-identification method described above.

As described above, the pedestrian re-identification method, apparatus, device and storage medium of the adaptive network of the present invention have the following advantages:

the method and the device automatically divide the patch branches in the redundant residual error network structure from the top to the bottom until the optimal (proper) network structure under the current scene is matched, reduce the huge workload of repeated manual design and experiment for obtaining the optimal network architecture, and can obtain a more simplified network structure under the premise of meeting the index in a self-adaptive manner, thereby avoiding misjudgment caused by artificial factors, being more suitable for free switching among a plurality of scenes and facilitating the expansion of the subsequent scenes.

Drawings

FIG. 1 is a flow chart of a pedestrian re-identification method of an adaptive network according to the present invention;

fig. 2 is a flowchart illustrating step S2 in the pedestrian re-identification method for adaptive network according to the present invention;

fig. 3 is a flowchart illustrating step S4 in the pedestrian re-identification method for adaptive network according to the present invention;

FIG. 4 is a block diagram of a pedestrian re-identification apparatus of an adaptive network according to the present invention;

fig. 5 shows a block diagram of an adaptive network electronic device for pedestrian re-identification according to the present invention.

Element number description:

1 pedestrian detection module

2 adaptive network architecture

3 feature extraction module

4 pedestrian recognition module

S1-S4 Steps 1-4

Detailed Description

The following description of the embodiments of the present application is provided for illustrative purposes, and other advantages and capabilities of the present application will become apparent to those skilled in the art from the present disclosure.

In the following description, reference is made to the accompanying drawings that describe several embodiments of the application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "below," "lower," "above," "upper," and the like, may be used herein to facilitate describing one element or feature's relationship to another element or feature as illustrated in the figures.

Although the terms first, second, etc. may be used herein to describe various elements in some instances, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, the first preset threshold may be referred to as a second preset threshold, and similarly, the second preset threshold may be referred to as a first preset threshold, without departing from the scope of the various described embodiments. The first preset threshold and the preset threshold are both described as one threshold, but they are not the same preset threshold unless the context clearly indicates otherwise. Similar situations also include a first volume and a second volume.

Furthermore, as used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context indicates otherwise, it should be further understood that the terms "comprises" and "comprising" indicate the presence of the stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, items, species, and/or groups. A; b; c; a and B; a and C; b and C; A. b and C "are only exceptions to this definition should be done when combinations of elements, functions, steps or operations are inherently mutually exclusive in some manner.

Referring to fig. 1, a flow chart of a pedestrian re-identification method of an adaptive network provided by the present invention includes:

step S1, acquiring a pedestrian video, and extracting pedestrian images in the pedestrian video by using a pedestrian detection algorithm;

the pedestrian video refers to a continuous multi-frame image (original image) shot by a camera with a monitoring function. The pedestrian image is an image in which a pedestrian is included in the original image. It is understood that there may be no pedestrian in one frame of image (e.g., the original image is the background), or there may be multiple pedestrians. That is, the detected images may be plural, and if the detected images are plural, the technical solution of the present embodiment is executed for each pedestrian image.

In a specific embodiment, the original image is obtained, that is, the original image can be obtained from a video sequence, the client can also obtain the original image from the video sequence and send the original image to the server, or directly send the video sequence to the server, and the server obtains the original image from the video sequence and detects the original image by using a pedestrian detection algorithm to obtain the image of the pedestrian to be detected.

The pedestrian detection algorithm is an algorithm for judging whether pedestrians exist in an image or a video sequence through detection and giving accurate positioning. Specifically, the pedestrian detection algorithm may be a pedestrian detection algorithm for manually extracting features, a pedestrian detection algorithm based on a neural network, or a pedestrian detection algorithm based on deep learning, and the pedestrian detection algorithm based on deep learning is adopted in this embodiment.

In a specific embodiment, after the step of detecting the original image by using the pedestrian detection algorithm, the pedestrian in the original image is further intercepted according to the pixel size, and the image with the specific shape obtained after the interception is the pedestrian image, for example, the figure is intercepted according to the actual size of the pedestrian, and the intercepted image is scaled to a fixed size, for example, uniformly scaled to 256 × 128.

In addition, a pre-trained Fast RCNN human body detection model can be adopted to directly extract a pedestrian image from a pedestrian video, wherein the Fast RCNN algorithm combines RPN and Fast-RCNN to extract a candidate frame from the RPN, Fast R-CNN is responsible for detection, and the target in the frame is detected and identified by the candidate frame extracted from the RPN.

Step S2, adaptively adjusting the redundant residual error network structure to the optimal network structure according to the current scene;

referring to fig. 2, a flowchart of step S2 in the pedestrian re-identification method of the adaptive network according to the present invention includes:

step S201, the redundant residual error network structure includes a global branch and a plurality of fine-grained patch branches, where the fine-grained patch branches include two fine-grained patch branches and a plurality of odd-numbered fine-grained patch branches greater than 2 according to their division manner;

step S202, adaptively adjusting the number of fine-grained patch branches in a redundant residual error network structure according to a current scene, and training each branch in the residual error network structure to converge by respectively connecting a softmax loss function and a triplet loss function in a plurality of odd fine-grained patch branches divided from fine to coarse in a descending manner;

the loss function is used for estimating the deviation degree of the predicted value and the true value of the model in the learning and training process. It is a non-negative real-valued function. The goal of optimizing the model during training is to minimize the loss function.

Step S203, judging whether the patch characteristic weight corresponding to each fine-grained branch is smaller than a preset threshold value, and deleting the fine-grained branch if the patch characteristic weight is smaller than the preset threshold value; if the patch characteristic weight is not smaller than a preset threshold value, the fine-grained branch is reserved;

specifically, the characteristic weight of the patch corresponding to each fine-grained branch is calculated, and the characteristic weights of the fine-grained branches corresponding to several partition modes may be smaller than a preset threshold, that is, the fine-grained branches of the multiple partition modes need to be deleted at the same time.

Step S204, calculating the respective characteristic weights of the fine-grained patch branches corresponding to a plurality of odd number of division modes, and deleting the division mode corresponding to the fine-grained patch branch with the minimum characteristic weight;

specifically, in two embodiments of step S203 and step S204, any step may be optionally selected to be executed, where in step S204, only the fine-grained patch branch of one division manner is deleted at each time, and the accuracy is higher for the subsequent recognition rate compared to step S203, and the efficiency for obtaining the optimal network structure subsequently is higher by introducing the preset threshold in step S203.

Step S205, calculating the recognition rate of the network structure after deleting the fine-grained patch branch, and when the recognition rate of the current network structure is smaller than the application scene recognition rate requirement index due to deleting the fine-grained patch branch corresponding to a certain division mode at a certain time, keeping the current network structure corresponding to the division mode as the optimal network structure of the current scene; otherwise, retraining the network structure formed by the remaining fine-grained patch branches and the global branch until convergence, and continuing deleting the corresponding fine-grained patch branches until the optimal network structure of the current scene is matched.

Specifically, if the recognition rate of the network structure after the fine-grained patch branch is deleted is still greater than the application scene recognition rate requirement index, the step S202 is returned to execute; in the above steps, each division mode in which a fine-grained patch branch is deleted once needs to be retrained to be converged, the recognition rate of the corresponding network structure after the division mode is deleted is calculated, and the recognition rate is compared with the recognition rate requirement index of the scene.

In a specific embodiment, when the redundant residual network structure includes one global branch and 5 fine-grained patch branches, where the 5 fine-grained patch branches vertically divide a pedestrian image (feature map) into 2, 3, 5, 7, and 9 fine-grained feature maps (if the number of the fine-grained patch branches is larger, the corresponding odd number of the fine-grained feature maps is also larger), that is, each time the pedestrian image must be vertically divided into two parts of the fine-grained feature maps, the global feature is expressed to a certain extent in comparison with the case that the subsequent division is more detailed. In the process of dividing the patch branch: sequentially dividing the pedestrian image from fine to coarse according to a plurality of odd numbers (3, 5, 7 and 9) to obtain a corresponding fine-grained characteristic diagram; when the recognition rate of all branches in the redundant residual error network structure after the training convergence is larger than the application scene recognition rate requirement index, sequentially and vertically dividing a pedestrian image into 3, 5, 7 and 9 fine-grained feature graphs, calculating the feature weight corresponding to each of the four patch dividing modes, and deleting the fine-grained branch (possibly one patch branch or a plurality of patch branches) if the feature weight corresponding to the patch branch is smaller than a preset threshold value, which indicates that the fine-grained feature is not important to the data of the scene, and trying to delete the fine-grained feature from the current network structure; or, the same effect can be achieved by directly deleting the patch branch with the minimum feature weight corresponding to the division mode; if the recognition rate of all the deleted branches of the current network structure is still larger than the requirement index of the current application scene, continuing deleting the patch branches according to the mode; and until the current network structure after the corresponding way patch branch is deleted at a certain time does not meet the requirement index of the current application scene, setting the network structure corresponding to the corresponding non-deleted patch branch as the optimal network structure. For example, if the pedestrian image is vertically divided into (3, 5, 7) three types of partition modes according to 7, namely a patch branch with the minimum feature weight, and the corresponding recognition rate after branch convergence in the current network structure is not greater than the application scene recognition rate requirement index after the patch branch is deleted, the network structure (2, 3, 5, 7 fine-grained special branches and global branch) used last time is taken as the optimal network structure of the scene.

In this embodiment, the dividing principle of the fine-grained patch branch is to satisfy that the recognition rate of the entire network structure after the branch training convergence is greater than the application scenario recognition rate requirement index, and meanwhile, on the premise of satisfying the recognition rate, if the feature weight of deleting the patch branch is less than the threshold value, but the recognition rate corresponding to the network structure after deleting the patch branch does not satisfy the index, the patch branch cannot be deleted, and the network structure without deleting the patch branch is regarded as the optimal network structure. Through the mode, the optimal network structure can be obtained according to the quick matching of the current application scene, the influence of the traditional human factors is avoided, the division mode that a large number of manual experiments are carried out to obtain fine-grained patch branches with relatively good effect is reduced, meanwhile, the adaptability of the network structure between different scenes with large scene difference can be met, the convenience of the design of the application scene network structure is improved compared with the common network structure, and the network structure can be simplified as much as possible on the premise of meeting the precision index.

Step S3, extracting the feature vector of the pedestrian image to be detected by using the optimal network structure based on the residual error network;

the optimal network structure obtained based on the residual error network can meet the requirement index of the current application scene recognition rate, and can simplify the network structure as much as possible, namely, the network structure is simplified.

Specifically, a plurality of regional characteristics of the pedestrian image to be detected are obtained by utilizing the network structure; acquiring the weight corresponding to each region feature of the pedestrian image to be detected; the method comprises the steps of obtaining a feature vector of the pedestrian image to be detected according to the regional features of the pedestrian image to be detected and weights corresponding to the regional features, wherein the feature vector is formed by the pedestrian image to be detected according to a specific combination mode, and the classifier comprises various neural networks, a support vector machine, a decision tree, a nearest neighbor classifier, a random forest, a Boosting classifier and the like.

And step S4, identifying the cosine similarity of the characteristic vector between the pedestrian to be detected and the preset pedestrian to obtain a pedestrian re-identification result.

Referring to fig. 3, a flowchart of step S4 in the pedestrian re-identification method of the adaptive network according to the present invention includes:

step S401, presetting cosine similarity of characteristic vectors among pedestrians;

step S402, when the cosine similarity of the feature vector of the pedestrian to be detected and the preset pedestrian reaches a preset value, judging that the pedestrian to be detected and the preset pedestrian are the same pedestrian;

step S403, when the cosine similarity between the feature vector of the detected pedestrian and the preset pedestrian does not reach the preset value, determining that the detected pedestrian and the preset pedestrian are not the same pedestrian.

In this embodiment, the cosine similarity, also called cosine similarity and cosine distance, is obtained by calculating the cosine value of the angle between two vectors to evaluate the similarity of the vectors. Cosine similarity, which is the product of two vectors multiplied together modulo the two vectors. The larger the cosine similarity is, the smaller the included angle between the two vectors is, and the closer the two vectors are; otherwise, the farther away; calculating cosine similarity between the feature vector of the pedestrian image to be identified and the feature vector of a preset pedestrian, determining a pedestrian re-identification result of the pedestrian image to be identified by the pedestrian corresponding to the maximum value of the cosine similarity, and outputting the result, wherein if the corresponding cosine similarity reaches the preset value, the two are judged to be the same pedestrian; and if the cosine similarity does not reach the preset value, judging that the two are not the same pedestrian.

In the above embodiment, the pedestrian detection algorithm based on deep learning detects the original image to obtain the pedestrian image, which is specifically as follows:

and extracting the pedestrian convolution characteristics in the original image by adopting a convolution neural network.

The convolutional neural network is a supervised learning neural network composed of a plurality of convolutional layers and full-connection layers and used for extracting effective features, so that the effective features are used in different tasks, such as scene classification, target detection, image retrieval and the like.

The number of the feature maps of the convolutional layer is specified in the network initialization, the size of the feature map of the convolutional layer is determined by the size of the convolution kernel and the input feature map of the previous layer, and the size of the feature map of the layer is (n-k +1) × (n-k +1) if the size of the feature map of the previous layer is n × n and the size of the convolution kernel is k × k.

Specifically, after an image containing a pedestrian in an original image is input to a convolutional neural network, a feature map of a convolutional layer, namely, a pedestrian convolutional feature, is obtained through convolutional calculation of the convolutional layer of the convolutional neural network.

And calculating the pedestrian convolution characteristics by utilizing the convolution layer of the convolution neural network, and outputting the characteristics of the full connection layer of the convolution neural network convolution layer to obtain a target characteristic vector.

Specifically, the pedestrian convolution features include shallow feature information and deep feature information, the shallow feature information refers to a feature map obtained by convolution of a convolution layer at a previous stage of the convolutional neural network, and the deep feature information refers to feature map information obtained by convolution of a convolution layer at a subsequent stage of the convolutional neural network.

In the convolutional neural network, the fully connected layer is used for integrating the features in the image feature map passing through a plurality of convolutional layers and pooling layers, acquiring the high-level meaning of the image features, and then classifying the image features. In one embodiment, the fully connected layer maps the pedestrian convolution features of the feature map generated by the convolutional layer into a feature vector of a fixed length (in this embodiment, the number of classes of original images input into the pedestrian set database, pedestrian and non-pedestrian (background), i.e., fixed length of 2). The feature vector contains the combination information of all the features of the input original image, and the vector reserves the image features with the most features in the image so as to complete the image classification task.

Specifically, the Pooling calculation is performed by many methods, and the most common method is the Max Pooling (Max Pooling) method and the mean Pooling (mean Pooling) method. The maximum pooling method is to take the maximum value of the region of the feature map as the pooled value of the region, that is, the maximum element value of the target feature in each feature map is extracted by the maximum pooling method as the pooling result. The average pooling method is to calculate an average value of a feature map region as a result of pooling the region, and for example, an average value of a specific feature of the feature map may be calculated. The global pooling layer does not need to optimize parameters, overfitting is avoided, output pooling results are invariant relative to positions, and output at different positions share parameters.

For understanding of deep information of the image, convolutional features in the image target are extracted by adopting a convolutional neural network, and the convolutional neural network directly takes the output of a full connection layer as a target feature vector after completing feature extraction of subsequent convolutional layer calculation in the previous stage.

And classifying the target characteristic vectors by using a support vector machine to obtain the pedestrian image to be detected.

Specifically, the data is divided into the maximum intervals possible by using a method of a support vector machine, so that the classification effect achieves the expected effect. The image classification task is based on that the image classification task detects a pedestrian image, so that the image content only needs to be judged, the specific belonging category numerical value (belonging category probability) of the input image is calculated, the most possible category is output, the classification task can be completed, and the image to be detected is obtained.

In the embodiment, the pedestrian convolution characteristics in the original image are extracted by adopting the convolution neural network, the characteristics are extracted by adopting the deep convolution neural network, and deep learning can be automatically learned from the data of the detection image, so that the method and the device are suitable for various environments, the adaptability of obtaining the detection image is improved, and the subsequent processing of the detection image is facilitated.

Referring to fig. 4, a structural block diagram of a pedestrian re-identification apparatus of an adaptive network provided by the present invention includes:

the pedestrian detection module 1 is used for acquiring a pedestrian video and extracting pedestrian images in the pedestrian video by utilizing a pedestrian detection algorithm;

the self-adaptive network structure 2 is used for self-adaptively adjusting the redundant residual error network structure to the optimal network structure according to the current scene;

the redundant residual error network structure comprises a global branch and a plurality of fine-grained patch branches, wherein the fine-grained patch branches comprise two fine-grained patch branches and a plurality of odd-numbered fine-grained patch branches which are larger than 2 according to the dividing mode of the fine-grained patch branches;

adaptively adjusting the number of fine-grained patch branches in a redundant residual error network structure according to a current scene, and training each branch in the residual error network structure to be convergent by respectively connecting a softmax loss function and a triplet loss function in a plurality of odd fine-grained patch branches divided from fine to coarse in a descending manner;

judging whether the patch characteristic weight corresponding to each fine-grained branch is smaller than a preset threshold value or not, and deleting the fine-grained branch if the patch characteristic weight is smaller than the preset threshold value; if the patch characteristic weight is not smaller than a preset threshold value, the fine-grained branch is reserved; or

Calculating the respective characteristic weights of the fine-grained patch branches corresponding to a plurality of odd number of division modes, and deleting the division mode corresponding to the fine-grained patch branch with the minimum characteristic weight;

calculating the recognition rate of the network structure after deleting the fine-grained patch branch, and taking the current network structure corresponding to the reserved partitioning mode as the optimal network structure of the current scene until the recognition rate of the current network structure is smaller than the application scene recognition rate requirement index due to deleting the fine-grained patch branch corresponding to the partitioning mode at a certain time; otherwise, retraining the network structure formed by the remaining fine-grained patch branches and the global branch until convergence, and continuing deleting the corresponding fine-grained patch branches until the optimal network structure of the current scene is matched.

The feature extraction module 3 is used for extracting a feature vector of the pedestrian image to be detected by using the optimal network structure based on a residual error network;

the global branch and the fine-grained patch branch in the optimal network structure are used for respectively obtaining global features and fine-grained patch features, and the global features and the fine-grained patch features are fused to obtain feature vectors of the pedestrian image to be detected.

The pedestrian re-identification module 4 is used for identifying the cosine similarity of the feature vector between the pedestrian to be detected and the preset pedestrian to obtain a pedestrian re-identification result.

Identifying cosine similarity of a feature vector between a pedestrian to be detected and a preset pedestrian; when the cosine similarity of the feature vector of the pedestrian to be detected and the preset pedestrian reaches a preset value, judging that the pedestrian to be detected and the preset pedestrian are the same; and when the cosine similarity of the feature vector of the pedestrian to be detected and the preset pedestrian does not reach the preset value, judging that the pedestrian to be detected and the preset pedestrian are not the same pedestrian.

In this embodiment, since the pedestrian re-identification method of the adaptive network and the pedestrian re-identification device of the adaptive network are in a one-to-one correspondence relationship, the above method is referred to for the corresponding technical features, technical means and technical effects in the device, and details are not repeated herein to avoid repetition.

Referring to fig. 5, a block diagram of an electronic device for pedestrian re-identification in an adaptive network according to the present invention includes:

one or more processors 51;

a memory 52; and

one or more programs, wherein the one or more programs are stored in the memory 52 and configured to be executed by the one or more processors 51 by instructions that, when executed by the one or more processors, cause the electronic device to perform the steps of the adaptive network pedestrian re-identification method as described above.

The processor 51 is operatively coupled to memory and/or non-volatile storage. More specifically, the processor 51 may execute instructions stored in the memory and/or non-volatile storage to perform operations in the computing device, such as generating image data and/or transmitting image data to an electronic display. As such, the processor may include one or more general purpose microprocessors, one or more application specific processors (ASICs), one or more field programmable logic arrays (FPGAs), or any combination thereof.

The application provides a computer storage medium, which comprises a stored program, wherein the storage medium controls a device on which the storage medium is positioned to execute the pedestrian re-identification method of the adaptive network when the program runs.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application.

In the embodiments provided herein, the computer-readable and writable storage medium may include Read-only memory (ROM), random-access memory (RAM), EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, a usb disk, a removable hard disk, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.

In summary, the patch branches in the redundant residual error network structure are automatically divided from the top to the bottom until the optimal (proper) network structure under the current scene is matched, so that the huge workload of repeated manual design and experiments for obtaining the optimal network architecture is reduced, meanwhile, a more simplified network structure can be obtained in a self-adaptive manner on the premise of meeting the indexes, misjudgment caused by artificial factors is avoided, the method is more suitable for free switching among multiple scenes, and subsequent scene expansion is facilitated. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A pedestrian re-identification method of an adaptive network, the method comprising:

adaptively adjusting a redundant residual error network structure to an optimal network structure according to a current scene; further comprising:

judging whether the patch characteristic weight corresponding to each fine-grained branch is smaller than a preset threshold value or not, and deleting the fine-grained branch if the patch characteristic weight is smaller than the preset threshold value; if the patch characteristic weight is not smaller than a preset threshold value, the fine-grained branch is reserved; or calculating the respective characteristic weights of the fine-grained patch branches corresponding to a plurality of odd number of division modes, and deleting the division mode corresponding to the fine-grained patch branch with the minimum characteristic weight;

calculating the recognition rate of the network structure after deleting the fine-grained patch branch, and taking the current network structure corresponding to the reserved partitioning mode as the optimal network structure of the current scene until the recognition rate of the current network structure is smaller than the application scene recognition rate requirement index due to deleting the fine-grained patch branch corresponding to the partitioning mode at a certain time; otherwise, retraining the network structure formed by the remaining fine-grained patch branches and the global branch until convergence, and continuing deleting the corresponding fine-grained patch branches until the optimal network structure of the current scene is matched;

extracting a feature vector of the pedestrian image to be detected by using the optimal network structure based on a residual error network, obtaining global features and fine-grained patch features by using global branches and fine-grained patch branches in the optimal network structure respectively, and obtaining the feature vector of the pedestrian image to be detected by fusing the global features and the fine-grained patch features;

2. The pedestrian re-identification method of the adaptive network according to claim 1, wherein the step of identifying the cosine similarity of the feature vector between the pedestrian to be detected and the preset pedestrian to obtain the result of re-identification of the pedestrian comprises:

identifying the cosine similarity of the characteristic vector between the pedestrian to be detected and a preset pedestrian; when the cosine similarity of the feature vector of the pedestrian to be detected and the preset pedestrian reaches a preset value, judging that the pedestrian to be detected and the preset pedestrian are the same; and when the cosine similarity of the feature vector of the pedestrian to be detected and the preset pedestrian does not reach the preset value, judging that the pedestrian to be detected and the preset pedestrian are not the same pedestrian.

3. An adaptive network pedestrian re-identification apparatus, the apparatus comprising:

the self-adaptive network structure is used for self-adaptively adjusting the redundant residual error network structure to the optimal network structure according to the current scene; further comprising:

the feature extraction module is used for extracting a feature vector of the pedestrian image to be detected by using the optimal network structure based on the residual error network, obtaining a global feature and a fine-grained patch feature by using a global branch and a fine-grained patch branch in the optimal network structure respectively, and obtaining the feature vector of the pedestrian image to be detected by fusing the global feature and the fine-grained patch feature;

4. The network-adaptive pedestrian re-identification device according to claim 3, wherein the pedestrian re-identification module further comprises:

5. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors to execute instructions, the one or more processors executing the instructions to cause the electronic device to perform the pedestrian re-identification method of the adaptive network according to any one of claims 1-2.

6. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a pedestrian re-identification method of an adaptive network according to one of claims 1 to 2.