CN113780243B

CN113780243B - Training method, device, equipment and storage medium for pedestrian image recognition model

Info

Publication number: CN113780243B
Application number: CN202111167837.1A
Authority: CN
Inventors: 司世景; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2023-10-17
Anticipated expiration: 2041-09-29
Also published as: CN113780243A

Abstract

The application discloses a training method of a pedestrian image recognition model, which comprises the following steps: obtaining an unlabeled first pedestrian image, carrying out data enhancement on the first pedestrian image to obtain a data enhancement image, respectively inputting the data enhancement image into a first pedestrian image recognition network and a second pedestrian image recognition network for analysis to extract a first anti-shielding high-level semantic feature vector and a second anti-shielding high-level semantic feature vector, and finally updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the first anti-shielding high-level semantic feature vector, the second anti-shielding high-level semantic feature vector and a preset loss function so as to realize training of a pedestrian image recognition model. Therefore, the pedestrian image recognition model can extract anti-shielding high-level semantic features in the data enhancement image, so that the trained pedestrian image recognition model can more accurately recognize pedestrians in the pedestrian shielding image.

Description

Training method, device, equipment and storage medium for pedestrian image recognition model

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a training method and apparatus for a pedestrian image recognition model, a computer device, and a storage medium.

Background

The pedestrian re-recognition technique (Person-identification) is a technique for retrieving whether or not a pedestrian is present in a specified image or video using computer vision. In practical application of the pedestrian re-recognition technology, since the actual image acquisition scene is usually complex and changeable, pedestrians in the image are easily blocked by certain obstacles (such as luggage, counters, crowded public places, automobiles, trees and the like), so that the image becomes a pedestrian blocking image. Most of the current pedestrian re-recognition technologies focus on searching and matching on the whole image of the pedestrian, but neglecting searching and matching on the pedestrian shielding image (i.e. the pedestrian in the image is shielded by other objects) results in that the current pedestrian re-recognition technologies cannot accurately recognize the pedestrian in the pedestrian shielding image. As can be seen, the recognition accuracy of the current pedestrian re-recognition technology still has room for further improvement.

Disclosure of Invention

The application aims to solve the technical problem that the recognition accuracy of the existing pedestrian re-recognition technology is lower.

In order to solve the technical problem, a first aspect of the present application discloses a training method for a pedestrian image recognition model, which comprises the following steps:

acquiring a first pedestrian image which is not provided with a corresponding labeling label;

adding shielding noise into the first pedestrian image based on a preset data enhancement method to obtain a data enhancement image;

inputting the data enhanced image to a first pedestrian image recognition network for analysis so as to extract a first anti-shielding high-level semantic feature vector in the data enhanced image;

inputting the data enhanced image to a second pedestrian image recognition network for analysis so as to extract a second anti-shielding high-level semantic feature vector in the data enhanced image;

updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the first anti-shielding high-level semantic feature vector, the second anti-shielding high-level semantic feature vector and a preset loss function so as to train a pedestrian image recognition model;

the pedestrian image recognition model comprises a first pedestrian image recognition network and a second pedestrian image recognition network, and network parameters are shared between the first pedestrian image recognition network and the second pedestrian image recognition network.

The second aspect of the application discloses a training device for a pedestrian image recognition model, which comprises:

the acquisition module is used for acquiring a first pedestrian image which is not provided with a corresponding labeling label;

the data enhancement module is used for adding shielding noise to the first pedestrian image based on a preset data enhancement method so as to obtain a data enhancement image;

the analysis module is used for inputting the data enhanced image into a first pedestrian image recognition network for analysis so as to extract a first anti-shielding high-level semantic feature vector in the data enhanced image;

the analysis module is further used for inputting the data enhanced image into a second pedestrian image recognition network for analysis so as to extract a second anti-shielding high-level semantic feature vector in the data enhanced image;

the updating module is used for updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the first anti-shielding high-level semantic feature vector, the second anti-shielding high-level semantic feature vector and a preset loss function so as to train a pedestrian image recognition model;

A third aspect of the application discloses a computer device comprising:

a memory storing executable program code;

a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to perform some or all of the steps in the training method for the pedestrian image recognition model disclosed in the first aspect of the present application.

A fourth aspect of the present application discloses a computer storage medium storing computer instructions which, when invoked, are used to perform part or all of the steps of the training method of the pedestrian image identification model disclosed in the first aspect of the present application.

According to the embodiment of the application, the first pedestrian image recognition network and the second pedestrian image recognition network which share network parameters are arranged in the pedestrian image recognition model, the model structure is more favorable for recognizing pedestrians in pedestrian shielding images, then the data enhancement images used for model training are obtained in a mode of adding shielding noise in the first pedestrian image, the mode of adding shielding noise is also favorable for learning the pedestrian in pedestrian shielding images with pertinence by the pedestrian image recognition model, then anti-shielding high-layer semantic features are extracted from the data enhancement images by using the two pedestrian image recognition networks, and finally training of the pedestrian image recognition model is completed by using a preset loss function and the anti-shielding high-layer semantic features, so that the pedestrian image recognition model has stronger capability of pertinently recognizing pedestrians in the pedestrian shielding images, the trained pedestrian image recognition model can more accurately recognize pedestrians in the pedestrian shielding images, and the recognition accuracy of pedestrian re-recognition technology is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a training method of a pedestrian image recognition model according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a pedestrian image recognition model according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a training device for a pedestrian image recognition model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a computer device according to an embodiment of the present application;

fig. 5 is a schematic diagram of a computer storage medium according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or article that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or article.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The application discloses a training method, a device, computer equipment and a storage medium of a pedestrian image recognition model, wherein a first pedestrian image recognition network and a second pedestrian image recognition network which share network parameters are arranged in the pedestrian image recognition model, the model structure is more favorable for recognizing pedestrians in a pedestrian shielding image, then a data enhancement image used for model training is obtained by adding shielding noise in the first pedestrian image, the mode of adding shielding noise is also favorable for learning the pedestrian image recognition model to have the capability of recognizing pedestrians in the pedestrian shielding image in a targeted manner, then anti-shielding high-level semantic features are extracted from the data enhancement image by using two pedestrian image recognition networks, and finally training of the pedestrian image recognition model is completed by using a preset loss function and the anti-shielding high-level semantic features, so that the pedestrian image recognition model has stronger capability of recognizing pedestrians in the pedestrian shielding image in a targeted manner, the pedestrian image recognition model after training can recognize pedestrians in the pedestrian shielding image more accurately, and the recognition accuracy of a re-recognition technology is improved. The following will describe in detail.

Example 1

Referring to fig. 1, fig. 1 is a flowchart of a training method of a pedestrian image recognition model according to an embodiment of the present application. As shown in fig. 1, the training method of the pedestrian image recognition model may include the following operations:

101. and acquiring a first pedestrian image which is not provided with the corresponding labeling label.

In step 101, a first pedestrian image may be obtained from an unlabeled pedestrian image dataset (e.g., a pedestrian re-identification dataset such as CUHK01, CUHK02, CUHK03, dukeMTMC-reID, dukeMTMC ReID, etc.) that typically includes a large number of unlabeled pedestrian images.

102. And adding shielding noise into the first pedestrian image based on a preset data enhancement method to obtain a data enhancement image.

In the step 102, the predetermined data enhancing method may be a random erase data enhancing method Random Erasing Data Augmentation (REA). In short, an area is randomly selected in the pedestrian image, and an occlusion noise mask is punched. The mask can be black blocks, gray blocks or random noise. The method has been demonstrated to improve the performance of the model and robustness against occlusion in multiple CNN architectures and different fields. Specifically, the basic flow of the random erase data augmentation method may be as follows:

(1) Inputting the width W, the height H and the area S of the pedestrian image I, setting the area occupation ratio range Se epsilon (Sl, sh) of the erasure area, and the length-width ratio range rl epsilon (r 1, r 2) of the erasure area;

(2) Randomly taking points (xe, ye) in the pedestrian image I, randomly generating an erasure area ratio Se in the range of (Sl, sh), randomly generating an erasure area length-width ratio rl in the range of (r 1, r 2), and further calculating the width We and the height He of the shielding noise mask according to Se and rl;

(3) Judging whether the shielding noise mask exceeds the boundary of the pedestrian image I, if so, returning to the second step, and if not, entering the next step;

(4) Assigning a random value or an average value to pixels in the shielding noise mask;

(5) And returning a new pedestrian image.

103. And inputting the data enhanced image into a first pedestrian image recognition network for analysis so as to extract a first anti-shielding high-level semantic feature vector in the data enhanced image.

In step 103, the first pedestrian image recognition network and the second pedestrian image recognition network that share the network parameters are set in the pedestrian image recognition model, and the network design makes the first pedestrian image recognition network still extract robust high-level features (i.e., the first anti-occlusion high-level semantic feature vector) that are not interfered by the occlusion noise in the data enhancement image under the influence of the occlusion noise.

104. And inputting the data enhanced image to a second pedestrian image recognition network for analysis so as to extract a second anti-occlusion high-level semantic feature vector in the data enhanced image.

In step 104, the network design in the pedestrian image recognition model can also enable the second pedestrian image recognition network to still extract the robust high-level features (i.e., the second anti-occlusion high-level semantic feature vector) of the data enhancement image, which are not interfered by the occlusion noise, under the influence of the occlusion noise.

105. Updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the first anti-shielding high-level semantic feature vector, the second anti-shielding high-level semantic feature vector and a preset loss function so as to train a pedestrian image recognition model;

In step 105, after the first anti-occlusion high-level semantic feature vector and the second anti-occlusion high-level semantic feature vector are extracted, the pedestrian image recognition model may be trained using the first anti-occlusion high-level semantic feature vector, the second anti-occlusion high-level semantic feature vector, and a preset loss function. Specifically, the preset loss function is used for restraining the two pedestrian image recognition networks to learn common information between the two data enhancement images, and network parameters in the two pedestrian image recognition networks are continuously updated through inverse gradient propagation in the training process, so that the setting of the network parameters continuously tends to be reasonable, and the capability of the two pedestrian image recognition networks to extract robust high-level features (namely anti-occlusion high-level semantic features) which are not interfered by occlusion noise in the data enhancement images is continuously improved.

Therefore, implementing the training method of the pedestrian image recognition model described in fig. 1, through setting the first pedestrian image recognition network and the second pedestrian image recognition network which share network parameters in the pedestrian image recognition model, the model structure is more favorable for recognizing pedestrians in pedestrian shielding images, then, a data enhancement image used for model training is obtained by adding shielding noise in the first pedestrian image, the mode of adding shielding noise is also favorable for learning the pedestrian image recognition model to have the capability of recognizing pedestrians in pedestrian shielding images in a targeted manner, then, two pedestrian image recognition networks are used for extracting anti-shielding high-layer semantic features from the data enhancement image, and finally, training of the pedestrian image recognition model is completed by using a preset loss function and the anti-shielding high-layer semantic features, so that the pedestrian image recognition model has stronger capability of recognizing pedestrians in pedestrian shielding images in a targeted manner, the pedestrian image recognition model after training can recognize pedestrians in pedestrian shielding images more accurately, and the recognition accuracy of the recognition technology is improved.

In an optional embodiment, the updating the network parameters of the first pedestrian image identification network and the second pedestrian image identification network based on the first anti-occlusion high-level semantic feature vector, the second anti-occlusion high-level semantic feature vector and a preset loss function to implement training of the pedestrian image identification model includes:

inputting the first anti-shielding high-level semantic feature vector into a preset multi-layer perceptron for analysis to obtain a multi-layer perceptual feature vector;

and updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the multi-layer perception feature vector, the second anti-occlusion high-layer semantic feature vector and a preset loss function so as to train the pedestrian image recognition model.

In this alternative embodiment, it has been found in practice that such a symmetrical model structure of the first pedestrian image recognition network and the second pedestrian image recognition network, in which the shared network parameters are set in the pedestrian image recognition model, is prone to a situation in which the outputs of the two pedestrian image recognition networks are highly approximated, resulting in collapse solutions. In order to reduce the occurrence of collapse and solution, a multi-layer perceptron can be added after the first pedestrian image recognition network to modify the model structure into an asymmetric structure, so that the occurrence of collapse and solution caused by the fact that network parameters tend to be the same in the model training process can be reduced, and the stability and adaptability of the pedestrian image recognition model are further enhanced.

In an alternative embodiment, the data-enhanced image comprises a first data-enhanced sub-image and a second data-enhanced sub-image;

and, the loss function is:

wherein p is ₁ For the feature vector, p, corresponding to the first data enhancer image in the multi-layer perceptual feature vector ₂ Z is the feature vector corresponding to the second data enhancer image in the multi-layer perceptual feature vector ₁ Z is the feature vector corresponding to the first data enhancer image in the second anti-occlusion high-level semantic feature vector ₂ And L is a loss value, and D (a, b) is a negative cosine similarity value between the feature vector a and the feature vector b for the feature vector corresponding to the second data enhancer image in the second anti-occlusion high-level semantic feature vector.

In this alternative embodiment, after adding the multi-layer perceptron after the first pedestrian image recognition network to modify the model structure to an asymmetric structure, the loss function may also be adaptively modified to the loss function described above to accommodate the new asymmetric model structure.

converting the second anti-occlusion high-level semantic feature vector into a gradient stop feature vector based on a preset gradient stop operator;

and updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the multi-layer perception feature vector, the gradient stopping feature vector and a preset loss function so as to train the pedestrian image recognition model.

In this alternative embodiment, as shown in fig. 2, in order to further increase the asymmetry of the model structure, a gradient operator may be added after the first pedestrian image recognition network, and meanwhile, a multi-layer perceptron may be added after the second pedestrian image recognition network, so that the occurrence of the situation that the network parameters tend to be the same and cause collapse and solution in the model training process can be further reduced, and the stability and adaptability of the pedestrian image recognition model are further enhanced.

and, the loss function is:

wherein p is ₁ For the feature vector, p, corresponding to the first data enhancer image in the multi-layer perceptual feature vector ₂ Z is the feature vector corresponding to the second data enhancer image in the multi-layer perceptual feature vector ₁ Z is the feature vector corresponding to the first data enhancer image in the second anti-occlusion high-level semantic feature vector ₂ For the feature vector corresponding to the second data enhancer image in the second anti-occlusion high-level semantic feature vector, a stop (z ₁ ) For the feature vector corresponding to the first data enhancer image in the gradient stop feature vector, a stopgard (z ₂ ) And stopping the feature vector corresponding to the second data enhancer image in the feature vector for the gradient, wherein L is a loss value, and D (a, b) is a negative cosine similarity value between the feature vector a and the feature vector b.

In this alternative embodiment, the multi-layer perceptron is added after the first pedestrian image recognition network and the gradient operator is added after the second pedestrian image recognition network, the penalty function may also be adaptively modified to the penalty function described above to accommodate the new asymmetric model structure.

In an optional embodiment, after the updating of the network parameters of the first pedestrian image identification network and the second pedestrian image identification network based on the first anti-occlusion high-level semantic feature vector, the second anti-occlusion high-level semantic feature vector, and a preset loss function to implement training of the pedestrian image identification model, the method further includes:

acquiring a second pedestrian image preset with a corresponding labeling label;

updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the second pedestrian image and a preset triplet loss function so as to realize adjustment of the pedestrian image recognition model;

wherein the triplet loss function is:

L _triplet ＝max(d(a,p)-d(a,n)+margin,0)

wherein a is a sample corresponding to the second pedestrian image in a preset labeled training dataset, p is a sample randomly selected from the labeled training dataset and belonging to the same class as the second pedestrian image, n is a sample randomly selected from the labeled training dataset and belonging to a different class from the second pedestrian image, margin is a preset boundary micro-constant, L _triplet D (a, p) is the Euclidean distance between a and p, and d (a, n) is the Euclidean distance between a and n.

In this alternative embodiment, the second tagged pedestrian image may be obtained from a pre-set tagged training dataset (e.g., MSMT17 dataset). A sample Anchor is randomly selected from the training data set with the tag, and then a sample Positive belonging to the same class with the Anchor and a sample Negative belonging to different classes with the Anchor are randomly selected. For a set triplet (active, negative), the triplet loss function tries to learn a feature space such that in the feature space, the reference sample (active) is closer to the Positive sample (Positive) and the reference sample (active) is farther from the Negative sample (Negative). And updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network again based on the second pedestrian image and a preset triplet loss function, namely training the first pedestrian image recognition network and the second pedestrian image recognition network by using the second pedestrian image and the preset triplet loss function. The recognition accuracy of the pedestrian re-recognition technology can be further improved by adjusting the pedestrian image recognition model through the triplet loss function.

It can be seen that, after the training of the pedestrian image recognition model is completed, the second line of the labeled pedestrian image and the triplet loss function are continuously used for adjusting the pedestrian image recognition model, so that the recognition accuracy of the pedestrian re-recognition technology can be further improved.

In an optional embodiment, after the updating of the network parameters of the first pedestrian image identification network and the second pedestrian image identification network based on the second pedestrian image and a preset triplet loss function to implement the adjustment of the pedestrian image identification model, the method further includes:

updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the second pedestrian image and a preset cross entropy loss function so as to realize adjustment of the pedestrian image recognition model;

wherein the cross entropy loss function is:

wherein y is _i Labeling labels corresponding to the ith sample in the labeled training dataset, p _i For the predicted probability value corresponding to the ith sample in the labeled training dataset, N is the number of samples in the labeled training dataset, L _CE Is a cross entropy loss value.

In this alternative embodiment, the process of updating the network parameters of the first pedestrian image identification network and the second pedestrian image identification network again based on the second pedestrian image and the preset cross entropy loss function is a process of training the first pedestrian image identification network and the second pedestrian image identification network using the second pedestrian image and the preset cross entropy loss function.

It can be seen that, in implementing this alternative embodiment, after the second line of person images with labels and the triplet loss function are used to adjust the pedestrian image recognition model, the second pedestrian image and the cross entropy loss function are used to adjust the pedestrian image recognition model again, so that the recognition accuracy of the pedestrian re-recognition technology can be further improved.

Optionally, it is also possible to: and uploading training information of the pedestrian image recognition model of the training method of the pedestrian image recognition model to a blockchain.

Specifically, the training information of the pedestrian image recognition model is obtained by running the training method of the pedestrian image recognition model, and is used for recording the training condition of the pedestrian image recognition model, such as the acquired first pedestrian image, the data enhancement image, the extracted first anti-occlusion high-level semantic feature vector, the extracted second anti-occlusion high-level semantic feature vector, the trained pedestrian image recognition model and the like. The training information of the pedestrian image recognition model is uploaded to the blockchain, so that the safety and the fairness and transparency to users can be ensured. The user can download the training information of the pedestrian image recognition model from the blockchain so as to verify whether the training information of the pedestrian image recognition model of the training method of the pedestrian image recognition model is tampered. The blockchain referred to in this example is a novel mode of application for computer technology such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Example two

Referring to fig. 3, fig. 3 is a schematic structural diagram of a training device for a pedestrian image recognition model according to an embodiment of the application. As shown in fig. 3, the training device of the pedestrian image recognition model may include:

an acquiring module 201, configured to acquire a first pedestrian image not provided with a corresponding labeling tag;

the data enhancement module 202 is configured to add shielding noise to the first pedestrian image based on a preset data enhancement method, so as to obtain a data enhancement image;

the analysis module 203 is configured to input the data enhanced image to a first pedestrian image recognition network for analysis, so as to extract a first anti-occlusion high-level semantic feature vector in the data enhanced image;

the analysis module 203 is further configured to input the data enhanced image to a second pedestrian image recognition network for analysis, so as to extract a second anti-occlusion high-level semantic feature vector in the data enhanced image;

the updating module 204 is configured to update network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the first anti-occlusion high-level semantic feature vector, the second anti-occlusion high-level semantic feature vector, and a preset loss function, so as to train a pedestrian image recognition model;

For the specific description of the training device of the pedestrian image recognition model, reference may be made to the specific description of the training method of the pedestrian image recognition model, and for avoiding repetition, the description will not be repeated here.

Example III

Referring to fig. 4, fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the application. As shown in fig. 4, the computer device may include:

a memory 301 storing executable program code;

a processor 302 connected to the memory 301;

the processor 302 invokes the executable program code stored in the memory 301 to perform the steps in the training method of the pedestrian image recognition model disclosed in the first embodiment of the present application.

Example IV

Referring to fig. 5, an embodiment of the present application discloses a computer storage medium 401, where the computer storage medium 401 stores computer instructions for executing steps in the training method of the pedestrian image recognition model disclosed in the embodiment of the present application when the computer instructions are called.

The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

From the above detailed description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product that may be stored in a computer-readable storage medium including Read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic disc Memory, tape Memory, or any other medium that can be used for computer-readable carrying or storing data.

Finally, it should be noted that: the training method, device, computer equipment and storage medium of the pedestrian image recognition model disclosed by the embodiment of the application are disclosed as the preferred embodiment of the application, and are only used for illustrating the technical scheme of the application, but are not limited to the technical scheme; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme recorded in the various embodiments can be modified or part of technical features in the technical scheme can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method for training a pedestrian image recognition model, the method comprising:

updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the first anti-shielding high-level semantic feature vector, the second anti-shielding high-level semantic feature vector and a preset loss function so as to train a pedestrian image recognition model, wherein in the training process, the network parameters in the two pedestrian image recognition networks are continuously updated through inverse gradient propagation;

the pedestrian image recognition model comprises a first pedestrian image recognition network and a second pedestrian image recognition network, and network parameters are shared between the first pedestrian image recognition network and the second pedestrian image recognition network;

acquiring a second pedestrian image preset with a corresponding labeling label;

wherein the triplet loss function is:

L _triplet ＝max(d(a,p)-d(a,n)+margin,0)

wherein a is a sample corresponding to the second pedestrian image in a preset labeled training dataset, p is a sample randomly selected from the labeled training dataset and belonging to the same class as the second pedestrian image, and n is a sample selected from the labeled training datasetRandomly selecting a sample in the labeled training dataset, which is different from the second pedestrian image, wherein margin is a preset boundary micro constant, L _triplet D (a, p) is the Euclidean distance between a and p, and d (a, n) is the Euclidean distance between a and n;

wherein the cross entropy loss function is:

2. The training method of the pedestrian image recognition model according to claim 1, wherein updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the first anti-occlusion high-level semantic feature vector, the second anti-occlusion high-level semantic feature vector, and a preset loss function to achieve training of the pedestrian image recognition model includes:

3. The method of training a pedestrian image recognition model of claim 2 wherein the data-enhanced image comprises a first data-enhanced sub-image and a second data-enhanced sub-image;

and, the loss function is:

4. The training method of the pedestrian image recognition model according to claim 1, wherein updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the first anti-occlusion high-level semantic feature vector, the second anti-occlusion high-level semantic feature vector, and a preset loss function to achieve training of the pedestrian image recognition model includes:

5. The method of training a pedestrian image recognition model of claim 4 wherein the data-enhanced image comprises a first data-enhanced sub-image and a second data-enhanced sub-image;

and, the loss function is:

6. A training device for a pedestrian image recognition model, the device comprising:

the updating module is used for updating network parameters of the first pedestrian image recognition network and the second pedestrian image recognition network based on the first anti-shielding high-level semantic feature vector, the second anti-shielding high-level semantic feature vector and a preset loss function so as to train a pedestrian image recognition model, and in the training process, the network parameters in the two pedestrian image recognition networks are continuously updated through inverse gradient propagation;

the updating module is also used for acquiring a second pedestrian image preset with a corresponding labeling label;

wherein the triplet loss function is:

L _triplet ＝max(d(a,p)-d(a,n)+margin,0)

wherein a is a pre-preparationThe set sample corresponding to the second pedestrian image in the labeled training dataset is p is a sample randomly selected from the labeled training dataset and belonging to the same class as the second pedestrian image, n is a sample randomly selected from the labeled training dataset and belonging to a different class from the second pedestrian image, margin is a preset boundary micro constant, L _triplet D (a, p) is the Euclidean distance between a and p, and d (a, n) is the Euclidean distance between a and n;

wherein the cross entropy loss function is:

7. A computer device, the computer device comprising:

a memory storing executable program code;

a processor coupled to the memory;

the processor invokes the executable program code stored in the memory to perform the training method of the pedestrian image recognition model of any one of claims 1-5.

8. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the training method of the pedestrian image recognition model according to any one of claims 1-5.