CN111582107B

CN111582107B - Training method and recognition method of target re-recognition model, electronic equipment and device

Info

Publication number: CN111582107B
Application number: CN202010351798.XA
Authority: CN
Inventors: 杨希; 李平生; 朱树磊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2023-09-29
Anticipated expiration: 2040-04-28
Also published as: CN111582107A

Abstract

The application discloses a training method, a recognition method, electronic equipment and a device for a target re-recognition model. Extracting anchor point sample characteristics, and further obtaining anchor point attention characteristics; obtaining anchor point comprehensive characteristics according to the anchor point sample characteristics and the anchor point attention characteristics; extracting similar sample characteristics and heterogeneous sample characteristics, and further obtaining similar attention characteristics and heterogeneous attention characteristics; obtaining similar comprehensive characteristics according to similar sample characteristics and similar attention characteristics; obtaining a heterogeneous comprehensive characteristic according to the heterogeneous sample characteristic and the heterogeneous attention characteristic; calculating the similar difference degrees of the similar comprehensive features and the anchor point comprehensive features and the heterogeneous difference degrees of the heterogeneous comprehensive features and the anchor point comprehensive features; and training the re-recognition model by taking the difference degree of the same class as the target that the difference degree of the same class is smaller than the difference degree of different classes. The target re-recognition model obtained by the training method can inhibit the background interference and the interference of local shielding in the re-recognition process, and can re-recognize partially shielded pedestrians.

Description

Training method and recognition method of target re-recognition model, electronic equipment and device

Technical Field

The application belongs to the technical field of target tracking, and particularly relates to a training method, a recognition method, electronic equipment and a device for a target re-recognition model.

Background

With the acceleration of the urban process, social public safety demands are increasing, a plurality of important public places cover a wide camera network, automatic monitoring by using a computer vision technology becomes a focus of attention, and pedestrian re-identification technology also becomes a focus of research gradually. Pedestrian re-recognition is used to retrieve a target pedestrian in view across the camera. Because of the complexity and variability of the monitoring scene, the collected pedestrian images often have the difficulties of illumination change, visual angle posture change, shielding and the like, and great challenges are brought to the re-recognition of pedestrians. How to identify the target pedestrians in complex and changeable monitoring scenes becomes a problem to be solved.

Disclosure of Invention

The application provides a training method, a recognition method, electronic equipment and a device for a target re-recognition model, which are used for solving the problem of recognizing target pedestrians in complex and changeable monitoring scenes.

In order to solve the technical problems, the application adopts a technical scheme that: a method of training a target re-recognition model, the re-recognition model comprising: the system comprises an anchor point feature extraction module, an anchor point attention module connected with the anchor point feature extraction module, an identification feature extraction module and an identification attention module connected with the identification feature extraction module; the training method comprises the following steps: inputting an anchor training sample into an anchor feature extraction module to obtain anchor sample features, and inputting the anchor sample features into an anchor attention module to obtain anchor attention features; obtaining anchor point comprehensive characteristics according to the anchor point sample characteristics and the anchor point attention characteristics; inputting the similar training samples of the anchor point training samples into an identification feature extraction module to obtain similar sample features, and inputting the similar sample features into an identification attention module to obtain similar attention features; obtaining similar comprehensive characteristics according to similar sample characteristics and similar attention characteristics; inputting the heterogeneous training sample of the anchor point training sample into an identification feature extraction module to obtain heterogeneous sample features, and inputting the heterogeneous sample features into an identification attention module to obtain heterogeneous attention features; obtaining a heterogeneous comprehensive characteristic according to the heterogeneous sample characteristic and the heterogeneous attention characteristic; calculating the homogeneous degree of difference between the homogeneous comprehensive features and the anchor point comprehensive features and the heterogeneous degree of difference between the heterogeneous comprehensive features and the anchor point comprehensive features; and training the re-recognition model by taking the similar difference degree smaller than the heterogeneous difference degree as a target.

According to an embodiment of the present application, the re-recognition model further includes a recognition local extraction module connected to the recognition feature extraction module; the training method further comprises the following steps: inputting the similar sample characteristics into a recognition local extraction module to obtain similar local characteristics; inputting the heterogeneous sample characteristics into an identification local extraction module to obtain heterogeneous local characteristics; obtaining the similar comprehensive characteristics according to the similar sample characteristics and the similar attention characteristics, wherein the method comprises the following steps: fusing the same kind of local features and the same kind of attention features to obtain the same kind of comprehensive features; the obtaining the heterogeneous comprehensive characteristics according to the heterogeneous sample characteristics and the heterogeneous attention characteristics comprises the following steps: and fusing the heterogeneous local features and the heterogeneous attention features to obtain the heterogeneous comprehensive features.

According to an embodiment of the present application, the inputting the similar sample features into the identifying local extraction module to obtain similar local features includes: pooling and aligning the similar sample features and anchor point sample features to obtain the similar local features; the step of inputting the heterogeneous sample characteristics into the local extraction module to obtain heterogeneous local characteristics comprises the following steps: and pooling and aligning the heterogeneous sample features and the anchor point sample features to obtain the homogeneous local features.

According to an embodiment of the present application, the anchor feature extraction module and the identification feature extraction module are a fourth layer of the res net50 network.

According to an embodiment of the present application, the anchor training sample is a sample image including a training target, and the training target includes a vehicle or a pedestrian; the similar training samples are sample images containing the training targets; the heterogeneous training sample is a sample image that does not include the training target.

In order to solve the technical problems, the application adopts another technical scheme that: an identification method based on a target re-identification model, the target re-identification model comprising: the system comprises an anchor point feature extraction module, an anchor point attention module connected with the anchor point feature extraction module, an identification feature extraction module and an identification attention module connected with the identification feature extraction module; the identification method comprises the following steps: inputting the target image into an anchor point feature extraction module to obtain a target image feature, and inputting the target image feature into an anchor point attention module to obtain a target image attention feature; obtaining a target image comprehensive feature according to the target image feature and the target image attention feature; inputting the predicted image into a recognition feature extraction module to obtain predicted image features, and inputting the predicted image features into a recognition attention module to obtain predicted image attention features; obtaining predicted image comprehensive characteristics according to the predicted image characteristics and the predicted image attention characteristics; calculating the similarity of the target image comprehensive characteristics and the predicted image comprehensive characteristics; and identifying the target in the predicted image based on the similarity.

According to an embodiment of the present application, the target re-recognition model further includes a local extraction module connected to the recognition feature extraction module; the identification method further comprises the following steps: inputting the predicted image features into a recognition local extraction module to obtain predicted image local features; the obtaining the predicted image comprehensive feature according to the predicted image feature and the predicted image attention feature comprises the following steps: and fusing the local features of the predicted image and the attention features of the predicted image to obtain the comprehensive features of the predicted image.

According to an embodiment of the present application, the inputting the predicted image feature into the identifying local extraction module to obtain the predicted image local feature includes: and pooling and aligning the predicted image features and the target image features to obtain the predicted image local features.

According to an embodiment of the present application, the target re-recognition model is trained by any one of the training methods described above.

In order to solve the technical problems, the application adopts another technical scheme that: an electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement any one of the training methods described above and any one of the identification methods described above.

In order to solve the technical problems, the application adopts another technical scheme that: a computer readable storage medium having stored thereon program data which when executed by a processor implements any of the training methods described above and any of the identification methods described above.

The beneficial effects of the application are as follows: the re-identification model obtained through training by the training method is simple in structure, the parameter quantity is reduced, the model is lighter, and the similar difference degree is smaller than the heterogeneous difference degree through ternary loss function optimization, so that the features extracted by the trained re-identification model have low inter-class coupling and high intra-class aggregation, and whether the features are similar can be better judged. The re-recognition model structure obtained by training can be used for recognizing vehicles or pedestrians, attention features and aligned local features can be fused, the global features of the target are focused, and meanwhile, the local features of the target are focused, so that the background interference, the human posture change and the local shielding interference in the re-recognition process are restrained, and the re-recognition of partial shielding and non-aligned pedestrians can be realized.

Drawings

For a clearer description of the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the description below are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art, wherein:

FIG. 1 is a flow chart of an embodiment of a training method of a target re-recognition model according to the present application;

FIG. 2 is a schematic diagram of alignment of homogeneous sample features and anchor sample features in a training method of a target re-recognition model of the present application;

FIG. 3 is a flow chart of an embodiment of a target re-recognition model based recognition method of the present application;

FIG. 4 is a schematic diagram of a frame of an embodiment of an electronic device of the present application;

FIG. 5 is a schematic diagram of a training apparatus for a target re-recognition model according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an embodiment of an identification device based on a target re-identification model according to the present application;

fig. 7 is a schematic structural diagram of an embodiment of the device with memory function of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a training method of a target re-recognition model according to the present application.

The embodiment of the application provides a training method of a target re-identification model, wherein the target re-identification model comprises an anchor point feature extraction module, an anchor point attention module connected with the anchor point feature extraction module, an identification feature extraction module and an identification attention module connected with the identification feature extraction module. The training method comprises the following steps:

s101: and inputting the anchor training sample into an anchor feature extraction module to obtain anchor sample features, and inputting the anchor sample features into an anchor attention module to obtain anchor attention features.

The anchor point sample is a sample image containing a training target, and the training target can be of a vehicle or a pedestrian. And inputting the anchor point training sample into an anchor point characteristic extraction module to obtain anchor point sample characteristics. The anchor point characteristic extraction module can select a ResNet-50 network model, and select a fourth layer ResNet50-4layers of the ResNet-50 network model to extract anchor point sample characteristics. The anchor point sample characteristics output by ResNet50-4layers have proper receptive fields, so that the anchor point sample characteristics not only have the context information of the target, but also keep the local information and the discrimination information of the target. The ResNet50-4layers can be selected to enhance the capability of the attention module to extract attention features, reduce dependence on feature layers with little target space semantic sharing and improve operation speed.

And inputting the anchor sample characteristics into an anchor attention module to obtain anchor attention characteristics. The attention module comprises a space attention and a channel attention, respectively identifies the characteristics of the anchor point sample, gives the characteristics of the anchor point sample weight, and obtains the characteristics of the anchor point attention, wherein the higher the similarity between the characteristics of the anchor point sample and key information is, the higher the weight is. The anchor point attention module starts from the global features of the anchor point sample features, searches the key attention area of the target on the anchor point sample feature map, and suppresses the background influence irrelevant to the target.

S102: and obtaining an anchor point comprehensive characteristic according to the anchor point sample characteristic and the anchor point attention characteristic.

And combining the anchor point sample characteristics and the anchor point attention characteristics to obtain anchor point comprehensive characteristics.

S103: the similar training samples of the anchor point training samples are input into the recognition feature extraction module to obtain similar sample features, and the similar sample features are input into the recognition attention module to obtain similar attention features.

The same class training sample is a sample image having the same training target as the anchor point training sample, e.g., the training target of the anchor point sample is a vehicle or pedestrian, then the same class training sample is a sample image containing the same vehicle or pedestrian. And inputting the similar training samples of the anchor point training samples into the recognition feature extraction module to obtain similar sample features.

The recognition feature extraction module selects a ResNet-50 network model, and selects a fourth layer ResNet50-4layers of the ResNet-50 network model to extract similar sample features. The homogeneous sample characteristics output by ResNet50-4layers have proper receptive fields, so that the method not only has the context information of the target, but also keeps the local information and the discrimination information of the target.

And inputting the similar sample characteristics into the recognition attention module to obtain similar attention characteristics. The attention module comprises a spatial attention and a channel attention, and is used for respectively identifying similar sample characteristics and giving the similar sample characteristics weight to obtain the similar attention characteristics, wherein the higher the similarity between the similar sample characteristics and key information is, the higher the weight is. The similar attention module starts from global features of similar sample features, searches for a focus attention area of a target on a similar sample feature map, and suppresses background influence irrelevant to the target.

S104: and obtaining the similar comprehensive characteristics according to the similar sample characteristics and the similar attention characteristics.

In one embodiment, homogeneous comprehensive features are derived from homogeneous sample features and homogeneous attention features. The same kind of comprehensive characteristics can be directly used for calculating the same kind of difference degree.

S105: the heterogeneous training samples of the anchor point training samples are input into the recognition feature extraction module to obtain heterogeneous sample features, and the heterogeneous sample features are input into the recognition attention module to obtain heterogeneous attention features.

The heterogeneous training samples are sample images which do not contain training targets of the anchor point training samples, for example, the training targets of the anchor point training samples are vehicles or pedestrians, the heterogeneous training samples are sample images which do not contain the same vehicles or pedestrians, and the heterogeneous training samples of the anchor point training samples are input into the recognition feature extraction module to obtain heterogeneous sample features.

The recognition feature extraction module selects a ResNet-50 network model, and selects a fourth layer ResNet50-4layers of the ResNet-50 network model to extract heterogeneous sample features. The heterogeneous sample characteristics output by ResNet50-4layers have proper receptive fields, so that the heterogeneous sample characteristics not only have the context information of the target, but also keep the local information and the discrimination information of the target.

The heterogeneous sample features are input into the recognition attention module to obtain the heterogeneous attention features. The attention module comprises a spatial attention and a channel attention, and respectively identifies and gives weight to the heterogeneous sample characteristics to obtain the heterogeneous attention characteristics, wherein the higher the similarity between the heterogeneous sample characteristics and the key information is, the higher the weight is.

S106: and obtaining the heterogeneous comprehensive characteristics according to the heterogeneous sample characteristics and the heterogeneous attention characteristics.

In one embodiment, the heterogeneous integrated features are derived from the heterogeneous sample features and the heterogeneous attention features. The heterogeneous integrated features can be directly used for subsequent calculation of the heterogeneous diversity order.

It should be noted that, the steps S101 to S102, S103 to S104, and 105 to S106 may be performed simultaneously, so as to improve the calculation efficiency.

Further, the re-identification model further comprises an identification local extraction module, and the identification local extraction module is connected with the identification feature extraction module.

The training method further comprises the following steps:

the method for inputting the similar sample features into the recognition local extraction module to obtain the similar local features comprises the following steps: pooling and aligning the similar sample features and anchor point sample features to obtain similar local features.

Specifically, the characteristics of the similar samples and the characteristics of the anchor samples are respectively subjected to horizontal pooling, local characteristic diagrams are respectively output, the local characteristic diagrams of the similar samples and the local characteristic diagrams of the anchor samples are aligned, an association matrix is calculated, the shortest path is obtained through a dynamic programming method, and a dynamic programming solving formula is as follows:

referring to fig. 2, fig. 2 is a schematic diagram illustrating alignment between a similar sample feature and an anchor sample feature in the training method of the target re-recognition model according to the present application.

And (3) comparing the cosine distance of the local feature passed by the minimum path with a threshold value, and if the threshold value requirement is met, considering the local feature as an aligned local feature. For the same type of local feature aligned, the alignment vector of the local feature can be marked as 1; for non-aligned homogeneous local features, its alignment vector may be marked as 0. So that the features of the aligned areas can be preserved, and the features of the non-aligned or occluded areas can be masked.

The alignment vector for a set of length 8 region features in fig. 2 is [1,1,1,1,1,0,0,0], indicating the alignment of the first five local features.

Further, step S104 obtains homogeneous comprehensive features according to homogeneous sample features and homogeneous attention features, including: s1041: and fusing the same kind of local features and the same kind of attention features to obtain the same kind of comprehensive features.

And fusing the similar local features and the similar attention features, and carrying out weight multiplication on the similar local features and the corresponding similar attention features to obtain fused similar comprehensive features.

Inputting the heterogeneous sample characteristics into an identification local extraction module to obtain heterogeneous local characteristics, wherein the method comprises the following steps: and pooling and aligning the heterogeneous sample characteristics and the anchor point sample characteristics to obtain the heterogeneous local characteristics. The specific obtaining process of the heterogeneous local features is approximately the same as that of the homogeneous local features, and will not be described herein.

Further, step S106 obtains a heterogeneous integrated feature according to the heterogeneous sample feature and the heterogeneous attention feature, including: s1061: and fusing the heterogeneous local features and the heterogeneous attention features to obtain the heterogeneous comprehensive features.

And fusing the heterogeneous local features and the heterogeneous attention features, and carrying out weight multiplication on the heterogeneous local features and the corresponding heterogeneous attention features to obtain fused heterogeneous comprehensive features.

S107: and calculating the homogeneous degree of difference of the homogeneous comprehensive features and the anchor point comprehensive features and the heterogeneous degree of difference of the heterogeneous comprehensive features and the anchor point comprehensive features.

And calculating the homogeneous degree of difference of the homogeneous comprehensive features and the anchor point comprehensive features and the heterogeneous degree of difference of the heterogeneous comprehensive features and the anchor point comprehensive features. The similar comprehensive features and the heterogeneous comprehensive features obtained in step S104 and step S106 are two-dimensional features, need to be mapped into one-dimensional features through a full connection layer, and are used for the following operation.

In this embodiment, cosine similarity is used as the difference measure, and the calculation formula is as follows:

where a and B represent vectors of two synthesis features that need to be brought into the calculation, such as homogeneous and anchor synthesis features, or heterogeneous and anchor synthesis features, respectively. A is that _i And B _i Representing the components of vectors a and B, respectively.

S108: and training the re-recognition model by taking the difference degree of the same class as the target that the difference degree of the same class is smaller than the difference degree of different classes.

In an embodiment, the re-recognition model adopts a ternary loss function, takes cosine similarity as a measurement mode, and optimizes the re-recognition model through a random gradient descent algorithm so that the similar difference degree of the similar comprehensive features and the anchor point comprehensive features is smaller than the heterogeneous difference degree of the heterogeneous comprehensive features and the anchor point comprehensive features. When training is carried out for a certain number of times and the preset standard is reached, the re-identification model training is completed and can be used for identifying the target.

Wherein the ternary loss function is as follows:

Triplet_losses＝||(Cosin(A,N)–Cosin(A,P)+margin)||

wherein Cosin (a, N) and Cosin (a, P) represent the dissimilarity degree and the like dissimilarity degree, i.e., the cosine similarity of the dissimilarity comprehensive feature and the anchor point comprehensive feature, and the cosine similarity of the like comprehensive feature and the anchor point comprehensive feature, respectively.

The re-identification model obtained through training by the training method is simple in structure, the parameter quantity is reduced, the model is lighter, and the similar difference degree is smaller than the heterogeneous difference degree through ternary loss function optimization, so that the features extracted by the trained re-identification model have low inter-class coupling and high intra-class aggregation, and whether the features are similar can be better judged. The re-recognition model structure obtained by training in the application fuses the attention feature and the aligned local feature, focuses on the global feature of the target and focuses on the local feature of the target, thereby inhibiting the background interference, the human posture change and the local shielding interference in the re-recognition process and being capable of re-recognizing partially shielded and unaligned pedestrians.

Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of a target re-recognition model-based recognition method according to the present application.

Still another embodiment of the present application provides an identification method based on a target re-identification model, where the target re-identification model includes: the system comprises an anchor point feature extraction module, an anchor point attention module connected with the anchor point feature extraction module, an identification feature extraction module and an identification attention module connected with the identification feature extraction module. The identification method comprises the following steps:

s201: and inputting the target image into an anchor point feature extraction module to obtain target image features, and inputting the target image features into an anchor point attention module to obtain target image attention features.

The target image is an image required to contain a target to be identified, and the target can be a vehicle or a pedestrian. And inputting the target image into an anchor point feature extraction module to obtain the target image feature. The anchor point characteristic extraction module can select a ResNet-50 network model, and select a fourth layer ResNet50-4layers of the ResNet-50 network model to extract anchor point sample characteristics. The anchor point sample characteristics output by ResNet50-4layers have proper receptive fields, so that the anchor point sample characteristics not only have the context information of the target, but also keep the local information and the discrimination information of the target. The ResNet50-4layers can be selected to enhance the capability of the attention module to extract attention features, reduce dependence on feature layers with little target space semantic sharing and improve operation speed.

And inputting the target image characteristics into an anchor point attention module to obtain the target image attention characteristics. The attention module includes spatial attention and channel attention. The anchor point attention module starts from the global features of the target image features, searches the key attention area of the target on the target image feature map, and suppresses background influence irrelevant to the target.

S202: and obtaining the comprehensive characteristics of the target image according to the characteristics of the target image and the attention characteristics of the target image.

And combining the target image characteristics and the target image attention characteristics to obtain target image comprehensive characteristics.

S203: the predicted image is input into the recognition feature extraction module to obtain predicted image features, and the predicted image features are input into the recognition attention module to obtain predicted image attention features.

The number of the predicted images is multiple, the predicted images are input into an identification feature extraction module to obtain the predicted image features, the identification feature extraction module selects a ResNet-50 network model, and a fourth layer ResNet50-4layers of the ResNet-50 network model is selected to extract the predicted image features.

The predicted image features are input into the recognition attention module to obtain predicted image attention features. The attention module includes spatial attention and channel attention.

S204: and obtaining the comprehensive characteristics of the predicted image according to the characteristics of the predicted image and the attention characteristics of the predicted image.

In one embodiment, a predicted image composite feature is derived from the predicted image feature and the predicted image attention feature. The predicted image comprehensive characteristics can be directly used for subsequent calculation of heterogeneous diversity order, so that partial shielding targets can be identified.

The identification method further comprises the following steps:

inputting the predicted image features into a recognition local extraction module to obtain the predicted image local features, wherein the method comprises the following steps: and pooling and aligning the predicted image features and the target image features to obtain the local features of the predicted image.

Specifically, the predicted image features and the target image features are respectively subjected to horizontal pooling, local feature images are respectively output, the local feature images of the predicted image and the local feature images of the target image are aligned, an association matrix is calculated, the shortest path is obtained through a dynamic programming method, cosine distances of the local features are calculated to find the aligned local features, and the local features of the predicted image are output. For the aligned prediction image local feature, the alignment vector of the alignment prediction image local feature can be marked as 1; for non-aligned predicted image local features, its alignment vector may be marked as 0. So that the features of the aligned areas can be preserved, and the features of the non-aligned or occluded areas can be masked.

Further, step S204 obtains a predicted image integrated feature according to the predicted image feature and the predicted image attention feature, including: s2041: and fusing the local features of the predicted image and the attention features of the predicted image to obtain the comprehensive features of the predicted image.

And fusing the local features of the predicted image and the attention features of the predicted image, and carrying out weight multiplication to obtain the comprehensive features of the fused predicted image.

S205: and calculating the similarity of the target image comprehensive characteristics and the predicted image comprehensive characteristics.

And (3) calculating the similarity of the target image comprehensive characteristics and the predicted image comprehensive characteristics, wherein the predicted image comprehensive characteristics obtained in the step S204 are two-dimensional characteristics, are mapped into one-dimensional characteristics through a full connection layer, and can directly calculate cosine similarity.

The cosine similarity calculation formula is as follows:

wherein A and B respectively represent vectors of target image integrated features and predicted image integrated features required to be brought into calculation, A _i And B _i Representing the components of vectors a and B, respectively.

S206: and identifying the target in the predicted image based on the similarity.

Since the number of the predicted images is plural, the similarity between each predicted image and the target image can be calculated, and the predicted images with the highest similarity are selected as the best matched predicted images, and the predicted images and the target image are considered to be the most similar pedestrians, so that the re-recognition of the non-aligned or blocked pedestrians is completed.

The target re-identification model of the embodiment is obtained by training by any one of the training methods.

The re-identification model has a simple structure, reduces the parameter quantity, is lighter, and has the advantages that the similar difference degree is smaller than the heterogeneous difference degree through the ternary loss function optimization, so that the features extracted by the re-identification model have low inter-class coupling and high intra-class aggregation, and whether the features are similar can be better judged. The recognition method disclosed by the application fuses the attention feature and the aligned local feature, focuses on the global feature of the target and focuses on the local feature of the target, so that the background interference, the human posture change and the local shielding interference in the re-recognition process are restrained, and the re-recognition of partially shielded and unaligned pedestrians can be performed.

Referring to fig. 4, fig. 4 is a schematic diagram of a frame of an electronic device according to an embodiment of the application.

A further embodiment of the present application provides an electronic device 30, including a memory 31 and a processor 32 coupled to each other, where the processor 32 is configured to execute program instructions stored in the memory 31 to implement the training method of any of the embodiments and the identification method of any of the embodiments. In one particular implementation scenario, electronic device 30 may include, but is not limited to: the microcomputer and the server, and the electronic device 30 may also include a mobile device such as a notebook computer and a tablet computer, which is not limited herein.

In particular, the processor 32 is configured to control itself and the memory 31 to implement the steps of any of the video analysis method embodiments described above, or to implement the steps of any of the model training method embodiments described above for video analysis. The processor 32 may also be referred to as a CPU (Central Processing Unit ). The processor 32 may be an integrated circuit chip having signal processing capabilities. The processor 32 may also be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 32 may be commonly implemented by an integrated circuit chip.

Referring to fig. 5, fig. 5 is a schematic diagram of a training apparatus for a target re-recognition model according to an embodiment of the application.

Yet another embodiment of the present application provides a training apparatus 40 for a target re-recognition model, which includes an anchor feature extraction module 41, an anchor attention module 42, a recognition feature extraction module 45, a recognition attention module 46, and a processing module 43 and a calculation module 44.

The processing module 43 inputs the anchor training samples to the anchor feature extraction module 41 for anchor sample features and inputs the anchor sample features to the anchor attention module 42 for anchor attention features. The processing module 43 obtains anchor point integrated features from the anchor point sample features and the anchor point attention features. The processing module 43 inputs the same kind of training samples of the anchor point training samples to the recognition feature extraction module 45 to obtain the same kind of sample features, and inputs the same kind of sample features to the recognition attention module 45 to obtain the same kind of attention features. The processing module 43 obtains homogeneous comprehensive features based on homogeneous sample features and homogeneous attention features. The processing module 43 inputs the heterogeneous training samples of the anchor training samples to the recognition feature extraction module 45 to obtain heterogeneous sample features, and inputs the heterogeneous sample features to the recognition attention module 46 to obtain heterogeneous attention features. The processing module 43 derives a heterogeneous integrated feature from the heterogeneous sample feature and the heterogeneous attention feature. The calculation module 44 calculates the like variability of the like and anchor point comprehensive features and the heterogeneous variability of the heterogeneous and anchor point comprehensive features, and trains the re-recognition model with the like variability being smaller than the heterogeneous variability.

The re-recognition model trained by the training device of the embodiment can inhibit the background interference, the human posture change and the interference of local shielding in the re-recognition process, and can re-recognize partially shielded and unaligned pedestrians.

Referring to fig. 6, fig. 6 is a schematic diagram of an embodiment of an identification device based on a target re-identification model according to the present application.

Yet another embodiment of the present application provides an identification device 50 based on a target re-identification model, which includes an anchor feature extraction module 51, an anchor attention module 52, an identification feature extraction module 55, an identification attention module 56, and a processing module 53 and a calculation module 54.

The processing module 53 inputs the target image to the anchor feature extraction module 51 to obtain target image features and inputs the target image features to the anchor attention module 52 to obtain target image attention features. The processing module 53 obtains a target image composite feature from the target image feature and the target image attention feature. The processing module 53 inputs the predicted image to the recognition feature extraction module 55 to obtain the predicted image features, and inputs the predicted image features to the recognition attention module 56 to obtain the predicted image attention features. The processing module 53 obtains predicted image composite features from the predicted image features and the predicted image attention features. The calculation module 54 calculates the similarity of the target image integrated feature and the predicted image integrated feature, and performs recognition of the target in the predicted image based on the similarity.

The recognition device of the embodiment can inhibit the background interference, the human posture change and the local shielding interference in the re-recognition process, and can re-recognize the partially shielded and unaligned pedestrians.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a device with memory function according to the present application.

Yet another embodiment of the present application provides a computer readable storage medium 60 having stored thereon program data 61, which when executed by a processor, implements the training method of any of the embodiments described above and the identification method of any of the embodiments described above. Through the scheme, the interference of background interference, human posture change and local shielding in the re-recognition process can be restrained, and the re-recognition of partially shielded and unaligned pedestrians can be performed.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.

The elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium 60. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium 60, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium 60 includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description is only illustrative of the present application and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes or direct or indirect application in other related technical fields are included in the scope of the present application.

Claims

1. A method of training a target re-recognition model, the re-recognition model comprising: the system comprises an anchor point feature extraction module, an anchor point attention module connected with the anchor point feature extraction module, an identification attention module connected with the identification feature extraction module and an identification local extraction module connected with the identification feature extraction module; the training method comprises the following steps:

inputting an anchor point training sample image into the anchor point feature extraction module to obtain anchor point sample features, and inputting the anchor point sample features into the anchor point attention module to obtain anchor point attention features;

obtaining anchor point comprehensive characteristics according to the anchor point sample characteristics and the anchor point attention characteristics;

inputting the similar training sample images of the anchor point training sample images into the identification feature extraction module to obtain similar sample features, and inputting the similar sample features into the identification attention module to obtain similar attention features;

inputting the similar sample features into the identification local extraction module to obtain similar local features;

fusing the similar local features and the similar attention features to obtain similar comprehensive features;

inputting the heterogeneous training sample images of the anchor point training sample images into the recognition feature extraction module to obtain heterogeneous sample features, and inputting the heterogeneous sample features into the recognition attention module to obtain heterogeneous attention features;

inputting the heterogeneous sample characteristics into the identification local extraction module to obtain heterogeneous local characteristics;

fusing the heterogeneous local features and the heterogeneous attention features to obtain heterogeneous comprehensive features;

calculating the homogeneous degree of difference between the homogeneous comprehensive features and the anchor point comprehensive features and the heterogeneous degree of difference between the heterogeneous comprehensive features and the anchor point comprehensive features;

training the re-recognition model by taking the similar difference degree smaller than the heterogeneous difference degree as a target;

the step of inputting the similar sample features into the identification local extraction module to obtain similar local features comprises the following steps: carrying out horizontal pooling on the similar sample characteristics and the anchor point sample characteristics, and respectively outputting local characteristic diagrams; aligning the local feature images of the similar sample features with the local feature images of the anchor point sample features, calculating an association matrix, calculating the cosine distance of the local feature images, which are passed by the minimum path, through a dynamic programming method, comparing the cosine distance with a threshold value, and obtaining the similar local features if the cosine distance meets the threshold value requirement;

the step of inputting the heterogeneous sample features into the identification local extraction module to obtain heterogeneous local features comprises the following steps: carrying out horizontal pooling on the heterogeneous sample characteristics and the anchor point sample characteristics, and respectively outputting local characteristic diagrams; and aligning the local feature map of the heterogeneous sample feature with the local feature map of the anchor point sample feature, calculating an association matrix, calculating the cosine distance of the local feature map passed by the minimum path through a dynamic programming method, comparing the cosine distance with a threshold value, and obtaining the heterogeneous local feature if the cosine distance meets the threshold value requirement.

2. The training method of claim 1, wherein the anchor feature extraction module and the identification feature extraction module are a fourth layer of a res net50 network.

3. The training method of any of claims 1-2, wherein the anchor training sample image is a sample image comprising a training target, the training target comprising a vehicle or a pedestrian; the similar training sample image is a sample image containing the training target; the heterogeneous training sample image is a sample image that does not include the training target.

4. An identification method based on a target re-identification model is characterized in that the target re-identification model comprises the following steps: the system comprises an anchor point feature extraction module, an anchor point attention module connected with the anchor point feature extraction module, an identification attention module connected with the identification feature extraction module and an identification local extraction module connected with the identification feature extraction module; the target re-identification model is trained by the training method according to any one of claims 1 to 3; the identification method comprises the following steps:

inputting a target image into the anchor point feature extraction module to obtain a target image feature, and inputting the target image feature into the anchor point attention module to obtain a target image attention feature;

obtaining a target image comprehensive feature according to the target image feature and the target image attention feature;

inputting a predicted image into the recognition feature extraction module to obtain predicted image features, and inputting the predicted image features into the recognition attention module to obtain predicted image attention features;

inputting the predicted image features into the recognition local extraction module to obtain predicted image local features;

fusing the local features of the predicted image and the attention features of the predicted image to obtain comprehensive features of the predicted image;

calculating the similarity of the target image comprehensive characteristics and the predicted image comprehensive characteristics;

identifying targets in the predicted image based on the similarity;

the step of inputting the predicted image features into the recognition local extraction module to obtain the predicted image local features includes:

and respectively pooling and aligning the predicted image features and the target image features, respectively outputting local feature images, aligning the local feature images of the predicted image and the target image, calculating an association matrix, solving a shortest path by a dynamic programming method, and calculating cosine distances of the local features to find aligned local features to obtain the predicted image local features.

5. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the training method of any one of claims 1 to 3 or the identification method of claim 4.

6. A computer readable storage medium having stored thereon program data, which when executed by a processor implements the training method of any one of claims 1 to 3 or the identification method of claim 4.