CN111582107B - Training method and recognition method of target re-recognition model, electronic equipment and device - Google Patents

Training method and recognition method of target re-recognition model, electronic equipment and device Download PDF

Info

Publication number
CN111582107B
CN111582107B CN202010351798.XA CN202010351798A CN111582107B CN 111582107 B CN111582107 B CN 111582107B CN 202010351798 A CN202010351798 A CN 202010351798A CN 111582107 B CN111582107 B CN 111582107B
Authority
CN
China
Prior art keywords
features
anchor point
local
sample
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010351798.XA
Other languages
Chinese (zh)
Other versions
CN111582107A (en
Inventor
杨希
李平生
朱树磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202010351798.XA priority Critical patent/CN111582107B/en
Publication of CN111582107A publication Critical patent/CN111582107A/en
Application granted granted Critical
Publication of CN111582107B publication Critical patent/CN111582107B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a training method, a recognition method, electronic equipment and a device for a target re-recognition model. Extracting anchor point sample characteristics, and further obtaining anchor point attention characteristics; obtaining anchor point comprehensive characteristics according to the anchor point sample characteristics and the anchor point attention characteristics; extracting similar sample characteristics and heterogeneous sample characteristics, and further obtaining similar attention characteristics and heterogeneous attention characteristics; obtaining similar comprehensive characteristics according to similar sample characteristics and similar attention characteristics; obtaining a heterogeneous comprehensive characteristic according to the heterogeneous sample characteristic and the heterogeneous attention characteristic; calculating the similar difference degrees of the similar comprehensive features and the anchor point comprehensive features and the heterogeneous difference degrees of the heterogeneous comprehensive features and the anchor point comprehensive features; and training the re-recognition model by taking the difference degree of the same class as the target that the difference degree of the same class is smaller than the difference degree of different classes. The target re-recognition model obtained by the training method can inhibit the background interference and the interference of local shielding in the re-recognition process, and can re-recognize partially shielded pedestrians.

Description

Training method and recognition method of target re-recognition model, electronic equipment and device
Technical Field
The application belongs to the technical field of target tracking, and particularly relates to a training method, a recognition method, electronic equipment and a device for a target re-recognition model.
Background
With the acceleration of the urban process, social public safety demands are increasing, a plurality of important public places cover a wide camera network, automatic monitoring by using a computer vision technology becomes a focus of attention, and pedestrian re-identification technology also becomes a focus of research gradually. Pedestrian re-recognition is used to retrieve a target pedestrian in view across the camera. Because of the complexity and variability of the monitoring scene, the collected pedestrian images often have the difficulties of illumination change, visual angle posture change, shielding and the like, and great challenges are brought to the re-recognition of pedestrians. How to identify the target pedestrians in complex and changeable monitoring scenes becomes a problem to be solved.
Disclosure of Invention
The application provides a training method, a recognition method, electronic equipment and a device for a target re-recognition model, which are used for solving the problem of recognizing target pedestrians in complex and changeable monitoring scenes.
In order to solve the technical problems, the application adopts a technical scheme that: a method of training a target re-recognition model, the re-recognition model comprising: the system comprises an anchor point feature extraction module, an anchor point attention module connected with the anchor point feature extraction module, an identification feature extraction module and an identification attention module connected with the identification feature extraction module; the training method comprises the following steps: inputting an anchor training sample into an anchor feature extraction module to obtain anchor sample features, and inputting the anchor sample features into an anchor attention module to obtain anchor attention features; obtaining anchor point comprehensive characteristics according to the anchor point sample characteristics and the anchor point attention characteristics; inputting the similar training samples of the anchor point training samples into an identification feature extraction module to obtain similar sample features, and inputting the similar sample features into an identification attention module to obtain similar attention features; obtaining similar comprehensive characteristics according to similar sample characteristics and similar attention characteristics; inputting the heterogeneous training sample of the anchor point training sample into an identification feature extraction module to obtain heterogeneous sample features, and inputting the heterogeneous sample features into an identification attention module to obtain heterogeneous attention features; obtaining a heterogeneous comprehensive characteristic according to the heterogeneous sample characteristic and the heterogeneous attention characteristic; calculating the homogeneous degree of difference between the homogeneous comprehensive features and the anchor point comprehensive features and the heterogeneous degree of difference between the heterogeneous comprehensive features and the anchor point comprehensive features; and training the re-recognition model by taking the similar difference degree smaller than the heterogeneous difference degree as a target.
According to an embodiment of the present application, the re-recognition model further includes a recognition local extraction module connected to the recognition feature extraction module; the training method further comprises the following steps: inputting the similar sample characteristics into a recognition local extraction module to obtain similar local characteristics; inputting the heterogeneous sample characteristics into an identification local extraction module to obtain heterogeneous local characteristics; obtaining the similar comprehensive characteristics according to the similar sample characteristics and the similar attention characteristics, wherein the method comprises the following steps: fusing the same kind of local features and the same kind of attention features to obtain the same kind of comprehensive features; the obtaining the heterogeneous comprehensive characteristics according to the heterogeneous sample characteristics and the heterogeneous attention characteristics comprises the following steps: and fusing the heterogeneous local features and the heterogeneous attention features to obtain the heterogeneous comprehensive features.
According to an embodiment of the present application, the inputting the similar sample features into the identifying local extraction module to obtain similar local features includes: pooling and aligning the similar sample features and anchor point sample features to obtain the similar local features; the step of inputting the heterogeneous sample characteristics into the local extraction module to obtain heterogeneous local characteristics comprises the following steps: and pooling and aligning the heterogeneous sample features and the anchor point sample features to obtain the homogeneous local features.
According to an embodiment of the present application, the anchor feature extraction module and the identification feature extraction module are a fourth layer of the res net50 network.
According to an embodiment of the present application, the anchor training sample is a sample image including a training target, and the training target includes a vehicle or a pedestrian; the similar training samples are sample images containing the training targets; the heterogeneous training sample is a sample image that does not include the training target.
In order to solve the technical problems, the application adopts another technical scheme that: an identification method based on a target re-identification model, the target re-identification model comprising: the system comprises an anchor point feature extraction module, an anchor point attention module connected with the anchor point feature extraction module, an identification feature extraction module and an identification attention module connected with the identification feature extraction module; the identification method comprises the following steps: inputting the target image into an anchor point feature extraction module to obtain a target image feature, and inputting the target image feature into an anchor point attention module to obtain a target image attention feature; obtaining a target image comprehensive feature according to the target image feature and the target image attention feature; inputting the predicted image into a recognition feature extraction module to obtain predicted image features, and inputting the predicted image features into a recognition attention module to obtain predicted image attention features; obtaining predicted image comprehensive characteristics according to the predicted image characteristics and the predicted image attention characteristics; calculating the similarity of the target image comprehensive characteristics and the predicted image comprehensive characteristics; and identifying the target in the predicted image based on the similarity.
According to an embodiment of the present application, the target re-recognition model further includes a local extraction module connected to the recognition feature extraction module; the identification method further comprises the following steps: inputting the predicted image features into a recognition local extraction module to obtain predicted image local features; the obtaining the predicted image comprehensive feature according to the predicted image feature and the predicted image attention feature comprises the following steps: and fusing the local features of the predicted image and the attention features of the predicted image to obtain the comprehensive features of the predicted image.
According to an embodiment of the present application, the inputting the predicted image feature into the identifying local extraction module to obtain the predicted image local feature includes: and pooling and aligning the predicted image features and the target image features to obtain the predicted image local features.
According to an embodiment of the present application, the target re-recognition model is trained by any one of the training methods described above.
In order to solve the technical problems, the application adopts another technical scheme that: an electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement any one of the training methods described above and any one of the identification methods described above.
In order to solve the technical problems, the application adopts another technical scheme that: a computer readable storage medium having stored thereon program data which when executed by a processor implements any of the training methods described above and any of the identification methods described above.
The beneficial effects of the application are as follows: the re-identification model obtained through training by the training method is simple in structure, the parameter quantity is reduced, the model is lighter, and the similar difference degree is smaller than the heterogeneous difference degree through ternary loss function optimization, so that the features extracted by the trained re-identification model have low inter-class coupling and high intra-class aggregation, and whether the features are similar can be better judged. The re-recognition model structure obtained by training can be used for recognizing vehicles or pedestrians, attention features and aligned local features can be fused, the global features of the target are focused, and meanwhile, the local features of the target are focused, so that the background interference, the human posture change and the local shielding interference in the re-recognition process are restrained, and the re-recognition of partial shielding and non-aligned pedestrians can be realized.
Drawings
For a clearer description of the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the description below are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art, wherein:
FIG. 1 is a flow chart of an embodiment of a training method of a target re-recognition model according to the present application;
FIG. 2 is a schematic diagram of alignment of homogeneous sample features and anchor sample features in a training method of a target re-recognition model of the present application;
FIG. 3 is a flow chart of an embodiment of a target re-recognition model based recognition method of the present application;
FIG. 4 is a schematic diagram of a frame of an embodiment of an electronic device of the present application;
FIG. 5 is a schematic diagram of a training apparatus for a target re-recognition model according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an embodiment of an identification device based on a target re-identification model according to the present application;
fig. 7 is a schematic structural diagram of an embodiment of the device with memory function of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a training method of a target re-recognition model according to the present application.
The embodiment of the application provides a training method of a target re-identification model, wherein the target re-identification model comprises an anchor point feature extraction module, an anchor point attention module connected with the anchor point feature extraction module, an identification feature extraction module and an identification attention module connected with the identification feature extraction module. The training method comprises the following steps:
s101: and inputting the anchor training sample into an anchor feature extraction module to obtain anchor sample features, and inputting the anchor sample features into an anchor attention module to obtain anchor attention features.
The anchor point sample is a sample image containing a training target, and the training target can be of a vehicle or a pedestrian. And inputting the anchor point training sample into an anchor point characteristic extraction module to obtain anchor point sample characteristics. The anchor point characteristic extraction module can select a ResNet-50 network model, and select a fourth layer ResNet50-4layers of the ResNet-50 network model to extract anchor point sample characteristics. The anchor point sample characteristics output by ResNet50-4layers have proper receptive fields, so that the anchor point sample characteristics not only have the context information of the target, but also keep the local information and the discrimination information of the target. The ResNet50-4layers can be selected to enhance the capability of the attention module to extract attention features, reduce dependence on feature layers with little target space semantic sharing and improve operation speed.
And inputting the anchor sample characteristics into an anchor attention module to obtain anchor attention characteristics. The attention module comprises a space attention and a channel attention, respectively identifies the characteristics of the anchor point sample, gives the characteristics of the anchor point sample weight, and obtains the characteristics of the anchor point attention, wherein the higher the similarity between the characteristics of the anchor point sample and key information is, the higher the weight is. The anchor point attention module starts from the global features of the anchor point sample features, searches the key attention area of the target on the anchor point sample feature map, and suppresses the background influence irrelevant to the target.
S102: and obtaining an anchor point comprehensive characteristic according to the anchor point sample characteristic and the anchor point attention characteristic.
And combining the anchor point sample characteristics and the anchor point attention characteristics to obtain anchor point comprehensive characteristics.
S103: the similar training samples of the anchor point training samples are input into the recognition feature extraction module to obtain similar sample features, and the similar sample features are input into the recognition attention module to obtain similar attention features.
The same class training sample is a sample image having the same training target as the anchor point training sample, e.g., the training target of the anchor point sample is a vehicle or pedestrian, then the same class training sample is a sample image containing the same vehicle or pedestrian. And inputting the similar training samples of the anchor point training samples into the recognition feature extraction module to obtain similar sample features.
The recognition feature extraction module selects a ResNet-50 network model, and selects a fourth layer ResNet50-4layers of the ResNet-50 network model to extract similar sample features. The homogeneous sample characteristics output by ResNet50-4layers have proper receptive fields, so that the method not only has the context information of the target, but also keeps the local information and the discrimination information of the target.
And inputting the similar sample characteristics into the recognition attention module to obtain similar attention characteristics. The attention module comprises a spatial attention and a channel attention, and is used for respectively identifying similar sample characteristics and giving the similar sample characteristics weight to obtain the similar attention characteristics, wherein the higher the similarity between the similar sample characteristics and key information is, the higher the weight is. The similar attention module starts from global features of similar sample features, searches for a focus attention area of a target on a similar sample feature map, and suppresses background influence irrelevant to the target.
S104: and obtaining the similar comprehensive characteristics according to the similar sample characteristics and the similar attention characteristics.
In one embodiment, homogeneous comprehensive features are derived from homogeneous sample features and homogeneous attention features. The same kind of comprehensive characteristics can be directly used for calculating the same kind of difference degree.
S105: the heterogeneous training samples of the anchor point training samples are input into the recognition feature extraction module to obtain heterogeneous sample features, and the heterogeneous sample features are input into the recognition attention module to obtain heterogeneous attention features.
The heterogeneous training samples are sample images which do not contain training targets of the anchor point training samples, for example, the training targets of the anchor point training samples are vehicles or pedestrians, the heterogeneous training samples are sample images which do not contain the same vehicles or pedestrians, and the heterogeneous training samples of the anchor point training samples are input into the recognition feature extraction module to obtain heterogeneous sample features.
The recognition feature extraction module selects a ResNet-50 network model, and selects a fourth layer ResNet50-4layers of the ResNet-50 network model to extract heterogeneous sample features. The heterogeneous sample characteristics output by ResNet50-4layers have proper receptive fields, so that the heterogeneous sample characteristics not only have the context information of the target, but also keep the local information and the discrimination information of the target.
The heterogeneous sample features are input into the recognition attention module to obtain the heterogeneous attention features. The attention module comprises a spatial attention and a channel attention, and respectively identifies and gives weight to the heterogeneous sample characteristics to obtain the heterogeneous attention characteristics, wherein the higher the similarity between the heterogeneous sample characteristics and the key information is, the higher the weight is.
S106: and obtaining the heterogeneous comprehensive characteristics according to the heterogeneous sample characteristics and the heterogeneous attention characteristics.
In one embodiment, the heterogeneous integrated features are derived from the heterogeneous sample features and the heterogeneous attention features. The heterogeneous integrated features can be directly used for subsequent calculation of the heterogeneous diversity order.
It should be noted that, the steps S101 to S102, S103 to S104, and 105 to S106 may be performed simultaneously, so as to improve the calculation efficiency.
Further, the re-identification model further comprises an identification local extraction module, and the identification local extraction module is connected with the identification feature extraction module.
The training method further comprises the following steps:
the method for inputting the similar sample features into the recognition local extraction module to obtain the similar local features comprises the following steps: pooling and aligning the similar sample features and anchor point sample features to obtain similar local features.
Specifically, the characteristics of the similar samples and the characteristics of the anchor samples are respectively subjected to horizontal pooling, local characteristic diagrams are respectively output, the local characteristic diagrams of the similar samples and the local characteristic diagrams of the anchor samples are aligned, an association matrix is calculated, the shortest path is obtained through a dynamic programming method, and a dynamic programming solving formula is as follows:
referring to fig. 2, fig. 2 is a schematic diagram illustrating alignment between a similar sample feature and an anchor sample feature in the training method of the target re-recognition model according to the present application.
And (3) comparing the cosine distance of the local feature passed by the minimum path with a threshold value, and if the threshold value requirement is met, considering the local feature as an aligned local feature. For the same type of local feature aligned, the alignment vector of the local feature can be marked as 1; for non-aligned homogeneous local features, its alignment vector may be marked as 0. So that the features of the aligned areas can be preserved, and the features of the non-aligned or occluded areas can be masked.
The alignment vector for a set of length 8 region features in fig. 2 is [1,1,1,1,1,0,0,0], indicating the alignment of the first five local features.
Further, step S104 obtains homogeneous comprehensive features according to homogeneous sample features and homogeneous attention features, including: s1041: and fusing the same kind of local features and the same kind of attention features to obtain the same kind of comprehensive features.
And fusing the similar local features and the similar attention features, and carrying out weight multiplication on the similar local features and the corresponding similar attention features to obtain fused similar comprehensive features.
Inputting the heterogeneous sample characteristics into an identification local extraction module to obtain heterogeneous local characteristics, wherein the method comprises the following steps: and pooling and aligning the heterogeneous sample characteristics and the anchor point sample characteristics to obtain the heterogeneous local characteristics. The specific obtaining process of the heterogeneous local features is approximately the same as that of the homogeneous local features, and will not be described herein.
Further, step S106 obtains a heterogeneous integrated feature according to the heterogeneous sample feature and the heterogeneous attention feature, including: s1061: and fusing the heterogeneous local features and the heterogeneous attention features to obtain the heterogeneous comprehensive features.
And fusing the heterogeneous local features and the heterogeneous attention features, and carrying out weight multiplication on the heterogeneous local features and the corresponding heterogeneous attention features to obtain fused heterogeneous comprehensive features.
S107: and calculating the homogeneous degree of difference of the homogeneous comprehensive features and the anchor point comprehensive features and the heterogeneous degree of difference of the heterogeneous comprehensive features and the anchor point comprehensive features.
And calculating the homogeneous degree of difference of the homogeneous comprehensive features and the anchor point comprehensive features and the heterogeneous degree of difference of the heterogeneous comprehensive features and the anchor point comprehensive features. The similar comprehensive features and the heterogeneous comprehensive features obtained in step S104 and step S106 are two-dimensional features, need to be mapped into one-dimensional features through a full connection layer, and are used for the following operation.
In this embodiment, cosine similarity is used as the difference measure, and the calculation formula is as follows:
where a and B represent vectors of two synthesis features that need to be brought into the calculation, such as homogeneous and anchor synthesis features, or heterogeneous and anchor synthesis features, respectively. A is that i And B i Representing the components of vectors a and B, respectively.
S108: and training the re-recognition model by taking the difference degree of the same class as the target that the difference degree of the same class is smaller than the difference degree of different classes.
In an embodiment, the re-recognition model adopts a ternary loss function, takes cosine similarity as a measurement mode, and optimizes the re-recognition model through a random gradient descent algorithm so that the similar difference degree of the similar comprehensive features and the anchor point comprehensive features is smaller than the heterogeneous difference degree of the heterogeneous comprehensive features and the anchor point comprehensive features. When training is carried out for a certain number of times and the preset standard is reached, the re-identification model training is completed and can be used for identifying the target.
Wherein the ternary loss function is as follows:
Triplet_losses=||(Cosin(A,N)–Cosin(A,P)+margin)||
wherein Cosin (a, N) and Cosin (a, P) represent the dissimilarity degree and the like dissimilarity degree, i.e., the cosine similarity of the dissimilarity comprehensive feature and the anchor point comprehensive feature, and the cosine similarity of the like comprehensive feature and the anchor point comprehensive feature, respectively.
The re-identification model obtained through training by the training method is simple in structure, the parameter quantity is reduced, the model is lighter, and the similar difference degree is smaller than the heterogeneous difference degree through ternary loss function optimization, so that the features extracted by the trained re-identification model have low inter-class coupling and high intra-class aggregation, and whether the features are similar can be better judged. The re-recognition model structure obtained by training in the application fuses the attention feature and the aligned local feature, focuses on the global feature of the target and focuses on the local feature of the target, thereby inhibiting the background interference, the human posture change and the local shielding interference in the re-recognition process and being capable of re-recognizing partially shielded and unaligned pedestrians.
Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of a target re-recognition model-based recognition method according to the present application.
Still another embodiment of the present application provides an identification method based on a target re-identification model, where the target re-identification model includes: the system comprises an anchor point feature extraction module, an anchor point attention module connected with the anchor point feature extraction module, an identification feature extraction module and an identification attention module connected with the identification feature extraction module. The identification method comprises the following steps:
s201: and inputting the target image into an anchor point feature extraction module to obtain target image features, and inputting the target image features into an anchor point attention module to obtain target image attention features.
The target image is an image required to contain a target to be identified, and the target can be a vehicle or a pedestrian. And inputting the target image into an anchor point feature extraction module to obtain the target image feature. The anchor point characteristic extraction module can select a ResNet-50 network model, and select a fourth layer ResNet50-4layers of the ResNet-50 network model to extract anchor point sample characteristics. The anchor point sample characteristics output by ResNet50-4layers have proper receptive fields, so that the anchor point sample characteristics not only have the context information of the target, but also keep the local information and the discrimination information of the target. The ResNet50-4layers can be selected to enhance the capability of the attention module to extract attention features, reduce dependence on feature layers with little target space semantic sharing and improve operation speed.
And inputting the target image characteristics into an anchor point attention module to obtain the target image attention characteristics. The attention module includes spatial attention and channel attention. The anchor point attention module starts from the global features of the target image features, searches the key attention area of the target on the target image feature map, and suppresses background influence irrelevant to the target.
S202: and obtaining the comprehensive characteristics of the target image according to the characteristics of the target image and the attention characteristics of the target image.
And combining the target image characteristics and the target image attention characteristics to obtain target image comprehensive characteristics.
S203: the predicted image is input into the recognition feature extraction module to obtain predicted image features, and the predicted image features are input into the recognition attention module to obtain predicted image attention features.
The number of the predicted images is multiple, the predicted images are input into an identification feature extraction module to obtain the predicted image features, the identification feature extraction module selects a ResNet-50 network model, and a fourth layer ResNet50-4layers of the ResNet-50 network model is selected to extract the predicted image features.
The predicted image features are input into the recognition attention module to obtain predicted image attention features. The attention module includes spatial attention and channel attention.
S204: and obtaining the comprehensive characteristics of the predicted image according to the characteristics of the predicted image and the attention characteristics of the predicted image.
In one embodiment, a predicted image composite feature is derived from the predicted image feature and the predicted image attention feature. The predicted image comprehensive characteristics can be directly used for subsequent calculation of heterogeneous diversity order, so that partial shielding targets can be identified.
Further, the re-identification model further comprises an identification local extraction module, and the identification local extraction module is connected with the identification feature extraction module.
The identification method further comprises the following steps:
inputting the predicted image features into a recognition local extraction module to obtain the predicted image local features, wherein the method comprises the following steps: and pooling and aligning the predicted image features and the target image features to obtain the local features of the predicted image.
Specifically, the predicted image features and the target image features are respectively subjected to horizontal pooling, local feature images are respectively output, the local feature images of the predicted image and the local feature images of the target image are aligned, an association matrix is calculated, the shortest path is obtained through a dynamic programming method, cosine distances of the local features are calculated to find the aligned local features, and the local features of the predicted image are output. For the aligned prediction image local feature, the alignment vector of the alignment prediction image local feature can be marked as 1; for non-aligned predicted image local features, its alignment vector may be marked as 0. So that the features of the aligned areas can be preserved, and the features of the non-aligned or occluded areas can be masked.
Further, step S204 obtains a predicted image integrated feature according to the predicted image feature and the predicted image attention feature, including: s2041: and fusing the local features of the predicted image and the attention features of the predicted image to obtain the comprehensive features of the predicted image.
And fusing the local features of the predicted image and the attention features of the predicted image, and carrying out weight multiplication to obtain the comprehensive features of the fused predicted image.
S205: and calculating the similarity of the target image comprehensive characteristics and the predicted image comprehensive characteristics.
And (3) calculating the similarity of the target image comprehensive characteristics and the predicted image comprehensive characteristics, wherein the predicted image comprehensive characteristics obtained in the step S204 are two-dimensional characteristics, are mapped into one-dimensional characteristics through a full connection layer, and can directly calculate cosine similarity.
The cosine similarity calculation formula is as follows:
wherein A and B respectively represent vectors of target image integrated features and predicted image integrated features required to be brought into calculation, A i And B i Representing the components of vectors a and B, respectively.
S206: and identifying the target in the predicted image based on the similarity.
Since the number of the predicted images is plural, the similarity between each predicted image and the target image can be calculated, and the predicted images with the highest similarity are selected as the best matched predicted images, and the predicted images and the target image are considered to be the most similar pedestrians, so that the re-recognition of the non-aligned or blocked pedestrians is completed.
The target re-identification model of the embodiment is obtained by training by any one of the training methods.
The re-identification model has a simple structure, reduces the parameter quantity, is lighter, and has the advantages that the similar difference degree is smaller than the heterogeneous difference degree through the ternary loss function optimization, so that the features extracted by the re-identification model have low inter-class coupling and high intra-class aggregation, and whether the features are similar can be better judged. The recognition method disclosed by the application fuses the attention feature and the aligned local feature, focuses on the global feature of the target and focuses on the local feature of the target, so that the background interference, the human posture change and the local shielding interference in the re-recognition process are restrained, and the re-recognition of partially shielded and unaligned pedestrians can be performed.
Referring to fig. 4, fig. 4 is a schematic diagram of a frame of an electronic device according to an embodiment of the application.
A further embodiment of the present application provides an electronic device 30, including a memory 31 and a processor 32 coupled to each other, where the processor 32 is configured to execute program instructions stored in the memory 31 to implement the training method of any of the embodiments and the identification method of any of the embodiments. In one particular implementation scenario, electronic device 30 may include, but is not limited to: the microcomputer and the server, and the electronic device 30 may also include a mobile device such as a notebook computer and a tablet computer, which is not limited herein.
In particular, the processor 32 is configured to control itself and the memory 31 to implement the steps of any of the video analysis method embodiments described above, or to implement the steps of any of the model training method embodiments described above for video analysis. The processor 32 may also be referred to as a CPU (Central Processing Unit ). The processor 32 may be an integrated circuit chip having signal processing capabilities. The processor 32 may also be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 32 may be commonly implemented by an integrated circuit chip.
Referring to fig. 5, fig. 5 is a schematic diagram of a training apparatus for a target re-recognition model according to an embodiment of the application.
Yet another embodiment of the present application provides a training apparatus 40 for a target re-recognition model, which includes an anchor feature extraction module 41, an anchor attention module 42, a recognition feature extraction module 45, a recognition attention module 46, and a processing module 43 and a calculation module 44.
The processing module 43 inputs the anchor training samples to the anchor feature extraction module 41 for anchor sample features and inputs the anchor sample features to the anchor attention module 42 for anchor attention features. The processing module 43 obtains anchor point integrated features from the anchor point sample features and the anchor point attention features. The processing module 43 inputs the same kind of training samples of the anchor point training samples to the recognition feature extraction module 45 to obtain the same kind of sample features, and inputs the same kind of sample features to the recognition attention module 45 to obtain the same kind of attention features. The processing module 43 obtains homogeneous comprehensive features based on homogeneous sample features and homogeneous attention features. The processing module 43 inputs the heterogeneous training samples of the anchor training samples to the recognition feature extraction module 45 to obtain heterogeneous sample features, and inputs the heterogeneous sample features to the recognition attention module 46 to obtain heterogeneous attention features. The processing module 43 derives a heterogeneous integrated feature from the heterogeneous sample feature and the heterogeneous attention feature. The calculation module 44 calculates the like variability of the like and anchor point comprehensive features and the heterogeneous variability of the heterogeneous and anchor point comprehensive features, and trains the re-recognition model with the like variability being smaller than the heterogeneous variability.
The re-recognition model trained by the training device of the embodiment can inhibit the background interference, the human posture change and the interference of local shielding in the re-recognition process, and can re-recognize partially shielded and unaligned pedestrians.
Referring to fig. 6, fig. 6 is a schematic diagram of an embodiment of an identification device based on a target re-identification model according to the present application.
Yet another embodiment of the present application provides an identification device 50 based on a target re-identification model, which includes an anchor feature extraction module 51, an anchor attention module 52, an identification feature extraction module 55, an identification attention module 56, and a processing module 53 and a calculation module 54.
The processing module 53 inputs the target image to the anchor feature extraction module 51 to obtain target image features and inputs the target image features to the anchor attention module 52 to obtain target image attention features. The processing module 53 obtains a target image composite feature from the target image feature and the target image attention feature. The processing module 53 inputs the predicted image to the recognition feature extraction module 55 to obtain the predicted image features, and inputs the predicted image features to the recognition attention module 56 to obtain the predicted image attention features. The processing module 53 obtains predicted image composite features from the predicted image features and the predicted image attention features. The calculation module 54 calculates the similarity of the target image integrated feature and the predicted image integrated feature, and performs recognition of the target in the predicted image based on the similarity.
The recognition device of the embodiment can inhibit the background interference, the human posture change and the local shielding interference in the re-recognition process, and can re-recognize the partially shielded and unaligned pedestrians.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a device with memory function according to the present application.
Yet another embodiment of the present application provides a computer readable storage medium 60 having stored thereon program data 61, which when executed by a processor, implements the training method of any of the embodiments described above and the identification method of any of the embodiments described above. Through the scheme, the interference of background interference, human posture change and local shielding in the re-recognition process can be restrained, and the re-recognition of partially shielded and unaligned pedestrians can be performed.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.
The elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium 60. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium 60, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium 60 includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing description is only illustrative of the present application and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes or direct or indirect application in other related technical fields are included in the scope of the present application.

Claims (6)

1. A method of training a target re-recognition model, the re-recognition model comprising: the system comprises an anchor point feature extraction module, an anchor point attention module connected with the anchor point feature extraction module, an identification attention module connected with the identification feature extraction module and an identification local extraction module connected with the identification feature extraction module; the training method comprises the following steps:
inputting an anchor point training sample image into the anchor point feature extraction module to obtain anchor point sample features, and inputting the anchor point sample features into the anchor point attention module to obtain anchor point attention features;
obtaining anchor point comprehensive characteristics according to the anchor point sample characteristics and the anchor point attention characteristics;
inputting the similar training sample images of the anchor point training sample images into the identification feature extraction module to obtain similar sample features, and inputting the similar sample features into the identification attention module to obtain similar attention features;
inputting the similar sample features into the identification local extraction module to obtain similar local features;
fusing the similar local features and the similar attention features to obtain similar comprehensive features;
inputting the heterogeneous training sample images of the anchor point training sample images into the recognition feature extraction module to obtain heterogeneous sample features, and inputting the heterogeneous sample features into the recognition attention module to obtain heterogeneous attention features;
inputting the heterogeneous sample characteristics into the identification local extraction module to obtain heterogeneous local characteristics;
fusing the heterogeneous local features and the heterogeneous attention features to obtain heterogeneous comprehensive features;
calculating the homogeneous degree of difference between the homogeneous comprehensive features and the anchor point comprehensive features and the heterogeneous degree of difference between the heterogeneous comprehensive features and the anchor point comprehensive features;
training the re-recognition model by taking the similar difference degree smaller than the heterogeneous difference degree as a target;
the step of inputting the similar sample features into the identification local extraction module to obtain similar local features comprises the following steps: carrying out horizontal pooling on the similar sample characteristics and the anchor point sample characteristics, and respectively outputting local characteristic diagrams; aligning the local feature images of the similar sample features with the local feature images of the anchor point sample features, calculating an association matrix, calculating the cosine distance of the local feature images, which are passed by the minimum path, through a dynamic programming method, comparing the cosine distance with a threshold value, and obtaining the similar local features if the cosine distance meets the threshold value requirement;
the step of inputting the heterogeneous sample features into the identification local extraction module to obtain heterogeneous local features comprises the following steps: carrying out horizontal pooling on the heterogeneous sample characteristics and the anchor point sample characteristics, and respectively outputting local characteristic diagrams; and aligning the local feature map of the heterogeneous sample feature with the local feature map of the anchor point sample feature, calculating an association matrix, calculating the cosine distance of the local feature map passed by the minimum path through a dynamic programming method, comparing the cosine distance with a threshold value, and obtaining the heterogeneous local feature if the cosine distance meets the threshold value requirement.
2. The training method of claim 1, wherein the anchor feature extraction module and the identification feature extraction module are a fourth layer of a res net50 network.
3. The training method of any of claims 1-2, wherein the anchor training sample image is a sample image comprising a training target, the training target comprising a vehicle or a pedestrian; the similar training sample image is a sample image containing the training target; the heterogeneous training sample image is a sample image that does not include the training target.
4. An identification method based on a target re-identification model is characterized in that the target re-identification model comprises the following steps: the system comprises an anchor point feature extraction module, an anchor point attention module connected with the anchor point feature extraction module, an identification attention module connected with the identification feature extraction module and an identification local extraction module connected with the identification feature extraction module; the target re-identification model is trained by the training method according to any one of claims 1 to 3; the identification method comprises the following steps:
inputting a target image into the anchor point feature extraction module to obtain a target image feature, and inputting the target image feature into the anchor point attention module to obtain a target image attention feature;
obtaining a target image comprehensive feature according to the target image feature and the target image attention feature;
inputting a predicted image into the recognition feature extraction module to obtain predicted image features, and inputting the predicted image features into the recognition attention module to obtain predicted image attention features;
inputting the predicted image features into the recognition local extraction module to obtain predicted image local features;
fusing the local features of the predicted image and the attention features of the predicted image to obtain comprehensive features of the predicted image;
calculating the similarity of the target image comprehensive characteristics and the predicted image comprehensive characteristics;
identifying targets in the predicted image based on the similarity;
the step of inputting the predicted image features into the recognition local extraction module to obtain the predicted image local features includes:
and respectively pooling and aligning the predicted image features and the target image features, respectively outputting local feature images, aligning the local feature images of the predicted image and the target image, calculating an association matrix, solving a shortest path by a dynamic programming method, and calculating cosine distances of the local features to find aligned local features to obtain the predicted image local features.
5. An electronic device comprising a memory and a processor coupled to each other, the processor being configured to execute program instructions stored in the memory to implement the training method of any one of claims 1 to 3 or the identification method of claim 4.
6. A computer readable storage medium having stored thereon program data, which when executed by a processor implements the training method of any one of claims 1 to 3 or the identification method of claim 4.
CN202010351798.XA 2020-04-28 2020-04-28 Training method and recognition method of target re-recognition model, electronic equipment and device Active CN111582107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010351798.XA CN111582107B (en) 2020-04-28 2020-04-28 Training method and recognition method of target re-recognition model, electronic equipment and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010351798.XA CN111582107B (en) 2020-04-28 2020-04-28 Training method and recognition method of target re-recognition model, electronic equipment and device

Publications (2)

Publication Number Publication Date
CN111582107A CN111582107A (en) 2020-08-25
CN111582107B true CN111582107B (en) 2023-09-29

Family

ID=72124558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010351798.XA Active CN111582107B (en) 2020-04-28 2020-04-28 Training method and recognition method of target re-recognition model, electronic equipment and device

Country Status (1)

Country Link
CN (1) CN111582107B (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197584A (en) * 2018-01-12 2018-06-22 武汉大学 A kind of recognition methods again of the pedestrian based on triple deep neural network
WO2018121690A1 (en) * 2016-12-29 2018-07-05 北京市商汤科技开发有限公司 Object attribute detection method and device, neural network training method and device, and regional detection method and device
CN109145766A (en) * 2018-07-27 2019-01-04 北京旷视科技有限公司 Model training method, device, recognition methods, electronic equipment and storage medium
CN109214366A (en) * 2018-10-24 2019-01-15 北京旷视科技有限公司 Localized target recognition methods, apparatus and system again
CN109271878A (en) * 2018-08-24 2019-01-25 北京地平线机器人技术研发有限公司 Image-recognizing method, pattern recognition device and electronic equipment
CN109472248A (en) * 2018-11-22 2019-03-15 广东工业大学 A kind of pedestrian recognition methods, system and electronic equipment and storage medium again
CN109784166A (en) * 2018-12-13 2019-05-21 北京飞搜科技有限公司 The method and device that pedestrian identifies again
WO2019114726A1 (en) * 2017-12-14 2019-06-20 腾讯科技(深圳)有限公司 Image recognition method and device, electronic apparatus, and readable storage medium
WO2019128367A1 (en) * 2017-12-26 2019-07-04 广州广电运通金融电子股份有限公司 Face verification method and apparatus based on triplet loss, and computer device and storage medium
CN109977798A (en) * 2019-03-06 2019-07-05 中山大学 The exposure mask pond model training identified again for pedestrian and pedestrian's recognition methods again
CN110110642A (en) * 2019-04-29 2019-08-09 华南理工大学 A kind of pedestrian's recognition methods again based on multichannel attention feature
CN110363193A (en) * 2019-06-12 2019-10-22 北京百度网讯科技有限公司 Vehicle recognition methods, device, equipment and computer storage medium again
WO2019218826A1 (en) * 2018-05-17 2019-11-21 腾讯科技(深圳)有限公司 Image processing method and device, computer apparatus, and storage medium
WO2019231105A1 (en) * 2018-05-31 2019-12-05 한국과학기술원 Method and apparatus for learning deep learning model for ordinal classification problem by using triplet loss function
CN110717411A (en) * 2019-09-23 2020-01-21 湖北工业大学 Pedestrian re-identification method based on deep layer feature fusion
CN110765903A (en) * 2019-10-10 2020-02-07 浙江大华技术股份有限公司 Pedestrian re-identification method and device and storage medium
WO2020052513A1 (en) * 2018-09-14 2020-03-19 阿里巴巴集团控股有限公司 Image identification and pedestrian re-identification method and apparatus, and electronic and storage device
CN111027442A (en) * 2019-12-03 2020-04-17 腾讯科技(深圳)有限公司 Model training method, recognition method, device and medium for pedestrian re-recognition

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9228865B2 (en) * 2012-04-23 2016-01-05 Xenon, Inc. Multi-analyzer and multi-reference sample validation system and method

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018121690A1 (en) * 2016-12-29 2018-07-05 北京市商汤科技开发有限公司 Object attribute detection method and device, neural network training method and device, and regional detection method and device
WO2019114726A1 (en) * 2017-12-14 2019-06-20 腾讯科技(深圳)有限公司 Image recognition method and device, electronic apparatus, and readable storage medium
WO2019128367A1 (en) * 2017-12-26 2019-07-04 广州广电运通金融电子股份有限公司 Face verification method and apparatus based on triplet loss, and computer device and storage medium
CN108197584A (en) * 2018-01-12 2018-06-22 武汉大学 A kind of recognition methods again of the pedestrian based on triple deep neural network
WO2019218826A1 (en) * 2018-05-17 2019-11-21 腾讯科技(深圳)有限公司 Image processing method and device, computer apparatus, and storage medium
WO2019231105A1 (en) * 2018-05-31 2019-12-05 한국과학기술원 Method and apparatus for learning deep learning model for ordinal classification problem by using triplet loss function
CN109145766A (en) * 2018-07-27 2019-01-04 北京旷视科技有限公司 Model training method, device, recognition methods, electronic equipment and storage medium
CN109271878A (en) * 2018-08-24 2019-01-25 北京地平线机器人技术研发有限公司 Image-recognizing method, pattern recognition device and electronic equipment
WO2020052513A1 (en) * 2018-09-14 2020-03-19 阿里巴巴集团控股有限公司 Image identification and pedestrian re-identification method and apparatus, and electronic and storage device
CN109214366A (en) * 2018-10-24 2019-01-15 北京旷视科技有限公司 Localized target recognition methods, apparatus and system again
CN109472248A (en) * 2018-11-22 2019-03-15 广东工业大学 A kind of pedestrian recognition methods, system and electronic equipment and storage medium again
CN109784166A (en) * 2018-12-13 2019-05-21 北京飞搜科技有限公司 The method and device that pedestrian identifies again
CN109977798A (en) * 2019-03-06 2019-07-05 中山大学 The exposure mask pond model training identified again for pedestrian and pedestrian's recognition methods again
CN110110642A (en) * 2019-04-29 2019-08-09 华南理工大学 A kind of pedestrian's recognition methods again based on multichannel attention feature
CN110363193A (en) * 2019-06-12 2019-10-22 北京百度网讯科技有限公司 Vehicle recognition methods, device, equipment and computer storage medium again
CN110717411A (en) * 2019-09-23 2020-01-21 湖北工业大学 Pedestrian re-identification method based on deep layer feature fusion
CN110765903A (en) * 2019-10-10 2020-02-07 浙江大华技术股份有限公司 Pedestrian re-identification method and device and storage medium
CN111027442A (en) * 2019-12-03 2020-04-17 腾讯科技(深圳)有限公司 Model training method, recognition method, device and medium for pedestrian re-recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于特征融合的目标检测与重识别;翟耀;中国优秀博士学位论文全文数据库 信息科技辑;全文 *
范星.智能视频监控中的行人重识别方法研究.《中国优秀硕士学位论文全文数据库 信息科技辑》.2020,正文第1-13、19-20、25-28、109-113页. *

Also Published As

Publication number Publication date
CN111582107A (en) 2020-08-25

Similar Documents

Publication Publication Date Title
Pal et al. Deep learning in multi-object detection and tracking: state of the art
Tian et al. A dual neural network for object detection in UAV images
CN111460968B (en) Unmanned aerial vehicle identification and tracking method and device based on video
Chen et al. Accurate and efficient traffic sign detection using discriminative adaboost and support vector regression
KR101528081B1 (en) Object recognition using incremental feature extraction
CN111767882A (en) Multi-mode pedestrian detection method based on improved YOLO model
CN109214403B (en) Image recognition method, device and equipment and readable medium
CN111414888A (en) Low-resolution face recognition method, system, device and storage medium
CN112580480B (en) Hyperspectral remote sensing image classification method and device
Li et al. 3D-DETNet: a single stage video-based vehicle detector
CN107315984B (en) Pedestrian retrieval method and device
Liu et al. Related HOG features for human detection using cascaded adaboost and SVM classifiers
He et al. Aggregating local context for accurate scene text detection
CN112668532A (en) Crowd counting method based on multi-stage mixed attention network
Wang et al. Multi‐scale pedestrian detection based on self‐attention and adaptively spatial feature fusion
Zhang et al. Multi-level and multi-scale horizontal pooling network for person re-identification
CN116630630B (en) Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium
CN112257628A (en) Method, device and equipment for identifying identities of outdoor competition athletes
Mallick et al. Video retrieval using salient foreground region of motion vector based extracted keyframes and spatial pyramid matching
CN116958873A (en) Pedestrian tracking method, device, electronic equipment and readable storage medium
CN111582107B (en) Training method and recognition method of target re-recognition model, electronic equipment and device
CN115984671A (en) Model online updating method and device, electronic equipment and readable storage medium
EP4332910A1 (en) Behavior detection method, electronic device, and computer readable storage medium
Gawande et al. Scale invariant mask r-cnn for pedestrian detection
CN115063831A (en) High-performance pedestrian retrieval and re-identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant