CN111667001A

CN111667001A - Target re-identification method and device, computer equipment and storage medium

Info

Publication number: CN111667001A
Application number: CN202010504139.5A
Authority: CN
Inventors: 林春伟; 刘莉红; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2020-09-15
Anticipated expiration: 2040-06-05
Also published as: WO2021114612A1; CN111667001B

Abstract

The application relates to a target re-identification method and device based on deep learning, computer equipment and a storage medium. The method comprises the following steps: and identifying the view angles of the target object picture and each picture to be detected in the picture set to be detected, and determining the picture with the same view angle as the target object picture and the picture with different view angles from the target object picture in the picture set to be detected. And determining a first characteristic distance between the target object picture and the picture with the same view angle and a second characteristic distance between the target object picture and the picture with different view angles. And determining pictures including the target object under the same visual angle according to the first characteristic distance, and determining pictures including the target object under different visual angles according to the second characteristic distance. By adopting the method, the pictures including the same target object under the same visual angle and different visual angles can be respectively determined, the problem that the same target object cannot be accurately determined due to large difference of shooting visual angles is effectively solved, and the accuracy of target re-identification work is improved.

Description

Target re-identification method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for re-identifying a target, a computer device, and a storage medium.

Background

With the development of computer technology, object re-recognition technology has emerged. Object re-identification is used to determine whether objects with the same identity exist at other times or perspectives for a given object. The target can be a vehicle or a pedestrian, and the target re-identification is used for matching the vehicles or the pedestrians with different visual angles or time periods in the video monitoring.

In the conventional art, it is generally determined whether the same object exists by using the appearance information of the object, but since the appearance of the object is easily affected by external conditions, factors of the object itself, and the like, there is a case where the appearance information of the same object is not consistent under different environments. For example, the captured images of the same pedestrian under different lighting conditions or different postures of the same pedestrian object have large differences, and the same pedestrian object cannot be determined as the same target. For vehicles, in the case of video images with different viewing angles or different resolutions, the appearance information of the same vehicle is different, and vehicle weight recognition cannot be effectively realized. Therefore, in the conventional re-recognition method, the appearance information of the target is easily influenced, and the accuracy of target re-recognition is low in various practical application scenes.

Disclosure of Invention

In view of the above, it is necessary to provide a target re-recognition method, an apparatus, a computer device, and a storage medium capable of improving the accuracy of target re-recognition.

A method of object re-identification, the method comprising:

identifying the view angles of a target object picture and each picture to be detected in a picture set to be detected, and determining the picture with the same view angle as the target object picture and the picture with different view angles from the target object picture in the picture set to be detected; the target object picture carries a target object to be identified;

determining a first characteristic distance between the target object picture and the picture with the same view angle and a second characteristic distance between the target object picture and the picture with the different view angle;

determining pictures including the target object under the same visual angle according to the first characteristic distance;

and determining pictures including the target object under different visual angles according to the second characteristic distance.

In one embodiment, the determining the first characteristic distance between the target object picture and the picture with the same view angle and the second characteristic distance between the target object picture and the picture with the different view angle includes:

inputting the target object picture and the picture with the same view angle into a trained re-recognition model, and outputting a first characteristic distance between the picture with the same view angle and the target object picture through a first convolution sub-branch network of the re-recognition model;

and inputting the target object picture and the pictures with different view angles into a trained re-recognition model, and outputting second characteristic distances of the pictures with different view angles and the target object picture through a second convolution sub-branch network of the re-recognition model.

In one embodiment, the method for training the re-recognition model includes:

acquiring labeled sample picture sets; the sample picture set comprises a ternary picture group with positive and negative samples and samples in the same visual angle, and a ternary picture group with positive and negative samples and samples in different visual angles; wherein the positive sample and the sample have the same target object; the negative sample and the sample have different target objects;

inputting the ternary picture group with the same visual angle in the sample picture set into a first convolution sub-branch network of an initial re-identification model to obtain a first characteristic matrix;

inputting the ternary picture groups with different visual angles into a second convolution sub-branch network of the initial re-identification model to obtain a second feature matrix;

calculating a first loss function according to the first feature matrix and calculating a second loss function according to the second feature matrix; wherein an objective of the first penalty function is to maximize a distance between a positive and a negative sample of the triplet group of pictures of the same view; the second penalty function is targeted to maximize the distance between the ternary group of pictures positive and negative samples for the different view angles;

updating parameters of the initial re-identification model according to the first loss function and the second loss function;

and returning to the step of obtaining the marked sample picture sets until an iteration stop condition is reached, and generating a trained re-recognition model.

In one embodiment, the method further comprises:

selecting characteristic elements of negative samples of the same view angle from the first characteristic matrix, and selecting characteristic elements of positive samples of different view angles from the second characteristic matrix;

calculating a third loss function according to the characteristic elements of the positive samples at different visual angles and the characteristic elements of the negative samples at the same visual angle; the goal of the third loss function is to maximize the characteristic distance of negative samples of the same view angle and positive samples of different view angles;

updating parameters of the initial re-identification model according to the first loss function and the second loss function, including: and updating parameters of the initial re-identification model according to the first loss function, the second loss function and the third loss function.

In one embodiment, after the obtaining of the labeled training sample pictures, the method further includes:

inputting the marked sample picture sets into a trained image feature extraction model;

extracting high-dimensional picture features of all pictures in the sample picture set by using the trained image feature extraction model; the high-dimensional picture features correspond to feature elements of each picture;

and inputting the high-dimensional picture features of the ternary picture groups with the same view angle in the sample picture set into a first convolution sub-branch network, and inputting the high-dimensional picture features of the ternary picture groups with different view angles in the sample picture set into a second convolution sub-branch network.

In one embodiment, identifying the view angle of a target object picture and each picture to be detected in a picture set to be detected, and determining a picture with the same view angle as the target object picture and a picture with a different view angle from the target object picture in the picture set to be detected, includes:

inputting the target object picture and each picture to be detected in the picture set to be detected into a trained view angle classification model, and outputting the view angle of the target object picture and the view angle of each picture to be detected;

and according to the view angle of the target object picture, determining pictures which are the same view angle pictures as the target object picture and pictures which are different view angles from the target object picture from the picture set to be detected.

In one embodiment, the manner of training the perspective classification model includes:

acquiring a picture training set with marked visual angles;

and training the initial classification model according to the picture training set with the marked visual angles to obtain a trained visual angle classification model.

An object re-identification apparatus, the apparatus comprising:

the visual angle identification module is used for identifying a target object picture and the visual angle of each picture to be detected in the picture set to be detected, and determining the picture with the same visual angle as the target object picture and the picture with the different visual angle as the target object picture in the picture set to be detected; the target object picture carries a target object to be identified;

the characteristic distance determining module is used for determining a first characteristic distance between the target object picture and the picture with the same visual angle and a second characteristic distance between the target object picture and the picture with the different visual angles;

the picture determining module is used for determining pictures including the target object under the same visual angle according to the first characteristic distance; and determining pictures including the target object under different view angles according to the second characteristic distance.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

In the target re-identification method, the target re-identification device, the computer equipment and the storage medium, the same view angle pictures which are the same as the target object picture and different view angle pictures which are different from the target object picture in the picture set to be detected are determined by identifying the view angles of the target object picture and the pictures to be detected in the picture set to be detected. And determining the pictures including the target object under the same visual angle according to the first characteristic distance and determining the pictures including the target object under different visual angles according to the second characteristic distance by determining the first characteristic distance between the target object picture and the picture with the same visual angle and the second characteristic distance between the target object picture and the pictures with different visual angles. The method and the device have the advantages that on the basis that the pictures to be detected and the target object picture are determined to be at the same visual angle or different visual angles, the pictures comprising the same target object under the same visual angle and under different visual angles are further determined respectively, the situation that whether the same target object exists or not is determined by adopting the appearance information of a single visual angle under different visual angles or different scenes is avoided, the problem that the same target object cannot be determined accurately due to the fact that the shooting visual angle difference is large in the target re-identification process is solved effectively, and the accuracy of target re-identification work is further improved.

Drawings

FIG. 1 is a diagram of an exemplary implementation of a method for object re-identification;

FIG. 2 is a schematic flow chart diagram illustrating a method for re-identifying an object in one embodiment;

FIG. 3 is a schematic flow chart illustrating training of a re-recognition model according to one embodiment;

FIG. 4 is a schematic flow chart of training a re-recognition model according to another embodiment;

FIG. 5 is a block diagram illustrating an overall architecture for training a re-recognition model according to an embodiment;

FIG. 6 is a block diagram of an object re-identification apparatus in one embodiment;

FIG. 7 is a block diagram showing the structure of an object re-recognition apparatus in another embodiment;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The target re-identification method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 and the server 104 communicate via a network. The terminal 102 identifies the target object picture and the view angle of each picture to be detected in the picture set to be detected, and determines the picture with the same view angle as the target object picture and the picture with different view angles from the target object picture in the picture set to be detected. The picture set to be detected may be stored locally at the terminal 102, or may be obtained from cloud storage of the server 104, and the target object picture carries the target object to be identified. And then determining a first characteristic distance between the target object picture and the picture with the same visual angle and a second characteristic distance between the target object picture and the picture with different visual angles. And further determining pictures including the target object under the same view angle according to the first characteristic distance, and further determining pictures including the target object under different view angles according to the second characteristic distance. The terminal 102 may be, but is not limited to, various personal computers, notebook computers and tablet computers, and the server 104 may be implemented by an independent server or a server cluster composed of a plurality of servers.

In an embodiment, as shown in fig. 2, a method for re-identifying an object is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:

step S202, identifying the view angles of the target object picture and each picture to be detected in the picture set to be detected, and determining the picture with the same view angle as the target object picture and the picture with different view angles from the target object picture in the picture set to be detected.

Specifically, the target object picture and each picture to be detected in the picture set to be detected are input into the trained view classification model, and the view of the target object picture and the view of each picture to be detected are output. And then according to the view angle of the target object picture, determining the picture with the same view angle as the target object picture from the picture set to be detected, and determining the picture with the different view angle from the target object picture from the picture set to be detected. The target object may be a vehicle or a pedestrian, and the target object picture may be a picture carrying the target object such as the vehicle or the pedestrian.

Further, taking the target object as a vehicle as an example, the corresponding target object picture is a picture carrying the target vehicle, and each picture to be detected in the picture set to be detected is a picture carrying the vehicle, and may be a picture carrying the target vehicle or a picture carrying different vehicles. The target object picture and the view angles of the pictures to be detected comprise a vehicle head view angle and a vehicle tail view angle, and the view angles of the target object picture and the pictures to be detected are identified and obtained through a trained view angle classification model. And further determining a picture with the same view angle as the target object picture from the picture set to be detected according to the view angle of the target object picture, namely determining a picture with the same view angle as the vehicle head view angle from the picture set to be detected when the view angle of the target vehicle in the target object picture is the vehicle head view angle, or determining a picture with the same view angle as the vehicle tail view angle from the picture set to be detected when the target object picture is the vehicle tail view angle.

Similarly, according to the view angle of the target object picture, a picture with a different view angle from the target object picture is further determined from the picture set to be detected, that is, when the view angle of the target vehicle in the target object picture is the view angle of the vehicle head, a picture with a different view angle from the view angle of the vehicle tail is determined from the picture set to be detected, or when the target object picture is the view angle of the vehicle tail, a picture with a different view angle from the view angle of the vehicle head is determined from the picture set to be detected.

In one embodiment, the step of training the perspective classification model comprises:

acquiring a picture training set with marked visual angles;

and training the initial classification model according to the image training set with the marked visual angles to obtain a trained visual angle classification model.

Specifically, taking a target object as a vehicle as an example, a picture training set composed of pictures of the vehicles with marked visual angles is obtained, wherein the marked visual angles indicate that the pictures of the vehicles are marked as a head visual angle or a tail visual angle. And then training the initial classification model according to the picture training set with the marked visual angles to obtain a well-trained visual angle classification model. The initial classification model may adopt a deep learning classification model, such as ResNet, GoogleNet, and the like.

Step S204, determining a first characteristic distance between the target object picture and the picture with the same view angle, and a second characteristic distance between the target object picture and the picture with different view angles.

Specifically, a target object picture and a picture with the same view angle are input into a trained re-recognition model, and then a first characteristic distance between the picture with the same view angle and the target object picture is output through a first convolution sub-branch network of the re-recognition model. Similarly, the target object picture and the different view angle pictures are input into the trained re-recognition model, and then the second characteristic distances of the different view angle pictures and the target object picture are output through a second convolution sub-branch network of the re-recognition model.

The trained re-recognition model comprises a first convolution sub-branch network and a second convolution sub-branch network which are identical in structure, and the first convolution sub-branch network is used for calculating to obtain first characteristic distances of the target object picture and the multiple pictures with the same visual angle. Similarly, the second convolution sub-branch network is used for calculating and obtaining second characteristic distances of the target object picture and the multiple pictures with different view angles.

Step S206, determining the picture including the target object under the same visual angle according to the first characteristic distance.

Specifically, a plurality of images with the same view angle and a first characteristic distance of a target object image can be obtained respectively by utilizing a first convolution sub-branch network of a trained re-recognition model, a first re-recognition sequence of the images with the same view angle relative to the target object image can be obtained by sequencing the plurality of calculated first characteristic distances, and the images including the target object under the same view angle can be determined based on the first re-recognition sequence.

Further, taking a target object as a target vehicle as an example, the first characteristic distance between the picture including the same target vehicle and the picture of the target object at the same view angle is obtained by using the trained first convolution sub-branch network of the re-recognition model, and the first characteristic distance between the picture including different vehicles and the picture of the target object at the same view angle can also be obtained. And sequencing the obtained plurality of first characteristic distances according to the size of the first characteristic distance to obtain a first re-recognition sequence, wherein in the first re-recognition sequence, the higher the sequencing is, the higher the matching degree with the target object picture is, and then the pictures including the same target vehicle under the same visual angle can be determined based on the first re-recognition sequence.

Step S208, determining pictures including the target object under different view angles according to the second characteristic distance.

Specifically, a plurality of images with different viewing angles and a second characteristic distance of the target object image can be obtained respectively by using a second convolution sub-branch network of the trained re-recognition model, a second re-recognition sequence of the images with different viewing angles relative to the target object image can be obtained by sequencing the plurality of calculated second characteristic distances, and the images including the target object under different viewing angles can be determined based on the second re-recognition sequence.

Further, taking the target object as the target vehicle as an example, the second convolutional sub-branch network of the trained re-recognition model is utilized to obtain the second characteristic distances of the images including the same target vehicle and the target object image at different viewing angles, and also obtain the second characteristic distances of the images including different vehicles and the target object image at different viewing angles. And sequencing the obtained plurality of second characteristic distances according to the second characteristic distances to obtain a second re-recognition sequence, wherein the higher the sequencing is, the higher the matching degree with the target object picture is, and determining pictures comprising the same target vehicle under different visual angles based on the second re-recognition sequence.

In the target re-identification method, the same view angle pictures with the same view angle as the target object picture and different view angle pictures with different view angles from the target object picture in the picture set to be detected are determined by identifying the view angles of the target object picture and the pictures to be detected in the picture set to be detected. And determining the pictures including the target object under the same visual angle according to the first characteristic distance and determining the pictures including the target object under different visual angles according to the second characteristic distance by determining the first characteristic distance between the target object picture and the picture with the same visual angle and the second characteristic distance between the target object picture and the pictures with different visual angles. The method and the device have the advantages that on the basis that the pictures to be detected and the target object picture are determined to be at the same visual angle or different visual angles, the pictures comprising the same target object under the same visual angle and under different visual angles are further determined respectively, the situation that whether the same target object exists or not is determined by adopting the appearance information of a single visual angle under different visual angles or different scenes is avoided, the problem that the same target object cannot be determined accurately due to the fact that the shooting visual angle difference is large in the target re-identification process is solved effectively, and the accuracy of target re-identification work is further improved.

In one embodiment, as shown in FIG. 3, the way to train the re-recognition model includes the following steps:

step S302, obtaining each labeled sample picture set.

The sample picture set comprises a ternary picture group with positive and negative samples and samples in the same view angle, and a ternary picture group with positive and negative samples and samples in different view angles. Wherein the positive sample and the sample have the same target object, and the negative sample and the sample have different target object. Each labeled sample picture set represents a labeled view, i.e., a view labeled as a head view or a tail view, and a picture labeled as a positive or negative sample.

Specifically, taking the target object as the target vehicle as an example, the sample is a picture including the target vehicle, that is, the picture may be a picture of the target vehicle from a view of a front of the target vehicle, or a picture of the target vehicle from a view of a rear of the target vehicle. Correspondingly, the sample picture set comprises a ternary picture group with positive and negative samples and samples both being in a vehicle head view angle or in a vehicle tail view angle, and the ternary picture group with the positive and negative samples and the samples being in different view angles.

Further, each picture included in the sample picture set includes the following situations:

1) and if the sample is the target vehicle picture with the view angle of the head of the target vehicle, the picture of the same target vehicle with the view angle of the head of the target vehicle is a positive sample, and the picture of different vehicles with the view angle of the head of the target vehicle is a negative sample, so that the ternary picture group with the positive and negative samples and the sample with the same view angle can be obtained.

2) The sample is a target vehicle picture with the target vehicle being in the vehicle tail view angle, the picture of the same target vehicle carrying the vehicle tail view angle is a positive sample, the picture of different vehicles carrying the vehicle tail view angle is a negative sample, and a ternary picture group with the positive and negative samples being in the same view angle can be obtained.

3) The sample is a target vehicle picture with a view angle of the front end of the target vehicle, the picture of the same target vehicle with the view angle of the rear end of the vehicle is a positive sample, the picture of different vehicles with the view angle of the front end of the vehicle is a negative sample, and the three-element picture group with the positive and negative samples in different view angles can be obtained.

4) The sample is a target vehicle picture with the target vehicle being at the rear view angle, the picture of the same sample vehicle carrying the front view angle is a positive sample, and the picture of different vehicles carrying the rear view angle is a negative sample, so that the ternary picture group with the positive and negative samples being different view angles can be obtained.

Step S304, inputting the ternary picture group with the same view angle in the sample picture set into a first convolution sub-branch network of the initial re-identification model to obtain a first feature matrix.

Specifically, a ternary picture group with the same view angle in a sample picture set is input into an initial re-identification model, and a corresponding first feature matrix is obtained through a first convolution sub-branch network output of the initial re-identification model.

Further, the positive and negative samples obtained by taking the sample as the target vehicle picture of the target vehicle as the front view angle, the positive sample of the same target vehicle carrying the front view angle and the negative sample of the different vehicles carrying the front view angle, or the sample as the target vehicle picture of the target vehicle as the rear view angle, the positive sample of the same target vehicle carrying the rear view angle and the negative sample of the different vehicles carrying the rear view angle, the positive and negative samples obtained are the ternary picture group of the same view angle, the first convolution sub-branch network of the initial re-identification model is input, and the first feature matrix composed of the first features of the pictures under the same view angle is output.

Step S306, inputting the ternary picture groups with different visual angles into a second convolution sub-branch network of the initial re-identification model to obtain a second feature matrix.

Further, the positive and negative samples obtained by taking the sample as the target vehicle picture of the target vehicle with the view angle of the vehicle head, the positive sample of the same target vehicle with the view angle of the vehicle tail and the negative sample of different vehicles with the view angle of the vehicle head can be ternary picture groups with different view angles, or the sample is the target vehicle picture of the target vehicle with the view angle of the vehicle tail, the positive sample of the same sample vehicle with the view angle of the vehicle head and the negative sample of different vehicles with the view angle of the vehicle tail, the obtained positive and negative samples are ternary picture groups with different view angles, the ternary picture groups are input into a second convolution sub-branch network of the initial re-identification model, and a corresponding second characteristic matrix is obtained through output.

In one embodiment, after obtaining each labeled training sample picture, the method further includes:

and inputting each marked sample picture set into a trained image feature extraction model, and extracting the high-dimensional picture features of each picture in the sample picture set by using the trained image feature extraction model.

The high-dimensional picture features correspond to feature elements of each picture, and the corresponding first feature matrix can be obtained by inputting the high-dimensional picture features of the ternary picture group with the same view angle in the sample picture set into the first convolution sub-branch network. Similarly, the corresponding second feature matrix can be obtained by inputting the high-dimensional picture features of the ternary picture group of different view angles in the sample picture set into the second convolution sub-branch network.

Further, taking the target object as a vehicle as an example, the method for training the image feature extraction model includes:

acquiring a marked training picture set; wherein the labeled attributes comprise two classification labels of a vehicle and a non-vehicle; and training the initial convolution network model by using the marked training picture set to generate a trained image feature extraction model.

In step S308, a first loss function is calculated according to the first feature matrix, and a second loss function is calculated according to the second feature matrix.

The target of the first loss function enables the distance between the positive sample and the negative sample of the ternary picture group in the same view angle to be maximum, and the target of the second loss function enables the distance between the positive sample and the negative sample of the ternary picture group in different view angles to be maximum.

For a triplet (x, x)⁺,x^-) Where x is the anchor sample, x⁺For positive samples, a sample from the same class as x, x^-Is a negative sample, meaning a sample from a different class than x, a positive pair P can be defined⁺＝(x,x⁺) And negative pairing P^-＝(x,x^-) And α represents the minimum distance separation between the artificially added positive and negative pairs, the triplet penalty can be defined as shown in equation (1) below:

L_tri(x,x⁺,x^-)＝max(D(P⁺)-D(P^-)+α,0)； (1)

specifically, euler distances between feature elements of samples of the same view angle and feature elements of the positive samples in the first feature matrix and euler distances between feature elements of samples of the same view angle and feature elements of the negative samples are calculated, and the first loss function at the same view angle is calculated using the following formula (2):

L_s＝max(D_s(P⁺)-D_s(P^-)+α,0)； (2)

wherein L is_sRepresenting a first loss function, D_s(P⁺) Euler distance, D, between a feature element of a sample representing the same view and a feature element of a positive sample_s(P^-) The euler distance between the characteristic elements of the samples representing the same perspective and the characteristic elements of the negative sample, α represents the minimum distance separation between the artificially added positive and negative pairs.

Further, the euler distance is calculated using the following formula (3):

D(P)＝D(x_i,x_j)＝||f(x_i)-f(x_j)||₂； (3)

where X denotes a data set, and P ═ X_i,x_j),x_i,x_j∈ X denotes the picture pair, f denotes the model that extracts features from the original picture, and D denotes the Euler distance between features.

Likewise, euler distances between the feature elements of the samples of different view angles and the feature elements of the positive samples in the second feature matrix and euler distances between the feature elements of the samples of different view angles and the feature elements of the negative samples are calculated, and a second loss function at the same view angle is calculated using the following formula (4):

L_d＝max(D_d(P⁺)-D_d(P^-)+α,0)； (4)

wherein L is_dRepresenting a second loss function, D_d(P⁺) Euler distances, D, between characteristic elements of samples representing different perspectives and characteristic elements of positive samples_d(P^-) Euler distances between feature elements of samples representing different perspectives and feature elements of negative samples, α represents the minimum distance separation between artificially added positive and negative pairs.

In one embodiment, the method further includes a manner of calculating a third loss function, specifically including:

selecting characteristic elements of negative samples of the same view angle from the first characteristic matrix, and selecting characteristic elements of positive samples of different view angles from the second characteristic matrix; and calculating a third loss function according to the characteristic elements of the positive samples of different visual angles and the characteristic elements of the negative samples of the same visual angle.

Wherein the objective of the third penalty function is to maximize the characteristic distance between negative samples of the same view angle and positive samples of different view angles.

Specifically, euler distances between feature elements of samples of the same view angle and feature elements of negative samples in the first feature matrix and euler distances between feature elements of samples of different view angles and feature elements of positive samples are calculated, and a second loss function at the same view angle is calculated using the following formula (5):

L_cross＝max(D_d(P⁺)-D_s(P^-)+α,0)； (5)

wherein L is_crossRepresenting a third loss function, D_d(P⁺) Euler distances, D, between characteristic elements of samples representing different perspectives and characteristic elements of positive samples_s(P^-) The euler distance between the characteristic elements of the samples representing the same perspective and the characteristic elements of the negative sample, α represents the minimum distance separation between the artificially added positive and negative pairs.

In step S310, parameters of the initial re-recognition model are updated according to the first loss function and the second loss function.

Specifically, parameters of the initial re-recognition model are updated according to the first loss function and the second loss function, and updating of the initial re-recognition model is achieved.

Further, updating parameters of the initial re-recognition model according to the first loss function and the second loss function includes:

and updating the parameters of the initial re-recognition model according to the first loss function, the second loss function and the third loss function, so as to realize the updating of the initial re-recognition model.

And step S312, returning to the step of obtaining the labeled sample picture sets until an iteration stop condition is reached, and generating a trained re-recognition model.

Specifically, the following steps are repeatedly performed: acquiring labeled sample picture sets; inputting a ternary picture group with the same visual angle in a sample picture set into a first convolution sub-branch network of an initial re-identification model to obtain a first characteristic matrix; inputting the ternary picture groups with different visual angles into a second convolution sub-branch network of the initial re-identification model to obtain a second characteristic matrix; calculating a first loss function according to the first feature matrix and calculating a second loss function according to the second feature matrix; and updating parameters of the initial re-recognition model according to the first loss function and the second loss function until an iteration stop condition is reached, and generating a trained re-recognition model.

The iteration stopping condition can be that the first loss function and the second loss function obtained through continuous calculation tend to stable values and do not drop greatly any more, and then the initial re-recognition model is converged, and a trained re-recognition model is generated.

And further, a third loss function is included, the loss function of the initial re-recognition model is obtained by calculating the sum of the first loss function, the second loss function and the third loss function, and whether iteration stopping adjustment is achieved is determined by judging whether the value of the continuously calculated loss function tends to be stable or not. And when the value of the loss function tends to be stable, the initial re-recognition model is converged, and a trained re-recognition model is generated.

In this embodiment, a first feature matrix is obtained by obtaining each labeled sample picture set, inputting the ternary picture groups of the same view angle in the sample picture set into a first convolution sub-branch network of the initial re-identification model, and inputting the ternary picture groups of different view angles into a second convolution sub-branch network of the initial re-identification model, so as to obtain a second feature matrix. And calculating a first loss function according to the first characteristic matrix, calculating a second loss function according to the second characteristic matrix, further updating parameters of the initial re-recognition model according to the first loss function and the second loss function until an iteration stop condition is reached, ending the updating of the initial re-recognition model, and generating the trained re-recognition model. The method and the device have the advantages that the triple pictures with the same visual angle and the triple pictures with different visual angles are utilized to carry out targeted training on different branch networks of the initial re-recognition model, so that the obtained re-recognition model can realize re-recognition of the pictures to be detected under different visual angles or the same visual angle, the problem that the same target object cannot be accurately recognized due to large angle difference caused by the fact that the pictures to be detected are re-recognized in a single recognition mode is avoided, and the accuracy of target re-recognition can be further improved.

In one embodiment, as shown in fig. 4, another training method for re-recognition model is provided, which includes the following steps:

1) and acquiring each labeled sample picture set.

The sample picture set comprises a ternary picture group with positive and negative samples and samples in the same view angle, and a ternary picture group with positive and negative samples and samples in different view angles. Wherein the positive sample and the sample have the same target object, and the negative sample and the sample have different target object.

2) And inputting the ternary picture group with the same visual angle in the sample picture set into a first convolution sub-branch network of the initial re-identification model to obtain a first feature matrix.

3) And inputting the ternary picture groups with different visual angles into a second convolution sub-branch network of the initial re-identification model to obtain a second characteristic matrix.

4) A first loss function is calculated from the first feature matrix.

Wherein the objective of the first penalty function is to maximize the distance between the positive and negative samples of the triplet set for the same view.

5) A second loss function is calculated from the second feature matrix.

Wherein the second penalty function is targeted to maximize the distance between the positive and negative examples of the triplet set for different views.

6) And selecting the characteristic elements of the negative samples of the same view angle from the first characteristic matrix, and selecting the characteristic elements of the positive samples of different view angles from the second characteristic matrix.

7) And calculating a third loss function according to the characteristic elements of the positive samples of different visual angles and the characteristic elements of the negative samples of the same visual angle.

8) Parameters of the initial re-identification model are updated according to the first loss function, the second loss function and the third loss function.

9) And returning to the step of obtaining the marked sample picture sets until an iteration stop condition is reached, and generating a trained re-recognition model.

In the above training method of the re-recognition model, a first loss function is calculated according to the first feature matrix, a second loss function is calculated according to the second feature matrix, and a third loss function is calculated according to the feature elements of the negative samples at the same view angle in the first feature matrix and the feature elements of the positive samples at different view angles in the second feature matrix. And further updating parameters of the initial re-recognition model according to the first loss function, the second loss function and the third loss function until an iteration stop condition is reached, ending the updating of the initial re-recognition model, and generating a trained re-recognition model. The method and the device have the advantages that the triple pictures with the same visual angle and the triple pictures with different visual angles are utilized to carry out targeted training on different branch networks of the initial re-recognition model, so that the obtained re-recognition model can realize re-recognition of the pictures to be detected under different visual angles or under the same visual angle, the matching effect of the third loss function on the same target object under different visual angles can be further improved, the problem that the same target object cannot be accurately recognized due to large angle difference can be avoided, and the target re-recognition accuracy can be further improved.

In one embodiment, as shown in fig. 5, there is provided an overall architecture for training the re-recognition model, and referring to fig. 5, the overall architecture for training the re-recognition model includes:

1) the view angle classification module 502 is configured to recognize a view angle of the target object picture and each picture to be detected in the picture set to be detected by using a view angle classifier, that is, a view angle classification model, and determine a picture with the same view angle as the target object picture and a picture with a different view angle from the target object picture in the picture set to be detected.

2) And the image feature extraction module 504 is configured to extract high-dimensional image features of each image in the sample image set by using a trained image feature extraction model, that is, a trained public convolutional neural network.

3) And the loss function calculation module 506 is configured to input the high-dimensional picture features of the ternary picture groups with the same view angle in the sample picture set into the first convolution sub-branch network to obtain a first feature matrix, and input the high-dimensional picture features of the ternary picture groups with different view angles in the sample picture set into the second convolution sub-branch network to obtain a second feature matrix.

Further, the euler distances between the feature elements of the samples with the same view angle in the first feature matrix and the feature elements of the positive samples and the euler distances between the feature elements of the samples with the same view angle and the feature elements of the negative samples are calculated, so that a first distance matrix corresponding to the first convolution sub-branch network is obtained, and therefore a first loss function is obtained.

Similarly, further calculating the euler distances between the feature elements of the samples at different view angles in the second feature matrix and the feature elements of the positive samples, and the euler distances between the feature elements of the samples at different view angles and the feature elements of the negative samples, to obtain a second distance matrix corresponding to the second convolutional sub-branching network, thereby obtaining a second loss function. By calculating the euler distances between the feature elements of the samples at the same view angle in the first feature matrix and the feature elements of the negative samples, and the euler distances between the feature elements of the samples at different view angles and the feature elements of the positive samples, a third distance matrix can be obtained, and a third loss function can be obtained.

And determining whether an iteration stop condition is reached by calculating the sum of the first loss function, the second loss function and the third loss function and according to the sum of the first loss function, the second loss function and the third loss function obtained by calculation, namely when the sum of the first loss function, the second loss function and the third loss function tends to a stable value, the iteration stop condition is reached, updating of the initial re-recognition model is stopped, and the trained re-recognition model is generated.

It should be understood that although the various steps in the flow charts of fig. 2-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 6, there is provided an object re-recognition apparatus including: a view identification module 602, a feature distance determination module 604, and a picture determination module 606, wherein:

the view angle identification module 602 is configured to identify a view angle of the target object picture and each picture to be detected in the picture set to be detected, and determine a picture with the same view angle as the target object picture and a picture with a different view angle from the target object picture in the picture set to be detected.

The feature distance determining module 604 is configured to determine a first feature distance between the target object picture and the picture with the same view angle, and a second feature distance between the target object picture and the picture with a different view angle.

A picture determining module 606, configured to determine, according to the first characteristic distance, a picture including the target object under the same view angle; and determining pictures including the target object under different view angles according to the second characteristic distance.

In the target re-identification device, the same view angle pictures with the same view angle as the target object picture and different view angle pictures with different view angles from the target object picture in the picture set to be detected are determined by identifying the view angles of the target object picture and the pictures to be detected in the picture set to be detected. And determining the pictures including the target object under the same visual angle according to the first characteristic distance and determining the pictures including the target object under different visual angles according to the second characteristic distance by determining the first characteristic distance between the target object picture and the picture with the same visual angle and the second characteristic distance between the target object picture and the pictures with different visual angles. The method and the device have the advantages that on the basis that the pictures to be detected and the target object picture are determined to be at the same visual angle or different visual angles, the pictures comprising the same target object under the same visual angle and under different visual angles are further determined respectively, the situation that whether the same target object exists or not is determined by adopting the appearance information of a single visual angle under different visual angles or different scenes is avoided, the problem that the same target object cannot be determined accurately due to the fact that the shooting visual angle difference is large in the target re-identification process is solved effectively, and the accuracy of target re-identification work is further improved.

In an embodiment, as shown in fig. 7, there is provided an object re-identification apparatus, specifically including:

a picture sample set obtaining module 702, configured to obtain each labeled sample picture set; the sample picture set comprises a ternary picture group with positive and negative samples and samples in the same visual angle, and a ternary picture group with positive and negative samples and samples in different visual angles; wherein the positive sample and the sample have the same target object; negative examples present different target objects than examples.

The first feature matrix generation module 704 is configured to input the ternary image group with the same view angle in the sample image set into a first convolution sub-branch network of the initial re-identification model, so as to obtain a first feature matrix.

And a second feature matrix generation module 706, configured to input the ternary picture groups from different viewing angles into a second convolution sub-branch network of the initial re-identification model, so as to obtain a second feature matrix.

A first loss function calculating module 708, configured to calculate a first loss function according to the first feature matrix, where an object of the first loss function is to maximize a distance between a positive sample and a negative sample of the triplet group of pictures from the same view.

And the second loss function calculating module 710 calculates a second loss function according to the second feature matrix, wherein the target of the second loss function maximizes the distance between the positive sample and the negative sample of the ternary picture group at different view angles.

And the feature element extraction module 712 is configured to select feature elements of negative samples from the same view angle from the first feature matrix, and select feature elements of positive samples from different view angles from the second feature matrix.

A third loss function calculating module 714, configured to calculate a third loss function according to the feature elements of the positive samples at different view angles and the feature elements of the negative samples at the same view angle; the goal of the third loss function is to maximize the characteristic distance between negative samples of the same view and positive samples of different views.

And an initial re-recognition model updating module 716, configured to update parameters of the initial re-recognition model according to the first loss function, the second loss function, and the third loss function.

And the re-recognition model generation module 718 is configured to return to the step of obtaining the labeled sample picture sets until an iteration stop condition is reached, and generate a trained re-recognition model.

In the above object re-recognition apparatus, a first loss function is calculated according to the first feature matrix, a second loss function is calculated according to the second feature matrix, and a third loss function is calculated according to feature elements of negative samples at the same view angle in the first feature matrix and feature elements of positive samples at different view angles in the second feature matrix. And further updating parameters of the initial re-recognition model according to the first loss function, the second loss function and the third loss function until an iteration stop condition is reached, ending the updating of the initial re-recognition model, and generating a trained re-recognition model. The method and the device have the advantages that the triple pictures with the same visual angle and the triple pictures with different visual angles are utilized to carry out targeted training on different branch networks of the initial re-recognition model, so that the obtained re-recognition model can realize re-recognition of the pictures to be detected under different visual angles or under the same visual angle, the matching effect of the third loss function on the same target object under different visual angles can be further improved, the problem that the same target object cannot be accurately recognized due to large angle difference can be avoided, and the target re-recognition accuracy can be further improved.

For the specific definition of the target re-identification device, reference may be made to the above definition of the target re-identification method, which is not described herein again. The modules in the target re-identification device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of object re-recognition. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

identifying the view angles of the target object picture and each picture to be detected in the picture set to be detected, and determining the picture with the same view angle as the target object picture and the picture with different view angles from the target object picture in the picture set to be detected; the target object picture carries a target object to be identified;

determining a first characteristic distance between a target object picture and pictures with the same visual angle and a second characteristic distance between the target object picture and pictures with different visual angles;

In one embodiment, the processor, when executing the computer program, further performs the steps of:

inputting the target object picture and the same visual angle picture into a trained re-identification model, and outputting a first characteristic distance between the same visual angle picture and the target object picture through a first convolution sub-branch network of the re-identification model;

and inputting the target object picture and the pictures with different visual angles into the trained re-recognition model, and outputting second characteristic distances of the pictures with different visual angles and the target object picture through a second convolution sub-branch network of the re-recognition model.

inputting a ternary picture group with the same visual angle in a sample picture set into a first convolution sub-branch network of an initial re-identification model to obtain a first characteristic matrix;

inputting the ternary picture groups with different visual angles into a second convolution sub-branch network of the initial re-identification model to obtain a second characteristic matrix;

calculating a first loss function according to the first feature matrix and calculating a second loss function according to the second feature matrix; wherein, the target of the first loss function makes the distance between the positive sample and the negative sample of the ternary picture group with the same view angle maximum; the target of the second loss function enables the distance between the positive sample and the negative sample of the ternary picture group in different view angles to be maximum;

calculating a third loss function according to the characteristic elements of the positive samples at different visual angles and the characteristic elements of the negative samples at the same visual angle; the goal of the third loss function is to maximize the characteristic distance between negative samples of the same view and positive samples of different views;

updating parameters of the initial re-identification model according to the first loss function and the second loss function, comprising: parameters of the initial re-identification model are updated according to the first loss function, the second loss function and the third loss function.

inputting each marked sample picture set into a trained image feature extraction model;

the high-dimensional picture features of the ternary picture groups with the same view angle in the sample picture set are input into the first convolution sub-branch network, and the high-dimensional picture features of the ternary picture groups with different view angles in the sample picture set are input into the second convolution sub-branch network.

and according to the view angle of the target object picture, determining picture pictures which are the same as the target object picture and different view angle pictures which are different from the target object picture from the picture set to be detected.

acquiring a picture training set with marked visual angles;

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

acquiring a picture training set with marked visual angles;

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for re-identifying an object, the method comprising:

2. The method according to claim 1, wherein the determining a first characteristic distance between the target object picture and the same view picture and a second characteristic distance between the target object picture and the different view picture comprises:

3. The method of claim 2, wherein training the re-recognition model comprises:

4. The method of claim 3, further comprising:

updating parameters of the initial re-identification model according to the first loss function and the second loss function, including: and updating the parameters of the initial re-identification model according to the first loss function, the second loss function and the third loss function.

5. The method according to claim 3, further comprising, after the obtaining of each labeled training sample picture:

6. The method according to claim 1, wherein identifying the view angle of each picture to be detected in the target object picture and the picture set to be detected, and determining the same view angle picture in the picture set to be detected which is the same view angle as the target object picture and the different view angle pictures which are different view angles from the target object picture comprises:

inputting the target object picture and each picture to be detected in the picture set to be detected into a trained view angle classification model, and outputting a view angle of the target object picture and a view angle of each picture to be detected;

7. The method of claim 6, wherein the manner of training the perspective classification model comprises:

acquiring a picture training set with marked visual angles;

8. An object re-identification apparatus, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.