CN111667001B

CN111667001B - Target re-identification method, device, computer equipment and storage medium

Info

Publication number: CN111667001B
Application number: CN202010504139.5A
Authority: CN
Inventors: 林春伟; 刘莉红; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2023-08-04
Anticipated expiration: 2040-06-05
Also published as: WO2021114612A1; CN111667001A

Abstract

The application relates to a target re-identification method, device, computer equipment and storage medium based on deep learning. The method comprises the following steps: and identifying the visual angles of the target object picture and each picture to be detected in the picture set to be detected, and determining the pictures with the same visual angles and different visual angles, wherein the pictures with the same visual angles and the pictures with different visual angles are in the picture set to be detected. And determining a first characteristic distance between the target object picture and the same view angle picture, and a second characteristic distance between the target object picture and different view angle pictures. And determining the pictures comprising the target object under the same view angle according to the first characteristic distance, and determining the pictures comprising the target object under different view angles according to the second characteristic distance. By adopting the method, the pictures of the same target object under the same visual angle and under different visual angles can be respectively determined, the problem that the same target object cannot be accurately determined due to large difference of shooting visual angles is effectively solved, and the accuracy of target re-identification work is improved.

Description

Target re-identification method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a target re-identification method, apparatus, computer device, and storage medium.

Background

With the development of computer technology, a target re-recognition technology has emerged. Target re-identification is used to determine whether there are targets with the same identity at other times or perspectives for a given target. The target can be a vehicle or a pedestrian, and the target re-identification is used for realizing matching of the vehicles or the pedestrians with different visual angles or time periods in the video monitoring.

In the conventional art, whether the same object exists is generally determined by using the appearance information of the object, but since the appearance of the object is easily affected by external conditions, factors of the object itself, and the like, there is a case in which the appearance information of the same object is not uniform in different environments. For example, the images of the same pedestrian captured by the same pedestrian subject under different lighting conditions or different postures have large differences, and cannot be determined as the same target. For vehicles, under video images with different visual angles or different resolutions, the appearance information of the same vehicle is different, and the vehicle re-identification cannot be effectively realized. Therefore, the conventional re-identification method is easy to influence the appearance information of the target, and the accuracy of target re-identification is low in various practical application scenes.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a target re-recognition method, apparatus, computer device, and storage medium capable of improving the accuracy of target re-recognition.

A method of target re-identification, the method comprising:

identifying the visual angles of each picture to be detected in a target object picture and a picture set to be detected, and determining the pictures in the picture set to be detected, which are the same visual angle as the target object picture and the pictures in different visual angles as the target object picture; the target object picture carries a target object to be identified;

determining a first characteristic distance between the target object picture and the same view angle picture, and a second characteristic distance between the target object picture and the different view angle picture;

determining a picture comprising the target object under the same view angle according to the first characteristic distance;

and determining pictures comprising the target object under different view angles according to the second characteristic distance.

In one embodiment, the determining the first feature distance of the target object picture and the same view picture, and the second feature distance of the target object picture and the different view picture includes:

Inputting the target object picture and the same visual angle picture into a trained re-recognition model, and outputting a first characteristic distance between the same visual angle picture and the target object picture through a first convolution sub-branch network of the re-recognition model;

and inputting the target object picture and the different view angle pictures into a trained re-recognition model, and outputting second characteristic distances of the different view angle pictures and the target object picture through a second convolution sub-branch network of the re-recognition model.

In one embodiment, the means for training the re-recognition model comprises:

acquiring marked sample picture sets; the sample picture set comprises a ternary picture group with positive and negative samples and samples in the same visual angle, and a ternary picture group with positive and negative samples and samples in different visual angles; wherein the positive sample and the sample have the same target object; the negative sample and the sample have different target objects;

inputting a ternary image group with the same visual angle in the sample image set into a first convolution sub-branch network of an initial re-identification model to obtain a first feature matrix;

inputting the ternary image groups with different visual angles into a second convolution sub-branch network of the initial re-identification model to obtain a second feature matrix;

Calculating a first loss function according to the first feature matrix, and calculating a second loss function according to the second feature matrix; the target of the first loss function enables the distance between the positive sample and the negative sample of the ternary picture group with the same visual angle to be maximum; the target of the second loss function enables the distance between the positive sample and the negative sample of the ternary picture group with different visual angles to be maximum;

updating parameters of the initial re-identification model according to the first loss function and the second loss function;

and returning to the step of obtaining the marked sample picture sets until the iteration stop condition is reached, and generating a trained re-recognition model.

In one embodiment, the method further comprises:

selecting characteristic elements of negative samples with the same visual angle from the first characteristic matrix, and selecting characteristic elements of positive samples with different visual angles from the second characteristic matrix;

calculating a third loss function according to the characteristic elements of the positive samples at different angles and the characteristic elements of the negative samples at the same angle; the objective of the third loss function is to maximize the feature distance of negative samples at the same view and positive samples at different views;

Updating parameters of the initial re-identification model according to the first loss function and the second loss function, including: and updating parameters of an initial re-identification model according to the first loss function, the second loss function and the third loss function.

In one embodiment, after the capturing the noted training sample pictures, the method further includes:

inputting the marked sample picture sets into a trained image feature extraction model;

extracting high-dimensional picture features of each picture in the sample picture set by using the trained image feature extraction model; the high-dimensional picture features correspond to feature elements of each picture;

and inputting the high-dimensional picture characteristics of the three-dimensional picture groups with the same view angle in the sample picture set into a first convolution sub-branch network, and inputting the high-dimensional picture characteristics of the three-dimensional picture groups with different view angles in the sample picture set into a second convolution sub-branch network.

In one embodiment, identifying the view angle of each picture to be detected in a target object picture and a picture set to be detected, determining a picture in the picture set to be detected, which is the same view angle as the target object picture, and a picture in a different view angle from the target object picture, includes:

Inputting the target object picture and each picture to be detected in the picture set to be detected into a trained visual angle classification model, and outputting the visual angle of the target object picture and the visual angle of each picture to be detected;

and determining that the picture is the same view angle picture as the target object picture and different view angle pictures with different view angles from the target object picture from the picture set to be detected according to the view angle of the target object picture.

In one embodiment, the manner of training the view classification model includes:

acquiring a picture training set marked with a visual angle;

and training the initial classification model according to the picture training set marked with the visual angles to obtain a trained visual angle classification model.

A target re-identification apparatus, the apparatus comprising:

the visual angle identification module is used for identifying visual angles of each picture to be detected in a target object picture and a picture set to be detected, and determining pictures with the same visual angle in the picture set to be detected, which are the same visual angle as the target object picture, and pictures with different visual angles, which are different visual angles from the target object picture; the target object picture carries a target object to be identified;

A feature distance determining module, configured to determine a first feature distance between the target object picture and the same view picture, and a second feature distance between the target object picture and the different view picture;

the picture determining module is used for determining pictures including the target object under the same view angle according to the first characteristic distance; and the image processing unit is also used for determining the images comprising the target object under different view angles according to the second characteristic distance.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In the target re-identification method, the target re-identification device, the computer equipment and the storage medium, the target object picture and the view angles of all pictures to be detected in the picture set to be detected are identified, and the pictures with the same view angle as the target object picture and the pictures with different view angles as the target object picture in the picture set to be detected are determined. And further, determining the picture comprising the target object under the same view angle according to the first characteristic distance and determining the picture comprising the target object under different view angles according to the second characteristic distance by determining the first characteristic distance between the target object picture and the picture of the same view angle and the second characteristic distance between the target object picture and the picture of different view angles. On the basis of determining that each picture to be detected and the target object picture are in the same view angle or different view angles, the pictures comprising the same target object under the same view angle and under different view angles are further determined respectively, whether the same target object exists or not is avoided by adopting appearance information of a single view angle under different view angles or different scenes, the problem that the same target object cannot be accurately determined due to large shooting view angle difference in the target re-recognition process is effectively solved, and the accuracy of target re-recognition work is further improved.

Drawings

FIG. 1 is an application environment diagram of a target re-recognition method in one embodiment;

FIG. 2 is a flow chart of a target re-identification method in one embodiment;

FIG. 3 is a flow diagram of training a re-recognition model in one embodiment;

FIG. 4 is a flow chart of training a re-recognition model in another embodiment;

FIG. 5 is a schematic diagram of an overall architecture of training a re-recognition model in one embodiment;

FIG. 6 is a block diagram of an apparatus for re-identifying an object in one embodiment;

FIG. 7 is a block diagram of an apparatus for re-identifying an object in another embodiment;

fig. 8 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The target re-identification method provided by the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 determines the same view angle picture in the to-be-detected picture set, which is the same view angle as the target object picture, and the different view angle picture in the to-be-detected picture set, which is the different view angle from the target object picture, by identifying the view angles of the target object picture and each to-be-detected picture in the to-be-detected picture set. The set of pictures to be detected may be stored locally in the terminal 102, or may be obtained from cloud storage of the server 104, where the target object picture carries the target object to be identified. And further by determining a first characteristic distance of the target object picture and the same view picture and a second characteristic distance of the target object picture and the different view pictures. The pictures comprising the target object at the same viewing angle are further determined according to the first characteristic distance, and the pictures comprising the target object at different viewing angles are further determined according to the second characteristic distance. The terminal 102 may be, but not limited to, various personal computers, notebook computers, and tablet computers, and the server 104 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a target re-identification method is provided, and the method is applied to the terminal in fig. 1 for illustration, and includes the following steps:

step S202, identifying the view angles of the target object picture and each picture to be detected in the picture set to be detected, and determining the pictures with the same view angles as the target object picture and the pictures with different view angles as the target object picture in the picture set to be detected.

Specifically, the visual angle of the target object picture and the visual angle of each picture to be detected are output by inputting the target object picture and each picture to be detected in the picture set to be detected into a trained visual angle classification model. And then according to the view angle of the target object picture, determining the picture with the same view angle as the target object picture from the picture set to be detected, and determining the picture with different view angles from the picture set to be detected, wherein the picture with different view angles is different from the target object picture. The target object may be a vehicle or a pedestrian, and the target object picture may be a picture carrying the target object such as the vehicle or the pedestrian.

Further, taking the target object as a vehicle as an example, the corresponding target object picture is a picture carrying the target vehicle, and each picture to be detected in the picture set to be detected is a picture carrying the vehicle, which can be a picture carrying the target vehicle or a picture carrying different vehicles. The visual angles of the target object picture and each picture to be detected comprise a head visual angle and a tail visual angle, and the visual angles of the target object picture and each picture to be detected are obtained through recognition by a trained visual angle classification model. And further determining the same view angle picture with the same view angle as the target object picture from the to-be-detected picture set according to the view angle of the target object picture, namely determining the same view angle picture with the same view angle as the vehicle head view angle from the to-be-detected picture set when the view angle of the target vehicle in the target object picture is the vehicle head view angle, or determining the same view angle picture with the same view angle as the vehicle tail view angle from the to-be-detected picture set when the target object picture is the vehicle tail view angle.

Similarly, according to the view angle of the target object picture, further determining the pictures with different view angles from the to-be-detected picture set, namely determining the pictures with different view angles from the to-be-detected picture set to the vehicle tail view angle when the view angle of the target vehicle in the target object picture is the vehicle head view angle, or determining the pictures with different view angles from the to-be-detected picture set to the vehicle head view angle when the target object picture is the vehicle tail view angle.

In one embodiment, the step of training a perspective classification model includes:

acquiring a picture training set marked with a visual angle;

Specifically, taking a target object as a vehicle as an example, acquiring a picture training set composed of all vehicle pictures with marked viewing angles, wherein the marked viewing angles represent that the vehicle pictures are marked as a head viewing angle or a tail viewing angle. And training the initial classification model according to the picture training set marked with the visual angles, so as to obtain a trained visual angle classification model. The initial classification model may be a deep learning classification model, such as ResNet, googleNet.

Step S204, determining a first feature distance between the target object picture and the same view picture, and a second feature distance between the target object picture and a different view picture.

Specifically, the target object picture and the same view angle picture are input into a trained re-recognition model, and then the first characteristic distance between the same view angle picture and the target object picture is output through a first convolution sub-branch network of the re-recognition model. Similarly, the target object picture and the different view angle pictures are input into the trained re-recognition model, and then the second characteristic distances of the different view angle pictures and the target object picture are output through a second convolution sub-branch network of the re-recognition model.

The trained re-recognition model comprises a first convolution sub-branch network and a second convolution sub-branch network which are identical in structure, wherein the first convolution sub-branch network is used for calculating and obtaining first characteristic distances of a target object picture and a plurality of pictures with the same visual angles. Likewise, the second convolution sub-branch network is used for calculating a second characteristic distance between the target object picture and the plurality of different view angle pictures.

Step S206, determining the pictures including the target object under the same view angle according to the first characteristic distance.

Specifically, a first convolution sub-branch network of the trained re-recognition model is utilized to obtain first feature distances of a plurality of images with the same view angle and images of the target object respectively, a first re-recognition sequence of the images with the same view angle relative to the images of the target object can be obtained by sequencing the calculated first feature distances, and the images including the target object under the same view angle can be determined based on the first re-recognition sequence.

Further, taking the target object as the target vehicle as an example, the first convolution sub-branch network of the trained re-recognition model is utilized to obtain the first feature distance including the picture of the same target vehicle and the picture of the target object under the same view angle, and the first feature distance including the picture of different vehicles and the picture of the target object under the same view angle can also be obtained. And ordering the obtained first feature distances according to the first feature distances to obtain a first re-identification sequence, wherein in the first re-identification sequence, the earlier the ordering is, the higher the matching degree with the target object picture is, and the picture comprising the same target vehicle under the same visual angle can be determined based on the first re-identification sequence.

Step S208, determining the pictures comprising the target object under different view angles according to the second characteristic distance.

Specifically, the second convolution sub-branch network of the trained re-recognition model is utilized to obtain a plurality of second feature distances of different view angle pictures and target object pictures respectively, the calculated second feature distances are sequenced to obtain a second re-recognition sequence of the different view angle pictures relative to the target object pictures, and the pictures comprising the target object under different view angles can be determined based on the second re-recognition sequence.

Further, taking the target object as the target vehicle as an example, using the second convolution sub-branch network of the trained re-recognition model to obtain the second characteristic distances of the pictures including the same target vehicle and the target object pictures under different view angles, and obtaining the second characteristic distances of the pictures including different vehicles and the target object pictures under different view angles. And ordering the obtained plurality of second feature distances according to the size of the second feature distances to obtain a second recognition sequence, wherein in the second recognition sequence, the earlier the ordering is, the higher the matching degree with the target object picture is, and the pictures of the same target vehicle under different view angles can be determined based on the second recognition sequence.

In the target re-identification method, the target object picture and the view angles of the pictures to be detected in the picture set to be detected are identified, and the pictures with the same view angles and the pictures with different view angles are determined. And further, determining the picture comprising the target object under the same view angle according to the first characteristic distance and determining the picture comprising the target object under different view angles according to the second characteristic distance by determining the first characteristic distance between the target object picture and the picture of the same view angle and the second characteristic distance between the target object picture and the picture of different view angles. On the basis of determining that each picture to be detected and the target object picture are in the same view angle or different view angles, the pictures comprising the same target object under the same view angle and under different view angles are further determined respectively, whether the same target object exists or not is avoided by adopting appearance information of a single view angle under different view angles or different scenes, the problem that the same target object cannot be accurately determined due to large shooting view angle difference in the target re-recognition process is effectively solved, and the accuracy of target re-recognition work is further improved.

In one embodiment, as shown in FIG. 3, the manner in which the re-recognition model is trained includes the steps of:

step S302, each marked sample picture set is obtained.

The sample picture set comprises a ternary picture group with positive and negative samples and samples in the same visual angle, and a ternary picture group with positive and negative samples and samples in different visual angles. Wherein the positive sample and the sample have the same target object, and the negative sample and the sample have different target objects. Each marked sample picture set represents a marked view, i.e. a picture marked as a head view or a tail view, and marked as a positive and negative sample.

Specifically, taking a target object as an example of a target vehicle, the sample is a picture including the target vehicle, that is, a target vehicle picture in which the target vehicle is at a head view angle or a target vehicle picture in which the target vehicle is at a tail view angle. Correspondingly, the sample picture set comprises a ternary picture group with positive and negative samples and samples being head view angles or tail view angles, and a ternary picture group with positive and negative samples and samples being different view angles.

Further, each picture included in the sample picture set includes the following cases:

1) The sample is a target vehicle picture of which the target vehicle is at the head angle, the picture of the same target vehicle carrying the head angle is a positive sample, and the pictures of different vehicles carrying the head angle are negative samples, so that a ternary picture group of which the positive and negative samples and the sample are at the same angle can be obtained.

2) The sample is a target vehicle picture of a vehicle tail view angle, the picture of the same target vehicle carrying the vehicle tail view angle is a positive sample, and the pictures of different vehicles carrying the vehicle tail view angle are negative samples, so that a ternary picture group with the positive and negative samples being the same view angle can be obtained.

3) The sample is a target vehicle picture of which the target vehicle is at the head angle, the picture of the same target vehicle carrying the tail angle is a positive sample, and the pictures of different vehicles carrying the head angle are negative samples, so that a ternary picture group of which the positive and negative samples are at different angles can be obtained.

4) The sample is a target vehicle picture of which the target vehicle is at the rear view angle, the picture of the same sample vehicle carrying the front view angle is a positive sample, and the pictures of different vehicles carrying the rear view angle are negative samples, so that a ternary picture group of which the positive and negative samples are at different view angles can be obtained.

Step S304, inputting the ternary image groups with the same view angle in the sample image set into a first convolution sub-branch network of the initial re-identification model to obtain a first feature matrix.

Specifically, a ternary image group with the same visual angle in a sample image set is input into an initial re-identification model, and a corresponding first feature matrix is obtained through the output of a first convolution sub-branch network of the initial re-identification model.

Further, the positive and negative samples obtained by taking the sample as the target vehicle picture of the head view angle, the positive sample of the same target vehicle carrying the head view angle and the negative sample of different vehicles carrying the head view angle can be input into the first convolution sub-branch network of the initial re-identification model, and the first feature matrix composed of the first features of each picture under the same view angle is output.

And step S306, inputting the ternary image groups with different visual angles into a second convolution sub-branch network of the initial re-identification model to obtain a second feature matrix.

Further, the positive and negative samples obtained by taking the target vehicle as the target vehicle picture of the head view angle, the positive sample of the same target vehicle carrying the tail view angle and the negative sample of different vehicles carrying the head view angle can be used as the ternary picture group of different view angles, or the positive and negative samples obtained by taking the target vehicle as the target vehicle picture of the tail view angle, the positive sample of the same sample vehicle carrying the head view angle and the negative sample of different vehicles carrying the tail view angle are used as the ternary picture group of different view angles, the second convolution sub-branch network of the initial re-identification model is input, and the corresponding second feature matrix is obtained.

In one embodiment, after obtaining the noted training sample pictures, the method further comprises:

and inputting the marked sample picture sets into a trained image feature extraction model, and extracting high-dimensional picture features of each picture in the sample picture sets by using the trained image feature extraction model.

The high-dimensional picture features correspond to feature elements of each picture, and the corresponding first feature matrix can be obtained by inputting the high-dimensional picture features of the ternary picture groups with the same visual angles in the sample picture set into the first convolution sub-branch network. Similarly, by inputting the high-dimensional picture features of the three-dimensional picture groups of different view angles in the sample picture set into the second convolution sub-branch network, a corresponding second feature matrix can be obtained.

Further, taking a target object as a vehicle as an example, the method for training the image feature extraction model includes:

acquiring a marked training picture set; the marked attributes comprise two kinds of marks of vehicles and non-vehicles; and training the initial convolutional network model by using the marked training picture set to generate a trained image feature extraction model.

Step S308, a first loss function is calculated according to the first feature matrix, and a second loss function is calculated according to the second feature matrix.

The first loss function targets the maximum distance between the positive and negative samples of the three-dimensional image groups at the same visual angle, and the second loss function targets the maximum distance between the positive and negative samples of the three-dimensional image groups at different visual angles.

For a triplet (x, x ⁺ ,x ^- ) Where x is the anchor sample, x ⁺ Is a positive sample, representing samples from the same class as x, x ^- Is a negative sample, representing samples from a different class than x, a positive pairing P can be defined ⁺ ＝(x,x ⁺ ) And negative pair P ^- ＝(x,x ^- ) Alpha represents the minimum distance separation between the artificially added positive and negative pairs, then the triplet loss can be defined as shown in equation (1) below:

L _tri (x,x ⁺ ,x ^- )＝max(D(P ⁺ )-D(P ^- )+α,0)； (1)

specifically, the euler distances between the characteristic elements of the samples at the same viewing angle and the characteristic elements of the positive samples in the first characteristic matrix and the euler distances between the characteristic elements of the samples at the same viewing angle and the characteristic elements of the negative samples are calculated, and the first loss function at the same viewing angle is calculated by adopting the following formula (2):

L _s ＝max(D _s (P ⁺ )-D _s (P ^- )+α,0)； (2)

wherein L is _s Represents a first loss function, D _s (P ⁺ ) Euler distance, D, between a characteristic element representing a sample at the same viewing angle and a characteristic element of a positive sample _s (P ^- ) The euler distance between the feature element of the sample and the feature element of the negative sample representing the same view angle, α represents the minimum distance separation between the artificially added positive and negative pairs.

Further, the euler distance is calculated by the following formula (3):

D(P)＝D(x _i ,x _j )＝||f(x _i )-f(x _j )|| ₂ ； (3)

where X represents a dataset, p= (X _i ,x _j ),x _i ,x _j E X represents the picture pair, f represents the model that extracts features from the original picture, and D represents the euler distance between features.

Likewise, the euler distances between the characteristic elements of the samples at different angles of view and the characteristic elements of the positive samples in the second characteristic matrix, and the euler distances between the characteristic elements of the samples at different angles of view and the characteristic elements of the negative samples are calculated, and the following formula (4) is adopted to calculate the second loss function at the same angle of view:

L _d ＝max(D _d (P ⁺ )-D _d (P ^- )+α,0)； (4)

wherein L is _d Represents a second loss function, D _d (P ⁺ ) Euler distance between characteristic elements of samples representing different perspectives and characteristic elements of positive samples, D _d (P ^- ) Representing the Euler distance between the feature elements of the sample and the feature elements of the negative sample at different perspectives, α representing the minimum distance separation between the artificially added positive and negative pairs.

In one embodiment, the method further comprises a mode of calculating a third loss function, specifically comprising:

selecting characteristic elements of negative samples with the same visual angle from the first characteristic matrix, and selecting characteristic elements of positive samples with different visual angles from the second characteristic matrix; a third loss function is calculated from the characteristic elements of the positive samples at different perspectives and the characteristic elements of the negative samples at the same perspectives.

Wherein the third loss function aims at maximizing the feature distance of negative samples at the same view and positive samples at different views.

Specifically, the euler distances between the characteristic elements of the samples at the same viewing angle and the characteristic elements of the negative samples in the first characteristic matrix, and the euler distances between the characteristic elements of the samples at different viewing angles and the characteristic elements of the positive samples are calculated, and the following formula (5) is adopted to calculate the second loss function at the same viewing angle:

L _cross ＝max(D _d (P ⁺ )-D _s (P ^- )+α,0)； (5)

wherein L is _cross Represents a third loss function, D _d (P ⁺ ) Euler distance between characteristic elements of samples representing different perspectives and characteristic elements of positive samples, D _s (P ^- ) The euler distance between the feature element of the sample and the feature element of the negative sample representing the same view angle, α represents the minimum distance separation between the artificially added positive and negative pairs.

Step S310, updating parameters of the initial re-identification model according to the first loss function and the second loss function.

Specifically, according to the first loss function and the second loss function, the parameters of the initial re-identification model are updated, and the update of the initial re-identification model is realized.

Further, updating parameters of the initial re-recognition model according to the first loss function and the second loss function includes:

And updating parameters of the initial re-identification model according to the first loss function, the second loss function and the third loss function, and updating the initial re-identification model.

Step S312, returning to the step of obtaining the marked sample picture sets until the iteration stop condition is reached, and generating a trained re-recognition model.

Specifically, the following steps are repeatedly performed: acquiring marked sample picture sets; inputting a ternary image group with the same visual angle in a sample image set into a first convolution sub-branch network of an initial re-identification model to obtain a first feature matrix; inputting the ternary image groups with different visual angles into a second convolution sub-branch network of the initial re-identification model to obtain a second feature matrix; calculating a first loss function according to the first feature matrix, and calculating a second loss function according to the second feature matrix; and updating parameters of the initial re-recognition model according to the first loss function and the second loss function until the iteration stop condition is reached, and generating a trained re-recognition model.

The iteration stop condition may be that the first loss function and the second loss function obtained by continuous calculation tend to be stable values, and no significant decrease occurs any more, so that the initial re-recognition model is converged, and a trained re-recognition model is generated.

Further, the method further comprises a third loss function, wherein the loss function of the initial re-identification model is obtained by calculating the sum of the first loss function, the second loss function and the third loss function, and whether iteration stop adjustment is achieved is determined by judging whether the value of the loss function obtained by continuous calculation tends to be stable. When the value of the loss function tends to be stable, the initial re-recognition model is converged, and a trained re-recognition model is generated.

In this embodiment, a first feature matrix is obtained by acquiring each labeled sample picture set, inputting a ternary picture group with the same view angle in the sample picture set into a first convolution sub-branch network of an initial re-recognition model, and inputting a ternary picture group with different view angles into a second convolution sub-branch network of the initial re-recognition model, so as to obtain a second feature matrix. And calculating a first loss function according to the first feature matrix, calculating a second loss function according to the second feature matrix, and further updating parameters of the initial re-recognition model according to the first loss function and the second loss function until an iteration stop condition is reached, ending the updating of the initial re-recognition model, and generating a trained re-recognition model. The method and the device realize targeted training of different branch networks of the initial re-identification model by utilizing the triple pictures of the same view angle and the triple pictures of different view angles, so that the obtained re-identification model can realize re-identification of the pictures to be detected under different view angles or under the same view angle, the problem that the same target object cannot be accurately identified due to large angle difference caused by re-identification of the pictures to be detected by adopting a single identification mode is avoided, and the target re-identification accuracy can be further improved.

In one embodiment, as shown in fig. 4, another training method of the re-recognition model is provided, which includes the following steps:

1) And obtaining each marked sample picture set.

The sample picture set comprises a ternary picture group with positive and negative samples and samples in the same visual angle, and a ternary picture group with positive and negative samples and samples in different visual angles. Wherein the positive sample and the sample have the same target object, and the negative sample and the sample have different target objects.

2) And inputting the ternary image groups with the same view angle in the sample image set into a first convolution sub-branch network of the initial re-identification model to obtain a first feature matrix.

3) And inputting the ternary image groups with different visual angles into a second convolution sub-branch network of the initial re-identification model to obtain a second feature matrix.

4) A first loss function is calculated from the first feature matrix.

Wherein the objective of the first loss function maximizes the distance between positive and negative samples of the triplet at the same view angle.

5) And calculating a second loss function according to the second feature matrix.

Wherein the objective of the second loss function maximizes the distance between positive and negative samples of the triplet at different perspectives.

6) The characteristic elements of the negative samples of the same view angle are selected from the first characteristic matrix, and the characteristic elements of the positive samples of different view angles are selected from the second characteristic matrix.

7) A third loss function is calculated from the characteristic elements of the positive samples at different perspectives and the characteristic elements of the negative samples at the same perspectives.

8) And updating the parameters of the initial re-identification model according to the first loss function, the second loss function and the third loss function.

9) And returning to the step of obtaining the marked sample picture sets until the iteration stop condition is reached, and generating a trained re-recognition model.

In the training mode of the re-recognition model, the first loss function is calculated according to the first feature matrix, the second loss function is calculated according to the second feature matrix, and the third loss function is calculated according to the feature elements of the negative samples with the same visual angle in the first feature matrix and the feature elements of the positive samples with different visual angles in the second feature matrix. And further updating parameters of the initial re-recognition model according to the first loss function, the second loss function and the third loss function until the iteration stopping condition is reached, ending the updating of the initial re-recognition model, and generating a trained re-recognition model. The method and the device realize targeted training of different branch networks of the initial re-identification model by utilizing the triple pictures of the same view angle and the triple pictures of different view angles, so that the obtained re-identification model can realize re-identification of the pictures to be detected under different view angles or under the same view angle, the matching effect of the third loss function on the same target object under different view angles can be further improved, the problem that the same target object cannot be accurately identified due to large angle difference can be avoided, and the target re-identification accuracy can be further improved.

In one embodiment, as shown in fig. 5, an overall architecture for training a re-recognition model is provided, and referring to fig. 5, the overall architecture for training a re-recognition model includes:

1) The view classifying module 502 is configured to implement view recognition of each to-be-detected picture in the target object picture and the to-be-detected picture set by using a view classifier, that is, a view classifying model, and determine the same view picture in the to-be-detected picture set, which is the same view as the target object picture, and different view pictures, which are different views from the target object picture.

2) The image feature extraction module 504 is configured to extract high-dimensional image features of each image in the sample image set by using a trained image feature extraction model, i.e., a trained public convolutional neural network.

3) The loss function calculation module 506 is configured to input the high-dimensional image features of the three-dimensional image groups with the same view angle in the sample image set into the first convolution sub-branch network to obtain a first feature matrix, and input the high-dimensional image features of the three-dimensional image groups with different view angles in the sample image set into the second convolution sub-branch network to obtain a second feature matrix.

Further calculating Euler distances between characteristic elements of samples with the same view angle and characteristic elements of positive samples in the first characteristic matrix and Euler distances between characteristic elements of samples with the same view angle and characteristic elements of negative samples to obtain a first distance matrix corresponding to the first convolution sub-branch network, and thus obtaining a first loss function.

And similarly, further calculating Euler distances between the characteristic elements of the samples at different angles and the characteristic elements of the positive samples in the second characteristic matrix and Euler distances between the characteristic elements of the samples at different angles and the characteristic elements of the negative samples to obtain a second distance matrix corresponding to the second convolution sub-branch network, thereby obtaining a second loss function. And calculating Euler distances between the characteristic elements of the samples with the same view angle and the characteristic elements of the negative samples in the first characteristic matrix and Euler distances between the characteristic elements of the samples with different view angles and the characteristic elements of the positive samples, so as to obtain a third distance matrix, thereby obtaining a third loss function.

And determining whether an iteration stopping condition is reached or not according to the calculated sum of the first loss function, the second loss function and the third loss function, namely, when the sum of the first loss function, the second loss function and the third loss function tends to a stable value, the iteration stopping condition is reached, updating of the initial re-recognition model is stopped, and a trained re-recognition model is generated.

It should be understood that, although the steps in the flowcharts of fig. 2-3 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-3 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.

In one embodiment, as shown in fig. 6, there is provided a target re-recognition apparatus including: a view identification module 602, a feature distance determination module 604, and a picture determination module 606, wherein:

the view angle identifying module 602 is configured to identify a view angle of each picture to be detected in the target object picture and the picture set to be detected, and determine a picture with the same view angle as the target object picture in the picture set to be detected, and a picture with a different view angle from the target object picture.

The feature distance determining module 604 is configured to determine a first feature distance between the target object picture and the same view picture, and a second feature distance between the target object picture and a different view picture.

A picture determining module 606, configured to determine a picture including the target object under the same viewing angle according to the first feature distance; and the method is also used for determining pictures comprising the target object under different view angles according to the second characteristic distance.

In the target re-identification device, the target object picture and the view angles of the pictures to be detected in the picture set to be detected are identified, and the pictures with the same view angles and the pictures with different view angles are determined. And further, determining the picture comprising the target object under the same view angle according to the first characteristic distance and determining the picture comprising the target object under different view angles according to the second characteristic distance by determining the first characteristic distance between the target object picture and the picture of the same view angle and the second characteristic distance between the target object picture and the picture of different view angles. On the basis of determining that each picture to be detected and the target object picture are in the same view angle or different view angles, the pictures comprising the same target object under the same view angle and under different view angles are further determined respectively, whether the same target object exists or not is avoided by adopting appearance information of a single view angle under different view angles or different scenes, the problem that the same target object cannot be accurately determined due to large shooting view angle difference in the target re-recognition process is effectively solved, and the accuracy of target re-recognition work is further improved.

In one embodiment, as shown in fig. 7, there is provided a target re-recognition apparatus, specifically including:

a picture sample set obtaining module 702, configured to obtain labeled picture sets of samples; the sample picture set comprises ternary picture groups with positive and negative samples and samples in the same visual angle, and ternary picture groups with positive and negative samples and samples in different visual angles; wherein the positive sample and the sample have the same target object; the negative sample is present with a different target object than the sample.

The first feature matrix generating module 704 is configured to input a ternary image group with the same view angle in the sample image set into a first convolution sub-branch network of the initial re-recognition model, so as to obtain a first feature matrix.

The second feature matrix generating module 706 is configured to input the ternary image groups from different perspectives into a second convolution sub-branch network of the initial re-recognition model to obtain a second feature matrix.

A first loss function calculation module 708, configured to calculate a first loss function according to the first feature matrix, where the target of the first loss function maximizes the distance between the positive sample and the negative sample of the ternary image group at the same viewing angle.

The second loss function calculation module 710 calculates a second loss function according to the second feature matrix, and the second loss function targets to maximize the distances between the positive and negative samples of the ternary group of pictures at different viewing angles.

The feature element extraction module 712 is configured to select feature elements of the negative samples of the same view angle from the first feature matrix, and select feature elements of the positive samples of different view angles from the second feature matrix.

A third loss function calculation module 714, configured to calculate a third loss function according to the characteristic elements of the positive samples at different angles of view and the characteristic elements of the negative samples at the same angle of view; the goal of the third loss function is to maximize the feature distance for negative samples at the same view and positive samples at different views.

An initial re-recognition model update module 716 is configured to update parameters of the initial re-recognition model according to the first loss function, the second loss function, and the third loss function.

The re-recognition model generating module 718 is configured to return to the step of obtaining the labeled sample image sets until reaching the iteration stop condition, and generate a trained re-recognition model.

In the target re-identification device, the first loss function is calculated according to the first feature matrix, the second loss function is calculated according to the second feature matrix, and the third loss function is calculated according to the feature elements of the negative samples with the same visual angle in the first feature matrix and the feature elements of the positive samples with different visual angles in the second feature matrix. And further updating parameters of the initial re-recognition model according to the first loss function, the second loss function and the third loss function until the iteration stopping condition is reached, ending the updating of the initial re-recognition model, and generating a trained re-recognition model. The method and the device realize targeted training of different branch networks of the initial re-identification model by utilizing the triple pictures of the same view angle and the triple pictures of different view angles, so that the obtained re-identification model can realize re-identification of the pictures to be detected under different view angles or under the same view angle, the matching effect of the third loss function on the same target object under different view angles can be further improved, the problem that the same target object cannot be accurately identified due to large angle difference can be avoided, and the target re-identification accuracy can be further improved.

For specific limitations of the target re-recognition apparatus, reference may be made to the above limitations of the target re-recognition method, and no further description is given here. The respective modules in the above-described target re-recognition apparatus may be realized in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 8. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a target re-recognition method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 8 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

identifying the visual angles of the target object picture and each picture to be detected in the picture set to be detected, and determining the pictures with the same visual angles and different visual angles, wherein the pictures with the same visual angles and the pictures with different visual angles are in the picture set to be detected; the target object picture carries a target object to be identified;

determining first feature distances of a target object picture and the same view angle picture, and second feature distances of the target object picture and different view angle pictures;

And determining pictures comprising the target object at different view angles according to the second characteristic distance.

In one embodiment, the processor when executing the computer program further performs the steps of:

and inputting the target object picture and the pictures with different visual angles into a trained re-recognition model, and outputting second characteristic distances of the pictures with different visual angles and the target object picture through a second convolution sub-branch network of the re-recognition model.

acquiring marked sample picture sets; the sample picture set comprises ternary picture groups with positive and negative samples and samples in the same visual angle, and ternary picture groups with positive and negative samples and samples in different visual angles; wherein the positive sample and the sample have the same target object; the negative sample and the sample have different target objects;

inputting a ternary image group with the same visual angle in a sample image set into a first convolution sub-branch network of an initial re-identification model to obtain a first feature matrix;

calculating a first loss function according to the first feature matrix, and calculating a second loss function according to the second feature matrix; the target of the first loss function enables the distance between the positive sample and the negative sample of the ternary picture group with the same visual angle to be maximum; the target of the second loss function enables the distances between positive samples and negative samples of the ternary picture groups at different visual angles to be maximum;

calculating a third loss function according to the characteristic elements of the positive samples at different angles and the characteristic elements of the negative samples at the same angle; the goal of the third loss function is to maximize the feature distance of negative samples at the same view and positive samples at different views;

Updating parameters of the initial re-identification model according to the first loss function and the second loss function, including: and updating the parameters of the initial re-identification model according to the first loss function, the second loss function and the third loss function.

inputting each marked sample picture set into a trained image feature extraction model;

the high-dimensional picture characteristics of the ternary picture groups with the same view angle in the sample picture set are input into a first convolution sub-branch network, and the high-dimensional picture characteristics of the ternary picture groups with different view angles in the sample picture set are input into a second convolution sub-branch network.

inputting a trained visual angle classification model into each picture to be detected in the target object picture and the picture to be detected set, and outputting the visual angle of the target object picture and the visual angle of each picture to be detected;

and determining a picture with the same view angle as the target object picture and a picture with different view angles from the to-be-detected picture set according to the view angle of the target object picture.

acquiring a picture training set marked with a visual angle;

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

Acquiring a picture training set marked with a visual angle;

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method of target re-identification, the method comprising:

Inputting the target object picture and the same visual angle picture into a trained re-recognition model, and outputting a first characteristic distance between the same visual angle picture and the target object picture through a first convolution sub-branch network of the re-recognition model; inputting the target object picture and the different view angle pictures into a trained re-recognition model, and outputting second characteristic distances of the different view angle pictures and the target object picture through a second convolution sub-branch network of the re-recognition model;

determining pictures comprising the target object under different view angles according to the second characteristic distance;

the method for training the re-identification model comprises the following steps:

acquiring marked sample picture sets; the sample picture set comprises a ternary picture group with positive and negative samples and samples in the same visual angle, and a ternary picture group with positive and negative samples and samples in different visual angles; wherein the positive sample and the sample have the same target object; the negative sample and the sample have different target objects; inputting a ternary image group with the same visual angle in the sample image set into a first convolution sub-branch network of an initial re-identification model to obtain a first feature matrix; inputting the ternary image groups with different visual angles into a second convolution sub-branch network of the initial re-identification model to obtain a second feature matrix; calculating a first loss function according to the first feature matrix, and calculating a second loss function according to the second feature matrix; the target of the first loss function enables the distance between the positive sample and the negative sample of the ternary picture group with the same visual angle to be maximum; the target of the second loss function enables the distance between the positive sample and the negative sample of the ternary picture group with different visual angles to be maximum; updating parameters of the initial re-identification model according to the first loss function and the second loss function; and returning to the step of obtaining the marked sample picture sets until the iteration stop condition is reached, and generating a trained re-recognition model.

2. The method according to claim 1, wherein the method further comprises:

updating parameters of the initial re-identification model according to the first loss function and the second loss function, including: and updating parameters of the initial re-identification model according to the first loss function, the second loss function and the third loss function.

3. The method of claim 1, further comprising, after the capturing of the annotated training sample pictures:

4. The method of claim 1, wherein identifying the view angle of each picture to be detected in a target object picture and a set of pictures to be detected, determining the same view angle picture in the set of pictures to be detected that is the same view angle as the target object picture, and a different view angle picture that is a different view angle than the target object picture, comprises:

5. The method of claim 4, wherein the manner in which the view classification model is trained comprises:

Acquiring a picture training set marked with a visual angle;

6. An object re-recognition apparatus, characterized in that the apparatus comprises:

the feature distance determining module is used for inputting the target object picture and the same visual angle picture into a trained re-recognition model, outputting a first feature distance between the same visual angle picture and the target object picture through a first convolution sub-branch network of the re-recognition model, inputting the target object picture and different visual angle pictures into the trained re-recognition model, and outputting a second feature distance between the different visual angle pictures and the target object picture through a second convolution sub-branch network of the re-recognition model;

The picture determining module is used for determining pictures including the target object under the same view angle according to the first characteristic distance; the image processing device is also used for determining images comprising the target object under different view angles according to the second characteristic distance;

further comprises:

the picture sample set acquisition module is used for acquiring each marked sample picture set; the sample picture set comprises a ternary picture group with positive and negative samples and samples in the same visual angle, and a ternary picture group with positive and negative samples and samples in different visual angles; wherein the positive sample and the sample have the same target object; the negative sample and the sample have different target objects;

the first feature matrix generation module is used for inputting the ternary picture groups with the same visual angle in the sample picture set into a first convolution sub-branch network of the initial re-identification model to obtain a first feature matrix;

the second feature matrix generation module is used for inputting the ternary picture groups with different visual angles into a second convolution sub-branch network of the initial re-identification model to obtain a second feature matrix;

a first loss function calculation module for: calculating a first loss function according to the first feature matrix, wherein the target of the first loss function enables the distance between the positive sample and the negative sample of the ternary picture group with the same visual angle to be maximum;

The second loss function calculation module is used for calculating a second loss function according to the second feature matrix, and the target of the second loss function enables the distances between the positive samples and the negative samples of the three-dimensional picture groups of different visual angles to be maximum;

the initial re-identification model updating module is used for updating parameters of the initial re-identification model according to the first loss function and the second loss function;

and the re-recognition model generation module is used for returning to the step of acquiring the marked sample picture sets until the iteration stop condition is reached, so as to generate a trained re-recognition model.

7. The apparatus of claim 6, wherein the apparatus further comprises:

the characteristic element extraction module is used for selecting characteristic elements of negative samples with the same visual angle from the first characteristic matrix and selecting characteristic elements of positive samples with different visual angles from the second characteristic matrix;

a third loss function calculation module, configured to calculate a third loss function according to characteristic elements of positive samples at different angles of view and characteristic elements of negative samples at the same angle of view; the objective of the third loss function is to maximize the feature distance of negative samples at the same view and positive samples at different views;

The initial re-identification model updating module is further configured to: and updating parameters of the initial re-identification model according to the first loss function, the second loss function and the third loss function.

8. The apparatus of claim 6, further comprising a view classification module to:

inputting the target object picture and each picture to be detected in the picture set to be detected into a trained visual angle classification model, and outputting the visual angle of the target object picture and the visual angle of each picture to be detected; and determining that the picture is the same view angle picture as the target object picture and different view angle pictures with different view angles from the target object picture from the picture set to be detected according to the view angle of the target object picture.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.