CN108154099B

CN108154099B - Figure identification method and device and electronic equipment

Info

Publication number: CN108154099B
Application number: CN201711386394.9A
Authority: CN
Inventors: 史培培
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2021-04-30
Anticipated expiration: 2037-12-20
Also published as: CN108154099A

Abstract

The embodiment of the invention provides a person identification method, a person identification device and electronic equipment, wherein the method comprises the following steps: carrying out face detection on a video to be processed to obtain each face position, and identifying a detected face to obtain a face number and confidence corresponding to the face which can be identified; performing head detection and clothes detection to obtain each head position and each clothes position, and extracting each head characteristic and each clothes characteristic; combining each face position, each head position and each clothes position through preset conditions to obtain each combined object; determining each combined object with the face number as each target object; determining the residual combined objects as retrieval objects; carrying out distance calculation on each characteristic of the retrieval object and the characteristic corresponding to each target object in a preset calculation mode, and determining the distance between the retrieval object and each target object; and determining the target object with the distance between each target object and the retrieval object smaller than a threshold value as the target person corresponding to the retrieval object.

Description

Figure identification method and device and electronic equipment

Technical Field

The invention relates to the technical field of computer information processing, in particular to a character recognition method, a character recognition device and electronic equipment.

Background

Video is an important medium for information transmission, and intelligent understanding of elements in video plays an increasingly important role, especially the understanding of people. Known person identification methods are mainly face identification methods.

The specific way of performing face recognition in the prior art is as follows: extracting the features of the face image from the image containing the face, searching and matching the extracted feature data of the face image with the feature template stored in the database, and outputting the result obtained by matching when the similarity exceeds the threshold by setting a threshold. The face recognition is to compare the face to be recognized with the obtained face and judge the identity information of the face according to the similarity degree.

However, the inventor finds that the prior art has at least the following problems in the process of implementing the invention:

face recognition can recognize the identity of a person based on the characteristics of the face, but is not effective in recognizing low-resolution front face images or side face images, and cannot process background images. How to identify the identity of a person in images such as a low-resolution front face, a side face, a back shadow and the like is an urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention aims to provide a person identification method, a person identification device and electronic equipment, so as to realize person identification of person information corresponding to a low-resolution front face, a side face, a back shadow and the like which cannot identify a face and increase person recalls. The specific technical scheme is as follows:

in order to achieve the above object, a first aspect of the present invention discloses a person identification method, including:

respectively carrying out face detection on each person of each frame of image of a video to be processed to obtain each face position, and identifying the detected face to obtain a face number and confidence corresponding to the face which can be identified;

respectively performing head detection and clothes detection on each person of each frame of image of the video to be processed to obtain each head position and each clothes position, and extracting each head characteristic and each clothes characteristic;

combining the face positions, the head positions and the clothes positions through preset conditions to obtain combined objects;

determining each combined object with the face number as each target object in each combined object; determining the residual combined objects as retrieval objects;

for each retrieval object, carrying out distance calculation on each normalized feature of the retrieval object and the normalized feature corresponding to each target object in a preset calculation mode, and determining the distance between the retrieval object and each target object; wherein the features include a head feature and a clothing feature;

and for each retrieval object, determining the target object with the distance between each target object and the retrieval object smaller than a threshold value as the target person corresponding to the retrieval object.

Optionally, in each of the combined objects, determining each combined object with a face number as each target object; after determining the remaining combined objects as the respective retrieval objects, the method further includes:

respectively normalizing the clothes characteristics of the target objects to obtain a first result of each target object; respectively normalizing the head characteristics of the target objects to obtain a second result of each target object;

for each target object, connecting the first result and the second result in series to obtain a reference feature of each target object;

respectively normalizing the clothes characteristics of the retrieval objects to obtain a third result of each retrieval object; respectively normalizing the head characteristics of the retrieval objects to obtain a fourth result of the retrieval objects;

for each retrieval object, the third result and the fourth result are connected in series to obtain the current characteristics of each retrieval object;

correspondingly, for each retrieval object, performing distance calculation on each normalized feature of the retrieval object and the normalized feature corresponding to each target object in a preset calculation manner to determine the distance between the retrieval object and each target object, includes:

and for each retrieval object, performing distance calculation on the current characteristic of the retrieval object and the reference characteristic of each target object through a first preset formula to obtain the distance between the retrieval object and each target object.

for each retrieval object, calculating a third result of the retrieval object and the first result of each target object through a second preset formula to obtain a first distance;

for each retrieval object, calculating a fourth result of the retrieval object and the second result of each target object through the second preset formula to obtain a second distance;

and aiming at each retrieval object, calculating the first distance and each second distance corresponding to the retrieval object through a third preset formula to obtain the distance between the retrieval object and each target object.

Optionally, the determining, for each search object, a target object whose distance between each target object and the search object is smaller than a threshold as a target person corresponding to the search object includes:

aiming at each retrieval object, searching for a target distance smaller than a threshold value in the distances corresponding to the retrieval objects;

and searching a target object corresponding to the target distance, and determining the searched target object as a target person corresponding to the retrieval object.

Optionally, combining the face positions, the head positions, and the clothes positions according to preset conditions to obtain combined objects, including:

combining the head positions and the clothes positions according to preset combination conditions of the head positions and the clothes positions to obtain a first combination result;

and combining the positions of the human faces and the positions of the heads according to a preset combination condition of the positions of the human faces and the positions of the heads on the basis of the first combination result to obtain each combined object.

In order to achieve the above object, in a second aspect of the present invention, there is disclosed a person identification device comprising:

the face recognition module is used for respectively carrying out face detection on each person of each frame of image of the video to be processed to obtain each face position, and recognizing the detected face to obtain a face number and confidence corresponding to the face which can be recognized;

the feature extraction module is used for respectively carrying out head detection and clothes detection on each person of each frame of image of the video to be processed to obtain each head position and each clothes position, and extracting each head feature and each clothes feature;

the characteristic combination module is used for combining the positions of the human faces, the positions of the heads and the positions of the clothes according to preset conditions to obtain combined objects;

the object determining module is used for determining each combined object with the face number as each target object in each combined object; determining the residual combined objects as retrieval objects;

the distance determining module is used for calculating the distance between each normalized feature of each retrieval object and the normalized feature corresponding to each target object in a preset calculation mode aiming at each retrieval object, and determining the distance between each retrieval object and each target object; wherein the features include a head feature and a clothing feature;

and the person determining module is used for determining the target objects with the distances between the target objects and the retrieval object smaller than a threshold value as the target persons corresponding to the retrieval object.

Optionally, the apparatus further comprises:

the first object normalization module is used for respectively normalizing the clothes characteristics of the target objects to obtain a first result of each target object; respectively normalizing the head characteristics of the target objects to obtain a second result of each target object;

a reference feature determination module, configured to concatenate the first result and the second result to obtain a reference feature of each target object;

the second object normalization module is used for respectively normalizing the clothes characteristics of the retrieval objects to obtain a third result of each retrieval object; respectively normalizing the head characteristics of the retrieval objects to obtain a fourth result of the retrieval objects;

a current feature determining module, configured to concatenate the third result and the fourth result to obtain a current feature of each search object;

correspondingly, the distance determining module is specifically configured to, for each search object, perform distance calculation on the current feature of the search object and the reference feature of each target object through a first preset formula to obtain a distance between the search object and each target object.

Optionally, the apparatus further comprises:

the third object normalization module is used for respectively normalizing the clothes characteristics of the target objects to obtain first results of the target objects; respectively normalizing the head characteristics of the target objects to obtain a second result of each target object;

the fourth object normalization module is used for respectively normalizing the clothes characteristics of the retrieval objects to obtain a third result of the retrieval objects; respectively normalizing the head characteristics of the retrieval objects to obtain a fourth result of the retrieval objects;

accordingly, the distance determination module comprises:

the first distance determining submodule is used for calculating a third result of each retrieval object and the first result of each target object through a second preset formula to obtain a first distance according to each retrieval object;

the second distance determining submodule is used for calculating a fourth result of each retrieval object and the second result of each target object through the second preset formula to obtain a second distance according to each retrieval object;

and the target distance determining submodule is used for calculating the first distance and each second distance corresponding to each retrieval object through a third preset formula aiming at each retrieval object to obtain the distance between each retrieval object and each target object.

Optionally, the person determination module includes:

the distance searching submodule is used for searching a target distance smaller than a threshold value in the distance corresponding to each retrieval object;

and the character determining submodule is used for searching the target object corresponding to the target distance and determining the searched target object as the target character corresponding to the retrieval object.

Optionally, the feature combination module includes:

the first characteristic combination submodule is used for combining the positions of the heads and the positions of the clothes according to a preset combination condition of the positions of the heads and the positions of the clothes to obtain a first combination result;

and the second feature combination submodule is used for combining the positions of the human faces and the positions of the heads according to a preset combination condition of the positions of the human faces and the positions of the heads on the basis of the first combination result to obtain the combined objects.

In order to achieve the above object, in another aspect of the present invention, an electronic device is disclosed, which includes a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement any one of the above-described person identification methods when executing the program stored in the memory.

In order to achieve the above object, in yet another aspect of the present invention, a computer-readable storage medium is disclosed, having instructions stored therein, which, when executed on a computer, cause the computer to perform any of the above-described person identification methods.

In another aspect of the present invention, the present invention also provides a computer program product including instructions, which when run on a computer, causes the computer to execute any of the above-mentioned person identification methods.

According to the person identification method, the person identification device and the electronic equipment, the face is detected, the identifiable face is identified, and the face number of each identifiable face is obtained. And then, the extraction of multiple features is realized by detecting the head position and the clothes position and extracting the head features and the clothes features. And then, performing multi-feature combination on the extracted face position, head position and clothes position through preset conditions to form various combined objects including those capable of recognizing the face and those incapable of recognizing the face. Determining each combined object with the face number as each target object; the remaining combination objects are determined as the respective retrieval objects.

For each retrieval object, carrying out distance calculation on each normalized feature of the retrieval object and each normalized feature corresponding to each target object in a preset calculation mode, and determining the distance between the retrieval object and each target object; and the target object with the distance from the retrieval object smaller than the threshold is determined as the target person corresponding to the retrieval object, so that the identity corresponding to each combined object corresponding to each unrecognizable face is determined by performing approximate analysis on various types of features. The embodiment of the invention finally realizes the purpose of determining the corresponding person identity for the images which cannot confirm the person identity, such as low-resolution front faces, side faces, back shadows and the like corresponding to the faces which cannot be identified in the video images, and increases the recall of the persons.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flow chart of a person identification method implemented by the present invention;

fig. 2 is a flowchart of a combined object determination method of a person identification method according to an embodiment of the present invention;

FIG. 3 is a flow chart of another method for identifying a person according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a human recognition device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device implemented in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

With the rapid development of video services, it is more and more important to intelligently identify people in videos. Although the number of characters appearing in each video is small, the angle, expression and posture of characters in the video are greatly changed; the video frame has the influence of external factors such as illumination, special effects and the like, and the characters have the influence of a plurality of factors such as shielding and the like, so that the character recognition in the video is more complex. At present, the mainstream method is to identify the face in the video by using a face identification mode and track the face according to the face. However, this method uses only information related to a human face, but is not effective for recognizing a low-resolution front face image or a side face image, and cannot process a back image.

In view of the above problems, embodiments of the present invention provide a person identification method, an apparatus, and an electronic device, which combine a face identification result, a head feature, a clothing feature, and the like to comprehensively determine the identity of a person, so as to implement person identity identification for person information corresponding to an unrecognizable face, such as a low-resolution front face, a side face, and a back shadow, and increase the recall of the person. The specific implementation mode is as follows:

in order to achieve the above object, a person identification method is disclosed in a first aspect of the present invention, as shown in fig. 1. Fig. 1 is a flowchart of a person identification method implemented by the present invention, including:

s101, respectively carrying out face detection on each person of each frame of image of the video to be processed to obtain each face position, and identifying the detected face to obtain a face number and confidence corresponding to the face which can be identified.

In the embodiment of the invention, the identification of the unrecognizable face in the video to be processed is required, namely, the identification of the low-resolution front face image, the side face image and the back image contained in the video is carried out, and then the specific person in the recognizable face corresponding to the image is determined.

In the embodiment of the invention, a network model trained by a convolutional neural network algorithm can be adopted to respectively detect the faces of people appearing in each frame of image of a video to be processed, so as to obtain the position of each face. In addition, the face recognition model can be adopted for face recognition of the faces capable of being recognized, and the face number and the confidence coefficient corresponding to each face capable of being recognized are obtained.

S102, respectively carrying out head detection and clothes detection on each person in each frame of image of the video to be processed to obtain each head position and each clothes position, and extracting each head characteristic and each clothes characteristic.

After the human face is identified, the head and clothes of a person appearing in each frame of image of the video to be processed can be respectively detected by adopting a network model trained by a convolutional neural network, and then the position of each head and the position of each clothes are obtained.

In addition, the convolutional neural network model is used in this step to extract the feature vectors corresponding to the detected heads of the persons and the feature vectors corresponding to the clothes of the persons.

In this step, the head position and the clothing position may be detected in parallel, and the feature vector corresponding to each head feature and the feature vector corresponding to each clothing feature may be extracted. Or, the head position is detected first, then the feature vector corresponding to each head feature is extracted, and then the clothes position is detected, and then the feature vector corresponding to each clothes feature is extracted. In addition, the positions of the clothes can be detected first, so that the feature vector corresponding to each clothes feature is extracted, and then the positions of the heads are detected, so that the feature vector corresponding to each head feature is extracted. The specific detection sequence is not limited.

And S103, combining the face positions, the head positions and the clothes positions through preset conditions to obtain combined objects.

After the face position, the head position and the clothes position in each frame of video image are obtained, the head position and the face position can be combined according to the position relation between the head position and the face position in the preset condition. And combining the clothes on the obtained combination result according to the position relation between the head position and the clothes position in the preset condition on the basis of the combination.

The combination condition of the face and the head in the preset condition may be: the position of the face and the position of the head are the same; the preset condition may be a combination condition of the head and the laundry: the laundry position is a position where the head position is a predetermined distance. The combination condition of the human face and the head in the preset condition can also be as follows: the position of the face and the position of the head are the same, and the direction of the face and the direction of the head are the same; the preset condition of the combination of the head and the laundry may be further condition: the clothes position is at the head position for a preset distance, and the clothes direction and the head direction are the same.

Specifically, according to a preset combination condition of the face position and the head position, the face and the head of a person with the same face position and head position capable of recognizing the face are combined. And then according to the relative position relation between the head position and the clothes position, clothes with the clothes position at the preset position below the head are combined with the combination result to obtain each combination object containing the recognizable human face. And the combined object is provided with a face number obtained by the face recognition model. That is, the partial combined objects are combined objects recognizable by the human face.

In addition, if the detected head corresponds to a back shadow or a human face cannot be identified laterally in the detection process, the combined objects obtained according to the preset conditions are combined objects which cannot be identified by the human face.

S104, determining each combined object with the face number as each target object in each combined object; the remaining combination objects are determined as the respective retrieval objects.

In the embodiment of the present invention, each combination object with a face number, of which a face can be recognized, in the obtained combination objects is defined as each target object in the embodiment of the present invention, the remaining combination objects are each combination object whose face cannot be recognized, and the combination object corresponding to the part of the face that cannot be recognized is defined as each retrieval object in the embodiment of the present invention. And further calculating the distance between the retrieval object and the characteristic vector of each target object, and determining the target object with the distance smaller than a threshold value as the person identity of the retrieval object. This step is a method step of determining each target object and each search object in each combined object.

Specifically, the face information of each combined object may be detected by a face recognition technology, and each combined object whose face can be clearly recognized is determined as each target object in the embodiment of the present invention. Further, the remaining combination objects among the combination objects are determined as the respective target objects of the embodiment of the present invention.

And S105, for each retrieval object, performing distance calculation on each normalized feature of the retrieval object of each target object and the normalized feature corresponding to each target object by a preset calculation mode, and determining the distance between the retrieval object and each target object, wherein each feature comprises a head feature and a clothes feature.

After the combined objects obtained by combining are divided into the target objects and the retrieval objects, the target object corresponding to the retrieval object can be determined for each retrieval object, namely the character identity of the retrieval object can be determined. The preset calculation mode of the embodiment of the invention can be a set distance calculation mode, and the distance between each retrieval object and each target object is obtained. The distance calculation mode can adopt an Euclidean distance formula, a cosine distance formula, a correlation coefficient and the like.

Specifically, feature vectors corresponding to clothing features of each target object are respectively normalized, and feature vectors corresponding to head features of each target object are respectively normalized, so that each feature vector after normalization of each target object is obtained.

For each retrieval object, respectively normalizing the feature vectors corresponding to the clothing features of the retrieval object, and respectively normalizing the feature vectors corresponding to the head features of the retrieval object. And calculating the distance between the comprehensive characteristics of the retrieval object and the comprehensive characteristics of each target object by the preset calculation mode of the embodiment of the invention. The distance indicates the similarity degree of the comprehensive features of the retrieval object and the comprehensive features of each target object, and the smaller the distance, the higher the similarity degree of the comprehensive features of the retrieval object and the comprehensive features of the target object.

And S106, aiming at each retrieval object, determining the target object with the distance between each target object and the retrieval object smaller than a threshold value as the target person corresponding to the retrieval object.

In the embodiment of the present invention, in order to ensure accurate identification of the target person corresponding to each search object, a threshold value of the distance between each feature of the search object and each feature of the target object may be set, where the threshold value indicates that the search object is equal to the maximum acceptable range value of the target object, and the threshold value may be manually set by a feature comparison technique.

In this step, a detection program may be set for each search object, a distance smaller than a threshold value in the distance of the search object is detected, a target object corresponding to the distance smaller than the threshold value is determined, and the target object is determined as a target person corresponding to the search object.

In the case where a plurality of target objects corresponding to distances smaller than the threshold value from the detection object are detected for each search object, the target person corresponding to the search object can be identified for all of the plurality of target objects. In addition, one or more target objects corresponding to the smallest distance among the distances smaller than the threshold among the distances of the detection objects may be determined as the target person corresponding to the search object.

In the person identification method provided by the embodiment of the invention, firstly, the face is detected, so that the identifiable face is identified, and the face number of each identifiable face is obtained. And then, the extraction of multiple features is realized by detecting the head position and the clothes position and extracting the head features and the clothes features. And then, performing multi-feature combination on the extracted face position, head position and clothes position through preset conditions to form various combined objects including those capable of recognizing the face and those incapable of recognizing the face. Determining each combined object with the face number as each target object; the remaining combination objects are determined as the respective retrieval objects.

For each retrieval object, carrying out distance calculation on each normalized feature of the retrieval object and each normalized feature corresponding to each target object in a preset calculation mode, and determining the distance between the retrieval object and each target object; and the target object with the distance from the retrieval object smaller than the threshold is determined as the target person corresponding to the retrieval object, so that the target object which can be accurately identified and corresponds to each combined object corresponding to each unrecognizable face is determined by performing approximate analysis on multiple types of features. The embodiment of the invention finally realizes the purpose of determining the corresponding person identity for the images which cannot confirm the person identity, such as low-resolution front faces, side faces, back shadows and the like corresponding to the faces which cannot be identified in the video images, and increases the recall of the persons.

Optionally, in an embodiment of the person identification method in the embodiment of the present invention, in each combined object, each combined object with a face number is determined as each target object; after determining the remaining combined objects as the retrieval objects, the method further comprises:

firstly, respectively normalizing the clothes characteristics of each target object to obtain a first result of each target object; and respectively normalizing the head characteristics of each target object to obtain a second result of each target object.

The embodiment of the invention normalizes the feature vectors corresponding to the features extracted from the retrieval object and normalizes the feature vectors corresponding to the features extracted from the target object; and (3) connecting the normalized features of each retrieval object in series and the normalized features of each target object in series, calculating the distance between the normalized features of each retrieval object in series and the normalized features of each target object in series in a preset calculation mode, and determining the distance between each retrieval object and each target object.

This step is an embodiment of normalizing the feature vectors corresponding to the features of the target objects. Normalization is a dimensionless processing means, and the absolute value of a numerical value is changed into a certain relative value relation, so that the calculation is simplified, and the effective method of magnitude reduction is realized. The normalization method has two forms, one is to change a number to a decimal between (0, 1), and the other is to change a dimensional expression to a dimensionless expression.

After obtaining each target object, normalizing the feature vector corresponding to the clothing feature of each target object according to a normalization method to obtain a normalization result corresponding to the clothing feature of each target object, which is the first result of the embodiment of the invention. And normalizing the feature vector corresponding to the head feature of each target object according to a normalization method to obtain a normalization result corresponding to the head feature of each target object, namely the second result of the embodiment of the invention.

For example, for each target object, the feature vector f of the clothing feature of the target object is set_1cNormalizing to obtain f_1nc(ii) a The feature vector f of the head feature head of the target object is calculated_1hNormalizing to obtain f_1nh。

And step two, aiming at each target object, connecting the first result and the second result in series to obtain the reference characteristic of each target object.

After the normalization result of each target object is obtained, for each target object, the normalization result corresponding to the clothing feature of the target object is concatenated with the normalization result corresponding to the head feature of the target object, that is, the normalization results are added and concatenated, and then the concatenated result is used as the reference feature of the target object.

For example, for each target object, the feature vector f of the target object is calculated_1ncAnd the feature vector f of the target object_1nhConnecting in series to obtain the reference characteristic f of the target object_1con＝(f_1nc,f_1nh)。

Respectively normalizing the clothes characteristics of each retrieval object to obtain a third result of each retrieval object; and respectively normalizing the head characteristics of the retrieval objects to obtain a fourth result of the retrieval objects.

According to the method for normalizing each target object, clothes characteristics of each retrieval object are respectively normalized to obtain a third result of each retrieval object. And respectively normalizing the head characteristics of each retrieval object to obtain a fourth result of each retrieval object. Details are not repeated.

For example, for each search target, the feature vector f of the clothing feature of the search target is set_2cNormalizing to obtain f_2nc(ii) a The feature vector f of the head feature head of the search object is used_2hNormalizing to obtain f_2nh。

And fourthly, connecting the third result and the fourth result in series aiming at each retrieval object to obtain the current characteristics of each retrieval object.

And according to the mode of connecting the first result and the second result in series for each target object, connecting the third result and the fourth result in series for each retrieval object to obtain the current characteristics of each retrieval object.

For example, for each search object, the feature vector f of the search object is set_2ncAnd the feature vector f of the search object_2nhConnecting in series to obtain the current characteristic f of the searched object_2con＝(f_2nc,f_2nh)。

Correspondingly, for each retrieval object, performing distance calculation on each normalized feature of the retrieval object and the normalized feature corresponding to each target object in a preset calculation mode, and determining the distance between the retrieval object and each target object, wherein the distance calculation method comprises the following steps:

and step five, aiming at each retrieval object, performing distance calculation on the current characteristics of the retrieval object and the reference characteristics of each target object through a first preset formula to obtain the distance between the retrieval object and each target object.

After the reference feature of each target object and the current feature of each retrieval object are obtained, the distance between each retrieval object and each target object can be calculated through a first preset formula.

In the embodiment of the present invention, the first preset formula may adopt any one of the calculation formulas of the euclidean distance and the cosine distance. And aiming at each retrieval object, calculating the current characteristic of the retrieval object and the reference characteristic of each target object by adopting the first preset formula to obtain the distance between the retrieval object and each target object.

In addition, in the embodiment of the present invention, the correlation coefficient may be further used to calculate the correlation between the current feature of the search object and the reference feature of each target object. The formula of the correlation coefficient ρ is as follows:

wherein X represents the current characteristics of the retrieval object; y represents a reference feature of the target object; sigma_XAn evolution value representing a variance of a normal distribution function to which a current feature of the retrieval object is subjected; sigma_YAn evolution value representing a variance of a normal distribution function to which a reference feature of the target object is subjected; mu.s_XRepresenting the mean value of a normal distribution function obeying the current characteristics of the retrieval object; mu.s_YRepresents the mean of a normal distribution function to which the reference feature of the target object obeys.

And aiming at each retrieval object, calculating the current characteristic of the retrieval object and the reference characteristic of each target object by adopting the correlation coefficient formula to obtain the similarity between the retrieval object and each target object.

Therefore, the embodiment of the invention can calculate the distance between the combined features of each retrieval object and each target object, and further know the comprehensive similarity degree between each retrieval object and each target object.

The embodiment of the invention normalizes the feature vectors corresponding to the features extracted from the retrieval object and normalizes the feature vectors corresponding to the features extracted from the target object; and respectively calculating the normalized features of each retrieval object and the normalized features corresponding to each target object in a preset calculation mode, and determining the distance between the normalized features of each retrieval object and the normalized features corresponding to each target object.

This step is an embodiment of normalizing the feature vectors corresponding to the features of the target objects.

Step two, respectively normalizing the clothes characteristics of each retrieval object to obtain a third result of each retrieval object; and respectively normalizing the head characteristics of the retrieval objects to obtain a fourth result of the retrieval objects.

According to the method for normalizing each target object, clothes characteristics of each retrieval object are respectively normalized to obtain a third result of each retrieval object. And respectively normalizing the head characteristics of each retrieval object to obtain a fourth result of each retrieval object. Details are not described in detail.

and step three, aiming at each retrieval object, calculating a third result of the retrieval object and the first result of each target object through a second preset formula to obtain a first distance.

After normalizing each target object and each feature of each search object, calculating, for each search object, a third result of the search object and a first result of each target object by using a second preset formula to obtain a distance value, where in this embodiment, the distance is defined as a first distance, and the first distance may be represented as d_c。

It should be noted that the second predetermined formula may adopt any one of the calculation formulas of the euclidean distance and the cosine distance, and the second predetermined formula may be the same as the first predetermined formula.

And step four, aiming at each retrieval object, calculating a fourth result of the retrieval object and second results of all target objects through a second preset formula to obtain a second distance.

After the first distance of each search object is determined, the same calculation method may be adopted, and the fourth result of the search object and the second result of each target object are calculated through the second preset formula to obtain a distance value, in this embodiment, the distance is defined as a second distance, and the second distance may be represented as d_s。

In addition, in the embodiment of the present invention, for each search object, the second distance may be calculated first, and then the first distance may be calculated, and the first distance and the second distance may be calculated at the same time, and a specific calculation sequence is not limited.

And fifthly, calculating the first distance and each second distance corresponding to each retrieval object through a third preset formula aiming at each retrieval object to obtain the distance between each retrieval object and each target object.

After the first distance and the second distance are obtained for each retrieval object, the distance between the retrieval object and each target object can be calculated through a third preset formula. The third predetermined formula functions to represent a weighted average distance of the first distance and the second distance. For example, the distance between the search object and each target object is represented as d_wA first distance d between the search object and each target object_cIs represented by weight of₁A second distance d between the search object and each target object_sIs represented by weight of₂Then, the third predetermined formula can be expressed as:

then, the distance between the search object and each target object is obtained for each search object through the third preset formula.

Therefore, the embodiment of the invention can calculate the distance between each feature of each retrieval object and the corresponding feature of each target object, and further obtain the average distance between each feature of each retrieval object and each feature of each target object by weighting the distance of each feature, thereby obtaining the similarity degree between each retrieval object and each target object.

Optionally, in an embodiment of the person identification method in the embodiment of the present invention, for each search object, determining a target object whose distance between each target object and the search object is smaller than a threshold as a target person corresponding to the search object, includes:

step one, aiming at each retrieval object, searching a target distance which is smaller than a threshold value in a position distance corresponding to the retrieval object.

The embodiment of the invention is an embodiment for determining a target object corresponding to each retrieval object. The specific mode is as follows:

after the distance between the retrieval object and each target object is obtained for each retrieval object, a threshold value of the distance obtained by integrating each characteristic of the retrieval object and each characteristic of the target object can be set, wherein the threshold value indicates that the retrieval object is equal to the maximum acceptable range value of the target object, and the threshold value can be manually set by a characteristic comparison technology. A smaller distance value indicates a closer similarity of the search object to the target object.

In this step, a detection program is provided to detect each distance having a distance smaller than a threshold value among distances between the search target and each target object, and to determine each distance having a distance smaller than the threshold value as a target distance of the search target.

And step two, searching a target object corresponding to the target distance, and determining the searched target object as a target person corresponding to the retrieval object.

After the target distance of each search object is determined, the target object corresponding to the search target distance is determined, and the target object is determined as the target person of the search object.

When there are a plurality of target distances specified by any search target, each target object corresponding to the plurality of target distances may be a target person of the search target, or a target object corresponding to a target distance having the smallest value among the target distances may be specified as a target person of the search target. Alternatively, when one target distance determined by any search object corresponds to a plurality of target objects, all of the plurality of target objects may be regarded as target persons of the search object.

In addition, in the embodiment of the present invention, when the similarity between any one of the retrieval objects and each target object is calculated using the correlation coefficient, a correlation threshold value may be set. And when the calculated similarity is larger than the threshold, determining the target person corresponding to the retrieval object according to the mode of determining the target person for each target object corresponding to the similarity larger than the threshold.

Therefore, the embodiment of the invention can determine the target object with the identifiable identity corresponding to each retrieval object with the unidentifiable identity, thereby increasing the people recall of the target object.

Alternatively, in an embodiment of the person identification method according to the embodiment of the present invention, the face positions, the head positions, and the clothes positions are combined by preset conditions to obtain the combined objects, which may be as shown in fig. 2. Fig. 2 is a flowchart of a combined object determining method of a person identification method according to an embodiment of the present invention, including:

s201, combining the head positions and the clothes positions according to preset combination conditions of the head positions and the clothes positions to obtain a first combination result.

The embodiment of the invention is an implementation method for combining the faces, the heads and clothes of the persons in each frame of image in the extracted video to be processed according to preset conditions so as to obtain each combined object.

According to the preset combination condition of the head position and the clothes position, the head position and the clothes position are combined to obtain a combination result containing the characteristics of the head and the clothes, namely a first combination result of the embodiment of the invention.

Specifically, the preset combination conditions of the head position and the clothes position may be set as follows:

(1)w_c＞w_h，h_c＞h_h，w_c，h_cwidth, height, w, respectively, characteristic of the garment_h，h_hRespectively the width and height of the head feature.

(2)sy_c＞cy_h，sy_cY-coordinate, cy, of the top left corner of a feature of the garment_hIs the y coordinate of the midpoint of the head feature.

(3)sx_c＜cx_h＜ex_c，sx_cX-coordinate, ex, of the top left corner point of the clothing feature_cX-coordinate, cx, of the lower right corner point of the characteristic of the garment_hIs the x coordinate of the midpoint of the head feature.

And combining the head positions and the clothes positions according to the preset combination conditions of the head positions and the clothes positions to obtain a first combination result.

And S202, combining the positions of the human faces and the positions of the heads according to a preset combination condition of the positions of the human faces and the positions of the heads on the basis of the first combination result to obtain the combined objects.

And combining the extracted head features and the clothes features according to the preset combination conditions of the head position and the clothes position, and combining the human face positions on the first combination result.

Specifically, each combination object whose head position matches the face position is found on the basis of the first combination result, and then each combination object capable of recognizing the face and a face number corresponding to each combination object capable of recognizing the face are obtained. For example, the combination condition of the preset face position and the head position of the first combination result is as follows: face position r_fAt the head position r_hInner and head position r_hAnd face position r_fAre the closest together.

Furthermore, after determining each combination capable of recognizing a human face in the first combination result, the remaining first combination objects are each combination object whose identity cannot be recognized according to the embodiment of the present invention.

Therefore, the embodiment of the invention can realize the extraction of various characteristics of the person, and further obtain each combined object by combining the extracted various characteristics.

To better describe a person identification method according to an embodiment of the present invention, there may be another flow chart of the person identification method according to the embodiment of the present invention as shown in fig. 3, where the flow chart includes:

s301, unframing a video to be processed, where fps is 2;

s302, carrying out face recognition on each frame of image by using the classifier trained by the neural network to obtain a face position r_fThe confidence coefficient and a face number entity id corresponding to the face which can be identified;

s303, detecting a head area head and clothes areas rings of each frame of image by using the target detector trained by the neural network to obtain a head position r_hAnd the position r of the clothes_c(ii) a The detector can simultaneously detect head, the tooths 2 categories, and outputs the circumscribed rectangle and the confidence of the target. The detector can adopt fast rcnn, ssd and other detection models;

s304, respectively extracting head features and clothes features aiming at the head region and the clothes region;

s305, according to the face position r_fHead position r_hAnd the position r of the clothes_cThe sizes and the relative position relations of the combined objects are combined to form the combined objects;

s306, selecting the combined objects with non-empty face numbers in the combined objects, respectively determining the combined objects as target objects, and respectively determining the rest combined objects as retrieval objects;

s307, aiming at each search object, the feature vector f of the clothes feature of the search object is used_cNormalizing to obtain f_nc(ii) a The feature vector f of the head feature of the search object is calculated_hNormalizing to obtain f_nh(ii) a Feature vector f of clothing feature normalized by the retrieval object_ncA feature vector f of the head feature normalized with the search target_nhAre connected in series to obtain f_con＝(f_nc,f_nh)；

S308, normalizing the feature vector of the clothes feature of each target object, normalizing the feature vector of the head feature of each target object, and connecting the feature vector of the clothes feature normalized by each target object and the feature vector of the head feature normalized by each target object in series;

s309, aiming at each retrieval object, calculating the characteristic vector f of the retrieval object in series connection by adopting Euclidean distance_conThe distance d between the feature vectors connected in series with each target object;

s310, for each search object, searching for a target distance whose value is smaller than the threshold value a among the distances d, and determining a target object corresponding to the target distance as a target person of the search object.

Therefore, the person identification method provided by the embodiment of the invention combines the face identification result, the head characteristics and the clothes characteristics to comprehensively judge and judge the person identity of each combined object of the low-resolution face, side face, back shadow and the like which can not identify the face information in the video. In addition, the embodiment of the invention also solves the problem of low recall rate caused by the fact that people are identified by only using the human face.

In order to achieve the above object, a person identifying apparatus is disclosed in a second aspect of the present invention as embodied in fig. 4. Fig. 4 is a schematic structural diagram of a person identification device implemented in the present invention, including:

the face recognition module 401 is configured to perform face detection on each person in each frame of image of the video to be processed to obtain each face position, and recognize a detected face to obtain a face number and a confidence corresponding to the face that can be recognized;

the feature extraction module 402 is configured to perform head detection and clothing detection on each person in each frame of image of the video to be processed, obtain positions of each head and positions of each clothing, and extract features of each head and each clothing;

the feature combination module 403 is configured to combine each face position, each head position, and each clothes position according to preset conditions to obtain each combination object;

an object determining module 404, configured to determine, as each target object, each combined object with a face number; determining the residual combined objects as retrieval objects;

a distance determining module 405, configured to perform distance calculation on each normalized feature of the search object and the normalized feature corresponding to each target object in a preset calculation manner for each search object, and determine a distance between the search object and each target object; wherein each feature comprises a head feature and a clothing feature;

and a person determining module 406, configured to determine, for each search object, a target person corresponding to the search object, where a distance between each target object and the search object is smaller than a threshold.

In the person identification device provided by the embodiment of the invention, firstly, the face is detected, so that the identifiable face is identified, and the face number of each identifiable face is obtained. And then, the extraction of multiple features is realized by detecting the head position and the clothes position and extracting the head features and the clothes features. And then, performing multi-feature combination on the extracted face position, head position and clothes position through preset conditions to form various combined objects including those capable of recognizing the face and those incapable of recognizing the face. Determining each combined object with the face number as each target object; the remaining combination objects are determined as the respective retrieval objects.

For each retrieval object, carrying out distance calculation on each normalized feature of the retrieval object and each normalized feature corresponding to each target object in a preset calculation mode, and determining the distance between the retrieval object and each target object; and the target object with the distance from the retrieval object smaller than the threshold is determined as the target person corresponding to the retrieval object, so that the approximate analysis of various types of features is realized, and the target object which can be accurately identified and corresponds to each combined object corresponding to each unrecognizable face is determined. The embodiment of the invention finally realizes the purpose of determining the corresponding person identity for the images which cannot confirm the person identity, such as low-resolution front faces, side faces, back shadows and the like corresponding to the faces which cannot be identified in the video images, and increases the recall of the persons.

Optionally, in an embodiment of the person identification apparatus in the embodiment of the present invention, the apparatus further includes:

the first object normalization module is used for respectively normalizing the clothes characteristics of each target object to obtain a first result of each target object; respectively normalizing the head characteristics of each target object to obtain a second result of each target object;

the reference characteristic determining module is used for connecting the first result and the second result in series aiming at each target object to obtain the reference characteristic of each target object;

the second object normalization module is used for respectively normalizing the clothes characteristics of the retrieval objects to obtain a third result of each retrieval object; respectively normalizing the head characteristics of the retrieval objects to obtain a fourth result of each retrieval object;

and the current characteristic determining module is used for connecting the third result and the fourth result in series aiming at each retrieval object to obtain the current characteristic of each retrieval object.

Accordingly, the distance determining module 405 is specifically configured to, for each search object, perform distance calculation on the current feature of the search object and the reference feature of each target object through a first preset formula to obtain the distance between the search object and each target object.

the third object normalization module is used for respectively normalizing the clothes characteristics of each target object to obtain a first result of each target object; respectively normalizing the head characteristics of each target object to obtain a second result of each target object;

the fourth object normalization module is used for normalizing the clothes characteristics of the retrieval objects respectively to obtain a third result of the retrieval objects; respectively normalizing the head characteristics of the retrieval objects to obtain a fourth result of each retrieval object;

accordingly, the distance determination module 405 includes:

the first distance determining submodule is used for calculating a third result of each retrieval object and a first result of each target object through a second preset formula to obtain a first distance according to each retrieval object;

the second distance determining submodule is used for calculating a fourth result of each retrieval object and second results of each target object through a second preset formula to obtain a second distance according to each retrieval object;

Optionally, in an embodiment of the person identification apparatus in the embodiment of the present invention, the person determining module 406 includes:

and the character determining submodule is used for searching a target object corresponding to the target distance and determining the searched target object as a target character corresponding to the retrieval object.

Optionally, in an embodiment of the person identification apparatus in the embodiment of the present invention, the feature combination module 403 includes:

the first characteristic combination submodule is used for combining each head position and each clothes position according to a preset combination condition of the head position and the clothes position to obtain a first combination result;

and the second characteristic combination submodule is used for combining each face position and each head position according to a preset combination condition of the face position and the head position on the basis of the first combination result to obtain each combination object.

In order to achieve the above object, an electronic device is disclosed in another aspect of the present invention, as shown in fig. 5. Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, which includes a processor 501, a communication interface 502, a memory 503, and a communication bus 504, where the processor 501, the communication interface 502, and the memory 503 complete communication with each other through the communication bus 504;

a memory 503 for storing a computer program;

the processor is used for realizing the following method steps when executing the program stored in the memory:

respectively carrying out head detection and clothes detection on each person in each frame of image of a video to be processed to obtain each head position and each clothes position, and extracting each head characteristic and each clothes characteristic;

combining each face position, each head position and each clothes position through preset conditions to obtain each combined object;

for each retrieval object, carrying out distance calculation on each normalized feature of the retrieval object and each normalized feature corresponding to each target object in a preset calculation mode, and determining the distance between the retrieval object and each target object; wherein each feature comprises a head feature and a clothing feature;

The communication bus 504 mentioned above for the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 504 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface 502 is used for communication between the above-described electronic apparatus and other apparatuses.

The Memory 503 may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory 503 may also be at least one storage device located remotely from the processor 501.

The Processor 501 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In the electronic device provided by the embodiment of the invention, the face is detected first, so that the recognizable face is recognized, and the face number of each recognizable face is obtained. And then, the extraction of multiple features is realized by detecting the head position and the clothes position and extracting the head features and the clothes features. And then, performing multi-feature combination on the extracted face position, head position and clothes position through preset conditions to form various combined objects including those capable of recognizing the face and those incapable of recognizing the face. Determining each combined object with the face number as each target object; the remaining combination objects are determined as the respective retrieval objects.

In order to achieve the above object, in yet another aspect of the present invention, a computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to execute any of the above-described person identification methods is disclosed.

In a computer-readable storage medium provided in an embodiment of the present invention, a human face is detected, and then an identifiable human face is identified, and a human face number of each identifiable human face is obtained. And then, the extraction of multiple features is realized by detecting the head position and the clothes position and extracting the head features and the clothes features. And then, performing multi-feature combination on the extracted face position, head position and clothes position through preset conditions to form various combined objects including those capable of recognizing the face and those incapable of recognizing the face. Determining each combined object with the face number as each target object; the remaining combination objects are determined as the respective retrieval objects.

In another aspect of the present invention, the present invention further provides a computer program product including instructions, which when run on a computer, causes the computer to execute any of the above-mentioned person identification methods.

In the computer program product including instructions provided by the embodiment of the present invention, a human face is detected first, and then recognizable human faces are recognized, and a human face number of each recognizable human face is obtained. And then, the extraction of multiple features is realized by detecting the head position and the clothes position and extracting the head features and the clothes features. And then, performing multi-feature combination on the extracted face position, head position and clothes position through preset conditions to form various combined objects including those capable of recognizing the face and those incapable of recognizing the face. Determining each combined object with the face number as each target object; the remaining combination objects are determined as the respective retrieval objects.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the invention are brought about in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the device and electronic apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A person identification method, comprising:

2. The method according to claim 1, characterized in that, among the combined objects, each combined object with a face number is determined as each target object; after determining the remaining combined objects as the respective retrieval objects, the method further includes:

3. The method according to claim 1, characterized in that, among the combined objects, each combined object with a face number is determined as each target object; after determining the remaining combined objects as the respective retrieval objects, the method further includes:

4. The method according to claim 1, wherein the determining, for each search object, a target object whose distance between the target object and the search object is smaller than a threshold as the target person corresponding to the search object comprises:

5. The method according to any one of claims 1 to 4, wherein combining the face positions, the head positions and the clothes positions by preset conditions to obtain combined objects comprises:

6. A person recognition apparatus, comprising:

7. The apparatus of claim 6, further comprising:

8. The apparatus of claim 6, further comprising:

accordingly, the distance determination module comprises:

9. The apparatus of claim 6, wherein the personality determination module comprises:

10. The apparatus of any one of claims 6-9, wherein the feature combination module comprises:

11. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor, when executing the program stored in the memory, implementing the method steps of any of claims 1-5.