CN109359544B

CN109359544B - Portrait retrieval method and device

Info

Publication number: CN109359544B
Application number: CN201811091048.2A
Authority: CN
Inventors: 姜黎; 张仁辉
Original assignee: Wuhan Fiberhome Digtal Technology Co Ltd
Current assignee: Wuhan Fiberhome Digtal Technology Co Ltd
Priority date: 2018-09-19
Filing date: 2018-09-19
Publication date: 2022-01-21
Anticipated expiration: 2038-09-19
Also published as: CN109359544A

Abstract

The invention provides a portrait retrieval method and a portrait retrieval device, wherein the portrait retrieval method comprises the following steps: obtaining a video to be detected; carrying out motion detection on the video to obtain a gait feature sequence of the pedestrian; acquiring a gait sequence network model, inputting a gait feature sequence into the gait sequence network model, and acquiring an identification result and an identification probability for a pedestrian; judging whether the recognition probability is greater than a preset threshold value or not; if so, judging that the obtained identification result is correct, and taking the identification result as a retrieval result; and if not, judging that the obtained identification result is incorrect, and returning to execute to obtain the video to be detected. By applying the embodiment of the invention, the efficiency and the accuracy of portrait retrieval are improved.

Description

Portrait retrieval method and device

Technical Field

The invention relates to the field of data retrieval, in particular to a portrait retrieval method and a portrait retrieval device.

Background

With the development of internet technology, various video data are growing explosively, and various portrait retrieval methods are applied to quickly retrieve relevant information of a person from massive video data.

At present, a portrait retrieval method mainly adopts a manual identification method or a face identification technology to retrieve the portrait in a video to obtain a retrieval result. However, these methods have the problems of low efficiency or low accuracy, and it is still difficult to meet the actual requirements of users. For example, the size of the video to be retrieved may be several hundred T (terabytes), and it may take one or two months to identify by manual means, which is a huge and time-consuming task; although the processing speed of the face recognition technology is high, the technology is easily interfered by scenes and has strict requirements on the size of the face, the scenes in the video usually change greatly, and the size of the face of the same person in the video may change, so that the accuracy of searching the portrait by applying the technology in the video is not high.

Therefore, it is necessary to design a new portrait retrieval method to overcome the above problems.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a portrait retrieval method and a portrait retrieval device so as to improve the efficiency and accuracy of portrait retrieval.

The invention is realized by the following steps:

in a first aspect, the present invention provides a method for retrieving a portrait, the method comprising:

obtaining a video to be detected; carrying out motion detection on the video to obtain a gait feature sequence of the pedestrian;

acquiring a gait sequence network model, inputting the gait feature sequence into the gait sequence network model, and acquiring an identification result and an identification probability for the pedestrian;

judging whether the recognition probability is greater than a preset threshold value or not;

if so, judging that the obtained identification result is correct, and taking the identification result as a retrieval result;

and if not, judging that the obtained identification result is incorrect, and returning to execute the step of obtaining the video to be detected.

Optionally, the motion detection is performed on the pedestrian motion video to obtain a gait feature sequence of the pedestrian, including:

detecting each image frame containing the pedestrian from the video by using a preset motion detection algorithm, and extracting gait features of the pedestrian from each image frame containing the pedestrian; and combining the extracted gait features to obtain a gait feature sequence of the pedestrian.

Optionally, the gait sequence network model is a target neural network model, and obtaining the gait sequence network model includes:

and training a preset initial neural network model by using a training sample set to obtain the target neural network model.

Optionally, the identification result includes an identity identification result and a motion identification result, the preset threshold includes a preset first threshold and a preset second threshold, and the identification probability includes an identity identification probability and a motion identification probability; judging whether the recognition probability is greater than a preset threshold value, if so, judging that the obtained recognition result is correct, and taking the recognition result as a retrieval result, wherein the steps of:

and when the identity recognition probability is greater than a preset first threshold and the motion recognition probability is greater than a preset second threshold, judging that the obtained recognition result is correct, and taking the identity recognition result and the motion recognition result as a retrieval result.

Optionally, if the recognition probability is not greater than the preset threshold, determining that the obtained recognition result is incorrect, and returning to the step of executing the video capturing of the pedestrian, including:

and when the identification probability is not greater than a preset first threshold or the motion recognition probability is not greater than a preset second threshold, judging that the obtained recognition result is incorrect, and returning to the step of acquiring the video of the pedestrian.

Optionally, the initial neural network model is an LSTM temporal recurrent neural network model.

Optionally, when there are a plurality of search results, the method further includes:

and according to the identification probability of each retrieval result, performing ascending/descending arrangement on each retrieval result.

In a second aspect, the present invention provides a portrait retrieval apparatus, the apparatus comprising:

the first obtaining module is used for obtaining a video to be detected; carrying out motion detection on the video to obtain a gait feature sequence of the pedestrian;

the second obtaining module is used for obtaining a gait sequence network model, inputting the gait feature sequence into the gait sequence network model and obtaining an identification result and an identification probability aiming at the pedestrian;

the judging module is used for judging whether the recognition probability is greater than a preset threshold value or not; if so, judging that the obtained identification result is correct, and taking the identification result as a retrieval result; and if not, judging that the obtained identification result is incorrect, and returning to execute to obtain the video to be detected.

Optionally, the motion detection of the pedestrian motion video by the first obtaining module to obtain a gait feature sequence of the pedestrian includes:

Optionally, the gait sequence network model is a target neural network model, and the second obtaining module obtains the gait sequence network model, specifically:

Optionally, the identification result includes an identity identification result and a motion identification result, the preset threshold includes a preset first threshold and a preset second threshold, and the identification probability includes an identity identification probability and a motion identification probability; the judging module judges whether the recognition probability is greater than a preset threshold value, if so, the obtained recognition result is judged to be correct, and the recognition result is taken as a retrieval result, specifically:

Optionally, when the recognition probability is not greater than the preset threshold, the determining module determines that the obtained recognition result is incorrect, and returns to execute to obtain the video to be detected, specifically:

and when the identity recognition probability is not greater than a preset first threshold or the motion recognition probability is not greater than a preset second threshold, judging that the obtained recognition result is incorrect, and returning to execute to obtain the video to be detected.

Optionally, the apparatus further comprises:

and the sorting module is used for performing ascending/descending sorting on each retrieval result according to the identification probability of each retrieval result when a plurality of retrieval results exist.

The invention has the following beneficial effects: by applying the embodiment of the invention, the video to be detected is obtained firstly; carrying out motion detection on the video to obtain a gait feature sequence of the pedestrian; further, a gait sequence network model is obtained, a gait feature sequence is input into the gait sequence network model, and a recognition result and recognition probability for the pedestrian are obtained; judging whether the recognition probability is greater than a preset threshold value or not; if so, judging that the obtained identification result is correct, and taking the identification result as a retrieval result; and if not, judging that the obtained identification result is incorrect, and returning to execute to obtain the video to be detected.

Therefore, by applying the embodiment of the invention, the gait feature sequence is input into the gait sequence network model to obtain the identification result and the identification probability for the pedestrian, compared with the existing manual identification mode, the retrieval efficiency is improved, and when the identification probability is greater than the preset threshold value, the identification result is taken as the retrieval result; and when the recognition probability is not greater than the preset threshold value, returning to the step of obtaining the video to be detected so as to retrieve the pedestrian again, so that the accuracy of the retrieval result is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a human image retrieval method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a portrait retrieval apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the portrait retrieval method provided by the present invention can be applied to electronic devices, wherein in a specific application, the electronic devices can be computers, personal computers, tablets, mobile phones, and the like, which is reasonable.

Referring to fig. 1, an embodiment of the present invention provides a portrait retrieval method, including the following steps:

s101, obtaining a video to be detected; processing the video to obtain a gait feature sequence of the pedestrian;

the video to be detected can be a video acquired by a video acquisition device in real time, can also be a video pre-stored in an execution main body (such as an electronic device) of the invention, and can also be a video provided by a third-party device. The video acquisition equipment can be a camera, a video recorder and the like, and the invention does not limit the specific model of the video acquisition equipment, for example, the video acquisition equipment can be a monocular camera or a binocular camera. The video acquisition equipment can be fixedly arranged at each touch position; and the device can also be installed on a moving object, such as an unmanned aerial vehicle and an automobile.

The video acquisition equipment can acquire data to obtain a video and can send the video to the electronic equipment, so that the electronic equipment can acquire the video acquired by the video acquisition equipment and can process the video to obtain a gait feature sequence of the pedestrian.

Specifically, the method for processing the video to obtain the gait feature sequence of the pedestrian may be:

detecting each image frame containing the pedestrian from the video by using a preset motion detection algorithm, and extracting gait features of the pedestrian from the image frames containing the pedestrian; and combining the extracted gait features to obtain a gait feature sequence of the pedestrian.

The video is composed of continuous image frames, can be regarded as a continuous image frame sequence, and each image frame containing the pedestrian can be detected from the video by utilizing a motion detection algorithm, and the gait feature of the pedestrian can be extracted. The specific motion detection algorithm may be one or a combination of a background image difference method, an inter-frame difference method, an optical flow method, and the like.

The gait characteristics are used for reflecting characteristics of the pedestrian during movement and can comprise characteristics of step length, stride length, step frequency and the like. Gait features can be extracted from image frames containing pedestrians, and the extracted gait features are combined to obtain a gait feature sequence. The gait characteristic sequence has uniqueness, and can uniquely distinguish the pedestrian and the motion state thereof. The states of motion of the pedestrian include loitering, jogging, normal walking, running, falling, and the like.

In addition, in order to eliminate noise in the video and improve the identification accuracy, before the video is subjected to motion detection to obtain a gait feature sequence of a pedestrian, the method may further include:

and carrying out filtering processing on the video.

Correspondingly, the step S101 of performing motion detection on the video to obtain a gait feature sequence of the pedestrian may be:

and carrying out motion detection on the filtered video to obtain a gait feature sequence of the pedestrian.

Because the video collected by the video collecting device may have noise interference, the video is filtered before the video is subjected to motion detection, so that unnecessary noise interference in the video can be reduced, and the definition of the video is improved.

The embodiment of the present invention does not limit the implementation manner of the filtering process, and for example, one or a combination of filtering algorithms such as median filtering, linear filtering, kalman filtering, and the like may be used to perform the filtering process on the video.

Therefore, by applying the embodiment of the invention, after the video is filtered, the noise in the video can be removed, and the filtered video is subjected to motion detection, which is beneficial to improving the accuracy of the gait feature sequence.

In still another implementation, to increase the speed of feature extraction, after detecting each image frame containing a pedestrian from a video, the method may further include:

performing image skeletonization processing on each image frame containing pedestrians to obtain a skeletonized image sequence;

correspondingly, the gait features of the pedestrian are extracted from the image frame containing the pedestrian, and the method specifically comprises the following steps:

and extracting gait features of the pedestrian from the skeletonized image sequence.

The image skeletonization processing can realize the refinement of the image frame, namely, some unimportant points are removed from the original image, so that the skeleton of the object in the image can be obtained. The removed points do not affect the overall shape of the object, and the skeleton can be understood as the central axis of the object, for example, a rectangular skeleton is the central axis in the long direction; the square skeleton is its central point; the skeleton of the circle is the center of the circle, the skeleton of the straight line is the skeleton itself, and the skeleton of the isolated point is the skeleton itself. The obtained image skeleton is equivalent to the main structure and shape information of the salient object, and the extraction of the gait features of the pedestrians can be realized according to the information.

By applying the embodiment of the invention, the redundant information is removed, so that the characteristic extraction speed is improved.

S102, acquiring a gait sequence network model, inputting the gait feature sequence into the gait sequence network model, and obtaining an identification result and an identification probability for the pedestrian;

the gait sequence network model can be one or a combination of a neural network model, a Support Vector Machine (SVM) model, a genetic network model and other Machine learning models. The gait sequence network model is a machine learning model which is trained to be convergent by using a training sample set in advance, so that after a gait feature sequence is obtained, a recognition result and recognition probability can be output. The recognition probability is used to evaluate the probability that the correct result was recognized.

Specifically, the gait sequence network model may be a target neural network model, and the acquiring of the gait sequence network model includes:

and training a preset initial neural network model by using the training sample set to obtain a target neural network model.

The training sample set may be a sample set used for training an initial neural network model, each sample includes a gait feature sequence and a corresponding recognition result, the gait feature sequence in the training sample set is derived from a video acquired by a video acquisition device, or a video pre-stored by an electronic device, or a video provided by another third-party device, and the recognition result in the training sample set may be labeled by an expert in advance, or may be learned by another machine learning model in advance.

All parameters in the initial neural network model are initial default parameters, after training, a target neural network model formed by the trained and mature model parameters can be obtained, and the model parameters determine the identification accuracy of the target neural network model. The initial neural network model is an LSTM (Long Short-Term Memory, time recursive neural network model), the LSTM is a modified recurrent neural network, Long-Term information can be remembered, the problem of Long-Term dependence is solved, and the method has a good learning effect on data with large information processing quantity.

Of course, in other implementations, the initial neural network model may also be a convolutional neural network model, a cyclic neural network model, or the like.

S103, judging whether the recognition probability is larger than a preset threshold value or not; if yes, executing S104; if not, executing S105;

s104, judging that the obtained identification result is correct, and taking the identification result as a retrieval result;

and S105, judging that the obtained identification result is incorrect, and returning to the step of obtaining the video to be detected.

The identification result may include an identity identification result and a motion identification result, and correspondingly, the preset threshold may include a preset first threshold and a preset second threshold, and the identification probability includes an identity identification probability and a motion identification probability; judging whether the recognition probability is greater than a preset threshold value, if so, judging that the obtained recognition result is correct, and taking the recognition result as a retrieval result, wherein the judgment result can be:

The preset first threshold and the preset second threshold may be preset according to requirements, and may be the same or different, for example, they may be 0.75 and 0.65, respectively, that is, when the identification probability is greater than 0.75 and the motion recognition probability is greater than 0.65, it is determined that the obtained recognition result is correct, and the identification result and the motion recognition result are used as the retrieval result.

Of course, in other implementation manners, when the identification probability is greater than the preset first threshold or the motion recognition probability is greater than the preset second threshold, it may be determined that the identification is successful or the motion recognition is correct, and the identification result or the motion recognition result is used as the search result.

For example, the preset first threshold and the preset second threshold are 0.75 and 0.65, respectively, when the identity recognition probability is greater than 0.75, the identity recognition is determined to be successful, and the identity recognition result is taken as the retrieval result; and when the motion recognition probability is larger than 0.65, judging that the motion recognition is correct, and taking the motion recognition result as a retrieval result.

Or, in another implementation manner, the purpose of the portrait recognition is only to recognize the identity of the pedestrian, the recognition result may only include the identity recognition result, and when the identity recognition probability is greater than a preset threshold, it is determined that the obtained recognition result is correct, and the identity recognition result is taken as the retrieval result.

Or, in another implementation manner, the purpose of the portrait recognition is only to recognize the motion of the pedestrian, and the identity of the pedestrian is not required to be concerned, the recognition result may only include the motion recognition result, and when the motion recognition probability is greater than a preset threshold, it is determined that the obtained recognition result is correct, and the identity recognition result is used as the search result.

The preset threshold may be set in advance, and may be 0.6, 0.65, 0.7, 0.75, and the like.

The identity recognition result can be used for uniquely identifying the identity of the pedestrian, and the specific content of the identity recognition result is not limited, for example, the identity recognition result can be a combination of information such as the name of the pedestrian, an identity card number, an address and the like, and can also only comprise the identity card number and the name. The motion recognition result may include one of loitering, jogging, normal walking, running, falling, and the like.

In addition, if the recognition probability is not greater than the preset threshold, it is determined that the obtained recognition result is incorrect, and the step of executing the video acquisition of the pedestrian is returned, and the steps may be:

and when the identification probability is not greater than a preset first threshold or the motion identification probability is not greater than a preset second threshold, judging that the obtained identification result is incorrect, and returning to the step of obtaining the video to be detected.

Or, in other implementation manners, when the identity recognition probability is not greater than the preset first threshold and the motion recognition probability is not greater than the preset second threshold, it may be determined that the obtained recognition result is incorrect, and the step of obtaining the video to be detected is returned to be executed.

For example, it may be 0.75 and 0.65, respectively, that is, when the identification probability is not more than 0.75 and the motion recognition probability is not more than 0.65, it is determined that the obtained recognition result is incorrect.

Therefore, by applying the embodiment of the invention, the gait feature sequence is input into the gait sequence network model to obtain the identification result and the identification probability for the pedestrian, compared with the existing manual identification mode, the retrieval efficiency is improved, and when the identification probability is greater than the preset threshold value, the identification result is taken as the retrieval result; and when the recognition probability is not greater than the preset threshold value, returning to execute to obtain the video to be detected so as to retrieve the pedestrian again, so that the accuracy of the retrieval result is improved.

In addition, in order to improve user experience, after the obtained recognition result is judged to be incorrect, prompt information can be given, wherein the prompt information is used for prompting the user that the obtained recognition result is incorrect, or the prompt information is also used for prompting the user whether to accept the recognition result, and if the user chooses to accept, the electronic equipment can take the recognition result as a retrieval result; if the user does not select to accept, the electronic device may return to perform the step of obtaining the video to be detected.

By applying the embodiment of the invention, the user can independently select whether to accept the identification result, so that the retrieval result is obtained according to the selection of the user, and the user experience is improved.

When the retrieval result is multiple, the method further comprises the following steps:

When the video comprises a plurality of pedestrians, aiming at each pedestrian, an identification result and an identification probability can be obtained, each identification result with the identification probability larger than a preset threshold value can be used as a retrieval result, so that the retrieval results are multiple, and the identification probability of the retrieval result is that: the recognition probability of the recognition result as the search result.

Or, when only one pedestrian is included in the video, the pedestrian may be in different motion states under different conditions, so that the number of the recognition results may be multiple, and each recognition result with the recognition probability greater than the preset threshold may be used as the search result, so that the number of the search results is multiple.

By applying the embodiment of the invention, the retrieval results can be arranged in an ascending/descending order, thereby facilitating subsequent checking and analysis.

Corresponding to the above method embodiment, the embodiment of the present invention further provides a portrait retrieval apparatus.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a portrait retrieval apparatus according to an embodiment of the present invention, the apparatus includes:

a first obtaining module 201, configured to obtain a video to be detected; carrying out motion detection on the video to obtain a gait feature sequence of the pedestrian;

a second obtaining module 202, configured to obtain a gait sequence network model, input the gait feature sequence into the gait sequence network model, and obtain an identification result and an identification probability for the pedestrian;

a judging module 203, configured to judge whether the recognition probability is greater than a preset threshold; if so, judging that the obtained identification result is correct, and taking the identification result as a retrieval result; and if not, judging that the obtained identification result is incorrect, and returning to execute to obtain the video to be detected.

Optionally, the apparatus further comprises:

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for retrieving a portrait, the method comprising:

if not, judging that the obtained identification result is incorrect, and returning to the step of obtaining the video to be detected;

the identification result comprises an identity identification result and a motion identification result, the preset threshold comprises a preset first threshold and a preset second threshold, and the identification probability comprises an identity identification probability and a motion identification probability; judging whether the recognition probability is greater than a preset threshold value, if so, judging that the obtained recognition result is correct, and taking the recognition result as a retrieval result, wherein the steps of:

2. The method according to claim 1, wherein the motion detection of the pedestrian motion video to obtain a gait feature sequence of a pedestrian comprises:

3. The method of claim 1, wherein the gait sequence network model is a target neural network model, and obtaining the gait sequence network model comprises:

4. The method according to claim 1, wherein if the recognition probability is not greater than the preset threshold, determining that the obtained recognition result is incorrect, and returning to the step of collecting the video of the pedestrian, the method comprises:

5. The method of claim 3, wherein the initial neural network model is an LSTM temporal recurrent neural network model.

6. The method according to claim 1, wherein when there are a plurality of search results, the method further comprises:

7. A portrait retrieval apparatus, characterized in that the apparatus comprises:

the judging module is used for judging whether the recognition probability is greater than a preset threshold value or not; if so, judging that the obtained identification result is correct, and taking the identification result as a retrieval result; if not, judging that the obtained identification result is incorrect, and returning to execute to obtain the video to be detected;

8. The apparatus according to claim 7, wherein the first obtaining module performs motion detection on the pedestrian motion video to obtain a gait feature sequence of a pedestrian, specifically:

9. The apparatus according to claim 7, wherein the gait sequence network model is a target neural network model, and the second obtaining module obtains the gait sequence network model by: