CN112699846A

CN112699846A - Specific character and specific behavior combined retrieval method and device with identity consistency check function

Info

Publication number: CN112699846A
Application number: CN202110051588.3A
Authority: CN
Inventors: 梁超; 杨晶垚; 牛艳蕊; 王中元
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-01-12
Filing date: 2021-01-12
Publication date: 2021-04-23
Anticipated expiration: 2041-01-12
Also published as: CN112699846B

Abstract

The invention relates to a specific character and specific behavior combined retrieval method and a device with identity consistency check, wherein a key frame retrieval database is obtained by carrying out shot segmentation and key frame extraction on an original video sequence; carrying out specific character instance retrieval; searching a specific behavior example; identity consistency check of the specific character and specific behavior retrieval is carried out, and the check score is used as a confidence score for judging the specific character to carry out the specific behavior; carrying out specific character and specific behavior retrieval result fusion; and converging the key frames to the shot retrieval scores, and sequencing the shots according to the retrieval scores to obtain a combined retrieval result of the specific character and the specific behavior with identity consistency check. The identity consistency checking method provided by the invention effectively solves the problem of inconsistent retrieval of the specific character and the specific behavior by judging the relevance of the character and the behavior retrieval result, so that the accuracy of the combined retrieval of the specific character and the specific behavior video example is effectively improved.

Description

Specific character and specific behavior combined retrieval method and device with identity consistency check function

Technical Field

The invention relates to the field of video retrieval, belongs to a video instance retrieval scheme combining a specific character and a specific behavior, and particularly belongs to a specific character and specific behavior video instance combined retrieval method and device with identity consistency check.

Background

Example retrieval problems are an important and hot research problem in real life, such as retrieving specific people, objects, scenes, actions, etc. The video instance retrieval refers to that a group of query examples are given, videos containing the query examples are retrieved from a massive video database, and a ranked list of retrieval results is returned according to the similarity degree of the videos and the query examples. The joint search of the specific character and the specific behavior belongs to a special case of video instance search, and the purpose of the joint search is to search videos of all specific characters performing the specific behavior. The technology plays an important role in various fields such as security protection, content analysis and audit and the like, particularly in the field of security protection, when a public security organization conducts suspicious character reconnaissance in a video monitoring scene, the video of suspicious behaviors conducted by suspicious characters needs to be focused on, the joint retrieval technology of specific characters and specific behaviors realizes that criminals are quickly searched and located from massive monitoring videos, the utilization efficiency of a security protection system can be effectively improved, and the technology has great significance in improving the emergency handling capacity of the public security organization, accelerating case reconnaissance and recourse, protecting life and property safety of people and the like.

The following challenges are mainly faced in the current joint retrieval of people and behaviors in videos: due to the fact that the video data volume is large, video noise is strong due to various factors such as ambient light, human face angles, human body multi-pose and shielding, great difficulty is brought to retrieval of characters or behaviors, and retrieval of the characters and the behaviors is not always effective at the same time. At present, the existing mainstream retrieval method for combining specific characters and specific behaviors mainly adopts a method of retrieving specific characters and specific behaviors respectively and then fusing the retrieval results of the specific characters and the specific behaviors by a certain fusion strategy to obtain a combined retrieval result. However, this method also has a problem in itself in that since the retrieval of the specific character and the specific behavior is independent from each other, it cannot guarantee the correlation between the specific character and the specific behavior. As shown in fig. 1, in a picture in which a specific person and a specific motion appear simultaneously, although it is determined that the specific person and the motion appear in the image, respectively, there is no guarantee that the current motion is performed by the specific person. This problem will directly cause a search error, so that the search accuracy is degraded.

Chinese patent document No. CN103714181A, published (announced) No. 2014.04.09, discloses a hierarchical specific character retrieval method, which firstly calculates the feature similarity of global color histograms of a query object and a calculation object as a coarse retrieval result, then extracts local comprehensive significant features after segmenting a superpixel graph, and purifies the retrieval result by taking the sum of nearest neighbors of the local significant feature sets as the measurement of the sum of query examples and the nearest neighbors of the coarse detection result, thereby improving the robustness of character feature extraction. Compared with the joint retrieval of the specific character and the specific behavior, the method only researches the problem of feature extraction of the specific character retrieval in the surveillance video, does not relate to the joint of the specific character and the specific behavior, and is different from the research angle of a joint retrieval method of the specific character and the specific behavior with identity consistency check.

Chinese patent document No. CN107315795A, published (announced) No. 2017.11.03, discloses a method and system for retrieving video instances in conjunction with specific characters and scenes, the method for retrieving video instances in conjunction with specific characters and scenes of the present invention first performs specific character retrieval and specific scene retrieval respectively, proposes strategies based on high score preservation and based on neighbor expansion to process the retrieval results in advance, and then fuses the retrieval results to obtain the video instance retrieval results in conjunction with specific characters and scenes, thereby improving the reliability and expandability of the ranking results. Compared with the joint retrieval of the specific character and the specific behavior, the method is used for researching the joint retrieval problem of the specific character and the specific scene, the research problem is different from that of the patent, meanwhile, the technology provided by the method aims to respectively carry out sequencing optimization on two retrieval results of the specific character and the specific scene, the relevance of the two retrieval results is not guaranteed, and the research angle is different from that of the joint retrieval method of the specific character and the specific behavior with identity consistency check.

Chinese patent document No. CN110516112A, published (announced) No. 2019.11.29, discloses a human body motion retrieval method and apparatus based on a hierarchical model. The human body action retrieval method based on the hierarchical model reserves the main geometric characteristics of the action data by using the coding based on the hierarchical model, converts the retrieval of complex action data into the retrieval of simple digital coding, and improves the retrieval efficiency. Compared with the joint retrieval of the specific character and the specific behavior, the method only researches the retrieval problem of the specific behavior in the video, does not relate to the joint of the specific character and the specific behavior, and has a different research angle from a joint retrieval method of the specific character and the specific behavior with identity consistency check.

Chinese patent document No. CN110674350A, published (announced) No. 2020.01.10, discloses a video character retrieval method, medium, apparatus, and computing device. According to the video character retrieval method based on multi-modal fusion, different modal characteristics in the retrieved video are extracted and fused, the target characters in the video are classified, and the robustness of characteristic information is improved. Compared with the joint retrieval of the specific character and the specific behavior, the method only researches the characteristic extraction problem of the specific character retrieval in the video, does not relate to the joint of the specific character and the specific behavior, and has a different research angle from a joint retrieval method of the specific character and the specific behavior with identity consistency check.

Disclosure of Invention

The invention aims to provide a method and a device for ensuring the association of the joint retrieval of a specific character and a specific behavior aiming at the defects in the prior art, so as to effectively improve the accuracy of the joint retrieval of the specific character and the specific behavior.

Based on the above purpose, the invention provides a specific character and specific behavior combined retrieval method with identity consistency check, which is different from the traditional behavior retrieval method adopting picture-level behavior identification, the invention adopts an example-level character interaction behavior detection method to perform behavior retrieval, and obtains a character frame corresponding to a specific behavior, so as to realize identity consistency check between the character frame in behavior retrieval and a face frame in character retrieval, the flow of the method is shown in fig. 2, the data processing flow of the method is shown in fig. 3, and the specific flow comprises the following steps:

step 1, performing shot segmentation on an original video sequence to obtain a shot retrieval database, and then performing key frame extraction on the shot to obtain a preprocessed key frame retrieval database;

step 2, searching the specific character example of the key frame level, which specifically comprises the following substeps:

step 2.1, detecting a retrieval database face frame and a face frame of an object to be inquired;

step 2.2, extracting the face features of the retrieval database and the face features of the object to be inquired;

step 2.3, calculating similarity scores of the face features of the retrieval database and the face features of the object to be queried;

step 2.4, saving a specific character instance retrieval result file;

step 3, searching the specific behavior example of the key frame level, specifically comprising the following substeps:

step 3.1, detecting character interaction behaviors of the retrieval database;

step 3.2, saving a retrieval result file of the specific behavior instance;

step 4, identity consistency check is carried out on the specific character and the specific behavior, and a consistency check score is used as a confidence score for judging whether the specific behavior is carried out on the specific character or not;

step 5, fusing the specific character and the specific behavior retrieval result of the key frame level, wherein the fused score of the specific character and the specific behavior retrieval is calculated according to the specific character retrieval score obtained in the step 2, the specific behavior retrieval score obtained in the step 3 and the consistency check score of the specific character and the specific behavior retrieval result obtained in the step 4;

and 6, converging the key frame retrieval scores to shot retrieval scores, and sequencing the retrieval shots according to the shot retrieval scores to obtain a combined retrieval result of the specific character and the specific behavior with identity consistency check.

Moreover, the method for saving the specific character instance search result file and the specific behavior instance search result file comprises the following steps:

the original video retrieval library is provided with L shots, the shot with the L belonging to [1, L ] can be divided into K key frames, for the K belonging to [1, K ] key frames, the number of retrieval results of specific character examples is m, and the number of retrieval results of specific behavior examples is n (for the convenience of discussion, under the condition that confusion is not caused, the corner mark symbols K and L in all variables are hidden).

And (3) sorting and storing the retrieval results of the ith element [1, m ] specific character examples into the following six-tuple form:

wherein f is_iIndicates character category, fs_iRepresents a category score, < fx_i,fy_i,fw_i,fh_iThe position information of the face frame is represented;

similarly, the result is searched for the j ∈ [1, n ] specific behavior instance, and is sorted and stored into the following six-tuple form:

wherein, a_iIndicates the category of behavior, as_iRepresents a category score, < ax_i,ay_i,aw_i,ah_iThe position information of the character frame is represented.

Moreover, the identity consistency check for the specific person and the specific behavior is performed in the following manner:

for the L ∈ [1, L ]]The K ∈ [1, K ] of lens]A key frame for searching the face frame position information in the result file according to the specific character and the specific behavior example

With character frame position information

Calculating identity consistency score matrix C epsilon R by pairwise matching principle^m×nWherein C ═ C_ij]，c_ijThe calculation method of (2) is, but not limited to, the following method:

(1) calculating the overlapping degree of the human face frame and the human figure frame;

(2) calculating the overlapping degree of the human face frame and the upper half part of the character frame;

(3) predicting the face position in the character frame based on the character skeleton model, and calculating the region overlapping degree of the face frame and the predicted face frame;

(4) and predicting the face position in the character frame based on a human body segmentation technology, and calculating the region overlapping degree of the face frame and the predicted face frame.

Moreover, the specific character and the specific behavior retrieval result at the key frame level are fused, and the implementation mode is as follows:

for the L ∈ [1, L ]]The K ∈ [1, K ] of lens]A key frame, searching the fractional vector fs according to the specific character obtained in step 2_i,i∈[1,m]The retrieval score vector as of the specific behavior obtained in step 3_j,j∈[1,n]And step 4, the consistency score matrix C of the specific character obtained in the step 4 for performing the specific behavior is [ C ═ C_ij]Calculating a fusion score matrix S ∈ R^m×nWherein S ═ S_ij]，s_ijThere are but not limited to the following ways:

s_ij＝c_ij×(α·fs_i+β·as_j)

wherein α and β are fusion coefficients respectively assigned to the specific character retrieval score and the specific behavior retrieval score at the time of fusion.

Moreover, the gathering of the key frame retrieval score to the shot retrieval score is performed in the following manner:

for the L ∈ [1, L ]]Shot, according to step 5, whose K ∈ [1, K ]]The retrieval score of each key frame is a fusion score matrix S belonging to R^m×nIn order to converge a plurality of frame-level retrieval results of different numbers of faces and behaviors to obtain a shot-level retrieval result, firstly preprocessing a frame-level fusion score matrix: setting the total face number of a retrieval database as M and the total behavior number as N, setting undetected character or behavior retrieval scores in a frame as zero, and setting a frame-level fusion score matrix S belonging to R^m×nExpansion to Sec R^M×NAnd is recorded as

Then converging the expanded K frame-level fusion score matrixes to form a lens-level fusion score matrix S^l∈R^M×NWherein

The convergence method includes, but is not limited to, the following methods:

(1) and (3) carrying out a specific behavior j aiming at a specific character i, taking the maximum value of the scores of all key frames as the retrieval score of the shot, wherein the formula is as follows:

(2) and (3) carrying out a specific behavior j aiming at a specific character i, taking the average value of the scores of all key frames as the retrieval score of the shot, wherein the formula is as follows:

based on the same inventive concept, the invention also designs a specific character and specific behavior combined retrieval device with identity consistency check, which is characterized by comprising the following steps:

the data preprocessing module is used for carrying out shot segmentation on the original video sequence to obtain a shot retrieval database, and then carrying out key frame extraction on the shot to obtain a preprocessed key frame retrieval database;

the specific character instance retrieval module is used for retrieving the specific character instance at the key frame level and comprises the following sub-steps:

detecting a retrieval database face frame and a face frame of an object to be inquired;

extracting face features of a retrieval database and face features of an object to be inquired;

calculating similarity scores of the face features of the retrieval database and the face features of the object to be queried;

saving a specific character instance retrieval result file;

the specific behavior instance retrieval module is used for retrieving the specific behavior instance of the key frame level and comprises the following sub-steps:

detecting the character interaction behavior of the retrieval database;

saving a specific behavior instance retrieval result file;

the identity consistency verification module is used for verifying identity consistency of the specific character and the specific behavior and taking a consistency verification score as a confidence score for judging whether the specific behavior is performed on the specific character or not; performing specific figure and specific behavior retrieval result fusion of a key frame level, wherein the fusion score of the specific figure and the specific behavior retrieval is calculated according to the obtained specific figure retrieval score, the specific behavior retrieval score and the consistency check score of the specific figure and the specific behavior retrieval result; and converging the key frame retrieval scores to shot retrieval scores, and sequencing the retrieval shots according to the shot retrieval scores to obtain a combined retrieval result of the specific character and the specific behavior with identity consistency check.

Based on the same inventive concept, the invention also designs a computer readable medium, on which a computer program is stored, wherein the program is executed to implement the above method for jointly searching the specific character and the specific behavior with identity consistency check.

Based on the same inventive concept, the invention also designs computer equipment which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, and is characterized in that the processor executes the program to realize the joint retrieval method of the specific character and the specific behavior with identity consistency check.

Compared with the existing joint retrieval technology of the specific character and the specific behavior video instance, the method mainly has the following advantages:

(1) compared with the prior art, the method has the advantages that instead of adopting a picture-level behavior recognition method for behavior retrieval, the method adopts an example-level character interaction behavior detection method for behavior retrieval to obtain the character frame corresponding to the specific behavior, so that identity consistency check between the character frame in the behavior retrieval and the face frame in the character retrieval is possible;

(2) compared with the prior art, the method has the advantages that the method replaces the direct fusion of the specific character and the specific behavior retrieval result, the identity consistency check method is provided for judging the relevance between the specific character and the specific behavior retrieval result, the problem of inconsistency between the character and the behavior retrieval identity is effectively solved, and the combined retrieval accuracy of the specific character and the specific behavior video instance is effectively improved.

Drawings

FIG. 1 is a diagram illustrating the problem under investigation according to the present invention.

FIG. 2 is a flow chart of the method of the present invention.

FIG. 3 is a data processing flow diagram of the method of the present invention.

FIG. 4 is a flow chart of an embodiment of the present invention.

Fig. 5 is a file structure of a search result file of an example of a specific character of the present invention.

FIG. 6 is a file structure of a search result file according to an exemplary embodiment of the present invention.

Fig. 7 is a schematic diagram of an identity consistency verification method according to an embodiment of the present invention.

Detailed Description

For the purpose of promoting a better understanding of the invention and for enabling those skilled in the art to practice the same, reference will now be made in detail to the embodiments of the invention as illustrated in the accompanying drawings.

The invention provides a specific character and specific behavior combined retrieval method with identity consistency check, which aims at solving the problem that the relevance between a specific character and a specific behavior cannot be ensured by adopting the conventional method of respectively retrieving the specific character and the specific behavior and then fusing the specific character and the specific behavior in the task of combined retrieval of the specific character and the specific behavior, and comprises the steps of firstly retrieving the specific character to obtain a face frame corresponding to the specific character, then, the example-level character interaction behavior detection method is adopted to carry out the behavior retrieval to obtain the character frame corresponding to the specific behavior, then the identity consistency check is carried out on the positions of the character frame in the behavior retrieval and the face frame in the character retrieval, finally the joint retrieval score is calculated according to the character retrieval score, the behavior retrieval score and the consistency check score, and obtaining a combined retrieval result of the specific character and the specific behavior video instance according to the combined retrieval score. In specific implementation, referring to fig. 4, the process includes the following steps:

step 1, performing shot segmentation on an original video sequence to obtain a shot retrieval database, then performing key frame extraction on the shot to obtain a preprocessed key frame retrieval database, wherein the quantity mapping relation between the original shot database and the key frame retrieval database is as follows: each shot can be divided into a plurality of key frames, the number of the key frames is determined according to the length of the shot, an original video search library is provided with L shots, and the shots can be divided into K key frames aiming at the L ∈ [1, L ];

and 2.1, detecting a face frame of the retrieval database and a face frame of an object to be inquired, wherein the face frame can be detected by adopting a currently commonly applied multi-task convolutional neural network face detection Model (MTCNN), and the model carries out coarse-to-fine processing on a face detection task through a three-order cascaded convolutional neural network to obtain the filtered face frame and the position of a face key point. Firstly, in order to adapt to the face detection of different sizes, the images are transformed in different scales to construct an image pyramid; secondly, performing primary feature extraction and face frame calibration on the image pyramid through 3 convolution layers and 1 face classifier, and performing face frame filtering through frame regression and a face key point positioner to form a primary face region; then, further screening the preliminary face area by adding a network of a full connection layer, and outputting a more credible face area; and finally, inputting the obtained face region into a network added with a convolution layer to identify the face region, and obtaining the positions of a face frame and a key point through frame regression and feature positioning. The MTCNN model, pre-trained on a large scale face detection data set WIDER FACE with high variability in scale, pose, occlusion, and illumination, is used for face detection with greater robustness. In the embodiment, the face candidate region with a larger error probability and a height smaller than 60 pixels is filtered, and the filtered face is subjected to similarity transformation to obtain an aligned face. The detection model returns the face frame position in each image, and a plurality of face frame positions can be detected from one image containing a plurality of faces.

And 2.2, extracting the face features of the retrieval database and the face features of the object to be inquired, carrying out face recognition on a face frame detected in each image to extract the features, wherein the face recognition can adopt a depth face recognition model (ArcFace) with better current performance and based on self-adaptive angle edge loss, and the model improves the intra-class compactness and the inter-class difference while improving the feature vector normalization and the additive angle interval. In an embodiment, a model pre-trained on a face recognition data set MS1MV2 is selected, using a ResNet100 backbone network, leaving only the feature embedding layer without the full connection layer, extracting 512-dimensional features for each normalized face image and its transversely flipped face image, respectively, and connecting them to generate 1024-dimensional face features to represent a face.

And 2.3, calculating similarity scores of the face features of the retrieval database and the face features of the objects to be queried, and sequencing to obtain a specific character retrieval list. Based on the extracted human face features, calculating feature similarity by adopting a similarity measurement method based on Cosine distance, and then sorting a similarity score retrieval list.

Step 2.4, based on the processing process, obtaining a plurality of character retrieval results of each key frame, setting the kth E [1, K ] key frame aiming at the L E [1, L ] lens, wherein the number of the retrieval results of the specific character examples is m, and the retrieval results of the ith E [1, m ] specific character examples are sorted and stored into the following six-tuple form:

wherein f is_iIndicates character category, fs_iRepresents a category score, < fx_i,fy_i,fw_i,fh_iThe position information of the face frame is stored as a specific character instance retrieval result file, and the file structure is shown in figure 5.

and 3.1, detecting character interaction behaviors of each picture in the key frame retrieval database, wherein the behavior detection model can adopt a real-time character interaction detection model (PPDM) with better current performance based on parallel point detection and matching. The model is a single-stage detection framework based on an anchor-frame-free detection idea, and mainly comprises point detection and point matching two branches: firstly, the point detection branch predicts a character central point, an object central point and an interaction central point through a heat map prediction network, and predicts the sizes of a character frame and an object frame through a regression method; and secondly, the point matching branch matches the detected character points, object points and interaction points by predicting the offset from the interaction center point to the character center point and the object center point and combining a matching algorithm to complete character interaction triple detection. In the embodiment, a model pre-trained on a common human interactive behavior detection data set HICO-DET is selected, a universal heat map prediction network DLA-34 is used as a feature extractor, and classification of specific behaviors and human detection frames corresponding to the specific behaviors are obtained.

Step 3.2, based on the processing process, obtaining a plurality of behavior retrieval results of each key frame, setting the kth e [1, K ] key frame aiming at the L e [1, L ] lens, wherein the number of the retrieval results of the specific behavior example is n, and the retrieval results of the jth e [1, n ] specific behavior example are sorted and stored into the following six-tuple form:

wherein, a_jIndicates the category of behavior, as_jRepresents a category score, < ax_j,ay_j,aw_j,ah_jThe position information of the character frame is stored as a specific behavior instance retrieval result file, and the file structure is shown in figure 6.

And 4, carrying out identity consistency check for searching the specific character and the specific behavior, and taking a consistency check score as a confidence score for judging whether the specific character carries out the specific behavior. For the L ∈ [1, L ]]The K ∈ [1, K ] of lens]A key frame for searching the face frame position information in the result file according to the specific character and the specific behavior example

With character frame position information

Calculating a consistency score matrix C epsilon R by a pairwise matching principle^m×nWherein C ═ C_ij]The calculation schematic diagram is shown in fig. 7, and specifically includes the following sub-steps:

step 4.1, predicting the position of the face in the character frame by using the human body posture estimation model

The human body posture model can adopt a human body posture estimation model (HRNet) with better current performance and based on deep high-resolution expression learning, the model gradually increases a low-resolution feature map sub-network in a high-resolution feature map main network, the low-resolution feature map sub-network is connected into a parallel multi-resolution sub-network, and the extraction from high resolution to low resolution features and the multi-scale sub-network are adoptedAnd the degree fusion obtains abundant high-resolution characteristics, and is used for carrying out heat map prediction on key points of human body postures. In an embodiment, a HRNet-W32 network model pre-trained on the COCO data set is selected to predict the position of the face box within the character box for a particular behavior.

Step 4.2, calculating the intersection ratio of the face frame in the specific character retrieval and the predicted face frame in the specific behavior retrieval, taking the intersection ratio as the confidence score of the specific behavior of the specific character, and making the ith E [1, m]Personal face frame position < fx_i,fy_i,fw_i,fh_iIs > ROI_iThe j ∈ [1, n ]]Individual character frame predicted face position

Is ROI_jThe intersection-to-parallel ratio formula is as follows:

step 5, fusing the search results of the specific character and the specific behavior, aiming at the L-th element [1, L ]]The K ∈ [1, K ] of lens]A key frame, searching the fractional vector fs according to the specific character obtained in step 2_i,i∈[1,m]Step 3, the retrieval score vector as of the specific behavior_j,j∈[1,n]And step 4, obtaining a consistency check fraction matrix C ═ C of the specific character and the specific behavior retrieval result_ij]Respectively distributing alpha and beta as fusion coefficients for the retrieval scores of the specific character and the retrieval scores of the specific behavior during fusion, and calculating a fusion score matrix S belonging to R for the retrieval of the specific character and the specific behavior^m×nWherein S ═ S_ij]，s_ijThe calculation formula is as follows:

s_ij＝c_ij×(α·fs_i+β·as_j)

step 6, gathering the key frame retrieval scores to shot retrieval scores, aiming at the L ∈ [1, L ]]Shot, according to step 5, whose K ∈ [1, K ]]The retrieval score of each key frame is a fusion score matrix S belonging to R^m×nIn order to aggregate a plurality of frame-level retrieval results for retrieving different numbers of faces and behaviorsObtaining a shot level retrieval result, firstly preprocessing a frame level fusion score matrix: setting the total number of people in the retrieval database as M and the total number of behaviors as N, and expanding the frame-level fusion score matrix to S e.g. R by setting the undetected character or behavior retrieval score in the frame as zero^M×NAnd is recorded as

Then, specific behavior j is carried out on a specific character i, the maximum score value of all key frames is taken as the retrieval score of the shot, and the expanded K frame-level fusion score matrixes are converged into a shot-level fusion score matrix S^l∈R^M×NWherein

The calculation formula is as follows:

and finally, sequencing the retrieval shots according to the shot retrieval scores to obtain a combined retrieval result of the specific characters and the specific behaviors with identity consistency check.

saving a specific character instance retrieval result file;

detecting the character interaction behavior of the retrieval database;

saving a specific behavior instance retrieval result file;

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description is for illustrative purposes only and is not intended to limit the scope of the present disclosure, which is to be construed as limiting the present disclosure.

Claims

1. A specific character and specific behavior combined retrieval method with identity consistency check is characterized by comprising the following steps:

step 2, searching the specific character example of the key frame level, comprising the following substeps:

step 2.4, saving a specific character instance retrieval result file;

step 3, searching the specific behavior example of the key frame level, comprising the following substeps:

step 3.1, detecting character interaction behaviors of the retrieval database;

step 3.2, saving a retrieval result file of the specific behavior instance;

2. The method for jointly searching the specific character and the specific behavior with the identity consistency check as claimed in claim 1, wherein the step of saving the specific character instance search result file comprises the following steps:

the original video retrieval library is provided with L shots, the shot belonging to the L < epsilon > 1 and L < epsilon > can be divided into K key frames, the number of retrieval results of specific character examples of the K < epsilon > 1 and K < epsilon > key frames is m, and the retrieval results of specific character examples belonging to the ith < epsilon > 1 and m < epsilon > are sorted and stored into a six-tuple form as follows:

wherein f is_iIndicates character category, fs_iRepresents a category score, < fx_i,fy_i,fw_i,fh_iThe > indicates the position information of the face frame.

3. The method for jointly searching the specific character and the specific behavior with identity consistency check according to claim 1, wherein the step of saving the specific behavior instance search result file comprises the following steps:

aiming at the kth e [1, K ] key frame of the L e [1, L ] lens, setting the number of the retrieval results of the specific behavior examples as n, and sorting and storing the retrieval results of the jth e [1, n ] specific behavior examples into a six-tuple form as follows:

wherein, a_iIndicates the category of behavior, as_iRepresents a category score, < ax_i,ay_i,aw_i,ah_iFrame for personThe location information of (1).

4. The method for jointly searching the specific character and the specific behavior with the identity consistency check according to the claims 2 and 3, characterized in that the identity consistency check for the specific character and the specific behavior is performed as follows:

With character frame position information

1) calculating the overlapping degree of the human face frame and the human figure frame;

2) calculating the overlapping degree of the human face frame and the upper half part of the character frame;

3) predicting the face position in the character frame based on the character skeleton model, and calculating the region overlapping degree of the face frame and the predicted face frame;

4) and predicting the face position in the character frame based on a human body segmentation technology, and calculating the region overlapping degree of the face frame and the predicted face frame.

5. The method for jointly retrieving the specific character and the specific behavior with the identity consistency check according to claim 4, wherein the fusion of the specific character and the specific behavior retrieval result at the key frame level is implemented as follows:

for the L ∈ [1, L ]]The K ∈ [1, K ] of lens]A key frame, searching the fractional vector fs according to the specific character obtained in step 2_i,i∈[1,m]The retrieval score vector as of the specific behavior obtained in step 3_j,j∈[1,n]Step (b)The consistency score matrix C ═ C of the specific character obtained in the step 4 for performing the specific behavior_ij]Calculating a fusion score matrix S ∈ R^m×nWherein S ═ S_ij]，s_ijThere are but not limited to the following ways:

s_ij＝c_ij×(α·fs_i+β·as_j)

6. The method of claim 5, wherein the aggregation of the key frame retrieval score to the shot retrieval score is performed by:

Then converging the expanded K frame-level fusion score matrixes to a shot-level fusion score matrix S^l∈R^M×NWherein

The convergence method includes, but is not limited to, the following methods:

1) and (3) carrying out a specific behavior j aiming at a specific character i, taking the maximum value of the scores of all key frames as the retrieval score of the shot, wherein the formula is as follows:

2) and (3) carrying out a specific behavior j aiming at a specific character i, taking the average value of the scores of all key frames as the retrieval score of the shot, wherein the formula is as follows:

7. a specific character and specific behavior combined retrieval device with identity consistency check is characterized by comprising:

saving a specific character instance retrieval result file;

detecting the character interaction behavior of the retrieval database;

saving a specific behavior instance retrieval result file;

8. A computer-readable medium, on which a computer program is stored, characterized in that the program, when executed, implements the method according to any one of claims 1 to 6.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 6 when executing the program.