CN112699846A - Specific character and specific behavior combined retrieval method and device with identity consistency check function - Google Patents

Specific character and specific behavior combined retrieval method and device with identity consistency check function Download PDF

Info

Publication number
CN112699846A
CN112699846A CN202110051588.3A CN202110051588A CN112699846A CN 112699846 A CN112699846 A CN 112699846A CN 202110051588 A CN202110051588 A CN 202110051588A CN 112699846 A CN112699846 A CN 112699846A
Authority
CN
China
Prior art keywords
retrieval
specific
character
behavior
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110051588.3A
Other languages
Chinese (zh)
Other versions
CN112699846B (en
Inventor
梁超
杨晶垚
牛艳蕊
王中元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110051588.3A priority Critical patent/CN112699846B/en
Publication of CN112699846A publication Critical patent/CN112699846A/en
Application granted granted Critical
Publication of CN112699846B publication Critical patent/CN112699846B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Abstract

The invention relates to a specific character and specific behavior combined retrieval method and a device with identity consistency check, wherein a key frame retrieval database is obtained by carrying out shot segmentation and key frame extraction on an original video sequence; carrying out specific character instance retrieval; searching a specific behavior example; identity consistency check of the specific character and specific behavior retrieval is carried out, and the check score is used as a confidence score for judging the specific character to carry out the specific behavior; carrying out specific character and specific behavior retrieval result fusion; and converging the key frames to the shot retrieval scores, and sequencing the shots according to the retrieval scores to obtain a combined retrieval result of the specific character and the specific behavior with identity consistency check. The identity consistency checking method provided by the invention effectively solves the problem of inconsistent retrieval of the specific character and the specific behavior by judging the relevance of the character and the behavior retrieval result, so that the accuracy of the combined retrieval of the specific character and the specific behavior video example is effectively improved.

Description

Specific character and specific behavior combined retrieval method and device with identity consistency check function
Technical Field
The invention relates to the field of video retrieval, belongs to a video instance retrieval scheme combining a specific character and a specific behavior, and particularly belongs to a specific character and specific behavior video instance combined retrieval method and device with identity consistency check.
Background
Example retrieval problems are an important and hot research problem in real life, such as retrieving specific people, objects, scenes, actions, etc. The video instance retrieval refers to that a group of query examples are given, videos containing the query examples are retrieved from a massive video database, and a ranked list of retrieval results is returned according to the similarity degree of the videos and the query examples. The joint search of the specific character and the specific behavior belongs to a special case of video instance search, and the purpose of the joint search is to search videos of all specific characters performing the specific behavior. The technology plays an important role in various fields such as security protection, content analysis and audit and the like, particularly in the field of security protection, when a public security organization conducts suspicious character reconnaissance in a video monitoring scene, the video of suspicious behaviors conducted by suspicious characters needs to be focused on, the joint retrieval technology of specific characters and specific behaviors realizes that criminals are quickly searched and located from massive monitoring videos, the utilization efficiency of a security protection system can be effectively improved, and the technology has great significance in improving the emergency handling capacity of the public security organization, accelerating case reconnaissance and recourse, protecting life and property safety of people and the like.
The following challenges are mainly faced in the current joint retrieval of people and behaviors in videos: due to the fact that the video data volume is large, video noise is strong due to various factors such as ambient light, human face angles, human body multi-pose and shielding, great difficulty is brought to retrieval of characters or behaviors, and retrieval of the characters and the behaviors is not always effective at the same time. At present, the existing mainstream retrieval method for combining specific characters and specific behaviors mainly adopts a method of retrieving specific characters and specific behaviors respectively and then fusing the retrieval results of the specific characters and the specific behaviors by a certain fusion strategy to obtain a combined retrieval result. However, this method also has a problem in itself in that since the retrieval of the specific character and the specific behavior is independent from each other, it cannot guarantee the correlation between the specific character and the specific behavior. As shown in fig. 1, in a picture in which a specific person and a specific motion appear simultaneously, although it is determined that the specific person and the motion appear in the image, respectively, there is no guarantee that the current motion is performed by the specific person. This problem will directly cause a search error, so that the search accuracy is degraded.
Chinese patent document No. CN103714181A, published (announced) No. 2014.04.09, discloses a hierarchical specific character retrieval method, which firstly calculates the feature similarity of global color histograms of a query object and a calculation object as a coarse retrieval result, then extracts local comprehensive significant features after segmenting a superpixel graph, and purifies the retrieval result by taking the sum of nearest neighbors of the local significant feature sets as the measurement of the sum of query examples and the nearest neighbors of the coarse detection result, thereby improving the robustness of character feature extraction. Compared with the joint retrieval of the specific character and the specific behavior, the method only researches the problem of feature extraction of the specific character retrieval in the surveillance video, does not relate to the joint of the specific character and the specific behavior, and is different from the research angle of a joint retrieval method of the specific character and the specific behavior with identity consistency check.
Chinese patent document No. CN107315795A, published (announced) No. 2017.11.03, discloses a method and system for retrieving video instances in conjunction with specific characters and scenes, the method for retrieving video instances in conjunction with specific characters and scenes of the present invention first performs specific character retrieval and specific scene retrieval respectively, proposes strategies based on high score preservation and based on neighbor expansion to process the retrieval results in advance, and then fuses the retrieval results to obtain the video instance retrieval results in conjunction with specific characters and scenes, thereby improving the reliability and expandability of the ranking results. Compared with the joint retrieval of the specific character and the specific behavior, the method is used for researching the joint retrieval problem of the specific character and the specific scene, the research problem is different from that of the patent, meanwhile, the technology provided by the method aims to respectively carry out sequencing optimization on two retrieval results of the specific character and the specific scene, the relevance of the two retrieval results is not guaranteed, and the research angle is different from that of the joint retrieval method of the specific character and the specific behavior with identity consistency check.
Chinese patent document No. CN110516112A, published (announced) No. 2019.11.29, discloses a human body motion retrieval method and apparatus based on a hierarchical model. The human body action retrieval method based on the hierarchical model reserves the main geometric characteristics of the action data by using the coding based on the hierarchical model, converts the retrieval of complex action data into the retrieval of simple digital coding, and improves the retrieval efficiency. Compared with the joint retrieval of the specific character and the specific behavior, the method only researches the retrieval problem of the specific behavior in the video, does not relate to the joint of the specific character and the specific behavior, and has a different research angle from a joint retrieval method of the specific character and the specific behavior with identity consistency check.
Chinese patent document No. CN110674350A, published (announced) No. 2020.01.10, discloses a video character retrieval method, medium, apparatus, and computing device. According to the video character retrieval method based on multi-modal fusion, different modal characteristics in the retrieved video are extracted and fused, the target characters in the video are classified, and the robustness of characteristic information is improved. Compared with the joint retrieval of the specific character and the specific behavior, the method only researches the characteristic extraction problem of the specific character retrieval in the video, does not relate to the joint of the specific character and the specific behavior, and has a different research angle from a joint retrieval method of the specific character and the specific behavior with identity consistency check.
Disclosure of Invention
The invention aims to provide a method and a device for ensuring the association of the joint retrieval of a specific character and a specific behavior aiming at the defects in the prior art, so as to effectively improve the accuracy of the joint retrieval of the specific character and the specific behavior.
Based on the above purpose, the invention provides a specific character and specific behavior combined retrieval method with identity consistency check, which is different from the traditional behavior retrieval method adopting picture-level behavior identification, the invention adopts an example-level character interaction behavior detection method to perform behavior retrieval, and obtains a character frame corresponding to a specific behavior, so as to realize identity consistency check between the character frame in behavior retrieval and a face frame in character retrieval, the flow of the method is shown in fig. 2, the data processing flow of the method is shown in fig. 3, and the specific flow comprises the following steps:
step 1, performing shot segmentation on an original video sequence to obtain a shot retrieval database, and then performing key frame extraction on the shot to obtain a preprocessed key frame retrieval database;
step 2, searching the specific character example of the key frame level, which specifically comprises the following substeps:
step 2.1, detecting a retrieval database face frame and a face frame of an object to be inquired;
step 2.2, extracting the face features of the retrieval database and the face features of the object to be inquired;
step 2.3, calculating similarity scores of the face features of the retrieval database and the face features of the object to be queried;
step 2.4, saving a specific character instance retrieval result file;
step 3, searching the specific behavior example of the key frame level, specifically comprising the following substeps:
step 3.1, detecting character interaction behaviors of the retrieval database;
step 3.2, saving a retrieval result file of the specific behavior instance;
step 4, identity consistency check is carried out on the specific character and the specific behavior, and a consistency check score is used as a confidence score for judging whether the specific behavior is carried out on the specific character or not;
step 5, fusing the specific character and the specific behavior retrieval result of the key frame level, wherein the fused score of the specific character and the specific behavior retrieval is calculated according to the specific character retrieval score obtained in the step 2, the specific behavior retrieval score obtained in the step 3 and the consistency check score of the specific character and the specific behavior retrieval result obtained in the step 4;
and 6, converging the key frame retrieval scores to shot retrieval scores, and sequencing the retrieval shots according to the shot retrieval scores to obtain a combined retrieval result of the specific character and the specific behavior with identity consistency check.
Moreover, the method for saving the specific character instance search result file and the specific behavior instance search result file comprises the following steps:
the original video retrieval library is provided with L shots, the shot with the L belonging to [1, L ] can be divided into K key frames, for the K belonging to [1, K ] key frames, the number of retrieval results of specific character examples is m, and the number of retrieval results of specific behavior examples is n (for the convenience of discussion, under the condition that confusion is not caused, the corner mark symbols K and L in all variables are hidden).
And (3) sorting and storing the retrieval results of the ith element [1, m ] specific character examples into the following six-tuple form:
Figure BDA0002894567350000031
wherein f isiIndicates character category, fsiRepresents a category score, < fxi,fyi,fwi,fhiThe position information of the face frame is represented;
similarly, the result is searched for the j ∈ [1, n ] specific behavior instance, and is sorted and stored into the following six-tuple form:
Figure BDA0002894567350000041
wherein, aiIndicates the category of behavior, asiRepresents a category score, < axi,ayi,awi,ahiThe position information of the character frame is represented.
Moreover, the identity consistency check for the specific person and the specific behavior is performed in the following manner:
for the L ∈ [1, L ]]The K ∈ [1, K ] of lens]A key frame for searching the face frame position information in the result file according to the specific character and the specific behavior example
Figure BDA0002894567350000042
With character frame position information
Figure BDA0002894567350000043
Calculating identity consistency score matrix C epsilon R by pairwise matching principlem×nWherein C ═ Cij],cijThe calculation method of (2) is, but not limited to, the following method:
(1) calculating the overlapping degree of the human face frame and the human figure frame;
(2) calculating the overlapping degree of the human face frame and the upper half part of the character frame;
(3) predicting the face position in the character frame based on the character skeleton model, and calculating the region overlapping degree of the face frame and the predicted face frame;
(4) and predicting the face position in the character frame based on a human body segmentation technology, and calculating the region overlapping degree of the face frame and the predicted face frame.
Moreover, the specific character and the specific behavior retrieval result at the key frame level are fused, and the implementation mode is as follows:
for the L ∈ [1, L ]]The K ∈ [1, K ] of lens]A key frame, searching the fractional vector fs according to the specific character obtained in step 2i,i∈[1,m]The retrieval score vector as of the specific behavior obtained in step 3j,j∈[1,n]And step 4, the consistency score matrix C of the specific character obtained in the step 4 for performing the specific behavior is [ C ═ Cij]Calculating a fusion score matrix S ∈ Rm×nWherein S ═ Sij],sijThere are but not limited to the following ways:
sij=cij×(α·fsi+β·asj)
wherein α and β are fusion coefficients respectively assigned to the specific character retrieval score and the specific behavior retrieval score at the time of fusion.
Moreover, the gathering of the key frame retrieval score to the shot retrieval score is performed in the following manner:
for the L ∈ [1, L ]]Shot, according to step 5, whose K ∈ [1, K ]]The retrieval score of each key frame is a fusion score matrix S belonging to Rm×nIn order to converge a plurality of frame-level retrieval results of different numbers of faces and behaviors to obtain a shot-level retrieval result, firstly preprocessing a frame-level fusion score matrix: setting the total face number of a retrieval database as M and the total behavior number as N, setting undetected character or behavior retrieval scores in a frame as zero, and setting a frame-level fusion score matrix S belonging to Rm×nExpansion to Sec RM×NAnd is recorded as
Figure BDA0002894567350000044
Then converging the expanded K frame-level fusion score matrixes to form a lens-level fusion score matrix Sl∈RM×NWherein
Figure BDA0002894567350000045
The convergence method includes, but is not limited to, the following methods:
(1) and (3) carrying out a specific behavior j aiming at a specific character i, taking the maximum value of the scores of all key frames as the retrieval score of the shot, wherein the formula is as follows:
Figure BDA0002894567350000051
(2) and (3) carrying out a specific behavior j aiming at a specific character i, taking the average value of the scores of all key frames as the retrieval score of the shot, wherein the formula is as follows:
Figure BDA0002894567350000052
based on the same inventive concept, the invention also designs a specific character and specific behavior combined retrieval device with identity consistency check, which is characterized by comprising the following steps:
the data preprocessing module is used for carrying out shot segmentation on the original video sequence to obtain a shot retrieval database, and then carrying out key frame extraction on the shot to obtain a preprocessed key frame retrieval database;
the specific character instance retrieval module is used for retrieving the specific character instance at the key frame level and comprises the following sub-steps:
detecting a retrieval database face frame and a face frame of an object to be inquired;
extracting face features of a retrieval database and face features of an object to be inquired;
calculating similarity scores of the face features of the retrieval database and the face features of the object to be queried;
saving a specific character instance retrieval result file;
the specific behavior instance retrieval module is used for retrieving the specific behavior instance of the key frame level and comprises the following sub-steps:
detecting the character interaction behavior of the retrieval database;
saving a specific behavior instance retrieval result file;
the identity consistency verification module is used for verifying identity consistency of the specific character and the specific behavior and taking a consistency verification score as a confidence score for judging whether the specific behavior is performed on the specific character or not; performing specific figure and specific behavior retrieval result fusion of a key frame level, wherein the fusion score of the specific figure and the specific behavior retrieval is calculated according to the obtained specific figure retrieval score, the specific behavior retrieval score and the consistency check score of the specific figure and the specific behavior retrieval result; and converging the key frame retrieval scores to shot retrieval scores, and sequencing the retrieval shots according to the shot retrieval scores to obtain a combined retrieval result of the specific character and the specific behavior with identity consistency check.
Based on the same inventive concept, the invention also designs a computer readable medium, on which a computer program is stored, wherein the program is executed to implement the above method for jointly searching the specific character and the specific behavior with identity consistency check.
Based on the same inventive concept, the invention also designs computer equipment which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, and is characterized in that the processor executes the program to realize the joint retrieval method of the specific character and the specific behavior with identity consistency check.
Compared with the existing joint retrieval technology of the specific character and the specific behavior video instance, the method mainly has the following advantages:
(1) compared with the prior art, the method has the advantages that instead of adopting a picture-level behavior recognition method for behavior retrieval, the method adopts an example-level character interaction behavior detection method for behavior retrieval to obtain the character frame corresponding to the specific behavior, so that identity consistency check between the character frame in the behavior retrieval and the face frame in the character retrieval is possible;
(2) compared with the prior art, the method has the advantages that the method replaces the direct fusion of the specific character and the specific behavior retrieval result, the identity consistency check method is provided for judging the relevance between the specific character and the specific behavior retrieval result, the problem of inconsistency between the character and the behavior retrieval identity is effectively solved, and the combined retrieval accuracy of the specific character and the specific behavior video instance is effectively improved.
Drawings
FIG. 1 is a diagram illustrating the problem under investigation according to the present invention.
FIG. 2 is a flow chart of the method of the present invention.
FIG. 3 is a data processing flow diagram of the method of the present invention.
FIG. 4 is a flow chart of an embodiment of the present invention.
Fig. 5 is a file structure of a search result file of an example of a specific character of the present invention.
FIG. 6 is a file structure of a search result file according to an exemplary embodiment of the present invention.
Fig. 7 is a schematic diagram of an identity consistency verification method according to an embodiment of the present invention.
Detailed Description
For the purpose of promoting a better understanding of the invention and for enabling those skilled in the art to practice the same, reference will now be made in detail to the embodiments of the invention as illustrated in the accompanying drawings.
The invention provides a specific character and specific behavior combined retrieval method with identity consistency check, which aims at solving the problem that the relevance between a specific character and a specific behavior cannot be ensured by adopting the conventional method of respectively retrieving the specific character and the specific behavior and then fusing the specific character and the specific behavior in the task of combined retrieval of the specific character and the specific behavior, and comprises the steps of firstly retrieving the specific character to obtain a face frame corresponding to the specific character, then, the example-level character interaction behavior detection method is adopted to carry out the behavior retrieval to obtain the character frame corresponding to the specific behavior, then the identity consistency check is carried out on the positions of the character frame in the behavior retrieval and the face frame in the character retrieval, finally the joint retrieval score is calculated according to the character retrieval score, the behavior retrieval score and the consistency check score, and obtaining a combined retrieval result of the specific character and the specific behavior video instance according to the combined retrieval score. In specific implementation, referring to fig. 4, the process includes the following steps:
step 1, performing shot segmentation on an original video sequence to obtain a shot retrieval database, then performing key frame extraction on the shot to obtain a preprocessed key frame retrieval database, wherein the quantity mapping relation between the original shot database and the key frame retrieval database is as follows: each shot can be divided into a plurality of key frames, the number of the key frames is determined according to the length of the shot, an original video search library is provided with L shots, and the shots can be divided into K key frames aiming at the L ∈ [1, L ];
step 2, searching the specific character example of the key frame level, which specifically comprises the following substeps:
and 2.1, detecting a face frame of the retrieval database and a face frame of an object to be inquired, wherein the face frame can be detected by adopting a currently commonly applied multi-task convolutional neural network face detection Model (MTCNN), and the model carries out coarse-to-fine processing on a face detection task through a three-order cascaded convolutional neural network to obtain the filtered face frame and the position of a face key point. Firstly, in order to adapt to the face detection of different sizes, the images are transformed in different scales to construct an image pyramid; secondly, performing primary feature extraction and face frame calibration on the image pyramid through 3 convolution layers and 1 face classifier, and performing face frame filtering through frame regression and a face key point positioner to form a primary face region; then, further screening the preliminary face area by adding a network of a full connection layer, and outputting a more credible face area; and finally, inputting the obtained face region into a network added with a convolution layer to identify the face region, and obtaining the positions of a face frame and a key point through frame regression and feature positioning. The MTCNN model, pre-trained on a large scale face detection data set WIDER FACE with high variability in scale, pose, occlusion, and illumination, is used for face detection with greater robustness. In the embodiment, the face candidate region with a larger error probability and a height smaller than 60 pixels is filtered, and the filtered face is subjected to similarity transformation to obtain an aligned face. The detection model returns the face frame position in each image, and a plurality of face frame positions can be detected from one image containing a plurality of faces.
And 2.2, extracting the face features of the retrieval database and the face features of the object to be inquired, carrying out face recognition on a face frame detected in each image to extract the features, wherein the face recognition can adopt a depth face recognition model (ArcFace) with better current performance and based on self-adaptive angle edge loss, and the model improves the intra-class compactness and the inter-class difference while improving the feature vector normalization and the additive angle interval. In an embodiment, a model pre-trained on a face recognition data set MS1MV2 is selected, using a ResNet100 backbone network, leaving only the feature embedding layer without the full connection layer, extracting 512-dimensional features for each normalized face image and its transversely flipped face image, respectively, and connecting them to generate 1024-dimensional face features to represent a face.
And 2.3, calculating similarity scores of the face features of the retrieval database and the face features of the objects to be queried, and sequencing to obtain a specific character retrieval list. Based on the extracted human face features, calculating feature similarity by adopting a similarity measurement method based on Cosine distance, and then sorting a similarity score retrieval list.
Step 2.4, based on the processing process, obtaining a plurality of character retrieval results of each key frame, setting the kth E [1, K ] key frame aiming at the L E [1, L ] lens, wherein the number of the retrieval results of the specific character examples is m, and the retrieval results of the ith E [1, m ] specific character examples are sorted and stored into the following six-tuple form:
Figure BDA0002894567350000081
wherein f isiIndicates character category, fsiRepresents a category score, < fxi,fyi,fwi,fhiThe position information of the face frame is stored as a specific character instance retrieval result file, and the file structure is shown in figure 5.
Step 3, searching the specific behavior example of the key frame level, specifically comprising the following substeps:
and 3.1, detecting character interaction behaviors of each picture in the key frame retrieval database, wherein the behavior detection model can adopt a real-time character interaction detection model (PPDM) with better current performance based on parallel point detection and matching. The model is a single-stage detection framework based on an anchor-frame-free detection idea, and mainly comprises point detection and point matching two branches: firstly, the point detection branch predicts a character central point, an object central point and an interaction central point through a heat map prediction network, and predicts the sizes of a character frame and an object frame through a regression method; and secondly, the point matching branch matches the detected character points, object points and interaction points by predicting the offset from the interaction center point to the character center point and the object center point and combining a matching algorithm to complete character interaction triple detection. In the embodiment, a model pre-trained on a common human interactive behavior detection data set HICO-DET is selected, a universal heat map prediction network DLA-34 is used as a feature extractor, and classification of specific behaviors and human detection frames corresponding to the specific behaviors are obtained.
Step 3.2, based on the processing process, obtaining a plurality of behavior retrieval results of each key frame, setting the kth e [1, K ] key frame aiming at the L e [1, L ] lens, wherein the number of the retrieval results of the specific behavior example is n, and the retrieval results of the jth e [1, n ] specific behavior example are sorted and stored into the following six-tuple form:
Figure BDA0002894567350000082
wherein, ajIndicates the category of behavior, asjRepresents a category score, < axj,ayj,awj,ahjThe position information of the character frame is stored as a specific behavior instance retrieval result file, and the file structure is shown in figure 6.
And 4, carrying out identity consistency check for searching the specific character and the specific behavior, and taking a consistency check score as a confidence score for judging whether the specific character carries out the specific behavior. For the L ∈ [1, L ]]The K ∈ [1, K ] of lens]A key frame for searching the face frame position information in the result file according to the specific character and the specific behavior example
Figure BDA0002894567350000083
With character frame position information
Figure BDA0002894567350000084
Calculating a consistency score matrix C epsilon R by a pairwise matching principlem×nWherein C ═ Cij]The calculation schematic diagram is shown in fig. 7, and specifically includes the following sub-steps:
step 4.1, predicting the position of the face in the character frame by using the human body posture estimation model
Figure BDA0002894567350000085
The human body posture model can adopt a human body posture estimation model (HRNet) with better current performance and based on deep high-resolution expression learning, the model gradually increases a low-resolution feature map sub-network in a high-resolution feature map main network, the low-resolution feature map sub-network is connected into a parallel multi-resolution sub-network, and the extraction from high resolution to low resolution features and the multi-scale sub-network are adoptedAnd the degree fusion obtains abundant high-resolution characteristics, and is used for carrying out heat map prediction on key points of human body postures. In an embodiment, a HRNet-W32 network model pre-trained on the COCO data set is selected to predict the position of the face box within the character box for a particular behavior.
Step 4.2, calculating the intersection ratio of the face frame in the specific character retrieval and the predicted face frame in the specific behavior retrieval, taking the intersection ratio as the confidence score of the specific behavior of the specific character, and making the ith E [1, m]Personal face frame position < fxi,fyi,fwi,fhiIs > ROIiThe j ∈ [1, n ]]Individual character frame predicted face position
Figure BDA0002894567350000091
Is ROIjThe intersection-to-parallel ratio formula is as follows:
Figure BDA0002894567350000092
step 5, fusing the search results of the specific character and the specific behavior, aiming at the L-th element [1, L ]]The K ∈ [1, K ] of lens]A key frame, searching the fractional vector fs according to the specific character obtained in step 2i,i∈[1,m]Step 3, the retrieval score vector as of the specific behaviorj,j∈[1,n]And step 4, obtaining a consistency check fraction matrix C ═ C of the specific character and the specific behavior retrieval resultij]Respectively distributing alpha and beta as fusion coefficients for the retrieval scores of the specific character and the retrieval scores of the specific behavior during fusion, and calculating a fusion score matrix S belonging to R for the retrieval of the specific character and the specific behaviorm×nWherein S ═ Sij],sijThe calculation formula is as follows:
sij=cij×(α·fsi+β·asj)
step 6, gathering the key frame retrieval scores to shot retrieval scores, aiming at the L ∈ [1, L ]]Shot, according to step 5, whose K ∈ [1, K ]]The retrieval score of each key frame is a fusion score matrix S belonging to Rm×nIn order to aggregate a plurality of frame-level retrieval results for retrieving different numbers of faces and behaviorsObtaining a shot level retrieval result, firstly preprocessing a frame level fusion score matrix: setting the total number of people in the retrieval database as M and the total number of behaviors as N, and expanding the frame-level fusion score matrix to S e.g. R by setting the undetected character or behavior retrieval score in the frame as zeroM×NAnd is recorded as
Figure BDA0002894567350000093
Then, specific behavior j is carried out on a specific character i, the maximum score value of all key frames is taken as the retrieval score of the shot, and the expanded K frame-level fusion score matrixes are converged into a shot-level fusion score matrix Sl∈RM×NWherein
Figure BDA0002894567350000094
The calculation formula is as follows:
Figure BDA0002894567350000095
and finally, sequencing the retrieval shots according to the shot retrieval scores to obtain a combined retrieval result of the specific characters and the specific behaviors with identity consistency check.
Based on the same inventive concept, the invention also designs a specific character and specific behavior combined retrieval device with identity consistency check, which is characterized by comprising the following steps:
the data preprocessing module is used for carrying out shot segmentation on the original video sequence to obtain a shot retrieval database, and then carrying out key frame extraction on the shot to obtain a preprocessed key frame retrieval database;
the specific character instance retrieval module is used for retrieving the specific character instance at the key frame level and comprises the following sub-steps:
detecting a retrieval database face frame and a face frame of an object to be inquired;
extracting face features of a retrieval database and face features of an object to be inquired;
calculating similarity scores of the face features of the retrieval database and the face features of the object to be queried;
saving a specific character instance retrieval result file;
the specific behavior instance retrieval module is used for retrieving the specific behavior instance of the key frame level and comprises the following sub-steps:
detecting the character interaction behavior of the retrieval database;
saving a specific behavior instance retrieval result file;
the identity consistency verification module is used for verifying identity consistency of the specific character and the specific behavior and taking a consistency verification score as a confidence score for judging whether the specific behavior is performed on the specific character or not; performing specific figure and specific behavior retrieval result fusion of a key frame level, wherein the fusion score of the specific figure and the specific behavior retrieval is calculated according to the obtained specific figure retrieval score, the specific behavior retrieval score and the consistency check score of the specific figure and the specific behavior retrieval result; and converging the key frame retrieval scores to shot retrieval scores, and sequencing the retrieval shots according to the shot retrieval scores to obtain a combined retrieval result of the specific character and the specific behavior with identity consistency check.
Based on the same inventive concept, the invention also designs a computer readable medium, on which a computer program is stored, wherein the program is executed to implement the above method for jointly searching the specific character and the specific behavior with identity consistency check.
Based on the same inventive concept, the invention also designs computer equipment which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, and is characterized in that the processor executes the program to realize the joint retrieval method of the specific character and the specific behavior with identity consistency check.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description is for illustrative purposes only and is not intended to limit the scope of the present disclosure, which is to be construed as limiting the present disclosure.

Claims (9)

1. A specific character and specific behavior combined retrieval method with identity consistency check is characterized by comprising the following steps:
step 1, performing shot segmentation on an original video sequence to obtain a shot retrieval database, and then performing key frame extraction on the shot to obtain a preprocessed key frame retrieval database;
step 2, searching the specific character example of the key frame level, comprising the following substeps:
step 2.1, detecting a retrieval database face frame and a face frame of an object to be inquired;
step 2.2, extracting the face features of the retrieval database and the face features of the object to be inquired;
step 2.3, calculating similarity scores of the face features of the retrieval database and the face features of the object to be queried;
step 2.4, saving a specific character instance retrieval result file;
step 3, searching the specific behavior example of the key frame level, comprising the following substeps:
step 3.1, detecting character interaction behaviors of the retrieval database;
step 3.2, saving a retrieval result file of the specific behavior instance;
step 4, identity consistency check is carried out on the specific character and the specific behavior, and a consistency check score is used as a confidence score for judging whether the specific behavior is carried out on the specific character or not;
step 5, fusing the specific character and the specific behavior retrieval result of the key frame level, wherein the fused score of the specific character and the specific behavior retrieval is calculated according to the specific character retrieval score obtained in the step 2, the specific behavior retrieval score obtained in the step 3 and the consistency check score of the specific character and the specific behavior retrieval result obtained in the step 4;
and 6, converging the key frame retrieval scores to shot retrieval scores, and sequencing the retrieval shots according to the shot retrieval scores to obtain a combined retrieval result of the specific character and the specific behavior with identity consistency check.
2. The method for jointly searching the specific character and the specific behavior with the identity consistency check as claimed in claim 1, wherein the step of saving the specific character instance search result file comprises the following steps:
the original video retrieval library is provided with L shots, the shot belonging to the L < epsilon > 1 and L < epsilon > can be divided into K key frames, the number of retrieval results of specific character examples of the K < epsilon > 1 and K < epsilon > key frames is m, and the retrieval results of specific character examples belonging to the ith < epsilon > 1 and m < epsilon > are sorted and stored into a six-tuple form as follows:
Figure FDA0002894567340000011
wherein f isiIndicates character category, fsiRepresents a category score, < fxi,fyi,fwi,fhiThe > indicates the position information of the face frame.
3. The method for jointly searching the specific character and the specific behavior with identity consistency check according to claim 1, wherein the step of saving the specific behavior instance search result file comprises the following steps:
aiming at the kth e [1, K ] key frame of the L e [1, L ] lens, setting the number of the retrieval results of the specific behavior examples as n, and sorting and storing the retrieval results of the jth e [1, n ] specific behavior examples into a six-tuple form as follows:
Figure FDA0002894567340000021
wherein, aiIndicates the category of behavior, asiRepresents a category score, < axi,ayi,awi,ahiFrame for personThe location information of (1).
4. The method for jointly searching the specific character and the specific behavior with the identity consistency check according to the claims 2 and 3, characterized in that the identity consistency check for the specific character and the specific behavior is performed as follows:
for the L ∈ [1, L ]]The K ∈ [1, K ] of lens]A key frame for searching the face frame position information in the result file according to the specific character and the specific behavior example
Figure FDA0002894567340000022
With character frame position information
Figure FDA0002894567340000023
Calculating identity consistency score matrix C epsilon R by pairwise matching principlem×nWherein C ═ Cij],cijThe calculation method of (2) is, but not limited to, the following method:
1) calculating the overlapping degree of the human face frame and the human figure frame;
2) calculating the overlapping degree of the human face frame and the upper half part of the character frame;
3) predicting the face position in the character frame based on the character skeleton model, and calculating the region overlapping degree of the face frame and the predicted face frame;
4) and predicting the face position in the character frame based on a human body segmentation technology, and calculating the region overlapping degree of the face frame and the predicted face frame.
5. The method for jointly retrieving the specific character and the specific behavior with the identity consistency check according to claim 4, wherein the fusion of the specific character and the specific behavior retrieval result at the key frame level is implemented as follows:
for the L ∈ [1, L ]]The K ∈ [1, K ] of lens]A key frame, searching the fractional vector fs according to the specific character obtained in step 2i,i∈[1,m]The retrieval score vector as of the specific behavior obtained in step 3j,j∈[1,n]Step (b)The consistency score matrix C ═ C of the specific character obtained in the step 4 for performing the specific behaviorij]Calculating a fusion score matrix S ∈ Rm×nWherein S ═ Sij],sijThere are but not limited to the following ways:
sij=cij×(α·fsi+β·asj)
wherein α and β are fusion coefficients respectively assigned to the specific character retrieval score and the specific behavior retrieval score at the time of fusion.
6. The method of claim 5, wherein the aggregation of the key frame retrieval score to the shot retrieval score is performed by:
for the L ∈ [1, L ]]Shot, according to step 5, whose K ∈ [1, K ]]The retrieval score of each key frame is a fusion score matrix S belonging to Rm×nIn order to converge a plurality of frame-level retrieval results of different numbers of faces and behaviors to obtain a shot-level retrieval result, firstly preprocessing a frame-level fusion score matrix: setting the total face number of a retrieval database as M and the total behavior number as N, setting undetected character or behavior retrieval scores in a frame as zero, and setting a frame-level fusion score matrix S belonging to Rm×nExpansion to Sec RM×NAnd is recorded as
Figure FDA0002894567340000031
Then converging the expanded K frame-level fusion score matrixes to a shot-level fusion score matrix Sl∈RM×NWherein
Figure FDA0002894567340000032
The convergence method includes, but is not limited to, the following methods:
1) and (3) carrying out a specific behavior j aiming at a specific character i, taking the maximum value of the scores of all key frames as the retrieval score of the shot, wherein the formula is as follows:
Figure FDA0002894567340000033
2) and (3) carrying out a specific behavior j aiming at a specific character i, taking the average value of the scores of all key frames as the retrieval score of the shot, wherein the formula is as follows:
Figure FDA0002894567340000034
7. a specific character and specific behavior combined retrieval device with identity consistency check is characterized by comprising:
the data preprocessing module is used for carrying out shot segmentation on the original video sequence to obtain a shot retrieval database, and then carrying out key frame extraction on the shot to obtain a preprocessed key frame retrieval database;
the specific character instance retrieval module is used for retrieving the specific character instance at the key frame level and comprises the following sub-steps:
detecting a retrieval database face frame and a face frame of an object to be inquired;
extracting face features of a retrieval database and face features of an object to be inquired;
calculating similarity scores of the face features of the retrieval database and the face features of the object to be queried;
saving a specific character instance retrieval result file;
the specific behavior instance retrieval module is used for retrieving the specific behavior instance of the key frame level and comprises the following sub-steps:
detecting the character interaction behavior of the retrieval database;
saving a specific behavior instance retrieval result file;
the identity consistency verification module is used for verifying identity consistency of the specific character and the specific behavior and taking a consistency verification score as a confidence score for judging whether the specific behavior is performed on the specific character or not; performing specific figure and specific behavior retrieval result fusion of a key frame level, wherein the fusion score of the specific figure and the specific behavior retrieval is calculated according to the obtained specific figure retrieval score, the specific behavior retrieval score and the consistency check score of the specific figure and the specific behavior retrieval result; and converging the key frame retrieval scores to shot retrieval scores, and sequencing the retrieval shots according to the shot retrieval scores to obtain a combined retrieval result of the specific character and the specific behavior with identity consistency check.
8. A computer-readable medium, on which a computer program is stored, characterized in that the program, when executed, implements the method according to any one of claims 1 to 6.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 6 when executing the program.
CN202110051588.3A 2021-01-12 2021-01-12 Specific character and specific behavior combined retrieval method and device with identity consistency check function Active CN112699846B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110051588.3A CN112699846B (en) 2021-01-12 2021-01-12 Specific character and specific behavior combined retrieval method and device with identity consistency check function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110051588.3A CN112699846B (en) 2021-01-12 2021-01-12 Specific character and specific behavior combined retrieval method and device with identity consistency check function

Publications (2)

Publication Number Publication Date
CN112699846A true CN112699846A (en) 2021-04-23
CN112699846B CN112699846B (en) 2022-06-07

Family

ID=75515155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110051588.3A Active CN112699846B (en) 2021-01-12 2021-01-12 Specific character and specific behavior combined retrieval method and device with identity consistency check function

Country Status (1)

Country Link
CN (1) CN112699846B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000009221A (en) * 1998-07-22 2000-02-15 정선종 Motion picture searching method using motion information based on joint points
CN106815566A (en) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 A kind of face retrieval method based on multitask convolutional neural networks
US20170201562A1 (en) * 2016-01-12 2017-07-13 Electronics And Telecommunications Research Institute System and method for automatically recreating personal media through fusion of multimodal features
CN107315795A (en) * 2017-06-15 2017-11-03 武汉大学 The instance of video search method and system of joint particular persons and scene
CN109635539A (en) * 2018-10-30 2019-04-16 华为技术有限公司 A kind of face identification method and electronic equipment
CN110781350A (en) * 2019-09-26 2020-02-11 武汉大学 Pedestrian retrieval method and system oriented to full-picture monitoring scene
CN111177436A (en) * 2018-11-09 2020-05-19 浙江宇视科技有限公司 Face feature retrieval method, device and equipment
CN111914622A (en) * 2020-06-16 2020-11-10 北京工业大学 Character interaction detection method based on deep learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000009221A (en) * 1998-07-22 2000-02-15 정선종 Motion picture searching method using motion information based on joint points
US20170201562A1 (en) * 2016-01-12 2017-07-13 Electronics And Telecommunications Research Institute System and method for automatically recreating personal media through fusion of multimodal features
CN106815566A (en) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 A kind of face retrieval method based on multitask convolutional neural networks
CN107315795A (en) * 2017-06-15 2017-11-03 武汉大学 The instance of video search method and system of joint particular persons and scene
CN109635539A (en) * 2018-10-30 2019-04-16 华为技术有限公司 A kind of face identification method and electronic equipment
CN111177436A (en) * 2018-11-09 2020-05-19 浙江宇视科技有限公司 Face feature retrieval method, device and equipment
CN110781350A (en) * 2019-09-26 2020-02-11 武汉大学 Pedestrian retrieval method and system oriented to full-picture monitoring scene
CN111914622A (en) * 2020-06-16 2020-11-10 北京工业大学 Character interaction detection method based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LONGXIANG JIANG,JINGYAO YANG: "WHU-NERCMS AT TRECVID2019:INSTANCE SEARCH TASK", 《WHU-NERCMS AT TRECVID2019》 *
杨洋,兰佳梅等: "联合特定人物和场景的视频实例检索问题与方法", 《中国科技论文》 *
王斌: "基于人体姿态及行为的视频检索研究", 《硕士电子期刊.信息科技辑》 *

Also Published As

Publication number Publication date
CN112699846B (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN110598543B (en) Model training method based on attribute mining and reasoning and pedestrian re-identification method
KR101781358B1 (en) Personal Identification System And Method By Face Recognition In Digital Image
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
Zhang et al. Boosting image orientation detection with indoor vs. outdoor classification
US20220180534A1 (en) Pedestrian tracking method, computing device, pedestrian tracking system and storage medium
Li et al. An anti-fraud system for car insurance claim based on visual evidence
CN113408492A (en) Pedestrian re-identification method based on global-local feature dynamic alignment
Yamauchi et al. Relational HOG feature with wild-card for object detection
JP2022082493A (en) Pedestrian re-identification method for random shielding recovery based on noise channel
Xia et al. Face occlusion detection using deep convolutional neural networks
CN112668557A (en) Method for defending image noise attack in pedestrian re-identification system
Gao et al. Occluded person re-identification based on feature fusion and sparse reconstruction
Ahmad et al. Embedded deep vision in smart cameras for multi-view objects representation and retrieval
Deng et al. Attention-aware dual-stream network for multimodal face anti-spoofing
Antonio et al. Pedestrians' Detection Methods in Video Images: A Literature Review
Shf et al. Review on deep based object detection
CN112699846B (en) Specific character and specific behavior combined retrieval method and device with identity consistency check function
Proença et al. SHREC’15 Track: Retrieval of Oobjects captured with kinect one camera
CN115331146A (en) Micro target self-adaptive detection method based on data enhancement and feature fusion
CN115082854A (en) Pedestrian searching method oriented to security monitoring video
TWI728655B (en) Convolutional neural network detection method and system for animals
Zhu et al. Enhancing interior and exterior deep facial features for face detection in the wild
Zhang et al. Person re-identification based on pose-aware segmentation
Ingale et al. Deep Learning for Crowd Image Classification for Images Captured Under Varying Climatic and Lighting Condition
Zhang et al. Novel freight train image fault detection and classification models based on CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant