CN107315795B

CN107315795B - The instance of video search method and system of joint particular persons and scene

Info

Publication number: CN107315795B
Application number: CN201710454025.2A
Authority: CN
Inventors: 胡瑞敏; 兰佳梅; 王正; 徐东曙; 梁超; 陈军; 陈祎玥; 杨洋
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2017-06-15
Filing date: 2017-06-15
Publication date: 2019-08-02
Anticipated expiration: 2037-06-15
Also published as: CN107315795A

Abstract

The present invention relates to the instance of video search methods and system of joint particular persons and scene, case retrieval including carrying out particular persons in video, retrieved based on part and the special scenes of global Combinatorial Optimization, realize the instance of video retrieval retained based on high score, realize the instance of video retrieval extended based on neighbour, merge particular persons retrieval and special scenes search result, including for each camera lens, personage's search result after merging initial scene search result and neighbour's extension, scene search result after merging initial personage's search result and neighbour's extension again, take the maximum value of two kinds of fusion results, obtain the camera lens ranking results of instance of video retrieval.Instance of video retrieval ordering result provided by the invention is relatively reliable, and expansion and applicability are very strong.

Description

The instance of video search method and system of joint particular persons and scene

Technical field

The invention belongs to video search technique areas, are related to a kind of instance of video retrieval technique scheme, more particularly to joint The instance of video search method and system of particular persons and scene.

Background technique

In video analysis and retrieval technique evaluation and test, instance of video retrieval refers to given inquiry sample (multitude of video segment Or image data) and video library, there are all video clips (camera lens) of given inquiry sample in retrieval from video library, and It is ranked up according to the similarity degree with given inquiry sample.Inquiry sample can be different scenes containing specific people, vehicle, object Several images of equal specific objectives, can also provide the video clip comprising the target sometimes.The view of joint particular persons and scene Frequency case retrieval refers to retrieves the segment that a certain particular persons occur in a certain special scenes in massive video data.The skill Art facilitates public security officer and excludes uncorrelated target in magnanimity monitor video, and focal point target focuses, observation, analysis suspicion Object is doubted, significantly improves magnanimity monitored video browse efficiency, and then to raising public security department's emergency disposal ability and social security Integrated prevention and control capacity, maintenance people life property safety are of great significance.

Joint particular persons and the instance of video retrieval technique institute facing challenges of scene are important from three sides at present Face: the first, amount of video is huge, there are a large amount of noise, a little target to be checked is found from the video of magnanimity and is not easy very much；The Two, there is situations such as wearing different, posture changing, scene angle transformation clothes in retrieval personage；Third, face scene illumination variation greatly, Situations such as serious shielding.The instance of video search method of existing joint particular persons and scene generally first retrieves particular persons respectively And scene, then examined with the instance of video that rear amalgamation mode fusion particular persons and scene search result obtain joint personage and scene Hitch fruit.Particular persons and scene search result are usually to be indicated with score, and score is higher, indicates that corresponding camera lens contains inquiry The probability of sample is bigger, the score of particular persons and scene under fusion method can be addition or the same camera lens that is multiplied.However Even correct camera lens to be checked, camera lens corresponds to personage's search result or scene search result is also not necessarily high.

Chinese patent literature CN105678250A, open (bulletin) day 2016.06.15 disclose in a kind of video Face identification method and device, face identification method and device in video described in the invention use dynamic identifying method, benefit Complementation is carried out to the information of each frame image with feature of the frame image each in video on time dimension with relevance, to improve Although the accuracy of recognition of face, this method belong to video search technique area, but this method only carried out personage retrieval and There is no scene search, is different with a kind of research angle of instance of video retrieval for combining particular persons and scene.

Chinese patent literature CN106022313A, open (day for announcing) 2016.10.12, disclose a kind of can fit automatically The face identification method of scene is answered, the face identification method of automatic adaptation scene described in this method refers to using convolutional Neural Network algorithm model compensates, and compared with traditional manual operations, has stronger automatic type, is not related to scene Retrieval is different with a kind of research angle of instance of video retrieval for combining particular persons and scene.

Chinese patent literature CN104794219A, open (bulletin) day 2015.07.22 disclose a kind of based on geography The scene search method of location information, this method are indexed using the geography information and global description's of scene image, filtering A large amount of irrelevant image, improves the efficiency of visual vocabulary space verifying and the accuracy rate of images match, and this method only carries out The retrieval of scene is without carrying out personage's retrieval, with a kind of research angle that the instance of video for combine particular persons and scene is retrieved Degree is different.

Chinese patent literature CN104820711A, discloses a kind of complex scene at open (bulletin) day 2015.08.05 Under to the video retrieval method of humanoid target, this method searches for the similarity of image, online updating, mould by continuous on-line tuning Type generates a new wheel search result after updating, this method allows human-computer interaction to come in newly update Machine Vision Recognition model library The search result satisfied to one, this method be not automatically generated as a result, thus the retrieval performance of this method there are also to be hoisted, and And combines in the instance of video search method of particular persons and scene with us and obtained by merging the search result of personage and scene Result to the end is different.

Chinese patent literature CN104517104A is spaced apart (bulletin) day 2015.04.15, discloses a kind of based on monitoring Scene servant face recognition method and system, this method by using Gabor characteristic and multiple dimensioned RILPQ feature scores grade fusion Mode, reduces that illumination unevenness of face image is even, has what the problems such as rotation angle and image are fuzzy generated recognition of face It influences, effectively improves the face identification rate under monitoring scene, which is not applied in addition to other under monitoring scene Scene is suitable for multiple scenes to combine the instance of video search method of particular persons and scene, and there are also optimize for this method Space.

Summary of the invention

In view of the deficiencies of the prior art, the present invention provides the instance of video inspections of a kind of joint particular persons and scene Rope technical solution, by being retained using high score initial retrieval result, being merged to obtain again after the sorting consistence of neighbour's extension Last ranking results, and then promoted and retrieve the accuracy rate that particular persons occur in special scenes.

The technical scheme adopted by the invention is that a kind of instance of video search method for combining particular persons and scene, including Following steps,

Step 1, in video particular persons case retrieval, including retrieve for an inquiry personage p, output is inquired Personage p and the similarity score for inquiring each camera lens in video library, obtain the ranking results of particular persons retrieval, as initial Personage's search result；

Step 2, in video special scenes case retrieval, including retrieved for an inquiry scene s, including following Sub-step,

Step 2.1, the specific objective retrieval based on local feature is carried out；

Step 2.2, the special scenes retrieval based on global characteristics is carried out；

Step 2.3, it realizes and is retrieved based on the local special scenes with global Combinatorial Optimization, including according to based on local feature Specific objective search result and special scenes search result based on global characteristics intersection rearrangement is carried out to camera lens, obtain final Special scenes retrieval ranking results, as initial scene search result；

Step 3, the instance of video retrieval retained based on high score, the sequence of removal step 1 gained particular persons retrieval are realized As a result with step 2 gained special scenes retrieval ranking results in rank behind as a result, the personage after being denoised retrieves As a result with the scene search result after denoising；

Step 4, it realizes the instance of video retrieval extended based on neighbour, including is carried out according to step 3 acquired results based on close The optimization of neighbour's extension, the scene search result after personage's search result and neighbour's extension after obtaining neighbour's extension；

Step 5, the retrieval of fusion particular persons and special scenes search result, including for each camera lens, merge initial Personage's search result after scene search result and neighbour's extension, then after merging initial personage's search result and neighbour's extension Scene search obtains the camera lens ranking results of instance of video retrieval as a result, take the maximum value of two kinds of fusion results.

Moreover, the specific objective retrieval based on local feature, including multiple are to be checked accordingly to an inquiry scene s Picture extracts the BOW feature of each target area in every picture to be checked；Extract all key frames in inquiry all camera lenses of video library BOW feature；According to BOW feature, to each target area of each picture to be checked, calculates and distinguish with key frames all in each camera lens Euclidean distance, take minimum euclidean distance be target area and camera lens similarity；To each camera lens, all target areas are taken respectively Similarity score with the similarity maximum value of the camera lens as camera lens obtains the specific objective retrieval knot based on local feature Fruit；

Moreover, the special scenes retrieval based on global characteristics, including multiple are to be checked accordingly to an inquiry scene s Picture extracts the CNN feature of every picture to be checked, extracts the CNN feature of all key frames in inquiry all camera lenses of video library；Root According to CNN feature, to each picture to be checked, the Euclidean distance respectively with key frames all in each camera lens is calculated, minimum euclidean distance is taken For the similarity of picture to be checked and camera lens；To each camera lens, the similarity maximum value of all pictures to be checked and the camera lens is taken to make respectively For the similarity score of camera lens, the specific objective search result based on global characteristics is obtained；

Moreover, carrying out the optimization extended based on neighbour, implementation is as follows,

If any face or the corresponding camera lens n initial score of scene are f (n), e (i, n) is to pass through Gauss neighbour tune by camera lens i Camera lens score after whole, wherein i, n ∈ [1, N], N are camera lens sum to be retrieved, and e (i, n) is defined as follows,

E (i, n)=f (i) g (n-i) R (n)

Wherein, g (n) is gaussian sequence, and R (n) is rectangle window sequence；

After the score adjustment based on Gauss model, each camera lens obtains score e (n+ τ, n) ..., e (n+1, n), e (n, n) ..., e (n- τ, n),

Best result adjusted is selected to represent camera lens score adjusted.

The present invention correspondingly provides a kind of instance of video searching system for combining particular persons and scene, including with lower die Block,

Personage's retrieval module is examined for the case retrieval of particular persons in video, including for an inquiry personage p Rope, output inquiry personage p and the similarity score for inquiring each camera lens in video library, obtain the sequence knot of particular persons retrieval Fruit, as initial personage's search result；

Scene search module is examined for the case retrieval of special scenes in video, including for an inquiry scene s Rope, including with lower unit,

Local search unit, for carrying out the specific objective retrieval based on local feature；

Global search unit, for carrying out the special scenes retrieval based on global characteristics；

Combined retrieval unit is retrieved for realizing based on part and the special scenes of global Combinatorial Optimization, including according to base Camera lens is carried out to intersect weight in the specific objective search result of local feature and special scenes search result based on global characteristics Row obtains the ranking results of final special scenes retrieval, as initial scene search result；

Preliminary optimization module, for realizing the instance of video retrieval retained based on high score, removal personage's retrieval module gained Ranking behind in the ranking results of special scenes retrieval obtained by the ranking results and scene search module of particular persons retrieval As a result, obtaining scene search result after denoising descendant's object search result and denoising；

Neighbour's optimization module, for realizing the instance of video retrieval extended based on neighbour, including according to preliminary optimization module Acquired results carry out the optimization extended based on neighbour, the scene after personage's search result and neighbour's extension after obtaining neighbour's extension Search result；

Optimization module is merged, for merging particular persons retrieval and special scenes search result, including for each camera lens, Personage's search result after merging initial scene search result and neighbour's extension, then merge initial personage's search result and close Scene search after neighbour's extension obtains the camera lens ranking results of instance of video retrieval as a result, take the maximum value of two kinds of fusion results.

E (i, n)=f (i) g (n-i) R (n)

Wherein, g (n) is gaussian sequence, and R (n) is rectangle window sequence；

After the score adjustment based on Gauss model, each camera lens obtains score

E (n+ τ, n) ..., e (n+1, n), e (n, n) ..., e (n- τ, n),

Best result adjusted is selected to represent camera lens score adjusted.

Compared with the instance of video retrieval technique of existing joint particular persons and scene, the present invention is mainly had the following advantages that With the utility model has the advantages that

1) compared with prior art, invention removes ranked behind in initial ranking results as a result, leaning on ranking Preceding search result is relatively reliable；

2) compared with prior art, the present invention is based on the high story boards of neighbour to adjust low story board, accidentally delete so that many Camera lens is rearranged in ranking results by front position, so that instance of video retrieval ordering result is relatively reliable；

3) the instance of video retrieval technique of joint particular persons and scene is improved present invention introduces the mode of ranking and fusing Performance, the optimization in sequence level is so that scheme expansion and applicability are very strong.

Detailed description of the invention

Fig. 1 is schematic illustration of the embodiment of the present invention.

Fig. 2 is flow chart of the embodiment of the present invention.

Specific embodiment

The present invention is understood and implemented for the ease of those of ordinary skill in the art, it is right with reference to the accompanying drawings and embodiments The present invention is described in further detail, it should be understood that and implementation example described herein is merely to illustrate and explain the present invention, It is not intended to limit the present invention.

Referring to Fig. 1, the technical scheme adopted by the invention is that the instance of video retrieval of a kind of joint particular persons and scene The instance of video of method, joint particular persons and scene is retrieved starts with from retrieval particular persons and scene respectively when realizing, first Special scenes based on face recognition technology and part and global Combinatorial Optimization retrieve to obtain particular persons and special scenes are retrieved As a result, the sorting consistence that high score retains, neighbour extends is done to particular persons and special scenes search result, after finally fusion optimization Particular persons and special scenes search result obtain the instance of video search result of joint particular persons and scene.

The present embodiment, as Simulation Experimental Platform, is analyzed and is examined in International video using MATLAB R2015b and VS2013 It is tested on case retrieval task Instance Search (INS) data set of rope technology assessment TRECVID.INS data set 244 video clips in Britain's BBC TV play " people from East " comprising 464 hours, this 244 segments are divided into 471, 526 camera lenses have multiframe picture under each camera lens, and occur many personages and scene in these videos and picture, due to clapping The factors such as angle, time change are taken the photograph, these personages and scene are all changing always.

Referring to fig. 2, the process of the embodiment of the present invention includes:

Step 1, in video particular persons case retrieval: for a specific inquiry personage p, utilize recognition of face skill Art realizes that particular persons are retrieved, export specific inquiry personage p and inquires the similarity score of each camera lens of video library, obtains The ranking results retrieved to particular persons, as initial personage's search result.

The prior art can be used in face recognition technology specific implementation, such as uses dimension self-adaption based on Faster-RCNN Depth convolution Recurrent networks carry out Face datection, mainly comprising face candidate it is humanoid at two steps of face/background class, Faster-RCNN is deep learning network model；And learn face characteristic using depth convolutional neural networks, carry out face knowledge Not.Using the large-scale CASIA-WebFace face database training network pre-established, contain 80,000 pedestrians in the face database, and And each pedestrian contains 500-800 faces.When it is implemented, can refer to document:

Y.Zhu,J.Wang,C.Zhao,H.Guo and H.Lu.Scale-adaptive Deconvolutional Regression Network for Pedestrian Detection,ACCV,2016.

Haiyun Guo,et al.Multi-View 3D Object Retrieval with Deep Embedding Network,ICIP,2016.

Those skilled in the art can voluntarily specific face recognition technology selected to use, it will not go into details by the present invention.

Step 2, in video special scenes case retrieval: for a specific inquiry scene s, be based on given scenario figure The part of piece and global characteristics realize special scenes retrieval, and inquiring each video in video library has a plurality of lenses, and each camera lens has Multiple key frames, the present invention claims the camera lens which includes inquiry scene s is found out, the result of last each camera lens is closed by a certain The result of key frame indicates.

In embodiment, step 2 specific implementation includes following sub-step:

Step 2.1, the specific objective retrieval based on local feature；Multiple pictures to be checked are provided to an inquiry scene s, with Different rigid objects in every picture to be checked are specific target to be retrieved, and specific implementation includes following sub-step:

Step 2.1.1 is extracted the BOW feature (BOW indicates bag of words) of a picture target area to be checked, is calculated using SIFT After method extracts feature, SIFT feature is carried out the weighting of TF-IDF (one frequency inverse of keyword frequency) strategy and has carried out ROOT (to take Root) and normalization operation, finally, SIFT point in target area is successively trained each visual vocabulary in gained code book with preparatory Compare, find out 3 small visual vocabularies of Euclidean distance, this feature point (soft matching process) is represented with this 3 visual vocabularies, to each After the completion of SIFT point is handled respectively, the histogram distribution situation of visual vocabulary in the target area is calculated, target area can be obtained The BOW feature in domain.

Step 2.1.2 extracts the BOW feature of inquiry video library, extracts institute in inquiry all video lens of video library here There is the BOW feature of key frame, it is consistent with the target area BOW characteristic procedure for extracting picture to be checked.

Step 2.1.3 calculates related with institute in each camera lens each target area of each picture to be checked according to BOW feature The Euclidean distance of key frame respectively, taking minimum euclidean distance is the similarity of target area and camera lens.

In embodiment, the result walk-through based on BOW feature, the BOW feature obtained using both the above step, carry out respectively to The similarity calculation of all key frames respectively in each target area and each camera lens of picture is looked into, similarity is falling for Euclidean distance Number chooses the minimum euclidean distance of all key frames in target area and camera lens, between representing target area and certain camera lens for being looked into Distance, formula is as follows:

D(I_i, J) and=MIN { d (I_i,J₁),d(I_i,J₂),...,d(I_i,J_n)} (1)

Wherein I_iA target area of certain picture to be checked in all pictures to be checked is represented, J represents camera lens, and in the camera lens There is n key frame J₁,J₂,…,J_n, d (I_i,J_j) represent target area I_iWith key frame J a certain in camera lens_jBetween distance, j=1, Distance between 2 ..., J namely two images.This method uses the minimum range of all key frames in target area and camera lens (most Little Chiization) similarity between image and camera lens calculated, wherein d (I_i,J_j) use a kind of inquiry adaptive distance metric Method obtains, reference can be made to document Cai-Zhi Zhu, Herve Jegou, Shinichi Satoh.Query-adaptive asymmetrical dissimi-larities for visual object retrieval.In ICCV.(2013)

Step 2.1.4 takes the similarity maximum value of all target areas and the camera lens as camera lens on each camera lens respectively Similarity score obtains the specific objective search result based on local feature.

For multiple targets inside multiple pictures to be checked of a scene, the query result of each target represents this The search result of scape, the query result of each target are indicated with similarity score, and score is higher, indicate the result packet Probability containing the scene to be looked for is bigger, and the present invention is integrally a max- to the inquiry score of each target in all pictures to be checked Pooling (maximum value for taking all target fractionals), to represent search result of this scene based on local feature.

Step 2.2, the special scenes retrieval based on global characteristics passes through multiple corresponding figures to be checked of an inquiry scene s Piece goes retrieval scene, is mainly realized by convolutional neural networks model；Its implement the following steps are included:

Step 2.2.1, the global characteristics based on RCNN extract, and embodiment is disclosed in training on Torch using Facebook Good residual error network (RCNN) model carries out the feature extraction of image；Take multiple corresponding pictures to be checked of inquiry scene s as defeated Enter picture and extract feature, the key frame in all inquiry video libraries is extracted into feature as input picture.Using RCNN network Two kinds of outputs, one is the output features of input picture convolutional layer after e-learning, and dimension 2048*1, one is inputs The probability for being belonging respectively to predefined 1000 classifications of picture, dimension 1000*1.

Step 2.2.2, to each picture to be checked, is calculated European with key frames all in each camera lens difference according to CNN feature Distance, taking minimum euclidean distance is the similarity of picture to be checked and camera lens.

In embodiment, by above step, each picture to be checked is obtained and has inquired the CNN feature of video library, adopted here Picture is indicated with the feature of 2048*1, using the side with the specific objective retrieval walk-through based on local feature on sort result Method is similar, several images in picture to be checked and camera lens carry out after calculating, choose in all frames for being looked into camera lens with Inquiry picture between minimum range come represent the camera lens and inquire picture similarity, or using formula (1) method.

Present invention further propose that being belonging respectively to the general of predefined 1000 classifications according to the input picture of output Rate can preset a threshold value, certain a kind of probability is greater than this threshold value, decide that changing camera lens contains this classification, when looking for Scene be all indoor scene, for be determined as containing automobile etc. only can outdoor scene occur classification, can be by the camera lens Score be set to 0, be beneficial to improve precision.

Step 2.2.3 takes the similarity maximum value of all pictures to be checked and the camera lens as camera lens on each camera lens respectively Similarity score obtains the specific objective search result based on global characteristics.

Proceed from the situation as a whole, it is similar with step 2.1, firstly, to one inquire scene s multiple pictures to be checked, it is different to That looks into picture (takes distance D (I based on global target retrieval result_i, J) inverse) be normalized, then, using it is all not Best result with picture to be checked represents the camera lens, and last rearranged result obtains the final special scenes inspection based on global characteristics Hitch fruit.

In the present invention, the picture to be checked of every of each scene can do the distance metric of feature with inquiry video library, take away from Inverse from measurement can obtain each picture to be checked and inquire the similarity point of each key frame inside video library Number, takes the highest key frame of score inside each camera lens to indicate that this camera lens includes the probability of the scene to be looked for, finally right The search result of all inquiry pictures of each scene is a max-pooling (maximum value for taking all target fractionals), Represent search result of this scene based on global characteristics.

Step 2.3, it is retrieved based on part and the special scenes of global Combinatorial Optimization, while considering overall situation and partial situation, respectively Intersect the special scenes search result based on global and local feature, resets camera lens and obtain the sequence knot of final special scenes Fruit, as initial scene search result；

When it is implemented, can be intersected using default rule, such as translocation sorting query result, 3000 before ranking Result (ranking result is arranged according to similarity score, and score is higher, and ranking is more forward), be global and local each ranking Preceding 1500 result successively oscillation sorting, wherein global preceding, part is rear.

Step 3, the instance of video retrieval retained based on high score: for the view that there are a large amount of non-query cases in massive video The noise data of frequency removes the ranking in step 1 gained particular persons search result and step 2 gained special scenes search result Rearward as a result, personage's search result after being denoised and the scene search result after denoising.When it is implemented, can by than Example remove it is ranking behind as a result, for example from before ranking in translocation sorting query result 3000 as a result, 1/3 after removing, such as The result of ranking the 2001st~3000.

Step 4, based on the instance of video retrieval of neighbour's extension: since people or scene can be blocked, the phase of certain camera lenses It is not high deleted like degree score, present invention further propose that carrying out neighbour's expansion to particular persons retrieval and special scenes search result The optimization of exhibition, the specific implementation steps are as follows:

The optimization of neighbour's extension is carried out to the low story board in real camera, this method proposes the score based on Gauss model Adjusted Option improves low story board score using the high story board of neighbour, reaches the low story board score of adjustment, improves low story board Ranking, so that many camera lenses accidentally deleted are rearranged in ranking results by front position, so that ranking results are relatively reliable；Specific implementation Including following sub-step:

Step 4.1, it is assumed that arbitrarily face or the corresponding camera lens n initial score of scene are f (n), and e (i, n) is to be passed through by camera lens i Gauss neighbour camera lens score adjusted, wherein i, n ∈ [1, N], N are camera lens sum to be retrieved, the value of N in the present embodiment It is 471,526, e (i, n) to be defined as follows:

E (i, n)=f (i) g (n-i) R (n) (2)

Wherein g (n) is gaussian sequence, and R (n) is rectangle window sequence, and the two is defined as follows:

R (k)=u (k+ τ)-u (k-1- τ) (4)

Wherein, parameter k ∈ [0, ± 1, ± 2...], τ are that front and back extends camera lens number, and the value of τ is 8 in experimentation,

U (z) is Unit step sequence, and according to parameter z value, formula is as follows:

Step 4.2, theoretically, by based on Gauss model score adjustment after, each camera lens can obtain score e (n+ τ, N) ..., e (n+1, n), e (n, n) ..., e (n- τ, n),

The present invention selects best result adjusted to represent the low story board score adjusted, and formula is as follows:

f^*(n)=Max [e (n+ τ, n) ..., e (n+1, n), e (n, n) ..., e (n- τ, n)] (6)

Step 5, the retrieval of fusion particular persons and special scenes search result obtain finally combining particular persons and scene Instance of video search result.

Implementation are as follows: result and special scenes search result after first merging particular persons neighbour extension, then merge spy Determine personage's search result and special scenes neighbour extension after as a result, finally for each camera lens, take the maximum of both results Value indicates last fusion results, and formula is as follows:

Wherein f_p(n),f_sIt (n) is initial particular persons and special scenes search result,For neighbour's expansion Personage and scene search result after exhibition.The F (n) finally obtained is bigger, indicates that personage p is inquired in the camera lens appears in inquiry field The probability of scape s is bigger.Camera lens each in this way can obtain a similarity score, have a sequence according to this score, point It is higher several times, indicate in this camera lens inquire personage p appear in inquiry scene s probability it is bigger, ranking is more forward.Output lens Ranking results are to user.

When it is implemented, method provided by the present invention can realize automatic running process based on software technology, mould can also be used Block mode realizes corresponding system.

The embodiment of the present invention provides a kind of instance of video searching system for combining particular persons and scene, including with lower die Block,

Preliminary optimization module, for realizing the instance of video retrieval retained based on high score, removal personage's retrieval module gained Ranking behind in the ranking results of special scenes retrieval obtained by the ranking results and scene search module of particular persons retrieval As a result, personage's search result after being denoised and the scene search result after denoising；

Each module specific implementation can be found in corresponding steps, and it will not go into details by the present invention.

For the sake of the effect convenient for understanding the present embodiment technical solution, using the widely used average standard of field of image search True rate MAP (Mean Average Precision) is used as Indexes of Evaluation Effect, and this method considers precision and recall rate simultaneously, Its calculation formula is as follows:

Wherein parameter

M indicates total number in sorted lists, j ∈ { 1 ..., M } and be integer.Under the same terms, the bigger expression of MAP value Search result is more forward；

In the above process, instance of video search result to initial joint particular persons and scene and high score has been carried out The instance of video search result for retaining joint particular persons and scene after extending sorting consistence with neighbour has calculated separately MAP Value, is shown in Table 1.From the table 1 it can be found that it is of the invention retained based on high score, the joint particular persons of neighbour's extension and scene The retrieval performance of instance of video search method significantly improves.

MAP value of the table 1 on INS data set

Fusion results	MAP
		Initial results	0.1420
The optimum results of personage and initial scene results	0.1539
		The initial results of personage and the optimum results of scene	0.2134
The optimum results of personage and the optimum results of scene	0.2241

It should be understood that the part that this specification does not elaborate belongs to the prior art.

It should be understood that the above-mentioned description for preferred embodiment is more detailed, can not therefore be considered to this The limitation of invention patent protection range, those skilled in the art under the inspiration of the present invention, are not departing from power of the present invention Benefit requires to make replacement or deformation under protected ambit, fall within the scope of protection of the present invention, this hair It is bright range is claimed to be determined by the appended claims.

Claims

1. a kind of instance of video search method for combining particular persons and scene, it is characterised in that: include the following steps,

Step 1, in video particular persons case retrieval, including retrieve for an inquiry personage p, personage is inquired in output P and the similarity score for inquiring each camera lens in video library, obtain the ranking results of particular persons retrieval, as initial people Object search result；

Step 2, in video special scenes case retrieval, including retrieved for an inquiry scene s, including following sub-step Suddenly,

Step 2.3, it realizes and is retrieved based on part and the special scenes of global Combinatorial Optimization, including according to the spy based on local feature The search result that sets the goal and special scenes search result based on global characteristics carry out intersection rearrangement to camera lens, obtain final spy The ranking results for determining scene search, as initial scene search result；

Step 3, the instance of video retrieval retained based on high score, the ranking results of removal step 1 gained particular persons retrieval are realized With ranking behind in the ranking results of step 2 gained special scenes retrieval as a result, personage's search result after being denoised With the scene search result after denoising；

Step 4, it realizes the instance of video retrieval extended based on neighbour, including expand based on neighbour according to step 3 acquired results The optimization of exhibition, the scene search result after personage's search result and neighbour's extension after obtaining neighbour's extension；

The optimization extended based on neighbour, implementation is as follows,

If any face or the corresponding camera lens n initial score of scene are f (n), e (i, n) be by camera lens i after Gauss neighbour adjustment Camera lens score, wherein i, n ∈ [1, N], N are camera lens sum to be retrieved, and e (i, n) is defined as follows,

E (i, n)=f (i) g (n-i) R (n)

Wherein, g (n) is gaussian sequence, and R (n) is rectangle window sequence；

After the score adjustment based on Gauss model, each camera lens obtain score e (n+ τ, n) ..., e (n+1, n), e (n, N) ..., e (n- τ, n),

Wherein, τ is that front and back extends camera lens number；

Best result adjusted is selected to represent camera lens score adjusted；

Step 5, the retrieval of fusion particular persons and special scenes search result, including for each camera lens, merge initial scene Personage's search result after search result and neighbour's extension, then merge the scene after initial personage's search result and neighbour's extension Search result takes the maximum value of two kinds of fusion results, obtains the camera lens ranking results of instance of video retrieval.

2. combining the instance of video search method of particular persons and scene according to claim 1, it is characterised in that: the base It is retrieved in the specific objective of local feature, including to multiple corresponding pictures to be checked of an inquiry scene s, extracts every figure to be checked The BOW feature of each target area in piece；Extract the BOW feature of all key frames in inquiry all camera lenses of video library；According to BOW spy Sign calculates the Euclidean distance with all key frames in each camera lens to each target area of each picture to be checked, take it is minimum it is European away from From the similarity for target area and camera lens；To each camera lens, the similarity maximum value of all target areas Yu the camera lens is taken respectively As the similarity score of camera lens, the specific objective search result based on local feature is obtained.

3. combining the instance of video search method of particular persons and scene according to claim 1, it is characterised in that: the base It is retrieved in the special scenes of global characteristics, including to multiple corresponding pictures to be checked of an inquiry scene s, extracts every figure to be checked The CNN feature of piece extracts the CNN feature of all key frames in inquiry all camera lenses of video library；According to CNN feature, to each to be checked Picture calculates the Euclidean distance with all key frames in each camera lens, and taking minimum euclidean distance is that picture to be checked is similar to camera lens Degree；To each camera lens, similarity score of the similarity maximum value of all pictures to be checked and the camera lens as camera lens is taken respectively, is obtained Special scenes search result based on global characteristics.

4. a kind of instance of video searching system for combining particular persons and scene, it is characterised in that: it comprises the following modules,

Personage's retrieval module is retrieved for the case retrieval of particular persons in video, including for an inquiry personage p, Output inquiry personage p and the similarity score for inquiring each camera lens in video library, obtain the ranking results of particular persons retrieval, As initial personage's search result；

Scene search module is retrieved for the case retrieval of special scenes in video, including for an inquiry scene s, Including with lower unit,

Combined retrieval unit is retrieved for realizing based on part and the special scenes of global Combinatorial Optimization, including basis is based on office The specific objective search result of portion's feature and the special scenes search result based on global characteristics carry out intersection rearrangement to camera lens, obtain The ranking results retrieved to final special scenes, as initial scene search result；

Preliminary optimization module, it is specific obtained by removal personage's retrieval module for realizing the instance of video retrieval retained based on high score In the ranking results of special scenes retrieval obtained by the ranking results and scene search module of personage's retrieval rank behind as a result, The scene search result after personage's search result and denoising after being denoised；

Neighbour's optimization module, for realizing the instance of video retrieval extended based on neighbour, including according to obtained by preliminary optimization module As a result the optimization extended based on neighbour is carried out, the scene search after personage's search result and neighbour's extension after obtaining neighbour's extension As a result；

The optimization extended based on neighbour, implementation is as follows,

E (i, n)=f (i) g (n-i) R (n)

Wherein, g (n) is gaussian sequence, and R (n) is rectangle window sequence；

Wherein, τ is that front and back extends camera lens number；

Best result adjusted is selected to represent camera lens score adjusted；

Optimization module is merged, for merging particular persons retrieval and special scenes search result, including for each camera lens, fusion Personage's search result after initial scene search result and neighbour's extension, then merge initial personage's search result and neighbour and expand Scene search after exhibition obtains the camera lens ranking results of instance of video retrieval as a result, take the maximum value of two kinds of fusion results.

5. combining the instance of video searching system of particular persons and scene according to claim 4, it is characterised in that: the base It is retrieved in the specific objective of local feature, including to multiple corresponding pictures to be checked of an inquiry scene s, extracts every figure to be checked The BOW feature of each target area in piece；Extract the BOW feature of all key frames in inquiry all camera lenses of video library；According to BOW spy Sign calculates the Euclidean distance with all key frames in each camera lens to each target area of each picture to be checked, take it is minimum it is European away from From the similarity for target area and camera lens；To each camera lens, the similarity maximum value of all target areas Yu the camera lens is taken respectively As the similarity score of camera lens, the specific objective search result based on local feature is obtained.

6. combining the instance of video searching system of particular persons and scene according to claim 4, it is characterised in that: the base It is retrieved in the special scenes of global characteristics, including to multiple corresponding pictures to be checked of an inquiry scene s, extracts every figure to be checked The CNN feature of piece extracts the CNN feature of all key frames in inquiry all camera lenses of video library；According to CNN feature, to each to be checked Picture calculates the Euclidean distance with all key frames in each camera lens, and taking minimum euclidean distance is that picture to be checked is similar to camera lens Degree；To each camera lens, similarity score of the similarity maximum value of all pictures to be checked and the camera lens as camera lens is taken respectively, is obtained Special scenes search result based on global characteristics.