CN107315795A

CN107315795A - The instance of video search method and system of joint particular persons and scene

Info

Publication number: CN107315795A
Application number: CN201710454025.2A
Authority: CN
Inventors: 胡瑞敏; 兰佳梅; 王正; 徐东曙; 梁超; 陈军; 陈祎玥; 杨洋
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2017-06-15
Filing date: 2017-06-15
Publication date: 2017-11-03
Anticipated expiration: 2037-06-15
Also published as: CN107315795B

Abstract

The present invention relates to joint particular persons and the instance of video search method and system of scene, case retrieval including carrying out particular persons in video, retrieved based on the local special scenes with global Combinatorial Optimization, realize the instance of video retrieval retained based on high score, realize the instance of video retrieval extended based on neighbour, merge particular persons retrieval and special scenes retrieval result, including for each camera lens, the initial scene search result of fusion and personage's retrieval result after neighbour's extension, the scene search result after initial personage's retrieval result and neighbour's extension is merged again, take the maximum of two kinds of fusion results, obtain the camera lens ranking results of instance of video retrieval.The instance of video retrieval ordering result that the present invention is provided is relatively reliable, and expansion and applicability are very strong.

Description

The instance of video search method and system of joint particular persons and scene

Technical field

The invention belongs to video search technique area, it is related to a kind of instance of video retrieval technique scheme, more particularly to joint The instance of video search method and system of particular persons and scene.

Background technology

In video analysis and retrieval technique evaluation and test, instance of video retrieval refers to given inquiry sample (multitude of video fragment Or view data) and video library, retrieval occurs in that all video segments (camera lens) of given inquiry sample from video library, and It is ranked up according to the similarity degree with given inquiry sample.Inquiry sample can be different scenes containing specific people, car, thing Deng some images of specific objective, the video segment comprising the target can be also provided sometimes.Joint particular persons and scene are regarded Frequency case retrieval refers to retrieve the fragment that a certain particular persons occur in a certain special scenes in massive video data.The skill Art contributes to public security officer to exclude uncorrelated target in magnanimity monitor video, and focal point target is focused on, observed, analysis is disliked Object is doubted, magnanimity monitored video browse efficiency is significantly improved, and then to improving public security department's emergency disposal ability and social security Integrated prevention and control capacity, safeguard that people life property safety is significant.

Joint particular persons and the instance of video retrieval technique institute facing challenges of scene important come from three sides at present Face：Firstth, amount of video is huge, there is substantial amounts of noise, and a little target to be checked is found from the video of magnanimity and is difficult very much；The Two, there is situations such as clothing differs, posture changing, scene angle are converted in retrieval personage；3rd, face scene illumination change greatly, Situations such as serious shielding.The instance of video search method of existing joint particular persons and scene typically first retrieves particular persons respectively And scene, then merge with rear amalgamation mode the instance of video inspection that particular persons and scene search result obtain joint personage and scene Hitch fruit.Particular persons and scene search result are usually to be represented with fraction, and fraction is higher, represent that corresponding camera lens contains inquiry The probability of sample is bigger, and fusion method can be particular persons and the fraction of scene under addition or the same camera lens of multiplication.But Even correct camera lens to be checked, camera lens correspondence personage's retrieval result or scene search result are also not necessarily high.

Chinese patent literature CN105678250A, open (bulletin) day 2016.06.15, is disclosed in a kind of video Face identification method and device, face identification method in video and device described in the invention use dynamic identifying method, profit With each two field picture in video, the feature with relevance carries out complementation to the information of each two field picture on time dimension, so as to improve Although the accuracy of recognition of face, this method belongs to video search technique area, but this method only carried out personage retrieval and There is no scene search, be different from a kind of research angle for the instance of video retrieval for combining particular persons and scene.

Chinese patent literature CN106022313A, open (day for announcing) 2016.10.12, disclosing one kind can fit automatically The face identification method of scene is answered, the face identification method of the automatic adaptation scene described in this method refers to employ convolutional Neural Network algorithm model is compensated, compared with traditional manual operations, and with stronger automatic type, scene is not related to Retrieval, is different from a kind of research angle for the instance of video retrieval for combining particular persons and scene.

Chinese patent literature CN104794219A, open (bulletin) day 2015.07.22, discloses a kind of based on geography The scene search method of positional information, this method is indexed using the geography information and global description's of scene image, filtering Substantial amounts of irrelevant image, improves the efficiency of visual vocabulary space checking and the accuracy rate of images match, and this method is only carried out The retrieval of scene is without carrying out personage's retrieval, the research angle retrieved with a kind of instance of video for combine particular persons and scene Degree is different.

Chinese patent literature CN104820711A, open (bulletin) day 2015.08.05, discloses a kind of complex scene Under the similarity that continuous on-line tuning searches for image, online updating, mould are passed through to the video retrieval method of humanoid target, this method Type produces a new wheel retrieval result after updating, this method allows man-machine interaction to come in newly renewal Machine Vision Recognition model library To a satisfied retrieval result, this method is not automatically generated result, thus the retrieval performance of this method need to be lifted, and And combine with us in the instance of video search method of particular persons and scene and obtained by merging the retrieval result of personage and scene It is different to last result.

Chinese patent literature CN104517104A, air switch (bulletin) day 2015.04.15, discloses a kind of based on monitoring Scene servant face recognition method and system, this method by using Gabor characteristic and multiple dimensioned RILPQ feature scores level fusion Mode, reduces that illumination unevenness of face image is even, there are problems that the anglec of rotation and recognition of face produced Influence, is effectively improved the face identification rate under monitoring scene, and the invention is not applied for except other under monitoring scene Scene, in order to which the instance of video search method for combining particular persons and scene is applied into multiple scenes, this method also has optimization Space.

The content of the invention

In view of the deficienciess of the prior art, the invention provides a kind of instance of video inspection for combining particular persons and scene Rope technical scheme, by being merged again after using the sorting consistence that high score retains, neighbour extends to initial retrieval result Last ranking results, and then lifting retrieves the accuracy rate that particular persons occur in special scenes.

The technical solution adopted in the present invention is a kind of instance of video search method for combining particular persons and scene, including Following steps,

Step 1, in video particular persons case retrieval, including for one inquiry personage p retrieved, output inquire about Personage p and the similarity score for inquiring about each camera lens in video library, obtain the ranking results of particular persons retrieval, as initial Personage's retrieval result；

Step 2, in video special scenes case retrieval, including retrieved for an inquiry scene s, it is including following Sub-step,

Step 2.1, the specific objective retrieval based on local feature is carried out；

Step 2.2, the special scenes retrieval based on global characteristics is carried out；

Step 2.3, realize and retrieved based on part and the special scenes of global Combinatorial Optimization, including according to based on local feature Specific objective retrieval result and special scenes retrieval result based on global characteristics intersection rearrangement is carried out to camera lens, obtain final Special scenes retrieval ranking results, be used as initial scene search result；

Step 3, the instance of video retrieval retained based on high score, the sequence of the gained particular persons of removal step 1 retrieval are realized As a result the result ranked behind in the ranking results retrieved with step 2 gained special scenes, obtains the retrieval of the personage after denoising As a result with the scene search result after denoising；

Step 4, the instance of video retrieval extended based on neighbour is realized, including is carried out according to step 3 acquired results based near The optimization of neighbour's extension, obtains personage's retrieval result after neighbour's extension and the scene search result after neighbour's extension；

Step 5, the retrieval of fusion particular persons and special scenes retrieval result, including for each camera lens, merge initially Scene search result and neighbour extend after personage's retrieval result, then merge after initial personage's retrieval result and neighbour's extension Scene search result, takes the maximum of two kinds of fusion results, obtains the camera lens ranking results of instance of video retrieval.

Moreover, the specific objective retrieval based on local feature, including corresponding multiple are to be checked to an inquiry scene s Picture, extracts the BOW features of each target area in every picture to be checked；Extract all key frames in inquiry all camera lenses of video library BOW features；According to BOW features, to each target area of each picture to be checked, calculate and distinguish with all key frames in each camera lens Euclidean distance, it is the similarity of target area and camera lens to take minimum euclidean distance；To each camera lens, all target areas are taken respectively Similarity maximum with the camera lens obtains the specific objective retrieval knot based on local feature as the similarity score of camera lens Really；

Moreover, the special scenes retrieval based on global characteristics, including corresponding multiple are to be checked to an inquiry scene s Picture, extracts the CNN features of every picture to be checked, extracts the CNN features of all key frames in inquiry all camera lenses of video library；Root According to CNN features, to each picture to be checked, the Euclidean distance with all key frame difference in each camera lens is calculated, minimum euclidean distance is taken For the similarity of picture to be checked and camera lens；To each camera lens, the similarity maximum of all pictures to be checked and the camera lens is taken to make respectively For the similarity score of camera lens, the specific objective retrieval result based on global characteristics is obtained；

Moreover, carrying out the optimization extended based on neighbour, implementation is as follows,

If any face or the corresponding camera lens n initial scores of scene are f (n), e (i, n) is to be adjusted by camera lens i by Gauss neighbour Camera lens fraction after whole, wherein i, n ∈ [1, N], N is camera lens sum to be retrieved, and e (i, n) is defined as follows,

E (i, n)=f (i) g (n-i) R (n)

Wherein, g (n) is gaussian sequence, and R (n) is rectangle window sequence；

By based on Gauss model fraction adjustment after, each camera lens obtain fraction e (n+ τ, n) ..., e (n+1, n), e (n, n) ..., e (n- τ, n),

Best result after selection adjustment represents the fraction after camera lens adjustment.

The present invention correspondingly provides a kind of instance of video searching system for combining particular persons and scene, including following mould Block,

Personage retrieves module, for the case retrieval of particular persons in video, including is examined for an inquiry personage p The similarity score of each camera lens, obtains the sequence knot of particular persons retrieval in rope, output inquiry personage p and inquiry video library Really, as initial personage's retrieval result；

Scene search module, is examined for the case retrieval of special scenes in video, including for an inquiry scene s Rope, including with lower unit,

Local search unit, for carrying out the specific objective retrieval based on local feature；

Global search unit, for carrying out the special scenes retrieval based on global characteristics；

Combined retrieval unit, is retrieved for realizing based on the local special scenes with global Combinatorial Optimization, including according to base Camera lens is carried out to intersect weight in the specific objective retrieval result of local feature and the special scenes retrieval result based on global characteristics Row, obtains the ranking results of final special scenes retrieval, is used as initial scene search result；

Preliminary optimization module, for realizing the instance of video retrieval retained based on high score, is removed obtained by personage's retrieval module Ranking behind in the ranking results of special scenes retrieval obtained by the ranking results and scene search module of particular persons retrieval As a result, scene retrieval result after denoising descendant's thing retrieval result and denoising is obtained；

Neighbour's optimization module, for realizing the instance of video retrieval extended based on neighbour, including according to preliminary optimization module Acquired results carry out the optimization extended based on neighbour, obtain personage's retrieval result after neighbour's extension and the scene after neighbour's extension Retrieval result；

Optimization module is merged, for merging particular persons retrieval and special scenes retrieval result, including for each camera lens, The initial scene search result of fusion and personage's retrieval result after neighbour's extension, then merge initial personage's retrieval result and near Scene search result after neighbour's extension, takes the maximum of two kinds of fusion results, obtains the camera lens ranking results of instance of video retrieval.

E (i, n)=f (i) g (n-i) R (n)

Wherein, g (n) is gaussian sequence, and R (n) is rectangle window sequence；

After the fraction adjustment based on Gauss model, each camera lens obtains fraction

E (n+ τ, n) ..., e (n+1, n), e (n, n) ..., e (n- τ, n),

Compared with the instance of video retrieval technique of existing joint particular persons and scene, the present invention mainly has advantages below And beneficial effect：

1) compared with prior art, invention removes the result ranked behind in initial ranking results so that ranking is leaned on Preceding retrieval result is relatively reliable；

2) compared with prior art, the present invention adjusts low story board based on the high story board of neighbour so that many to delete by mistake Camera lens is rearranged in the forward position of ranking results so that instance of video retrieval ordering result is relatively reliable；

3) the instance of video retrieval technique of joint particular persons and scene is improved present invention introduces the mode of ranking and fusing Performance, the optimization in sequence aspect causes scheme expansion and applicability very strong.

Brief description of the drawings

Fig. 1 is principle schematic of the embodiment of the present invention.

Fig. 2 is flow chart of the embodiment of the present invention.

Embodiment

Understand for the ease of the art those of ordinary skill and implement the present invention, below in conjunction with the accompanying drawings and embodiment pair The present invention is described in further detail, it will be appreciated that implementation example described herein is merely to illustrate and explain the present invention, It is not intended to limit the present invention.

Referring to Fig. 1, the technical solution adopted in the present invention is a kind of instance of video retrieval for combining particular persons and scene The instance of video retrieval of method, joint particular persons and scene is started with from retrieval particular persons and scene respectively when realizing, first Particular persons are obtained based on face recognition technology and the local special scenes retrieval with global Combinatorial Optimization and special scenes are retrieved As a result, particular persons and special scenes retrieval result are done with the sorting consistence that high score retains, neighbour extends, after finally fusion optimization Particular persons and special scenes retrieval result obtain the instance of video retrieval result of joint particular persons and scene.

The present embodiment, as Simulation Experimental Platform, is analyzed with examining using MATLAB R2015b and VS2013 in International video Tested on rope technology test and appraisal TRECVID case retrieval task Instance Search (INS) data set.INS data sets Include Britain's BBC TV plays of 464 hours《People from East》In 244 video segments, this 244 fragments are divided into 471, Have under 526 camera lenses, each camera lens and many personages and scene are occurred in that in multiframe picture, and these videos and picture, due to clapping The factors such as angle, time change are taken the photograph, these personages and scene are always all in change.

Referring to Fig. 2, the flow of the embodiment of the present invention includes：

Step 1, in video particular persons case retrieval：For a specific inquiry personage p, recognition of face skill is utilized Art realizes that particular persons are retrieved, and the similarity score of each camera lens of the specific inquiry personage p of output and inquiry video library obtains The ranking results retrieved to particular persons, are used as initial personage's retrieval result.

Face recognition technology, which is implemented, can use prior art, for example, use dimension self-adaption based on Faster-RCNN Depth convolution Recurrent networks carry out Face datection, mainly comprising face candidate it is humanoid into two steps of face/background class, Faster-RCNN is deep learning network model；And using depth convolutional neural networks study face characteristic, carry out face knowledge Not.Using the large-scale CASIA-WebFace face databases training network pre-established, 80,000 pedestrians are contained in the face database, and And each pedestrian contains 500-800 faces.When it is implemented, referring to document：

Y.Zhu,J.Wang,C.Zhao,H.Guo and H.Lu.Scale-adaptive Deconvolutional Regression Network for Pedestrian Detection,ACCV,2016.

Haiyun Guo,et al.Multi-View 3D Object Retrieval with Deep Embedding Network,ICIP,2016.

Those skilled in the art can voluntarily select the specific face recognition technology used, and it will not go into details by the present invention.

Step 2, in video special scenes case retrieval：For a specific inquiry scene s, based on given scenario figure The part and global characteristics of piece realize that special scenes are retrieved, and each video has multiple camera lenses in inquiry video library, and each camera lens has Multiple key frames, application claims find out the camera lens which includes inquiry scene s, and the result of last each camera lens is by a certain pass The result of key frame is represented.

In embodiment, step 2 is implemented including following sub-step：

Step 2.1, the specific objective retrieval based on local feature；Multiple pictures to be checked are provided to an inquiry scene s, with Different rigid objects in every picture to be checked are specific target to be retrieved, and it is implemented including following sub-step：

Step 2.1.1, is extracted the BOW features (BOW represents bag of words) of a picture target area to be checked, is calculated using SIFT Method is extracted after feature, the weighting of TF-IDF (frequency inverse of keyword frequency one) strategies is carried out to SIFT feature and ROOT has been carried out (taking Root) and normalization operation, finally, by SIFT points in target area successively with each visual vocabulary in code book obtained by training in advance Compare, find out 3 small visual vocabularies of Euclidean distance, this feature point (soft matching process) is represented with this 3 visual vocabularies, to each After the completion of SIFT points are handled respectively, the histogram distribution situation of visual vocabulary in the target area is calculated, you can obtain target area The BOW features in domain.

Step 2.1.2, extracts the BOW features of inquiry video library, and institute in inquiry all video lens of video library is extracted here There are the BOW features of key frame, it is consistent with the target area BOW characteristic procedures for extracting picture to be checked.

Step 2.1.3, according to BOW features, to each target area of each picture to be checked, is calculated relevant with institute in each camera lens The Euclidean distance of key frame respectively, it is target area and the similarity of camera lens to take minimum euclidean distance.

In embodiment, the result walk-through based on BOW features, the BOW features obtained using both the above step are respectively treated Each target area of picture and the Similarity Measure of all key frames respectively in each camera lens are looked into, similarity is fallen for Euclidean distance Number, chooses the minimum euclidean distance of all key frames in target area and camera lens, represents target area between certain camera lens for being looked into Distance, formula is as follows：

D(I_i, J) and=MIN { d (I_i,J₁),d(I_i,J₂),...,d(I_i,J_n)} (1)

Wherein I_iA target area of certain picture to be checked in all pictures to be checked is represented, J is represented in camera lens, and the camera lens There is n key frame J₁,J₂,…,J_n, d (I_i,J_j) represent target area I_iWith a certain key frame J in camera lens_jBetween distance, j=1, Distance between 2 ..., J, namely two images.This method uses the minimum range of all key frames in target area and camera lens (most Little Chiization) similarity between image and camera lens is calculated, wherein d (I_i,J_j) using a kind of inquiry adaptive distance metric Method is obtained, reference can be made to document Cai-Zhi Zhu, Herve Jegou, Shinichi Satoh.Query-adaptive asymmetrical dissimi-larities for visual object retrieval.In ICCV.(2013)

Step 2.1.4, to each camera lens, takes the similarity maximum of all target areas and the camera lens as camera lens respectively Similarity score, obtains the specific objective retrieval result based on local feature.

For multiple targets inside a scene multiple pictures to be checked, the Query Result of each target represents this The retrieval result of scape, the Query Result of each target represents that fraction is higher with similarity score, represents the result bag Probability containing the scene to be looked for is bigger, and the present invention is integrally a max- to the inquiry fraction of each target in all pictures to be checked Pooling (maximum for taking all target fractionals), to represent retrieval result of this scene based on local feature.

Step 2.2, the special scenes retrieval based on global characteristics, passes through multiple corresponding figures to be checked of inquiry scene s Piece, goes to retrieve scene, is mainly realized by convolutional neural networks model；It, which is implemented, comprises the following steps：

Step 2.2.1, the global characteristics based on RCNN are extracted, and embodiment is disclosed on Torch using Facebook and trained Good residual error network (RCNN) model carries out the feature extraction of image；Multiple corresponding pictures to be checked of inquiry scene s are taken as defeated Enter picture and extract feature, the key frame in all inquiry video libraries is extracted into feature as input picture.Using RCNN networks Two kinds of outputs, a kind of is the output characteristic for inputting picture convolutional layer after e-learning, and dimension is 2048*1, and one kind is input The probability for being belonging respectively to predefined 1000 classifications of picture, dimension is 1000*1.

Step 2.2.2, according to CNN features, to each picture to be checked, calculate with all key frames in each camera lens distinguish it is European Distance, it is the similarity of picture to be checked and camera lens to take minimum euclidean distance.

In embodiment, by above step, the CNN features of each picture to be checked and inquiry video library are obtained, have been adopted here Picture is represented with 2048*1 feature, the side that walk-through is retrieved with the specific objective based on local feature is used on sort result Method is similar, and several images in picture to be checked and camera lens enter after row distance calculates, choose in all frames for being looked into camera lens with Inquiry picture between minimum range come represent the camera lens with inquiry picture similarity, or using formula (1) method.

Present invention further propose that, the general of predefined 1000 classifications is belonging respectively to according to the input picture of output Rate, can preset a threshold value, and the probability of a certain class is more than this threshold value, decides that changing camera lens contains this classification, when looking for Scene be all indoor scene, for be determined as containing automobile etc. only can outdoor scene occur classification, can be by the camera lens Fraction be set to 0, be beneficial to improve precision.

Step 2.2.3, to each camera lens, takes the similarity maximum of all pictures to be checked and the camera lens as camera lens respectively Similarity score, obtains the specific objective retrieval result based on global characteristics.

Proceed from the situation as a whole, similar with step 2.1, first, to an inquiry scene s multiple pictures to be checked, different treats Look into (being taken based on global target retrieval result apart from D (I for picture_i, J) inverse) be normalized, then, using it is all not Best result with picture to be checked represents the camera lens, and last rearranged result obtains the final special scenes inspection based on global characteristics Hitch fruit.

In the present invention, every picture to be checked of each scene can do the distance metric of feature with inquiry video library, take away from From the similarity point reciprocal with regard to each key frame inside each picture to be checked and inquiry video library can be obtained of measurement Take fraction highest key frame to represent that this camera lens includes the probability of the scene to be looked for inside number, each camera lens, it is finally right The retrieval result of all inquiry pictures of each scene does a max-pooling (maximum for taking all target fractionals), Represent retrieval result of this scene based on global characteristics.

Step 2.3, retrieved based on the local special scenes with global Combinatorial Optimization, while considering overall situation and partial situation, respectively Intersect the special scenes retrieval result based on global and local feature, reset the sequence knot that camera lens obtains final special scenes Really, as initial scene search result；

When it is implemented, can be intersected using default rule, such as translocation sorting Query Result, 3000 before ranking Result (ranking result is arranged according to similarity score, and fraction is higher, and ranking is more forward), be each ranking of global and local Preceding 1500 result oscillation sorting successively, wherein it is global preceding, it is local rear.

Step 3, the instance of video retrieval retained based on high score：For there is regarding for a large amount of non-query cases in massive video Ranking in the noise data of frequency, the gained particular persons retrieval result of removal step 1 and step 2 gained special scenes retrieval result Result rearward, obtains the scene search result after personage's retrieval result and denoising after denoising.When it is implemented, can by than Example remove the result ranked behind, such as from translocation sorting Query Result before ranking 3000 result, remove after 1/3, for example The result of ranking the 2001st~3000.

Step 4, the instance of video retrieval extended based on neighbour：Because people or scene can be blocked, the phase of some camera lenses It is not high deleted like degree fraction, present invention further propose that carrying out neighbour's expansion to particular persons retrieval and special scenes retrieval result The optimization of exhibition, implements step as follows：

The optimization of neighbour's extension is carried out to the low story board in real camera, this method proposes the fraction based on Gauss model Adjusted Option, using the high story board of neighbour, improves low story board fraction, reaches the low story board fraction of adjustment, improves low story board Ranking so that many camera lenses deleted by mistake are rearranged in the forward position of ranking results so that ranking results are relatively reliable；Implement Including following sub-step：

Step 4.1, it is assumed that any face or the corresponding camera lens n initial scores of scene are f (n), e (i, n) is to be passed through by camera lens i Camera lens fraction after Gauss neighbour adjustment, wherein i, n ∈ [1, N], the value that N is N in camera lens sum to be retrieved, the present embodiment For 471,526, e (i, n) are defined as follows：

E (i, n)=f (i) g (n-i) R (n) (2)

Wherein g (n) is gaussian sequence, and R (n) is rectangle window sequence, and the two is defined as follows：

R (k)=u (k+ τ)-u (k-1- τ) (4)

Wherein, parameter k ∈ [0, ± 1, ± 2...], the value that τ is τ in front and rear extension camera lens number, experimentation is 8,

U (z) is Unit step sequence, and according to parameter z values, formula is as follows：

Step 4.2, in theory, by based on Gauss model fraction adjustment after, each camera lens can obtain fraction e (n+ τ, N) ..., e (n+1, n), e (n, n) ..., e (n- τ, n),

Best result after present invention selection adjustment represents the fraction after the low story board adjustment, and formula is as follows：

f^*(n)=Max [e (n+ τ, n) ..., e (n+1, n), e (n, n) ..., e (n- τ, n)] (6)

Step 5, the retrieval of fusion particular persons and special scenes retrieval result, obtain final joint particular persons and scene Instance of video retrieval result.

Implementation is：The first result and special scenes retrieval result after fusion particular persons neighbour extension, then merge spy Determine the result after personage's retrieval result and special scenes neighbour extension, finally for each camera lens, take the maximum of both results Value represents last fusion results, and formula is as follows：

Wherein f_p(n),f_s(n) it is initial particular persons and special scenes retrieval result,Extended for neighbour Personage afterwards and scene search result.The F (n) finally obtained is bigger, represents that personage p is inquired about in the camera lens appears in inquiry scene S probability is bigger.So each camera lens can obtain a similarity score, and a sequence, fraction are had according to this fraction Degree is higher, represent to inquire about in this camera lens personage p appear in inquiry scene s probability it is bigger, ranking is more forward.Output lenses are arranged Sequence result is to user.

When it is implemented, method provided by the present invention can realize automatic running flow based on software engineering, mould can be also used Block mode realizes corresponding system.

The embodiment of the present invention provides a kind of instance of video searching system for combining particular persons and scene, including following mould Block,

Preliminary optimization module, for realizing the instance of video retrieval retained based on high score, is removed obtained by personage's retrieval module Ranking behind in the ranking results of special scenes retrieval obtained by the ranking results and scene search module of particular persons retrieval As a result, the scene search result after personage's retrieval result and denoising after denoising is obtained；

Each module, which is implemented, can be found in corresponding steps, and it will not go into details by the present invention.

For the sake of effect for ease of understanding the present embodiment technical scheme, using the widely used average standard of field of image search True rate MAP (Mean Average Precision) considers precision and recall rate simultaneously as Indexes of Evaluation Effect, this method, Its calculation formula is as follows：

Wherein parameter

M represents total number in sorted lists, j ∈ { 1 ..., M } and be integer.Under the same terms, MAP value is bigger to be represented Retrieval result is more forward；

In said process, instance of video retrieval result to initial joint particular persons and scene and high score has been carried out Retain and neighbour extends the joint particular persons after sorting consistence and the instance of video retrieval result of scene calculates MAP respectively Value, is shown in Table 1.It can be found that being retained based on high score of the present invention, the joint particular persons of neighbour's extension and scene from the table 1 The retrieval performance of instance of video search method is significantly improved.

MAP value of the table 1 on INS data sets

Fusion results	MAP
		Initial results	0.1420
The optimum results of personage and initial scene results	0.1539
		The initial results of personage and the optimum results of scene	0.2134
The optimum results of personage and the optimum results of scene	0.2241

It should be appreciated that the part that this specification is not elaborated belongs to prior art.

It should be appreciated that the above-mentioned description for preferred embodiment is more detailed, therefore it can not be considered to this The limitation of invention patent protection scope, one of ordinary skill in the art is not departing from power of the present invention under the enlightenment of the present invention Profit is required under protected ambit, can also be made replacement or be deformed, each fall within protection scope of the present invention, this hair It is bright scope is claimed to be determined by the appended claims.

Claims

1. a kind of instance of video search method for combining particular persons and scene, it is characterised in that：Comprise the following steps,

Step 1, in video particular persons case retrieval, including for one inquiry personage p retrieved, output inquire about personage P and the similarity score for inquiring about each camera lens in video library, obtain the ranking results of particular persons retrieval, are used as initial people Thing retrieval result；

Step 2, in video special scenes case retrieval, including retrieved for an inquiry scene s, including following sub-step Suddenly,

Step 2.3, realize and retrieved based on the local special scenes with global Combinatorial Optimization, including according to the spy based on local feature The retrieval result that sets the goal and special scenes retrieval result based on global characteristics carry out intersection rearrangement to camera lens, obtain final spy Determine the ranking results of scene search, be used as initial scene search result；

Step 3, the instance of video retrieval retained based on high score, the ranking results of the gained particular persons of removal step 1 retrieval are realized The result ranked behind in the ranking results retrieved with step 2 gained special scenes, obtains personage's retrieval result after denoising With the scene search result after denoising；

Step 4, the instance of video retrieval extended based on neighbour is realized, including is expanded according to the progress of step 3 acquired results based on neighbour The optimization of exhibition, obtains personage's retrieval result after neighbour's extension and the scene search result after neighbour's extension；

Step 5, the retrieval of fusion particular persons and special scenes retrieval result, including for each camera lens, merge initial scene Retrieval result and neighbour extend after personage's retrieval result, then merge the scene after initial personage's retrieval result and neighbour's extension Retrieval result, takes the maximum of two kinds of fusion results, obtains the camera lens ranking results of instance of video retrieval.

2. combine the instance of video search method of particular persons and scene according to claim 1, it is characterised in that：The base Retrieved in the specific objective of local feature, including to multiple corresponding pictures to be checked of inquiry scene s, extract every figure to be checked The BOW features of each target area in piece；Extract the BOW features of all key frames in inquiry all camera lenses of video library；It is special according to BOW Levy, to each target area of each picture to be checked, calculate the Euclidean distance with all key frame difference in each camera lens, take minimum Europe Formula distance is target area and the similarity of camera lens；To each camera lens, the similarity of all target areas and the camera lens is taken respectively most Big value obtains the specific objective retrieval result based on local feature as the similarity score of camera lens.

3. combine the instance of video search method of particular persons and scene according to claim 1, it is characterised in that：The base Retrieved in the special scenes of global characteristics, including to multiple corresponding pictures to be checked of inquiry scene s, extract every figure to be checked The CNN features of piece, extract the CNN features of all key frames in inquiry all camera lenses of video library；According to CNN features, to each to be checked Picture, calculates the Euclidean distance with all key frame difference in each camera lens, it is picture to be checked and camera lens to take minimum euclidean distance Similarity；To each camera lens, the similarity maximum of all pictures to be checked and the camera lens is taken respectively as the similarity score of camera lens, Obtain the specific objective retrieval result based on global characteristics.

4. combine the instance of video search method of particular persons and scene according to claim 1 or 2 or 3, it is characterised in that： The optimization extended based on neighbour is carried out, implementation is as follows,

If any face or the corresponding camera lens n initial scores of scene are f (n), e (i, n) be by camera lens i after Gauss neighbour adjusts Camera lens fraction, wherein i, n ∈ [1, N], N is camera lens sum to be retrieved, and e (i, n) is defined as follows,

E (i, n)=f (i) g (n-i) R (n)

Wherein, g (n) is gaussian sequence, and R (n) is rectangle window sequence；

5. a kind of instance of video searching system for combining particular persons and scene, it is characterised in that：Including with lower module,

Personage retrieves module, for the case retrieval of particular persons in video, including is retrieved for an inquiry personage p, Output inquiry personage p and the similarity score for inquiring about each camera lens in video library, obtain the ranking results of particular persons retrieval, It is used as initial personage's retrieval result；

Scene search module, is retrieved for the case retrieval of special scenes in video, including for an inquiry scene s, Including with lower unit,

Combined retrieval unit, is retrieved for realizing based on the local special scenes with global Combinatorial Optimization, including basis is based on office The specific objective retrieval result of portion's feature and the special scenes retrieval result based on global characteristics carry out intersection rearrangement to camera lens, obtain The ranking results retrieved to final special scenes, are used as initial scene search result；

Preliminary optimization module, for realizing the instance of video retrieval retained based on high score, is removed specific obtained by personage's retrieval module The result ranked behind in the ranking results of special scenes retrieval obtained by the ranking results and scene search module of personage's retrieval, Obtain the scene search result after personage's retrieval result and denoising after denoising；

Neighbour's optimization module, for realizing the instance of video retrieval extended based on neighbour, including according to obtained by preliminary optimization module As a result the optimization extended based on neighbour is carried out, personage's retrieval result after neighbour's extension and the scene search after neighbour's extension is obtained As a result；

Optimization module is merged, for merging particular persons retrieval and special scenes retrieval result, including for each camera lens, fusion Initial scene search result extended with neighbour after personage's retrieval result, then merge initial personage's retrieval result and neighbour expands Scene search result after exhibition, takes the maximum of two kinds of fusion results, obtains the camera lens ranking results of instance of video retrieval.

6. combine the instance of video searching system of particular persons and scene according to claim 5, it is characterised in that：The base Retrieved in the specific objective of local feature, including to multiple corresponding pictures to be checked of inquiry scene s, extract every figure to be checked The BOW features of each target area in piece；Extract the BOW features of all key frames in inquiry all camera lenses of video library；It is special according to BOW Levy, to each target area of each picture to be checked, calculate the Euclidean distance with all key frame difference in each camera lens, take minimum Europe Formula distance is target area and the similarity of camera lens；To each camera lens, the similarity of all target areas and the camera lens is taken respectively most Big value obtains the specific objective retrieval result based on local feature as the similarity score of camera lens.

7. combine the instance of video searching system of particular persons and scene according to claim 5, it is characterised in that：The base Retrieved in the special scenes of global characteristics, including to multiple corresponding pictures to be checked of inquiry scene s, extract every figure to be checked The CNN features of piece, extract the CNN features of all key frames in inquiry all camera lenses of video library；According to CNN features, to each to be checked Picture, calculates the Euclidean distance with all key frame difference in each camera lens, it is picture to be checked and camera lens to take minimum euclidean distance Similarity；To each camera lens, the similarity maximum of all pictures to be checked and the camera lens is taken respectively as the similarity score of camera lens, Obtain the specific objective retrieval result based on global characteristics.

8. combine the instance of video searching system of particular persons and scene according to claim 5 or 6 or 7, it is characterised in that： The optimization extended based on neighbour is carried out, implementation is as follows,

E (i, n)=f (i) g (n-i) R (n)

Wherein, g (n) is gaussian sequence, and R (n) is rectangle window sequence；