CN107315795B - The instance of video search method and system of joint particular persons and scene - Google Patents
The instance of video search method and system of joint particular persons and scene Download PDFInfo
- Publication number
- CN107315795B CN107315795B CN201710454025.2A CN201710454025A CN107315795B CN 107315795 B CN107315795 B CN 107315795B CN 201710454025 A CN201710454025 A CN 201710454025A CN 107315795 B CN107315795 B CN 107315795B
- Authority
- CN
- China
- Prior art keywords
- camera lens
- scene
- retrieval
- search result
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Library & Information Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to the instance of video search methods and system of joint particular persons and scene, case retrieval including carrying out particular persons in video, retrieved based on part and the special scenes of global Combinatorial Optimization, realize the instance of video retrieval retained based on high score, realize the instance of video retrieval extended based on neighbour, merge particular persons retrieval and special scenes search result, including for each camera lens, personage's search result after merging initial scene search result and neighbour's extension, scene search result after merging initial personage's search result and neighbour's extension again, take the maximum value of two kinds of fusion results, obtain the camera lens ranking results of instance of video retrieval.Instance of video retrieval ordering result provided by the invention is relatively reliable, and expansion and applicability are very strong.
Description
Technical field
The invention belongs to video search technique areas, are related to a kind of instance of video retrieval technique scheme, more particularly to joint
The instance of video search method and system of particular persons and scene.
Background technique
In video analysis and retrieval technique evaluation and test, instance of video retrieval refers to given inquiry sample (multitude of video segment
Or image data) and video library, there are all video clips (camera lens) of given inquiry sample in retrieval from video library, and
It is ranked up according to the similarity degree with given inquiry sample.Inquiry sample can be different scenes containing specific people, vehicle, object
Several images of equal specific objectives, can also provide the video clip comprising the target sometimes.The view of joint particular persons and scene
Frequency case retrieval refers to retrieves the segment that a certain particular persons occur in a certain special scenes in massive video data.The skill
Art facilitates public security officer and excludes uncorrelated target in magnanimity monitor video, and focal point target focuses, observation, analysis suspicion
Object is doubted, significantly improves magnanimity monitored video browse efficiency, and then to raising public security department's emergency disposal ability and social security
Integrated prevention and control capacity, maintenance people life property safety are of great significance.
Joint particular persons and the instance of video retrieval technique institute facing challenges of scene are important from three sides at present
Face: the first, amount of video is huge, there are a large amount of noise, a little target to be checked is found from the video of magnanimity and is not easy very much;The
Two, there is situations such as wearing different, posture changing, scene angle transformation clothes in retrieval personage;Third, face scene illumination variation greatly,
Situations such as serious shielding.The instance of video search method of existing joint particular persons and scene generally first retrieves particular persons respectively
And scene, then examined with the instance of video that rear amalgamation mode fusion particular persons and scene search result obtain joint personage and scene
Hitch fruit.Particular persons and scene search result are usually to be indicated with score, and score is higher, indicates that corresponding camera lens contains inquiry
The probability of sample is bigger, the score of particular persons and scene under fusion method can be addition or the same camera lens that is multiplied.However
Even correct camera lens to be checked, camera lens corresponds to personage's search result or scene search result is also not necessarily high.
Chinese patent literature CN105678250A, open (bulletin) day 2016.06.15 disclose in a kind of video
Face identification method and device, face identification method and device in video described in the invention use dynamic identifying method, benefit
Complementation is carried out to the information of each frame image with feature of the frame image each in video on time dimension with relevance, to improve
Although the accuracy of recognition of face, this method belong to video search technique area, but this method only carried out personage retrieval and
There is no scene search, is different with a kind of research angle of instance of video retrieval for combining particular persons and scene.
Chinese patent literature CN106022313A, open (day for announcing) 2016.10.12, disclose a kind of can fit automatically
The face identification method of scene is answered, the face identification method of automatic adaptation scene described in this method refers to using convolutional Neural
Network algorithm model compensates, and compared with traditional manual operations, has stronger automatic type, is not related to scene
Retrieval is different with a kind of research angle of instance of video retrieval for combining particular persons and scene.
Chinese patent literature CN104794219A, open (bulletin) day 2015.07.22 disclose a kind of based on geography
The scene search method of location information, this method are indexed using the geography information and global description's of scene image, filtering
A large amount of irrelevant image, improves the efficiency of visual vocabulary space verifying and the accuracy rate of images match, and this method only carries out
The retrieval of scene is without carrying out personage's retrieval, with a kind of research angle that the instance of video for combine particular persons and scene is retrieved
Degree is different.
Chinese patent literature CN104820711A, discloses a kind of complex scene at open (bulletin) day 2015.08.05
Under to the video retrieval method of humanoid target, this method searches for the similarity of image, online updating, mould by continuous on-line tuning
Type generates a new wheel search result after updating, this method allows human-computer interaction to come in newly update Machine Vision Recognition model library
The search result satisfied to one, this method be not automatically generated as a result, thus the retrieval performance of this method there are also to be hoisted, and
And combines in the instance of video search method of particular persons and scene with us and obtained by merging the search result of personage and scene
Result to the end is different.
Chinese patent literature CN104517104A is spaced apart (bulletin) day 2015.04.15, discloses a kind of based on monitoring
Scene servant face recognition method and system, this method by using Gabor characteristic and multiple dimensioned RILPQ feature scores grade fusion
Mode, reduces that illumination unevenness of face image is even, has what the problems such as rotation angle and image are fuzzy generated recognition of face
It influences, effectively improves the face identification rate under monitoring scene, which is not applied in addition to other under monitoring scene
Scene is suitable for multiple scenes to combine the instance of video search method of particular persons and scene, and there are also optimize for this method
Space.
Summary of the invention
In view of the deficiencies of the prior art, the present invention provides the instance of video inspections of a kind of joint particular persons and scene
Rope technical solution, by being retained using high score initial retrieval result, being merged to obtain again after the sorting consistence of neighbour's extension
Last ranking results, and then promoted and retrieve the accuracy rate that particular persons occur in special scenes.
The technical scheme adopted by the invention is that a kind of instance of video search method for combining particular persons and scene, including
Following steps,
Step 1, in video particular persons case retrieval, including retrieve for an inquiry personage p, output is inquired
Personage p and the similarity score for inquiring each camera lens in video library, obtain the ranking results of particular persons retrieval, as initial
Personage's search result;
Step 2, in video special scenes case retrieval, including retrieved for an inquiry scene s, including following
Sub-step,
Step 2.1, the specific objective retrieval based on local feature is carried out;
Step 2.2, the special scenes retrieval based on global characteristics is carried out;
Step 2.3, it realizes and is retrieved based on the local special scenes with global Combinatorial Optimization, including according to based on local feature
Specific objective search result and special scenes search result based on global characteristics intersection rearrangement is carried out to camera lens, obtain final
Special scenes retrieval ranking results, as initial scene search result;
Step 3, the instance of video retrieval retained based on high score, the sequence of removal step 1 gained particular persons retrieval are realized
As a result with step 2 gained special scenes retrieval ranking results in rank behind as a result, the personage after being denoised retrieves
As a result with the scene search result after denoising;
Step 4, it realizes the instance of video retrieval extended based on neighbour, including is carried out according to step 3 acquired results based on close
The optimization of neighbour's extension, the scene search result after personage's search result and neighbour's extension after obtaining neighbour's extension;
Step 5, the retrieval of fusion particular persons and special scenes search result, including for each camera lens, merge initial
Personage's search result after scene search result and neighbour's extension, then after merging initial personage's search result and neighbour's extension
Scene search obtains the camera lens ranking results of instance of video retrieval as a result, take the maximum value of two kinds of fusion results.
Moreover, the specific objective retrieval based on local feature, including multiple are to be checked accordingly to an inquiry scene s
Picture extracts the BOW feature of each target area in every picture to be checked;Extract all key frames in inquiry all camera lenses of video library
BOW feature;According to BOW feature, to each target area of each picture to be checked, calculates and distinguish with key frames all in each camera lens
Euclidean distance, take minimum euclidean distance be target area and camera lens similarity;To each camera lens, all target areas are taken respectively
Similarity score with the similarity maximum value of the camera lens as camera lens obtains the specific objective retrieval knot based on local feature
Fruit;
Moreover, the special scenes retrieval based on global characteristics, including multiple are to be checked accordingly to an inquiry scene s
Picture extracts the CNN feature of every picture to be checked, extracts the CNN feature of all key frames in inquiry all camera lenses of video library;Root
According to CNN feature, to each picture to be checked, the Euclidean distance respectively with key frames all in each camera lens is calculated, minimum euclidean distance is taken
For the similarity of picture to be checked and camera lens;To each camera lens, the similarity maximum value of all pictures to be checked and the camera lens is taken to make respectively
For the similarity score of camera lens, the specific objective search result based on global characteristics is obtained;
Moreover, carrying out the optimization extended based on neighbour, implementation is as follows,
If any face or the corresponding camera lens n initial score of scene are f (n), e (i, n) is to pass through Gauss neighbour tune by camera lens i
Camera lens score after whole, wherein i, n ∈ [1, N], N are camera lens sum to be retrieved, and e (i, n) is defined as follows,
E (i, n)=f (i) g (n-i) R (n)
Wherein, g (n) is gaussian sequence, and R (n) is rectangle window sequence;
After the score adjustment based on Gauss model, each camera lens obtains score e (n+ τ, n) ..., e (n+1, n), e
(n, n) ..., e (n- τ, n),
Best result adjusted is selected to represent camera lens score adjusted.
The present invention correspondingly provides a kind of instance of video searching system for combining particular persons and scene, including with lower die
Block,
Personage's retrieval module is examined for the case retrieval of particular persons in video, including for an inquiry personage p
Rope, output inquiry personage p and the similarity score for inquiring each camera lens in video library, obtain the sequence knot of particular persons retrieval
Fruit, as initial personage's search result;
Scene search module is examined for the case retrieval of special scenes in video, including for an inquiry scene s
Rope, including with lower unit,
Local search unit, for carrying out the specific objective retrieval based on local feature;
Global search unit, for carrying out the special scenes retrieval based on global characteristics;
Combined retrieval unit is retrieved for realizing based on part and the special scenes of global Combinatorial Optimization, including according to base
Camera lens is carried out to intersect weight in the specific objective search result of local feature and special scenes search result based on global characteristics
Row obtains the ranking results of final special scenes retrieval, as initial scene search result;
Preliminary optimization module, for realizing the instance of video retrieval retained based on high score, removal personage's retrieval module gained
Ranking behind in the ranking results of special scenes retrieval obtained by the ranking results and scene search module of particular persons retrieval
As a result, obtaining scene search result after denoising descendant's object search result and denoising;
Neighbour's optimization module, for realizing the instance of video retrieval extended based on neighbour, including according to preliminary optimization module
Acquired results carry out the optimization extended based on neighbour, the scene after personage's search result and neighbour's extension after obtaining neighbour's extension
Search result;
Optimization module is merged, for merging particular persons retrieval and special scenes search result, including for each camera lens,
Personage's search result after merging initial scene search result and neighbour's extension, then merge initial personage's search result and close
Scene search after neighbour's extension obtains the camera lens ranking results of instance of video retrieval as a result, take the maximum value of two kinds of fusion results.
Moreover, the specific objective retrieval based on local feature, including multiple are to be checked accordingly to an inquiry scene s
Picture extracts the BOW feature of each target area in every picture to be checked;Extract all key frames in inquiry all camera lenses of video library
BOW feature;According to BOW feature, to each target area of each picture to be checked, calculates and distinguish with key frames all in each camera lens
Euclidean distance, take minimum euclidean distance be target area and camera lens similarity;To each camera lens, all target areas are taken respectively
Similarity score with the similarity maximum value of the camera lens as camera lens obtains the specific objective retrieval knot based on local feature
Fruit;
Moreover, the special scenes retrieval based on global characteristics, including multiple are to be checked accordingly to an inquiry scene s
Picture extracts the CNN feature of every picture to be checked, extracts the CNN feature of all key frames in inquiry all camera lenses of video library;Root
According to CNN feature, to each picture to be checked, the Euclidean distance respectively with key frames all in each camera lens is calculated, minimum euclidean distance is taken
For the similarity of picture to be checked and camera lens;To each camera lens, the similarity maximum value of all pictures to be checked and the camera lens is taken to make respectively
For the similarity score of camera lens, the specific objective search result based on global characteristics is obtained;
Moreover, carrying out the optimization extended based on neighbour, implementation is as follows,
If any face or the corresponding camera lens n initial score of scene are f (n), e (i, n) is to pass through Gauss neighbour tune by camera lens i
Camera lens score after whole, wherein i, n ∈ [1, N], N are camera lens sum to be retrieved, and e (i, n) is defined as follows,
E (i, n)=f (i) g (n-i) R (n)
Wherein, g (n) is gaussian sequence, and R (n) is rectangle window sequence;
After the score adjustment based on Gauss model, each camera lens obtains score
E (n+ τ, n) ..., e (n+1, n), e (n, n) ..., e (n- τ, n),
Best result adjusted is selected to represent camera lens score adjusted.
Compared with the instance of video retrieval technique of existing joint particular persons and scene, the present invention is mainly had the following advantages that
With the utility model has the advantages that
1) compared with prior art, invention removes ranked behind in initial ranking results as a result, leaning on ranking
Preceding search result is relatively reliable;
2) compared with prior art, the present invention is based on the high story boards of neighbour to adjust low story board, accidentally delete so that many
Camera lens is rearranged in ranking results by front position, so that instance of video retrieval ordering result is relatively reliable;
3) the instance of video retrieval technique of joint particular persons and scene is improved present invention introduces the mode of ranking and fusing
Performance, the optimization in sequence level is so that scheme expansion and applicability are very strong.
Detailed description of the invention
Fig. 1 is schematic illustration of the embodiment of the present invention.
Fig. 2 is flow chart of the embodiment of the present invention.
Specific embodiment
The present invention is understood and implemented for the ease of those of ordinary skill in the art, it is right with reference to the accompanying drawings and embodiments
The present invention is described in further detail, it should be understood that and implementation example described herein is merely to illustrate and explain the present invention,
It is not intended to limit the present invention.
Referring to Fig. 1, the technical scheme adopted by the invention is that the instance of video retrieval of a kind of joint particular persons and scene
The instance of video of method, joint particular persons and scene is retrieved starts with from retrieval particular persons and scene respectively when realizing, first
Special scenes based on face recognition technology and part and global Combinatorial Optimization retrieve to obtain particular persons and special scenes are retrieved
As a result, the sorting consistence that high score retains, neighbour extends is done to particular persons and special scenes search result, after finally fusion optimization
Particular persons and special scenes search result obtain the instance of video search result of joint particular persons and scene.
The present embodiment, as Simulation Experimental Platform, is analyzed and is examined in International video using MATLAB R2015b and VS2013
It is tested on case retrieval task Instance Search (INS) data set of rope technology assessment TRECVID.INS data set
244 video clips in Britain's BBC TV play " people from East " comprising 464 hours, this 244 segments are divided into 471,
526 camera lenses have multiframe picture under each camera lens, and occur many personages and scene in these videos and picture, due to clapping
The factors such as angle, time change are taken the photograph, these personages and scene are all changing always.
Referring to fig. 2, the process of the embodiment of the present invention includes:
Step 1, in video particular persons case retrieval: for a specific inquiry personage p, utilize recognition of face skill
Art realizes that particular persons are retrieved, export specific inquiry personage p and inquires the similarity score of each camera lens of video library, obtains
The ranking results retrieved to particular persons, as initial personage's search result.
The prior art can be used in face recognition technology specific implementation, such as uses dimension self-adaption based on Faster-RCNN
Depth convolution Recurrent networks carry out Face datection, mainly comprising face candidate it is humanoid at two steps of face/background class,
Faster-RCNN is deep learning network model;And learn face characteristic using depth convolutional neural networks, carry out face knowledge
Not.Using the large-scale CASIA-WebFace face database training network pre-established, contain 80,000 pedestrians in the face database, and
And each pedestrian contains 500-800 faces.When it is implemented, can refer to document:
Y.Zhu,J.Wang,C.Zhao,H.Guo and H.Lu.Scale-adaptive Deconvolutional
Regression Network for Pedestrian Detection,ACCV,2016.
Haiyun Guo,et al.Multi-View 3D Object Retrieval with Deep Embedding
Network,ICIP,2016.
Those skilled in the art can voluntarily specific face recognition technology selected to use, it will not go into details by the present invention.
Step 2, in video special scenes case retrieval: for a specific inquiry scene s, be based on given scenario figure
The part of piece and global characteristics realize special scenes retrieval, and inquiring each video in video library has a plurality of lenses, and each camera lens has
Multiple key frames, the present invention claims the camera lens which includes inquiry scene s is found out, the result of last each camera lens is closed by a certain
The result of key frame indicates.
In embodiment, step 2 specific implementation includes following sub-step:
Step 2.1, the specific objective retrieval based on local feature;Multiple pictures to be checked are provided to an inquiry scene s, with
Different rigid objects in every picture to be checked are specific target to be retrieved, and specific implementation includes following sub-step:
Step 2.1.1 is extracted the BOW feature (BOW indicates bag of words) of a picture target area to be checked, is calculated using SIFT
After method extracts feature, SIFT feature is carried out the weighting of TF-IDF (one frequency inverse of keyword frequency) strategy and has carried out ROOT (to take
Root) and normalization operation, finally, SIFT point in target area is successively trained each visual vocabulary in gained code book with preparatory
Compare, find out 3 small visual vocabularies of Euclidean distance, this feature point (soft matching process) is represented with this 3 visual vocabularies, to each
After the completion of SIFT point is handled respectively, the histogram distribution situation of visual vocabulary in the target area is calculated, target area can be obtained
The BOW feature in domain.
Step 2.1.2 extracts the BOW feature of inquiry video library, extracts institute in inquiry all video lens of video library here
There is the BOW feature of key frame, it is consistent with the target area BOW characteristic procedure for extracting picture to be checked.
Step 2.1.3 calculates related with institute in each camera lens each target area of each picture to be checked according to BOW feature
The Euclidean distance of key frame respectively, taking minimum euclidean distance is the similarity of target area and camera lens.
In embodiment, the result walk-through based on BOW feature, the BOW feature obtained using both the above step, carry out respectively to
The similarity calculation of all key frames respectively in each target area and each camera lens of picture is looked into, similarity is falling for Euclidean distance
Number chooses the minimum euclidean distance of all key frames in target area and camera lens, between representing target area and certain camera lens for being looked into
Distance, formula is as follows:
D(Ii, J) and=MIN { d (Ii,J1),d(Ii,J2),...,d(Ii,Jn)} (1)
Wherein IiA target area of certain picture to be checked in all pictures to be checked is represented, J represents camera lens, and in the camera lens
There is n key frame J1,J2,…,Jn, d (Ii,Jj) represent target area IiWith key frame J a certain in camera lensjBetween distance, j=1,
Distance between 2 ..., J namely two images.This method uses the minimum range of all key frames in target area and camera lens (most
Little Chiization) similarity between image and camera lens calculated, wherein d (Ii,Jj) use a kind of inquiry adaptive distance metric
Method obtains, reference can be made to document Cai-Zhi Zhu, Herve Jegou, Shinichi Satoh.Query-adaptive
asymmetrical dissimi-larities for visual object retrieval.In ICCV.(2013)
Step 2.1.4 takes the similarity maximum value of all target areas and the camera lens as camera lens on each camera lens respectively
Similarity score obtains the specific objective search result based on local feature.
For multiple targets inside multiple pictures to be checked of a scene, the query result of each target represents this
The search result of scape, the query result of each target are indicated with similarity score, and score is higher, indicate the result packet
Probability containing the scene to be looked for is bigger, and the present invention is integrally a max- to the inquiry score of each target in all pictures to be checked
Pooling (maximum value for taking all target fractionals), to represent search result of this scene based on local feature.
Step 2.2, the special scenes retrieval based on global characteristics passes through multiple corresponding figures to be checked of an inquiry scene s
Piece goes retrieval scene, is mainly realized by convolutional neural networks model;Its implement the following steps are included:
Step 2.2.1, the global characteristics based on RCNN extract, and embodiment is disclosed in training on Torch using Facebook
Good residual error network (RCNN) model carries out the feature extraction of image;Take multiple corresponding pictures to be checked of inquiry scene s as defeated
Enter picture and extract feature, the key frame in all inquiry video libraries is extracted into feature as input picture.Using RCNN network
Two kinds of outputs, one is the output features of input picture convolutional layer after e-learning, and dimension 2048*1, one is inputs
The probability for being belonging respectively to predefined 1000 classifications of picture, dimension 1000*1.
Step 2.2.2, to each picture to be checked, is calculated European with key frames all in each camera lens difference according to CNN feature
Distance, taking minimum euclidean distance is the similarity of picture to be checked and camera lens.
In embodiment, by above step, each picture to be checked is obtained and has inquired the CNN feature of video library, adopted here
Picture is indicated with the feature of 2048*1, using the side with the specific objective retrieval walk-through based on local feature on sort result
Method is similar, several images in picture to be checked and camera lens carry out after calculating, choose in all frames for being looked into camera lens with
Inquiry picture between minimum range come represent the camera lens and inquire picture similarity, or using formula (1) method.
Present invention further propose that being belonging respectively to the general of predefined 1000 classifications according to the input picture of output
Rate can preset a threshold value, certain a kind of probability is greater than this threshold value, decide that changing camera lens contains this classification, when looking for
Scene be all indoor scene, for be determined as containing automobile etc. only can outdoor scene occur classification, can be by the camera lens
Score be set to 0, be beneficial to improve precision.
Step 2.2.3 takes the similarity maximum value of all pictures to be checked and the camera lens as camera lens on each camera lens respectively
Similarity score obtains the specific objective search result based on global characteristics.
Proceed from the situation as a whole, it is similar with step 2.1, firstly, to one inquire scene s multiple pictures to be checked, it is different to
That looks into picture (takes distance D (I based on global target retrieval resulti, J) inverse) be normalized, then, using it is all not
Best result with picture to be checked represents the camera lens, and last rearranged result obtains the final special scenes inspection based on global characteristics
Hitch fruit.
In the present invention, the picture to be checked of every of each scene can do the distance metric of feature with inquiry video library, take away from
Inverse from measurement can obtain each picture to be checked and inquire the similarity point of each key frame inside video library
Number, takes the highest key frame of score inside each camera lens to indicate that this camera lens includes the probability of the scene to be looked for, finally right
The search result of all inquiry pictures of each scene is a max-pooling (maximum value for taking all target fractionals),
Represent search result of this scene based on global characteristics.
Step 2.3, it is retrieved based on part and the special scenes of global Combinatorial Optimization, while considering overall situation and partial situation, respectively
Intersect the special scenes search result based on global and local feature, resets camera lens and obtain the sequence knot of final special scenes
Fruit, as initial scene search result;
When it is implemented, can be intersected using default rule, such as translocation sorting query result, 3000 before ranking
Result (ranking result is arranged according to similarity score, and score is higher, and ranking is more forward), be global and local each ranking
Preceding 1500 result successively oscillation sorting, wherein global preceding, part is rear.
Step 3, the instance of video retrieval retained based on high score: for the view that there are a large amount of non-query cases in massive video
The noise data of frequency removes the ranking in step 1 gained particular persons search result and step 2 gained special scenes search result
Rearward as a result, personage's search result after being denoised and the scene search result after denoising.When it is implemented, can by than
Example remove it is ranking behind as a result, for example from before ranking in translocation sorting query result 3000 as a result, 1/3 after removing, such as
The result of ranking the 2001st~3000.
Step 4, based on the instance of video retrieval of neighbour's extension: since people or scene can be blocked, the phase of certain camera lenses
It is not high deleted like degree score, present invention further propose that carrying out neighbour's expansion to particular persons retrieval and special scenes search result
The optimization of exhibition, the specific implementation steps are as follows:
The optimization of neighbour's extension is carried out to the low story board in real camera, this method proposes the score based on Gauss model
Adjusted Option improves low story board score using the high story board of neighbour, reaches the low story board score of adjustment, improves low story board
Ranking, so that many camera lenses accidentally deleted are rearranged in ranking results by front position, so that ranking results are relatively reliable;Specific implementation
Including following sub-step:
Step 4.1, it is assumed that arbitrarily face or the corresponding camera lens n initial score of scene are f (n), and e (i, n) is to be passed through by camera lens i
Gauss neighbour camera lens score adjusted, wherein i, n ∈ [1, N], N are camera lens sum to be retrieved, the value of N in the present embodiment
It is 471,526, e (i, n) to be defined as follows:
E (i, n)=f (i) g (n-i) R (n) (2)
Wherein g (n) is gaussian sequence, and R (n) is rectangle window sequence, and the two is defined as follows:
R (k)=u (k+ τ)-u (k-1- τ) (4)
Wherein, parameter k ∈ [0, ± 1, ± 2...], τ are that front and back extends camera lens number, and the value of τ is 8 in experimentation,
U (z) is Unit step sequence, and according to parameter z value, formula is as follows:
Step 4.2, theoretically, by based on Gauss model score adjustment after, each camera lens can obtain score e (n+ τ,
N) ..., e (n+1, n), e (n, n) ..., e (n- τ, n),
The present invention selects best result adjusted to represent the low story board score adjusted, and formula is as follows:
f*(n)=Max [e (n+ τ, n) ..., e (n+1, n), e (n, n) ..., e (n- τ, n)] (6)
Step 5, the retrieval of fusion particular persons and special scenes search result obtain finally combining particular persons and scene
Instance of video search result.
Implementation are as follows: result and special scenes search result after first merging particular persons neighbour extension, then merge spy
Determine personage's search result and special scenes neighbour extension after as a result, finally for each camera lens, take the maximum of both results
Value indicates last fusion results, and formula is as follows:
Wherein fp(n),fsIt (n) is initial particular persons and special scenes search result,For neighbour's expansion
Personage and scene search result after exhibition.The F (n) finally obtained is bigger, indicates that personage p is inquired in the camera lens appears in inquiry field
The probability of scape s is bigger.Camera lens each in this way can obtain a similarity score, have a sequence according to this score, point
It is higher several times, indicate in this camera lens inquire personage p appear in inquiry scene s probability it is bigger, ranking is more forward.Output lens
Ranking results are to user.
When it is implemented, method provided by the present invention can realize automatic running process based on software technology, mould can also be used
Block mode realizes corresponding system.
The embodiment of the present invention provides a kind of instance of video searching system for combining particular persons and scene, including with lower die
Block,
Personage's retrieval module is examined for the case retrieval of particular persons in video, including for an inquiry personage p
Rope, output inquiry personage p and the similarity score for inquiring each camera lens in video library, obtain the sequence knot of particular persons retrieval
Fruit, as initial personage's search result;
Scene search module is examined for the case retrieval of special scenes in video, including for an inquiry scene s
Rope, including with lower unit,
Local search unit, for carrying out the specific objective retrieval based on local feature;
Global search unit, for carrying out the special scenes retrieval based on global characteristics;
Combined retrieval unit is retrieved for realizing based on part and the special scenes of global Combinatorial Optimization, including according to base
Camera lens is carried out to intersect weight in the specific objective search result of local feature and special scenes search result based on global characteristics
Row obtains the ranking results of final special scenes retrieval, as initial scene search result;
Preliminary optimization module, for realizing the instance of video retrieval retained based on high score, removal personage's retrieval module gained
Ranking behind in the ranking results of special scenes retrieval obtained by the ranking results and scene search module of particular persons retrieval
As a result, personage's search result after being denoised and the scene search result after denoising;
Neighbour's optimization module, for realizing the instance of video retrieval extended based on neighbour, including according to preliminary optimization module
Acquired results carry out the optimization extended based on neighbour, the scene after personage's search result and neighbour's extension after obtaining neighbour's extension
Search result;
Optimization module is merged, for merging particular persons retrieval and special scenes search result, including for each camera lens,
Personage's search result after merging initial scene search result and neighbour's extension, then merge initial personage's search result and close
Scene search after neighbour's extension obtains the camera lens ranking results of instance of video retrieval as a result, take the maximum value of two kinds of fusion results.
Each module specific implementation can be found in corresponding steps, and it will not go into details by the present invention.
For the sake of the effect convenient for understanding the present embodiment technical solution, using the widely used average standard of field of image search
True rate MAP (Mean Average Precision) is used as Indexes of Evaluation Effect, and this method considers precision and recall rate simultaneously,
Its calculation formula is as follows:
Wherein parameter
M indicates total number in sorted lists, j ∈ { 1 ..., M } and be integer.Under the same terms, the bigger expression of MAP value
Search result is more forward;
In the above process, instance of video search result to initial joint particular persons and scene and high score has been carried out
The instance of video search result for retaining joint particular persons and scene after extending sorting consistence with neighbour has calculated separately MAP
Value, is shown in Table 1.From the table 1 it can be found that it is of the invention retained based on high score, the joint particular persons of neighbour's extension and scene
The retrieval performance of instance of video search method significantly improves.
MAP value of the table 1 on INS data set
Fusion results | MAP |
Initial results | 0.1420 |
The optimum results of personage and initial scene results | 0.1539 |
The initial results of personage and the optimum results of scene | 0.2134 |
The optimum results of personage and the optimum results of scene | 0.2241 |
It should be understood that the part that this specification does not elaborate belongs to the prior art.
It should be understood that the above-mentioned description for preferred embodiment is more detailed, can not therefore be considered to this
The limitation of invention patent protection range, those skilled in the art under the inspiration of the present invention, are not departing from power of the present invention
Benefit requires to make replacement or deformation under protected ambit, fall within the scope of protection of the present invention, this hair
It is bright range is claimed to be determined by the appended claims.
Claims (6)
1. a kind of instance of video search method for combining particular persons and scene, it is characterised in that: include the following steps,
Step 1, in video particular persons case retrieval, including retrieve for an inquiry personage p, personage is inquired in output
P and the similarity score for inquiring each camera lens in video library, obtain the ranking results of particular persons retrieval, as initial people
Object search result;
Step 2, in video special scenes case retrieval, including retrieved for an inquiry scene s, including following sub-step
Suddenly,
Step 2.1, the specific objective retrieval based on local feature is carried out;
Step 2.2, the special scenes retrieval based on global characteristics is carried out;
Step 2.3, it realizes and is retrieved based on part and the special scenes of global Combinatorial Optimization, including according to the spy based on local feature
The search result that sets the goal and special scenes search result based on global characteristics carry out intersection rearrangement to camera lens, obtain final spy
The ranking results for determining scene search, as initial scene search result;
Step 3, the instance of video retrieval retained based on high score, the ranking results of removal step 1 gained particular persons retrieval are realized
With ranking behind in the ranking results of step 2 gained special scenes retrieval as a result, personage's search result after being denoised
With the scene search result after denoising;
Step 4, it realizes the instance of video retrieval extended based on neighbour, including expand based on neighbour according to step 3 acquired results
The optimization of exhibition, the scene search result after personage's search result and neighbour's extension after obtaining neighbour's extension;
The optimization extended based on neighbour, implementation is as follows,
If any face or the corresponding camera lens n initial score of scene are f (n), e (i, n) be by camera lens i after Gauss neighbour adjustment
Camera lens score, wherein i, n ∈ [1, N], N are camera lens sum to be retrieved, and e (i, n) is defined as follows,
E (i, n)=f (i) g (n-i) R (n)
Wherein, g (n) is gaussian sequence, and R (n) is rectangle window sequence;
After the score adjustment based on Gauss model, each camera lens obtain score e (n+ τ, n) ..., e (n+1, n), e (n,
N) ..., e (n- τ, n),
Wherein, τ is that front and back extends camera lens number;
Best result adjusted is selected to represent camera lens score adjusted;
Step 5, the retrieval of fusion particular persons and special scenes search result, including for each camera lens, merge initial scene
Personage's search result after search result and neighbour's extension, then merge the scene after initial personage's search result and neighbour's extension
Search result takes the maximum value of two kinds of fusion results, obtains the camera lens ranking results of instance of video retrieval.
2. combining the instance of video search method of particular persons and scene according to claim 1, it is characterised in that: the base
It is retrieved in the specific objective of local feature, including to multiple corresponding pictures to be checked of an inquiry scene s, extracts every figure to be checked
The BOW feature of each target area in piece;Extract the BOW feature of all key frames in inquiry all camera lenses of video library;According to BOW spy
Sign calculates the Euclidean distance with all key frames in each camera lens to each target area of each picture to be checked, take it is minimum it is European away from
From the similarity for target area and camera lens;To each camera lens, the similarity maximum value of all target areas Yu the camera lens is taken respectively
As the similarity score of camera lens, the specific objective search result based on local feature is obtained.
3. combining the instance of video search method of particular persons and scene according to claim 1, it is characterised in that: the base
It is retrieved in the special scenes of global characteristics, including to multiple corresponding pictures to be checked of an inquiry scene s, extracts every figure to be checked
The CNN feature of piece extracts the CNN feature of all key frames in inquiry all camera lenses of video library;According to CNN feature, to each to be checked
Picture calculates the Euclidean distance with all key frames in each camera lens, and taking minimum euclidean distance is that picture to be checked is similar to camera lens
Degree;To each camera lens, similarity score of the similarity maximum value of all pictures to be checked and the camera lens as camera lens is taken respectively, is obtained
Special scenes search result based on global characteristics.
4. a kind of instance of video searching system for combining particular persons and scene, it is characterised in that: it comprises the following modules,
Personage's retrieval module is retrieved for the case retrieval of particular persons in video, including for an inquiry personage p,
Output inquiry personage p and the similarity score for inquiring each camera lens in video library, obtain the ranking results of particular persons retrieval,
As initial personage's search result;
Scene search module is retrieved for the case retrieval of special scenes in video, including for an inquiry scene s,
Including with lower unit,
Local search unit, for carrying out the specific objective retrieval based on local feature;
Global search unit, for carrying out the special scenes retrieval based on global characteristics;
Combined retrieval unit is retrieved for realizing based on part and the special scenes of global Combinatorial Optimization, including basis is based on office
The specific objective search result of portion's feature and the special scenes search result based on global characteristics carry out intersection rearrangement to camera lens, obtain
The ranking results retrieved to final special scenes, as initial scene search result;
Preliminary optimization module, it is specific obtained by removal personage's retrieval module for realizing the instance of video retrieval retained based on high score
In the ranking results of special scenes retrieval obtained by the ranking results and scene search module of personage's retrieval rank behind as a result,
The scene search result after personage's search result and denoising after being denoised;
Neighbour's optimization module, for realizing the instance of video retrieval extended based on neighbour, including according to obtained by preliminary optimization module
As a result the optimization extended based on neighbour is carried out, the scene search after personage's search result and neighbour's extension after obtaining neighbour's extension
As a result;
The optimization extended based on neighbour, implementation is as follows,
If any face or the corresponding camera lens n initial score of scene are f (n), e (i, n) be by camera lens i after Gauss neighbour adjustment
Camera lens score, wherein i, n ∈ [1, N], N are camera lens sum to be retrieved, and e (i, n) is defined as follows,
E (i, n)=f (i) g (n-i) R (n)
Wherein, g (n) is gaussian sequence, and R (n) is rectangle window sequence;
After the score adjustment based on Gauss model, each camera lens obtain score e (n+ τ, n) ..., e (n+1, n), e (n,
N) ..., e (n- τ, n),
Wherein, τ is that front and back extends camera lens number;
Best result adjusted is selected to represent camera lens score adjusted;
Optimization module is merged, for merging particular persons retrieval and special scenes search result, including for each camera lens, fusion
Personage's search result after initial scene search result and neighbour's extension, then merge initial personage's search result and neighbour and expand
Scene search after exhibition obtains the camera lens ranking results of instance of video retrieval as a result, take the maximum value of two kinds of fusion results.
5. combining the instance of video searching system of particular persons and scene according to claim 4, it is characterised in that: the base
It is retrieved in the specific objective of local feature, including to multiple corresponding pictures to be checked of an inquiry scene s, extracts every figure to be checked
The BOW feature of each target area in piece;Extract the BOW feature of all key frames in inquiry all camera lenses of video library;According to BOW spy
Sign calculates the Euclidean distance with all key frames in each camera lens to each target area of each picture to be checked, take it is minimum it is European away from
From the similarity for target area and camera lens;To each camera lens, the similarity maximum value of all target areas Yu the camera lens is taken respectively
As the similarity score of camera lens, the specific objective search result based on local feature is obtained.
6. combining the instance of video searching system of particular persons and scene according to claim 4, it is characterised in that: the base
It is retrieved in the special scenes of global characteristics, including to multiple corresponding pictures to be checked of an inquiry scene s, extracts every figure to be checked
The CNN feature of piece extracts the CNN feature of all key frames in inquiry all camera lenses of video library;According to CNN feature, to each to be checked
Picture calculates the Euclidean distance with all key frames in each camera lens, and taking minimum euclidean distance is that picture to be checked is similar to camera lens
Degree;To each camera lens, similarity score of the similarity maximum value of all pictures to be checked and the camera lens as camera lens is taken respectively, is obtained
Special scenes search result based on global characteristics.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710454025.2A CN107315795B (en) | 2017-06-15 | 2017-06-15 | The instance of video search method and system of joint particular persons and scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710454025.2A CN107315795B (en) | 2017-06-15 | 2017-06-15 | The instance of video search method and system of joint particular persons and scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107315795A CN107315795A (en) | 2017-11-03 |
CN107315795B true CN107315795B (en) | 2019-08-02 |
Family
ID=60184038
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710454025.2A Active CN107315795B (en) | 2017-06-15 | 2017-06-15 | The instance of video search method and system of joint particular persons and scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107315795B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109858308B (en) * | 2017-11-30 | 2023-03-24 | 株式会社日立制作所 | Video retrieval device, video retrieval method, and storage medium |
CN108491794B (en) * | 2018-03-22 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Face recognition method and device |
CN111325245B (en) * | 2020-02-05 | 2023-10-17 | 腾讯科技(深圳)有限公司 | Repeated image recognition method, device, electronic equipment and computer readable storage medium |
CN111538858B (en) * | 2020-05-06 | 2023-06-23 | 英华达(上海)科技有限公司 | Method, device, electronic equipment and storage medium for establishing video map |
JP7225194B2 (en) * | 2020-12-28 | 2023-02-20 | 楽天グループ株式会社 | Image frame extraction device, image frame extraction method and program |
CN112699846B (en) * | 2021-01-12 | 2022-06-07 | 武汉大学 | Specific character and specific behavior combined retrieval method and device with identity consistency check function |
CN112836600B (en) * | 2021-01-19 | 2023-12-22 | 新华智云科技有限公司 | Video similarity calculation method and system |
CN116127133B (en) * | 2023-04-17 | 2023-08-08 | 湖南柚子树文化传媒有限公司 | File searching method, system, equipment and medium based on artificial intelligence |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102595103A (en) * | 2012-03-07 | 2012-07-18 | 深圳市信义科技有限公司 | Method based on geographic information system (GIS) map deduction intelligent video |
CN103702134A (en) * | 2012-09-27 | 2014-04-02 | 索尼公司 | Image processing device, image processing method and program |
CN104200206A (en) * | 2014-09-09 | 2014-12-10 | 武汉大学 | Double-angle sequencing optimization based pedestrian re-identification method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8315965B2 (en) * | 2008-04-22 | 2012-11-20 | Siemens Corporation | Method for object detection |
-
2017
- 2017-06-15 CN CN201710454025.2A patent/CN107315795B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102595103A (en) * | 2012-03-07 | 2012-07-18 | 深圳市信义科技有限公司 | Method based on geographic information system (GIS) map deduction intelligent video |
CN103702134A (en) * | 2012-09-27 | 2014-04-02 | 索尼公司 | Image processing device, image processing method and program |
CN104200206A (en) * | 2014-09-09 | 2014-12-10 | 武汉大学 | Double-angle sequencing optimization based pedestrian re-identification method |
Also Published As
Publication number | Publication date |
---|---|
CN107315795A (en) | 2017-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107315795B (en) | The instance of video search method and system of joint particular persons and scene | |
CN109740541B (en) | Pedestrian re-identification system and method | |
JP4553650B2 (en) | Image group representation method, descriptor derived by representation method, search method, apparatus, computer program, and storage medium | |
CN109325471B (en) | Double-current network pedestrian re-identification method combining apparent characteristics and space-time distribution | |
CN112818931A (en) | Multi-scale pedestrian re-identification method based on multi-granularity depth feature fusion | |
CN105808732A (en) | Integration target attribute identification and precise retrieval method based on depth measurement learning | |
CN111696128A (en) | High-speed multi-target detection tracking and target image optimization method and storage medium | |
CN109472191A (en) | A kind of pedestrian based on space-time context identifies again and method for tracing | |
CN111046821B (en) | Video behavior recognition method and system and electronic equipment | |
CN103714181B (en) | A kind of hierarchical particular persons search method | |
CN104200206B (en) | Double-angle sequencing optimization based pedestrian re-identification method | |
GB2493580A (en) | Method of searching for a target within video data | |
CN104281572B (en) | A kind of target matching method and its system based on mutual information | |
CN111709331B (en) | Pedestrian re-recognition method based on multi-granularity information interaction model | |
CN110598543A (en) | Model training method based on attribute mining and reasoning and pedestrian re-identification method | |
CN108764018A (en) | A kind of multitask vehicle based on convolutional neural networks recognition methods and device again | |
CN112818790A (en) | Pedestrian re-identification method based on attention mechanism and space geometric constraint | |
CN109635647B (en) | Multi-picture multi-face clustering method based on constraint condition | |
CN111814690A (en) | Target re-identification method and device and computer readable storage medium | |
CN112668557A (en) | Method for defending image noise attack in pedestrian re-identification system | |
CN111597978B (en) | Method for automatically generating pedestrian re-identification picture based on StarGAN network model | |
CN111159475B (en) | Pedestrian re-identification path generation method based on multi-camera video image | |
Varini et al. | Egocentric video summarization of cultural tour based on user preferences | |
CN109711232A (en) | Deep learning pedestrian recognition methods again based on multiple objective function | |
Khare et al. | Keyframe extraction using binary robust invariant scalable keypoint features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |