CN104090885A - Multi-view video object retrieval system based on local copy detection and method - Google Patents

Multi-view video object retrieval system based on local copy detection and method Download PDF

Info

Publication number
CN104090885A
CN104090885A CN201310657435.9A CN201310657435A CN104090885A CN 104090885 A CN104090885 A CN 104090885A CN 201310657435 A CN201310657435 A CN 201310657435A CN 104090885 A CN104090885 A CN 104090885A
Authority
CN
China
Prior art keywords
hash
module
feature
local
local feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310657435.9A
Other languages
Chinese (zh)
Inventor
凌贺飞
严灵毓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUHAN FEELING VIDEO TECHNOLOGY Co Ltd
Original Assignee
WUHAN FEELING VIDEO TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUHAN FEELING VIDEO TECHNOLOGY Co Ltd filed Critical WUHAN FEELING VIDEO TECHNOLOGY Co Ltd
Priority to CN201310657435.9A priority Critical patent/CN104090885A/en
Publication of CN104090885A publication Critical patent/CN104090885A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content

Abstract

The invention discloses a multi-view video object retrieval system based on local copy detection and method. The method comprises the following steps: extracting a multi-view video object, and searching and matching the object through a local copy detection manner; carrying out Hash algorithm to the extracted local characteristics by the local copy detection, converting the video to multiple two-value Hash codes comprising semantic information, and searching and matching the Hash codes to quickly retrieve the object in the video. The method is to extract the object in the video, and search the object in the multi-view video through the local copy detection manner, so that the system and the method can effectively ensure the accuracy and the comprehensiveness for object searching, and further ensure the high efficiency of searching. The system and the method are wide in application scenarios, low in economic cost for realization and considerable in prospect.

Description

A kind of multi-angle video object retrieval system and method detecting based on local copies
Technical field
The invention belongs to technical field of video monitoring, be specifically related to a kind of multi-angle video object retrieval system detecting based on local copies.
Background technology
The high speed development of science and technology makes video monitoring be able to widespread use in people's life.Along with the propelling of safe city, video monitoring camera will be covered with in streets and lanes and buildings, along with the lifting of camera effect, and scope universal, every day, these equipment generated the video information of magnanimity.Therefore, from the data of these magnanimity, find important object just to become abnormal difficult.
Although computing machine processing speed is in lifting at full speed, disk size increases increasingly, bandwidth is increasing, due to the singularity of video, deals with or takes time and effort.From magnanimity video information, finding fast certain object video or event clue, is expensive, consuming time, to require great effort a challenge difficult problem at present.It often needs a huge team to concentrate to check and analyze the video record of catching, for investigating and collecting evidence.So not only make monitoring cost improve, and only depend on people's naked eyes to watch for a long time monitor video easily to make people's fatigue, occur careless mistake, be unfavorable for security monitoring, this just can not give full play to the supervisory role of the due real-time active of supervisory system.For example, in Zhou Kehua case, nearly 4 trillion video datas in 4 crimes of Zhou Kehua are searched by the Changsha police, and the Changsha police sent more than 200 police day and night to check at that time, and many people's police are long to machine time face-to-face, cause headache tired.Therefore, market is needed the robotization of a kind of method energy, intelligentized video searching system badly, substitutes or indirect labor supervises pattern.
Intelligent video monitoring is the research topic that computer vision field receives much concern, and its target is to utilize video processing technique and artificial intelligence technology that vision signal is processed, understood and analyzes, so that detection and Identification monitor institute's event in scene.If video camera is regarded as to people's eyes, intelligent video system or equipment can be regarded people's brain as, its final purpose is exactly to make computing machine can analyze, describe and understand the content in video pictures, mass data in video pictures is carried out to high speed analysis, filter out the unconcerned information of supervisor, only provide the key message of use for supervisor, for the video frequency searching of suspect object, event.
Object video retrieval more object-based attributes mostly in current intelligent video monitoring, as color, shape, speed, direction etc., carry out the systematic searching of object and attribute thereof.This retrieval, records a video with respect to traditional monitor video of manually watching, and really can improve recall precision, shortens viewing time.But along with the scale rapid growth of monitor video, because this method is coupling fuzzy, coarseness, cause returning results and often comprise a large amount of noises, and correct Search Results cannot find; Even if find, be also usually submerged among a large amount of noises, user have to the expensive time therefrom determine correct result.Therefore, how to realize the intelligent search of monitor video object or event clue, navigate to relevant video segments fast, become problem demanding prompt solution in video monitoring, the solution of this problem has huge market potential.
Current video object retrieval system can be divided into two classes: (1) retrieval based on combinations of attributes: by Video Object Extraction out, then the attribute using the feature of object (as color, shape, speed, direction etc.) as object, inquires about and search by the combination of attribute; (2) video search based on visual phrase: by video image is extracted to feature, then utilize clustering method that feature is converted into visual phrase, utilize the relative index technology of text retrieval, carry out the retrieval of video.Although this two series products, compared with artificial mode, has improved recall precision greatly, also come with some shortcomings:
(1) retrieval based on combinations of attributes: these class methods are coarsenesses, the structure that can not portray object completely, there is a large amount of noises in the result of retrieval therefore, and correct Search Results cannot find; Even if find, be also usually submerged among a large amount of noises, user have to the expensive time therefrom determine correct result;
(2) video search based on visual phrase: first, be to video image retrieval, instead of for video object level; Secondly, because cluster centre number can not arrange excessively, cause the visual phrase that transforms accurate not, identification power deficiency, therefore returns to a large amount of Query Results, has a large amount of noises, and query accuracy and efficiency are not high.
By thinking above, due to the singularity of video, current object video search method is difficult to practical requirement simultaneously in precision and timeliness, therefore, need a kind of object precise search technology towards monitor video badly, in accurate description object, ensure object search with mate efficiently.
Summary of the invention
For above defect or the Improvement requirement of prior art, the invention provides a kind of multi-angle video object retrieval system and method detecting based on local copies, can retrieve efficiently, reliably moving target object.
A multi-angle video object retrieval system based on local feature, comprises object extraction module, local feature extraction module, feature Hash module, index construct and search module and enquiry module;
Object extraction module, for extracting moving target from each two field picture of off-line learning or online query video, sends the regional area figure that represents moving target to local feature extraction module;
Local feature extraction module, for the off-line learning stage, extracts multiple local features for follow-up study to the regional area figure from object extraction module, and sends the local feature of the study use of extraction to Hash module; Also, for the online query stage, the regional area figure from object extraction module is extracted to multiple local features for subsequent query, and send the local feature of the inquiry use of extraction to enquiry module;
Hash module, for the off-line learning stage, the study of the multiple higher-dimensions from local feature extraction module is carried out to the two-value Hash table of Hash calculation generation low-dimensional with local feature, generate hash function simultaneously, send two-value Hash table to index construct and search module, send hash function to enquiry module;
Index construct and search module, for the off-line learning stage, to the two-value Hash table index building storehouse from Hash module; Also for the online query stage, receive the inquiry request of enquiry module, each inquiry code of carrying according to inquiry request is retrieved and is obtained and the regional area figure that may mate with it in index database, and sets it as Query Result and feed back to enquiry module;
Enquiry module, for the online query stage, utilize, from the hash function of Hash module, multiple local features of the inquiry use from local feature extraction module are converted into multiple queries Hash codes, then send index construct and search module to by comprising the inquiry request of inquiring about Hash information; Receive the Query Result from index construct and search module feedback, add up for the multiple image that in Query Result, each inquiry Hash codes may be mated, the regional area figure probability that its corresponding moving target is target recognition result that the rate that repeats is higher is higher.
Further, described Hash module comprises that Hash codes computing module and hash function build module,
Described Hash codes computing module, carries out Hash calculation and generates the two-value Hash table of low-dimensional with local feature for the study of the multiple higher-dimensions to from local feature extraction module, its process is:
Making local feature set expression is X=[x 1, x 2, x n], its corresponding Hash codes set is H=[h 1, h 2, h n], x ibe the local visual feature of i, y ifor x icorresponding Hash codes, i=1,2 ..., n, n is local feature sum;
Solve satisfied min Σ i = 1 n Σ j = 1 n S ij | | x i - x j | | 2 + φ ( | | X T W + 1 b - H | | F 2 + γ | | W | | F 2 ) Hash codes set H, parameter W and b, wherein, γ and φ for regulate parameter, || || frepresent to ask norm, subscript T represents transposition,
n k(x j) represent and feature x jnearest k characteristic set, N k(x i) represent and feature x inearest k characteristic set, the span of k is 5~25;
Described hash function builds module, for building hash function h (x)=W tx+b, x represents variable.
Further, described local feature is SIFT feature or SURF feature or HoG feature or CHoG feature.
A multi-angle video object search method based on local feature, comprises off-line learning step and online query step,
The detailed process of described off-line learning step is:
From each two field picture of off-line learning video, extract moving target, obtain representing the regional area figure of moving target;
Multiple local features that localized region figure learns use extract;
The two-value Hash table that the study of multiple higher-dimensions is carried out to Hash calculation generation low-dimensional with local feature generates hash function simultaneously;
To two-value Hash table index building storehouse;
The detailed process of described online query step is:
From each two field picture of online query video, extract moving target, obtain representing the regional area figure of moving target to be identified;
Multiple local features of the regional area figure of moving target to be identified being inquired about to use extract;
Utilize the hash function that off-line learning step obtains that multiple local features of inquiry use are converted into multiple queries Hash codes;
Utilize each inquiry code to retrieve to obtain the regional area figure that may mate with it in index database;
The multiple image that may mate each inquiry Hash codes is added up, and the regional area figure probability that its corresponding moving target is target recognition result that the rate that repeats is higher is higher.
Further, in described off-line learning step, the study of multiple higher-dimensions is carried out the two-value Hash table of Hash calculation generation low-dimensional with local feature, the detailed process that simultaneously generates hash function is:
Making local feature set expression is X=[x 1, x 2, x n], its corresponding Hash codes set is H=[h 1, h 2, h n], x ifor to i local visual feature, y ifor x icorresponding Hash codes, i=1,2 ..., n, n is local feature sum;
Solve satisfied min Σ i = 1 n Σ j = 1 n S ij | | x i - x j | | 2 + φ ( | | X T W + 1 b - H | | F 2 + γ | | W | | F 2 ) Hash codes set H, parameter W and b, wherein, γ and φ for regulate parameter, || || frepresent to ask norm, subscript T represents transposition,
n k(x j) represent and feature x jnearest k characteristic set, N k(x i) represent and nearest k the characteristic set of feature xi, the span of k is 5~25;
Build hash function h (x)=W tx+b, x represents variable.
In general, compared with prior art, the present invention introduces intelligent retrieval to the above technical scheme of conceiving by the present invention in video monitoring process, has broken away from and has monitored the deficiency that needs a large amount of manpowers in the past.System has very high reliability, by utilizing the local feature of robust, can be at query object by partial occlusion, query object because camera lens distance causes in the situations such as change in size, effectively search out the position that query object occurs in the captured video of the camera of different visual angles diverse location, thereby save a large amount of manpowers and time cost for the behavior of analysis and consult object; Further, introduce optimum theory and machine learning knowledge, feature high dimensional data is converted into the low-dimensional Hash codes that retains a large amount of semantic informations, on the one hand, the a large amount of semantic informations that retain have ensured the degree of accuracy of search, and on the other hand, the structure of low-dimensional Hash codes makes only to need to calculate in the process of search and coupling the Hamming distance of two-value intersymbol, it is consuming time short many that this calculates Euclidean distance than general features, and then make object search process have very high efficiency; In addition, this System Implementation cost is lower, and the storage space that whole search procedure needs is much smaller than the shared storage space of original video, and it is required that general existing video monitoring server has been enough to meet system, without other configure hardware.
Brief description of the drawings
Fig. 1 is the structural representation of object video searching system of the present invention;
Fig. 2 is the learning process figure of object video searching system of the present invention;
Fig. 3 is the querying flow figure of object video searching system of the present invention;
Fig. 4 is the flow example figure of object video searching system of the present invention;
Fig. 5 is the systematic schematic diagram of object video searching system of the present invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.In addition,, in each embodiment of described the present invention, involved technical characterictic just can combine mutually as long as do not form each other conflict.
As shown in Figure 1, system of the present invention comprises client and service end.Client comprises object extraction module 100, local feature extraction module 200 and enquiry module 500, and service end comprises feature Hash module 300 and index construct and search module 400.
In the time that monitor video is processed, video is transferred to service end in client after object extraction module 100 and local characteristic extracting module 200 obtain higher-dimension local feature to carry out feature Hash and exports index database, and this process is called learning process.In the time that inquiry video segment is inquired about, video segment obtains after higher-dimension local feature through object extraction module 100 and local characteristic extracting module 200 in client, utilize the hash function that service end feature Hash module 300 generates to generate Hash codes, then be transferred to service end, and in the index database obtaining in learning process, search for and mate, this process is called query script.
Below each several part is described in further detail.
Object extraction module 100 is processed video, realizes the function that extracts moving target from a given video.
Judge in each frame of video whether have moving target by the video background of estimation.If there is moving target, extract this target, send the regional area figure that represents moving target to local feature extraction module 200; If there is not moving target, skip this frame, carry out next frame processing.The common methods of moving object detection has: (1) background subtraction method: first video background is carried out to modeling, and obtain a threshold value by training, then by the background subtracting after each frame in video and modeling, if the absolute value of the difference of the two is greater than this threshold value, be judged as moving target; Otherwise, be judged as background.(2) frame differential method a: first threshold value will be set, then adjacent two frames or three two field pictures be adopted to the difference based on pixel, obtain difference diagram, according to predetermined threshold value, by difference diagram binaryzation, obtain the profile of moving target.
Local feature extraction module 200 extracts local feature for the single frames regional area picture to moving target.The local copies detection technique that this module adopts is mainly paid close attention to the structure composition of object, adopt the description of multiple constant local feature points with very strong ability to express and their geometric relationship to express, therefore more can hold user's actual demand, thereby can effectively search destination object, ensure query performance more accurately.Wherein, the higher-dimension local feature that monitor video set pair is answered is submitted to the feature Hash module 300 of service end, and higher-dimension local feature corresponding to inquiry video is submitted to enquiry module 500.In order to realize the coupling of the different images that two width comprise same object, need to find some " stable point " in image, these points are some special points, can not disappear because of the interference of the variation of the change at visual angle, illumination, noise, such as the dim spot of bright spot and the bright area of angle point, marginal point, dark areas.If have identical object in two width images like this, these stable point will occur in the same object of two width images simultaneously so, thus the coupling of realization.At present, the local feature of performance comparative maturity has SIFT feature or SURF feature or HoG feature or CHoG feature, also has these some improvement projects, as PCA-SIFT etc.The basic thought of above local feature is: the foundation of (1) metric space; (2) extraction of unique point; (3) accurate location feature point; (4) utilize the unique point Information generation Feature Descriptor of neighborhood around.This class local feature is very close to human vision mechanism, by the Feature Combination of localization, forms the overall impression to target object, also can solve visual angle change, blocks, the problem such as distortion.
Hash module 300 is carried out Hash for the high dimensional feature that local characteristic extracting module 200 is submitted to, obtains being convenient to feature pitch from the two-value Hash codes of calculating, thus composition Hash table by index construct module 401 for index building storehouse.Meanwhile, Hash module is also learnt out hash function and is carried out hash function renewal for enquiry module 500.The early stage method of feature Hash has LSH etc., recent have the data perception hash method based on optimum theory and machine term, as SpH, STH etc., also has NSLPH, the LS_SPH of autonomous research.
The step that the present invention proposes a kind of new data perception hash method based on optimum theory and machine learning is as follows:
(1) the weight matrix S between definition high dimensional feature.First the present invention represents X=[x by the higher-dimension set of vectors of local feature 1, x 2, x n], its corresponding Hash codes is H=[h 1, h 2, h n], x ibe i local visual feature, y ifor x icorresponding Hash codes, i=1,2 ..., n, n is local feature sum.The k neighbour of a feature refers to nearest k the feature (distance is generally to calculate Euclidean distance, and k gets 5-25 conventionally) with this feature.In the present invention, if feature x ifeature x jk neighbour (be x i∈ N k(x j)), by the element S of capable i in weight matrix S j row ij(being weights) composes is 1, otherwise is 0.This process has ensured that similar high dimensional data is projected into the low dimension data of closely similar (desired result is identical), and different high dimensional datas projects into differentiated low dimension data.In addition, simplify weights definition mode, and then reduced calculated amount.Element S in weight matrix S ijdefine as the formula (1):
(2) define semantic loss function.Semantic loss function conventionally by each low-dimensional feature pitch from square || h i-h j|| 2with weights S ijproduct add and form, as the formula (2).
Σ i = 1 n Σ j = 1 n S ij | | h i - h j | | 2 - - - ( 2 )
In order to tackle the needs of large-scale data, we wish to learn hash function h (x) in calculating Hash codes, and as the formula (3), the correlation parameter that wherein W and b are hash function solves in subsequent process in definition.
h(x)=W Tx+b (3)
Consider solving the simultaneous minimization Hash error of semantic loss function, in the semantic loss function of formula (3), add Hash error .Consider the convenient regular terms that adds solving final semantic loss function as the formula (4).The Hash codes of this semanteme loss function of energy minimization is optimal result.
min Σ i = 1 n Σ j = 1 n S ij | | x i - x j | | 2 + φ ( | | X T W + 1 b - H | | F 2 + γ | | W | | F 2 ) - - - ( 4 )
γ and φ are for regulating parameter, and it is empirical value, and according to experimental result adjustment, general value is [0,1], || || frepresent to ask norm, subscript T represents transposition.
(3) solve semantic loss function.The present invention solves by the mode that different parameters is solved respectively to extreme value.
Making formula (4) is 0 to asking the local derviation of b, can obtain
b = 1 n ( 1 T H - 1 T X T W ) - - - ( 5 )
Making formula (4) is 0 to asking the local derviation of W, can obtain
X(X TW+1b-H)+γW=0 (6)
By formula (5) substitution, can obtain
W=X(L cX T+γI) -1XL cH (7)
Wherein i is unit matrix, and 1 is that an element is 1 vector.
By formula (5), formula (7) substitution formula (4) is also derived by mark optimum theory, and formula (4) is converted to the most at last
min YY T = 1 tr ( H T ( L + φB ) H ) L = N - S B = ( L c - L c X T ( XL c X T + γI ) XL c )
Wherein, N is n*n diagonal matrix, and the elements in a main diagonal is all the other elements are 0.
Minimum m eigenwert characteristic of correspondence vector of matrix (L+ φ B) be optimum Hash codes H(wherein m be the code length of Hash codes).Obtain after H, through type (7), formula (5) is tried to achieve respectively W, b, thereby hash function in the formula of trying to achieve (3).
Index construct and enquiry module 400 comprise index construct module 401 and indexed search module 402.
The Hash table that index construct module 401 generates for Hash module 300 is set up an index database, each index entry comprises two parts: Hash codes, video information corresponding to this Hash codes, then in alphabetical order index entry is sorted, and merge the index entry with identical Hash codes, the inverted index chained list of generating video, i.e. index database.
The searching request that indexed search module 402 is sent for receiving enquiry module 500, searches in the index database obtaining and mate in learning process, then result is returned to enquiry module 500.According to calculating formula of similarity, calculate the similarity of Hash codes in inquiry Hash codes and index database, finally find several videos that similarity is the highest.
Enquiry module 500 comprises hash function update module 501 and matching result output module 502, for upgrading the hash function of client and the higher-dimension local feature of query object being carried out to Hash, then send inquiry request to service end, and the Query Result that service end is returned output.
Hash function update module 501 need to be upgraded to service end request according to user, after the hash function receiving after renewal, records hash function.In addition, utilize the hash function recording to carry out Hash to the higher-dimension local feature of local characteristic extracting module 200, obtain corresponding Hash codes, search for for the indexed search module 402 that is sent to service end.
Matching result output module 502 is according to the Search Results of the result output query object of indexed search module 402, add up for the multiple image that in Query Result, each inquiry Hash codes may be mated, the regional area figure probability that its corresponding moving target is target recognition result that the rate that repeats is higher is higher, export target recognition result and relevant video place camera information, and the position that in video, query object is corresponding (being initial time and end time) etc.
As shown in Figure 2, the off-line learning process after system of the present invention is as follows: after different clients monitoring of a recorded programme video, utilize respectively object extraction module 100 to extract moving target, obtain the regional area frame of video of moving target; Then utilize local feature extraction module 200 to extract the feature of moving target regional area frame of video, obtain high dimensional feature vector; High dimensional feature vector is transferred to service end by different clients unification, learnt to generate Hash table by feature Hash module 300, generates hash function simultaneously and be transferred to each client; The Hash table that Hash table module is generated, builds inverted index by index construct module 401, generating indexes storehouse, for the search of query script with mate.
As shown in Figure 3, the online query process of system of the present invention is as follows: choose after inquiry video segment in client, utilize object extraction module 100 to extract moving target, obtain the regional area frame of video of moving target; Then utilize local feature extraction module 200 to extract the feature of moving target regional area frame of video, obtain high dimensional feature vector; First the hash function update module 201 of enquiry module 500 utilizes the hash function of service end Hash module transmission that this group higher-dimension local feature is converted into Hash codes, then Hash codes being transferred to service end inquires about, service end is receiving after inquiry request, calculated the similarity of Hash codes in inquiry Hash codes and index database by indexed search module 402, then video information corresponding some Hash codes larger similarity is returned to client, exported by matching result output module 502.
Fig. 4 is an exemplary plot of learning process and query script.
Fig. 5 is systematic schematic diagram of the present invention.After multiple user sides are processed monitor video separately respectively, proper vector is uploaded to intelligent video monitoring server by unification, after study, obtains index database and hash function, and wherein hash function returns to client for inquiry video is carried out to Hash.The Hash codes that the inquiry video of certain client obtains after treatment will be transferred to service end and inquire about, and the match video information in index database is returned to client.
For further illustrating, a simplified example that provides the present invention to realize is as follows.
On server, set up web website, the authentication of logging in system by user adopts user name password authentification, and the sensitive data in website all adopts encryption technology to realize data confidentiality.User registers and logs on the website of this object retrieval system.Monitor video is carried out after object extraction and feature extraction, after proper vector is uploaded onto the server, automatically carry out learning process, generating indexes storehouse.In the time that user need to carry out object search to video sometime, can, by the mode given query object of selecting video fragment or frame of video, corresponding object be searched for, returned to the information of the video that comprises institute's query object by service end.
The multi-angle video object retrieval system detecting based on local copies adopts C/S (client/server) and B/S(browser/server) the hybrid architecture pattern that combines.When user, need to carry out extensive object video search time, because algorithm calculated performance is had relatively high expectations, adopts C/S architecture, in the time that user just carries out a small amount of object video search, adopt B/S architecture.The structure content of system is as follows: object extraction module 100, local feature extraction module 200 and enquiry module 500 are placed in each client, and feature Hash module 300, index construct and search module 400 are positioned on same station server.
Those skilled in the art will readily understand; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention, all any amendments of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims (6)

1. the multi-angle video object retrieval system based on local feature, comprises object extraction module (100), local feature extraction module (200), feature Hash module (300), index construct and search module (400) and enquiry module (500);
Object extraction module (100), for extracting moving target from each two field picture of off-line learning or online query video, sends the regional area figure that represents moving target to local feature extraction module (200);
Local feature extraction module (200), for the off-line learning stage, to extracting multiple local features for follow-up study from the regional area figure of object extraction module (100), and sends the local feature of the study use of extraction to Hash module (300); Also, for the online query stage, to extracting multiple local features for subsequent query from the regional area figure of object extraction module (100), and send the local feature of the inquiry use of extraction to enquiry module (500);
Hash module (300), for the off-line learning stage, the study of the multiple higher-dimensions from local feature extraction module (200) is carried out to the two-value Hash table of Hash calculation generation low-dimensional with local feature, generate hash function simultaneously, send two-value Hash table to index construct and search module (400), send hash function to enquiry module (500);
Index construct and search module (400), for the off-line learning stage, to the two-value Hash table index building storehouse from Hash module (300); Also for the online query stage, receive the inquiry request of enquiry module (500), each inquiry code of carrying according to inquiry request is retrieved and is obtained the regional area figure that may mate with it in index database, and sets it as Query Result and feed back to enquiry module (500);
Enquiry module (500), for the online query stage, utilize, from the hash function of Hash module (300), multiple local features of the inquiry use from local feature extraction module (200) are converted into multiple queries Hash codes, then send index construct and search module (400) to by comprising the inquiry request of inquiring about Hash information; Receive the Query Result from index construct and search module (400) feedback, add up for the multiple image that in Query Result, each inquiry Hash codes may be mated, the regional area figure probability that its corresponding moving target is target recognition result that the rate that repeats is higher is higher.
2. the multi-angle video object retrieval system based on local feature according to claim 1, is characterized in that, described Hash module (300) comprises that Hash codes computing module and hash function build module,
Described Hash codes computing module, carries out Hash calculation and generates the two-value Hash table of low-dimensional with local feature for the study of the multiple higher-dimensions to from local feature extraction module (200), its process is:
Making local feature set expression is X=[x 1, x 2, x n], its corresponding Hash codes set is H=[h 1, h 2, h n], x ibe i local visual feature, y ifor x icorresponding Hash codes, i=1,2 ..., n, n is local feature sum;
Solve satisfied min Σ i = 1 n Σ j = 1 n S ij | | x i - x j | | 2 + φ ( | | X T W + 1 b - H | | F 2 + γ | | W | | F 2 ) Hash codes set H, parameter W and b, wherein, γ and φ for regulate parameter, || || frepresent to ask norm, subscript T represents transposition,
n k(x j) represent and feature x jnearest k characteristic set, N k(x i) represent and nearest k the characteristic set of feature xi, the span of k is 5~25;
Described hash function builds module, for building hash function h (x)=W tx+b, x represents variable.
3. the multi-angle video object retrieval system based on local feature according to claim 1 and 2, is characterized in that, described local feature is SIFT feature or SURF feature or HoG feature or CHoG feature.
4. the multi-angle video object search method based on local feature, comprises off-line learning step and online query step,
The detailed process of described off-line learning step is:
From each two field picture of off-line learning video, extract moving target, obtain representing the regional area figure of moving target;
Multiple local features that localized region figure learns use extract;
The two-value Hash table that the study of multiple higher-dimensions is carried out to Hash calculation generation low-dimensional with local feature generates hash function simultaneously;
To two-value Hash table index building storehouse;
The detailed process of described online query step is:
From each two field picture of online query video, extract moving target, obtain representing the regional area figure of moving target to be identified;
Multiple local features of the regional area figure of moving target to be identified being inquired about to use extract;
Utilize the hash function that off-line learning step obtains that multiple local features of inquiry use are converted into multiple queries Hash codes;
Utilize each inquiry code to retrieve to obtain the regional area figure that may mate with it in index database;
The multiple image that may mate each inquiry Hash codes is added up, and the regional area figure probability that its corresponding moving target is target recognition result that the rate that repeats is higher is higher.
5. the multi-angle video object search method based on local feature according to claim 4, it is characterized in that, the two-value Hash table that in described off-line learning step, the study of multiple higher-dimensions is carried out Hash calculation generation low-dimensional with local feature, the detailed process that simultaneously generates hash function is:
Making local feature set expression is X=[x 1, x 2, x n], its corresponding Hash codes set is H=[h 1, h 2, h n], x ibe i local visual feature, y ifor x icorresponding Hash codes, i=1,2 ..., n, n is local feature sum;
Solve satisfied min Σ i = 1 n Σ j = 1 n S ij | | x i - x j | | 2 + φ ( | | X T W + 1 b - H | | F 2 + γ | | W | | F 2 ) Hash codes set H, parameter W and b, wherein, γ and φ for regulate parameter, || || frepresent to ask norm, subscript T represents transposition,
n k(x j) represent and feature x jnearest k characteristic set, N k(x i) represent and feature x inearest k characteristic set, the span of k is 5~25;
Build hash function h (x)=W tx+b, x represents variable.
6. according to the multi-angle video object search method based on local feature described in claim 4 or 5, it is characterized in that, described local feature is SIFT feature or SURF feature or HoG feature or CHoG feature.
CN201310657435.9A 2013-12-09 2013-12-09 Multi-view video object retrieval system based on local copy detection and method Pending CN104090885A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310657435.9A CN104090885A (en) 2013-12-09 2013-12-09 Multi-view video object retrieval system based on local copy detection and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310657435.9A CN104090885A (en) 2013-12-09 2013-12-09 Multi-view video object retrieval system based on local copy detection and method

Publications (1)

Publication Number Publication Date
CN104090885A true CN104090885A (en) 2014-10-08

Family

ID=51638601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310657435.9A Pending CN104090885A (en) 2013-12-09 2013-12-09 Multi-view video object retrieval system based on local copy detection and method

Country Status (1)

Country Link
CN (1) CN104090885A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893947A (en) * 2016-03-29 2016-08-24 江南大学 Bi-visual-angle face identification method based on multi-local correlation characteristic learning
CN106156284A (en) * 2016-06-24 2016-11-23 合肥工业大学 Video retrieval method is closely repeated based on random the extensive of various visual angles Hash
WO2017032245A1 (en) * 2015-08-27 2017-03-02 阿里巴巴集团控股有限公司 Method and device for generating video file index information
CN108121806A (en) * 2017-12-26 2018-06-05 湖北工业大学 One kind is based on the matched image search method of local feature and system
CN112528048A (en) * 2021-02-18 2021-03-19 腾讯科技(深圳)有限公司 Cross-modal retrieval method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6691126B1 (en) * 2000-06-14 2004-02-10 International Business Machines Corporation Method and apparatus for locating multi-region objects in an image or video database
CN102880854A (en) * 2012-08-16 2013-01-16 北京理工大学 Distributed processing and Hash mapping-based outdoor massive object identification method and system
CN103336957A (en) * 2013-07-18 2013-10-02 中国科学院自动化研究所 Network coderivative video detection method based on spatial-temporal characteristics
CN103353875A (en) * 2013-06-09 2013-10-16 华中科技大学 Method and system for media interaction based on visible search

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6691126B1 (en) * 2000-06-14 2004-02-10 International Business Machines Corporation Method and apparatus for locating multi-region objects in an image or video database
CN102880854A (en) * 2012-08-16 2013-01-16 北京理工大学 Distributed processing and Hash mapping-based outdoor massive object identification method and system
CN103353875A (en) * 2013-06-09 2013-10-16 华中科技大学 Method and system for media interaction based on visible search
CN103336957A (en) * 2013-07-18 2013-10-02 中国科学院自动化研究所 Network coderivative video detection method based on spatial-temporal characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘大伟等: "一种重复视频的快速检测算法", 《小型微型计算机系统》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017032245A1 (en) * 2015-08-27 2017-03-02 阿里巴巴集团控股有限公司 Method and device for generating video file index information
CN105893947A (en) * 2016-03-29 2016-08-24 江南大学 Bi-visual-angle face identification method based on multi-local correlation characteristic learning
CN105893947B (en) * 2016-03-29 2019-12-03 江南大学 The two visual angle face identification methods based on more local correlation feature learnings
CN106156284A (en) * 2016-06-24 2016-11-23 合肥工业大学 Video retrieval method is closely repeated based on random the extensive of various visual angles Hash
CN108121806A (en) * 2017-12-26 2018-06-05 湖北工业大学 One kind is based on the matched image search method of local feature and system
CN112528048A (en) * 2021-02-18 2021-03-19 腾讯科技(深圳)有限公司 Cross-modal retrieval method, device, equipment and medium
CN112528048B (en) * 2021-02-18 2021-05-14 腾讯科技(深圳)有限公司 Cross-modal retrieval method, device, equipment and medium

Similar Documents

Publication Publication Date Title
WO2022068196A1 (en) Cross-modal data processing method and device, storage medium, and electronic device
CN111324774B (en) Video duplicate removal method and device
CN108229539A (en) For training the method for neural network, computer program product and device
US10216778B2 (en) Indexing and searching heterogenous data entities
Pasandi et al. Convince: Collaborative cross-camera video analytics at the edge
CN112651262B (en) Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment
CN104090885A (en) Multi-view video object retrieval system based on local copy detection and method
CN114329109B (en) Multimodal retrieval method and system based on weakly supervised Hash learning
CN113313170A (en) Full-time global training big data platform based on artificial intelligence
CN103649955A (en) Image topological coding for visual search
Abdul-Rashid et al. Shrec’18 track: 2d image-based 3d scene retrieval
Chen et al. A hybrid mobile visual search system with compact global signatures
Yang et al. A multimedia semantic retrieval mobile system based on HCFGs
Jain et al. Channel graph regularized correlation filters for visual object tracking
CN115686868A (en) Cross-node-oriented multi-mode retrieval method based on federated hash learning
Liao et al. A scalable approach for content based image retrieval in cloud datacenter
Zhang et al. An edge based federated learning framework for person re-identification in UAV delivery service
Zhang et al. Dataset-driven unsupervised object discovery for region-based instance image retrieval
Ou et al. Communication-efficient multi-view keyframe extraction in distributed video sensors
CN105117735A (en) Image detection method in big data environment
WO2012077818A1 (en) Method for determining conversion matrix for hash function, hash-type approximation nearest neighbour search method using said hash function, and device and computer program therefor
Jayarajah et al. Comai: Enabling lightweight, collaborative intelligence by retrofitting vision dnns
Zhang et al. A scalable approach for content-based image retrieval in peer-to-peer networks
Gao et al. Data-driven lightweight interest point selection for large-scale visual search
Cheng et al. Sparse representations based distributed attribute learning for person re-identification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20141008