CN107291910A

CN107291910A - A kind of video segment structuralized query method, device and electronic equipment

Info

Publication number: CN107291910A
Application number: CN201710495744.9A
Authority: CN
Inventors: 朱才志; 周晓
Original assignee: Graphic Information Technology (shenzhen) Co Ltd
Current assignee: Graphic Information Technology (shenzhen) Co Ltd
Priority date: 2017-06-26
Filing date: 2017-06-26
Publication date: 2017-10-24

Abstract

The embodiments of the invention provide a kind of video segment structuralized query method, device and electronic equipment, method includes：Obtain the input data for including target to be checked；The structured features of target to be checked described in the input data are extracted, wherein, the structured features include the classification identity characteristic of classification, the sub- attribute in target classification and the target of target；According to the structured features of the target to be checked, inquire about the corresponding target characteristic data storehouse of video database, determine in the video database, the corresponding video segment of structured features of the target to be checked, wherein, what the structured features that the target characteristic data storehouse is extracted by the video data in the video database were constituted.Using the embodiment of the present invention, the search efficiency of target correspondence video segment is improved.

Description

A kind of video segment structuralized query method, device and electronic equipment

Technical field

The present invention relates to technical field of video image processing, more particularly to a kind of video segment structuralized query method, Device and electronic equipment.

Background technology

Safety-protection system (Surveillance System) is constituted with application safety strick precaution product and other Related products Intrusion alarm system, video security monitoring system, gateway control system, anti-explosion safety check etc. system；Or by this A little systems are that subsystem is combined or integrated electronic system or network.

In video security monitoring system, the monitor video of magnanimity is preserved, it is often necessary to bag is searched for from monitor video Video segment containing the target such as pedestrian, vehicle, i.e., the corresponding video segment of target to be checked.Existing querying method is first First detect to include clarification of objective to be checked in the image of target to be checked, then for video database (i.e. magnanimity The database that monitor video is constituted) in target video each frame of video, choosing may needing comprising inquiry target With region, the feature in these regions to be matched is extracted, the feature of extraction and clarification of objective to be checked are subjected to similarity mode, The frame of video for including the region to be matched filtered out is obtained, wherein, the similarity corresponding to region to be matched filtered out is higher than Predetermined threshold value.By all frame of video of acquisition, it is defined as including the frame of video of the target to be checked, and then obtains to be checked comprising this Ask the video segment of target.

However, the querying method of existing video segment, after target to be checked is detected every time, is required to be directed to video counts According to each frame of video of the target video in storehouse, region selection to be matched, the feature extraction in region to be matched, characteristic similarity are performed The steps such as matching, to obtain all frame of video for including the target to be checked, so as to obtain the corresponding video segment of the target.Can See, the existing search for target correspondence video segment, process is comparatively laborious, and the time spent is longer, causes search to be imitated Rate is not high.

The content of the invention

The purpose of the embodiment of the present invention is to provide a kind of video segment structuralized query method, device and electronic equipment, To improve the search efficiency of target correspondence video segment.Concrete technical scheme is as follows：

To reach above-mentioned purpose, the embodiment of the invention discloses a kind of video segment structuralized query method, method includes：

The input data for including target to be checked is obtained, wherein, the input data includes：Image and/or text；

The structured features of target to be checked described in the input data are extracted, wherein, the structured features include The classification identity characteristic of the classification of target, the sub- attribute in target classification and target；

According to the structured features of the target to be checked, the corresponding target characteristic data storehouse of video database is inquired about, really In the fixed video database, the corresponding video segment of structured features of the target to be checked, wherein, the target property What the structured features that database is extracted by the video data in the video database were constituted.

Optionally, the structured features according to the target to be checked, the corresponding target of inquiry video database is special Before property database, methods described also includes：

The video in the video database is decoded, each sample video in each video is determined in the video database The target that frame is included；

For each target determined, extract the structured features of the target, and the structured features be based on What deep learning algorithm was determined；

According to the structured features of each target of extraction, the target characteristic data storehouse is set up.

Optionally, the structured features according to the target to be checked, the corresponding target of inquiry video database is special Property database, is determined in the video database, the corresponding video segment of structured features of the target to be checked, including：

After it is determined that finishing the target that all frame of video in the video of the video database are included, it is determined that including each The frame of video of target, based on the frame of video for including each target determined, determines the corresponding video segment of each target.

Optionally, the structured features for extracting target to be checked described in the input data, including：

When the input data is the text of the classification comprising target to be checked and its sub- attribute, it will be wrapped in the text The classification and its sub- attribute of the target to be checked included as the target to be checked structured features.

When the input data is image, the target to be checked that detection described image is included；

Extract the structured features of the target to be checked；

The structured features according to the target to be checked, the corresponding target characteristic data of inquiry video database Storehouse, is determined in the video database to be checked, the corresponding video segment of structured features of the target to be checked, including：

Sub- attribute in the classification and target classification of the target included in structured features using the target to be checked, Target characteristic data storehouse corresponding to video database is screened, it is determined that classification and target classification with the target to be checked In sub- attribute all same target；

The classification identity characteristic of the target included in structured features using the target to be checked, to identified mesh Mark is scanned for, and the similarity obtained with the classification identity characteristic of the target to be checked reaches the target of predetermined threshold value；

From the video database to be checked, it is determined that the video segment corresponding to the target that search is obtained, as described In video database to be checked, the corresponding video segment of structured features of the target to be checked.

When input data includes image and text, the classification and its son category of the target to be checked that the text is included Property, and the classification identity characteristic of the target to be checked that the described image extracted is included, it is used as the target to be checked Structured features；

The classification and its sub- attribute of the target to be checked included using the text, target corresponding to video database are special Property database is filtered, it is determined that classification and its target of sub- attribute all same with the target to be checked；

The classification identity characteristic of the target to be checked included using described image, is searched to identified target Rope, the similarity obtained with the classification identity characteristic reaches the target of predetermined threshold value；

To reach above-mentioned purpose, the embodiments of the invention provide a kind of video segment structuralized query device, device includes：

Acquisition module, for obtaining the input data for including target to be checked, wherein, the input data includes：Image And/or text；

First extraction module, the structured features for extracting target to be checked described in the input data, wherein, institute State the classification identity characteristic of classification of the structured features including target, the sub- attribute in target classification and target；

Enquiry module, for the structured features according to the target to be checked, the corresponding target of inquiry video database Property database, is determined in the video database, the corresponding video segment of structured features of the target to be checked, its In, what the structured features that the target characteristic data storehouse is extracted by the video data in the video database were constituted.

Optionally, described device also includes：

Detection module, for decoding the video in the video database, determines each video in the video database In the target that includes of each sampled video frame；

Second extraction module, for each target for determining, extracts the structured features of the target, and described Structured features are to be determined based on deep learning algorithm；

Module is set up, for the structured features of each target according to extraction, the target characteristic data storehouse is set up.

Optionally, the enquiry module is determining module；

The determining module, for the mesh that all frame of video are included in the video that detection finishes the video database After mark, it is determined that the frame of video comprising each target, based on the frame of video for including each target determined, determines each target Corresponding video segment.

Optionally, the extraction module is the first determining module；

First determining module, for being the classification comprising target to be checked and its sub- attribute when the input data During text, the knot of the classification and its sub- attribute of the target to be checked that the text is included as the target to be checked Structure feature.

Optionally, the extraction module, including：

Detection unit, for when the input data is image, detecting the target to be checked that described image is included；

Extraction unit, the structured features for extracting the target to be checked；

The enquiry module, including：

First determining unit, for the classification of target and mesh included in the structured features using the target to be checked Mark the sub- attribute in classification, target characteristic data storehouse corresponding to video database screens, it is determined that with the mesh to be checked The target of sub- attribute all same in target classification and target classification；

First search unit, for the classification identity of the target included in the structured features using the target to be checked Feature, is scanned for identified target, and the similarity obtained with the classification identity characteristic of the target to be checked reaches pre- If the target of threshold value；

Second determining unit, for from the video database to be checked, it is determined that corresponding to the target that search is obtained Video segment, as in the video database to be checked, the corresponding video segment of structured features of the target to be checked.

Optionally, the extraction module is the second determining module；

Second determining module, for when input data include image and text when, by the text include it is to be checked The classification and its sub- attribute of target, and the classification identity characteristic of the target to be checked that the described image extracted is included are ask, It is used as the structured features of the target to be checked；

The enquiry module, including：

Filter element, for the classification of target to be checked included using the text and its sub- attribute, to video data The corresponding target characteristic data storehouse in storehouse is filtered, it is determined that classification and its mesh of sub- attribute all same with the target to be checked Mark；

Second search unit, for the classification identity characteristic of the target to be checked included using described image, to institute The target of determination is scanned for, and the similarity obtained with the classification identity characteristic reaches the target of predetermined threshold value；

3rd determining unit, for from the video database to be checked, it is determined that corresponding to the target that search is obtained Video segment, as in the video database to be checked, the corresponding video segment of structured features of the target to be checked.

The another aspect implemented in the present invention, the embodiment of the present invention additionally provides a kind of computer-readable recording medium, institute The instruction that is stored with computer-readable recording medium is stated, when run on a computer so that computer performs any of the above-described Described video segment structuralized query method.

The another aspect implemented in the present invention, the embodiment of the present invention additionally provides a kind of computer program production comprising instruction Product, when run on a computer so that computer performs any of the above-described described video segment structuralized query method.

It can be seen that, the structured features of target to be checked in input data are extracted, the structured features are utilized, it is only necessary to inquire about The corresponding target characteristic data storehouse of video database, determine in video database, the structured features of target to be checked it is corresponding Two steps such as video segment, you can complete the search to target to be checked correspondence video segment, it is not necessary to detect to treat every time Inquire about after target, be required to each frame of video of the target video for video database, perform region to be matched and choose, treat The steps such as the feature extraction with region, characteristic similarity matching, to obtain all frame of video for including the target to be checked, just may be used Determine the corresponding video segment of the target.It can be seen that, using the embodiment of the present invention, it can simplify for target to be checked correspondence video The search procedure of fragment, the time that search is spent becomes shorter, and then improves the search efficiency of target correspondence video segment.

Certainly, any product or method for implementing the present invention it is not absolutely required to while reaching all the above excellent Point.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described.

Fig. 1 is a kind of schematic flow sheet of video segment structuralized query method provided in an embodiment of the present invention；

Fig. 2 is another schematic flow sheet of video segment structuralized query method provided in an embodiment of the present invention；

Fig. 3 is a kind of structural representation of video segment structuralized query device provided in an embodiment of the present invention；

Fig. 4 is another structural representation of video segment structuralized query device provided in an embodiment of the present invention；

Fig. 5 is the structural representation of a kind of electronic equipment provided in an embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is described.

In order to solve in the prior art by the search procedure for target correspondence video segment is comparatively laborious, spent Time is longer and causes the problem of search efficiency is not high, and the embodiments of the invention provide a kind of video segment structuralized query side Method, device and electronic equipment.

Specifically, above-mentioned video segment structuralized query method, firstly, it is necessary to obtain the input for including target to be checked Data.Then, the structured features of the target to be checked of this in the input data are extracted, it is special according to the structuring of the target to be checked Levy, by inquiring about the corresponding target characteristic data storehouse of video database, directly can just search for determine in the video database, The corresponding video segment of structured features of the target to be checked, so as to simplify the search procedure of video segment, shortens search flower The time taken, improve the search efficiency of target correspondence video segment.

It should be noted that a kind of video segment structuralized query method provided in an embodiment of the present invention can be applied preferably In server, server is not limited to certainly, for example：This method can also be applied to user equipment, and this is also rational.

Specifically, for applied to server, the method that the embodiment of the present invention is provided can be applied and be deployed in cloud Server is held, can also be in home server；Also, when the data volume scale of processing is smaller needed for server, list can be selected Machine is disposed, when data volume is larger can framework in a distributed manner, be deployed in server group.

Referring to Fig. 1, Fig. 1 illustrates for a kind of flow of video segment structuralized query method provided in an embodiment of the present invention Figure, may include steps of：

S101, obtains the input data for including target to be checked；Wherein, input data includes：Image and/or text；

Specifically, input data can include：Image and/or text comprising target to be checked, wherein, target to be checked Can be people, face, car or car plate etc..For example, image can for comprising pedestrian, the people that rides cart, people by tricycle, Truck, car, the image of bus car or minibus, text can be the son category in the classification comprising target to be checked and its classification The text of the keywords such as property.Also, image can also be the face-image of people, body image etc..

Wherein, the sub- attribute in the classification of target can be the subdivision and description further carried out to target classification.Such as, The classification of target to be checked is behaved, and the sub- attribute in the category can be：The dress ornament of people, body, age, sex etc.；It is to be checked The classification for asking target is car, then the sub- attribute in the category can be：Vehicle, logo, car plate color, number-plate number etc..

S102, extracts the structured features of target to be checked in input data, wherein, structured features include the class of target Not, in target classification sub- attribute and the classification identity characteristic of target；

Specifically, structured features can be the classification identity of sub- attribute in the classification of target, target classification and target The general name of feature, can be for characterizing and describe the object content of target to be checked.

Wherein, the classification of target can be the species marked off to target, such as can be people and the classes of Che two.Or, People, car class can be finely divided, for example, people is subdivided into：The subclass such as pedestrian, the people for riding cart, people by tricycle, Car is subdivided into：The subclass such as truck, car, bus car, minibus.Finally, each subclass segmented out can conduct A kind of classification of target.

Sub- attribute in the classification of target, can be the subdivision and description further carried out to target classification.Such as, it is to be checked The classification for asking target is behaved, and the sub- attribute in the category can be：The dress ornament of people, body, age, sex etc.；Mesh to be checked Target classification is car, then the sub- attribute in the category can be：Vehicle, logo, car plate color, number-plate number etc..

The classification identity characteristic of target, can be that (traditional image is special for the local feature of image-region where target Levy), including color characteristic, textural characteristics and shape facility, such as color histogram, color moment, GLOH (Gradient Location and Orientation Histogram, gradient locations direction histogram), HOG (Histogram of Oriented Gradient, histograms of oriented gradients), SIFT (Scale Invariant Feature Transform, yardstick Invariant features are converted), Shape context (Shape context) etc. or based on existing deep learning algorithm institute The deep learning feature of determination, or can be combination of the local feature with deep learning feature of image-region.

Wherein, because characteristics of image traditional at present can not describe the classification feature of target in image exactly, also, it is deep Although degree learning art is the better performances distinguished the class of target, the subdivision ability to target in class is poor.Based on this, The classification identity characteristic of target preferably can be combination of the local feature with deep learning feature of image-region.

It is exemplary, the local feature of image-region and the combination of deep learning feature, can include but is not limited to Lower any combination：

It is straight based on CNN (Convolutional Neural Network, the convolutional neural networks) features determined and color Fang Tu, color moment, GLOH, HOG, SIFT or Shape context combination；

The feature and color histogram, color moment, GLOH, HOG, SIFT or Shape determined based on recurrent neural network Context combination；

Based on DNN (Deep Neural Network, the deep neural network) features determined and color histogram, color The combination of square, GLOH, HOG, SIFT or Shape context；

Based on LSTM (Long Short-Term Memory, the length time memory unit) features determined and color histogram Figure, color moment, GLOH, HOG, SIFT or Shape context combination etc..

It should be noted that the specific category of the above-mentioned target provided, sub- attribute and classification identity characteristic, as just showing Example, should not constitute the restriction to the embodiment of the present invention.

S103, according to the structured features of target to be checked, inquires about the corresponding target characteristic data storehouse of video database, really Determine in video database, the corresponding video segment of structured features of target to be checked, wherein, target characteristic data storehouse be by regarding The structured features composition that video data in frequency database is extracted.

Specifically, processed offline can be carried out to the history video data in video database, the knot of video data is extracted Structure feature, composition target characteristic data storehouse.Or, live video stream can also online be handled, extract real-time video The structured features of stream, composition target characteristic data storehouse.Wherein, video database is the database for storing video data, The video of storage can be security protection video, monitor video etc..

Specifically, can be by the structured features of target to be checked, with the structured features in target characteristic database It is compared, inquires about in the video data of video data library storage, same or analogous with the structured features of target to be checked Target, it is determined that the corresponding video segment of the target inquired, is used as the corresponding video of the structured features of the target to be checked Fragment.

For example, it is possible to use contrast algorithm, inquiry and the similarity of the structured features of target to be checked are more than certain threshold The target of value (80%, 90% or other, can be determined according to actual conditions).Wherein, contrast algorithm neighbour and quasi- neighbor search Algorithm, such as Bruteforce search (strong search), inverted file index, kd-tree (k-dimensional trees, k- Space tree), ANN (Approximate Nearest Neighbor, approximate KNN search) or Hash scheduling algorithm.Or, with Search similarity is ranked up from high to low, the video object list after being sorted, and N (top N) individual result is to use before returning Family.

Referring to Fig. 2, Fig. 2 illustrates for another flow of video segment structuralized query method provided in an embodiment of the present invention Figure, Fig. 2 embodiments of the present invention are on the basis of embodiment illustrated in fig. 1, and in the structured features according to target to be checked, inquiry is regarded Before the corresponding target characteristic data storehouse of frequency database, increase step S104：Video in decoding video database, determines video The target that each sampled video frame is included in each video in database；For each target determined, the knot of target is extracted Structure feature, and structured features are to be determined based on deep learning algorithm；According to the structured features of each target of extraction, Set up target characteristic data storehouse.

Wherein it is possible to replace traditional characteristic extractive technique to carry out at structuring original video using depth learning technology Reason.

In actual applications, it is possible to use the method for detection and/or tracking, determine every in each video in video database The target that one sampled video frame is included.Because detection often takes very much, can between or using detection, or locked using tracking Determine the target in video.Tracking is advantageous in that, each target can be quickly associated in consecutive frame.

It should be noted that in actual applications, S104 execution sequential is not limited with Fig. 2, and S014 can be in S103 Any stage before is performed, and the present invention is not defined to this.

Specifically, it is possible to use object detection method such as DPM (Deformable Parts Model, deformable parts mould Type) etc., or the algorithm of target detection based on deep learning such as Faster-RCNN, RCNN, SSD (Single Shot MultiBox Detector), YOLO (You Only Look Once), or traditional Detection dynamic target algorithm such as Gauss Mixed model (GMM), VIBE (visual background extractor, visual background is extracted), background subtraction method, dynamic The target that each sampled video frame is included in each video in context update etc., detection or tracking video database, and obtain mesh Target classification.In addition, in the case where the target detected is the moving targets such as pedestrian, car, tracking can be gathered with associating Algorithm, such as Kalman filtering, particle filter, Meanshift (average drift), template matches or KCF (Kernerlized Correlation Filter, coring correlation filter) scheduling algorithm, realize motion target tracking.

Wherein it is possible to using each frame of video as sampled video frame, but for the consideration of Video processing efficiency and real-time, In actual applications, the partial video frame in video can be preferably extracted with certain sampling interval, the frame of video of extraction is made For sampled video frame.

In each target detected and after obtaining the classification of target, the time that can be in video occurred based on target with And the position in the frame of video that is appeared in of target, extract the classification identity characteristic of target, the identity re-authentication for target.So Afterwards, to different classes of target, using deep learning model, the sub- attribute in target classification is extracted, so as to realize：For determining The each target gone out, extracts the structured features of target.Finally, the structured features of each target of extraction are integrated into one Rise, be stored in the characteristic data database based on target, you can set up target characteristic data storehouse.

It is emphasized that in order to ensure the structured features of target to be checked and the structuring in target characteristic data storehouse The contrast of feature there are the structured features employed in validity, target characteristic data storehouse, it is necessary to foregoing mesh to be checked Target structured features are consistent.

Because the algorithm of target detection based on deep learning has high-precision advantage, target detection and motion mesh are being carried out During mark tracking, using this kind of algorithm of target detection and fusional movement target following scheduling algorithm, whole moving object detection and tracking 100FPS (Frames per Second, display frame number per second) can be reached, accuracy of detection can reach 80% or higher.

In the another embodiment that the present invention is provided, all frame of video are wrapped in the video that detection finishes video database After the target contained, it is determined that the frame of video comprising each target, based on the frame of video for including each target determined, obtains each The corresponding video segment of individual target.

It should be noted that the mode that video segment is determined can be predefined after target characteristic data storehouse is set up Or determine in real time when inquiring about the corresponding target characteristic data storehouse of video database.Generally, Ke Yishi When determine the corresponding video segment of target, advantage of this is that the corresponding video segment of each predetermined target can be saved Shared memory space, saves the storage resource of server.

In the another embodiment that the present invention is provided, when input data is the classification comprising target to be checked and its sub- attribute Text when, the structured features of the classification and its sub- attribute of the target to be checked that text is included as target to be checked.

It is then possible to utilize the classification and its sub- attribute of the target to be checked, target property corresponding to video database Database is filtered, it is determined that classification and its target of sub- attribute all same with the target to be checked.

Exemplary, input data is the text of " pedestrian's cap ", then the target to be checked that the text is included can be：Body The upper pedestrian with cap.By the classification of " pedestrian " as target to be checked, " cap " is as in the classification " pedestrian " of target Sub- attribute, the structured features of the text are " pedestrian " and " cap ".Utilize " pedestrian " and " cap " Filtration Goal characteristic quantity According to storehouse, the candidate target for meeting the structured features, such as the pedestrian a of hand-held cap, the pedestrian b for wearing cap etc. are listed. Finally, the corresponding video segment of each candidate target is determined, the piece of video corresponding to target to be checked included as the text Section, and supports user to click the corresponding video segment of some target to be played back, or based on the video segment search clicked with Similar video of the video segment etc..

In addition, in the case where the target detected is pedestrian or vehicle, deep learning algorithm can be based on, it is accurate to extract Frame region where the clothes region of pedestrian or vehicle, counts the distribution of color in the region, and is mapped to color space, so that Support pedestrian's clothes or the text search of vehicle color.For example, the text of input is " pedestrian is red ", then red can will be worn Video segment corresponding to the pedestrian of clothes, which is searched out, to be come；The text of input is " vehicle is red ", then outside can be tinted to be red Video segment corresponding to the vehicle of color, which is searched out, to be come.

Also, in the case where the target detected is car plate, the number-plate number can be identified, the car that will identify that In trade mark code deposit target characteristic data storehouse, to support the number-plate number based on regular expression to search for so that way of search becomes More diversity is obtained, user's search gets up more convenient.

In the another embodiment that the present invention is provided, when input data is image, the mesh to be checked that detection image is included Mark；Extract the structured features of target to be checked；

Sub- attribute in the classification and target classification of the target included in structured features using target to be checked, to regarding The corresponding target characteristic data storehouse of frequency database is screened, it is determined that belonging to the son in the classification and target classification of target to be checked The target of property all same；

The classification identity characteristic of the target included in structured features using target to be checked, enters to identified target Row search, the similarity obtained with the classification identity characteristic of target to be checked reaches the target of predetermined threshold value；Wherein, predetermined threshold value It can be 80%, 90% or other numerical value, specifically can voluntarily be determined according to actual conditions, and in the similarity searched for After the target for reaching predetermined threshold value, user is checked for convenience, and target can be ranked up, and obtains searching for similarity from height The list of target after to low sequence；

From video database to be checked, it is determined that the video segment corresponding to the target that search is obtained, is regarded as to be checked In frequency database, the corresponding video segment of structured features of target to be checked.Wherein it is possible to support user to click some target Corresponding video segment is to be played back, or searches for based on the video segment clicked video similar to the video segment etc..

It should be noted that the target to be checked that detection image is included, the algorithm of target detection used, can be DPM, Faster-RCNN, RCNN, SSD, YOLO, or traditional Detection dynamic target algorithm such as gauss hybrid models, VIBE, background One of relief method, dynamic background renewal etc. or the mixing of polyalgorithm, the algorithm specifically selected with foregoing, it is necessary to build The algorithm of target detection used during vertical target characteristic data storehouse is consistent.Also, extract the knot of target to be checked Structure feature, the method used, also need with it is foregoing set up target characteristic data storehouse during the method that is used keep Unanimously.

And it is possible to the sub- attribute in the target classification and target classification of interface display target to be checked, and support pin To the manual amendment of sub- attribute.

In addition, can utilize prior art, realize respectively：Target characteristic data storehouse corresponding to video database is carried out Screening, and identified target is scanned for, the embodiment of the present invention is not repeated it herein.

In addition, in the case where input data is facial image, can merge quick in unified deep learning framework Human face region detection, the detection of face mark point are alignd and three parts such as Feature Extraction with face, are extracted based on depth The practise, face characteristic of high discrimination, can support suspicion face in massive video to scheme to look for people, i.e., according to facial image, look into The corresponding target characteristic data storehouse of video database is ask, determines that the structured features of the face of this in video database are corresponding, gather around There is the video segment of the people of the face appearance.

Moreover, input data behave body image in the case of, can based on deep learning pedestrian target detect With pedestrian's feature extraction, pedestrian's feature of the high discrimination based on deep learning of extraction can be supported in massive video based on suspicion Doubt humanoid body characteristicses to scheme to look for people, i.e., according to body image, inquire about the corresponding target characteristic data storehouse of video database, it is determined that The structured features of the body of this in video database are corresponding, people with the body video segment, strengthen the practicality of search Property.

In the another embodiment that the present invention is provided, when input data includes image and text, by treating that text includes The classification and its sub- attribute of target, and the classification identity characteristic of target to be checked that the image extracted is included are inquired about, as treating Inquire about the structured features of target；

The classification and its sub- attribute of the target to be checked included using text, target property number corresponding to video database Filtered according to storehouse, it is determined that classification and its target of sub- attribute all same with target to be checked；

The classification identity characteristic of the target to be checked included using image, is scanned for identified target, obtain with The similarity of classification identity characteristic reaches the target of predetermined threshold value；Wherein, predetermined threshold value can be 80%, 90% or other numbers Value, specifically can voluntarily be determined, and after the similarity searched for reaches the target of predetermined threshold value, be according to actual conditions Facilitate user to check, target can be equally ranked up, obtain searching for the target after similarity sorts from high to low List；

From video database to be checked, it is determined that the video segment corresponding to the target that search is obtained, is regarded as to be checked In frequency database, the corresponding video segment of structured features of target to be checked.

Wherein, prior art can be utilized, is realized respectively：Target characteristic data storehouse corresponding to video database is carried out Filtering, and identified target is scanned for, the embodiment of the present invention is not repeated it herein.

Exemplary, input data is：Text and a pedestrian being branded as comprising a potential head including " pedestrian's cap " Image, then the target to be checked that the input data includes is：The pedestrian for wearing cap.

The target to be checked that the image is included is detected, the classification identity characteristic of the target to be checked detected is extracted, will carry The classification " pedestrian " and its sub- attribute " cap " for the target to be checked that the classification identity characteristic and the text taken includes, are treated as this Inquire about the structured features of target.

It is emphasized that the target to be checked that detection image is included, the algorithm of target detection used, can be DPM, Faster-RCNN, RCNN, SSD, YOLO, or traditional Detection dynamic target algorithm such as gauss hybrid models, VIBE, background Relief method, dynamic background such as update at the mixing of one of them or polyalgorithm, the algorithm specifically selected, still need with it is foregoing Set up target characteristic data storehouse during the algorithm of target detection that is used be consistent.Also, extract target to be checked Structured features, the method used, still need with it is foregoing set up target characteristic data storehouse during the side that is used Method is consistent.

Input data can also be the combination of image and text in addition to image or text, add the search of user Mode, practicality is strengthened.Also, with reference to image and text, it can more precisely characterize the structuring of target to be checked Feature, so as to improve the search degree of accuracy of the target correspondence video segment to be checked.

In addition, it will be appreciated by persons skilled in the art that set up target characteristic data storehouse, Video processing can be deployed in In server group, its hardware configuration can be CPU (Central Processing Unit, central processing unit)：Xeon3.0G* N；GPU (Graphics Processing Unit, graphics processor)：NVIDIA GPU*M；Internal memory：More than 16G；Hard disk： 300G*2, RAID0+1.

S101-S103 is performed, can be deployed in retrieval server group, its hardware configuration can be CPU：Xeon2.4G； Internal memory：More than 64G；Hard disk：300G*2, RAID0+1.

Video database and target characteristic data storehouse, can be stored in disk array, and its hardware configuration can be：Storage Capacity (GB)：More than 10000；Average transmission rate (MB/S)：More than 200M；RAID(Redundant Arrays of Independent Disks, disk array) support：0,0+1,1,5,6,10,50, JBOD (Just a Bunch Of Disks, Disk cluster).

It is emphasized that the deployment way of foregoing proposition, can both have been disposed beyond the clouds, it can also be deployed in locally, its In, when data volume scale is smaller can unit deployment, when larger can in a distributed manner framework dispose.Typical hardware configuration Environment can as described above or other feasible hardware configurations, and the present invention is not defined to this.

Specifically, according to the video data volume size of video database, can be calculated using distributed multinode storage at a high speed Unit can select GPU and CPU.In terms of O＆M, processed offline (data volume correspondence is loaded) can be divided into and (accessed with on-line search Number of times correspondence number of concurrent) two large divisions, simplify processing procedure, improve efficiency, time complexity and space complexity are all obtained greatly Big reduction, meets the processing requirement of magnanimity big data, possesses very strong practicality.

Referring to Fig. 3, Fig. 3 is a kind of structural representation of video segment structuralized query device provided in an embodiment of the present invention Figure, corresponding with the flow shown in Fig. 1, the inquiry unit can include：Acquisition module 301, the first extraction module 302 and inquiry Module 303；

Acquisition module 301, for obtaining the input data for including target to be checked, wherein, input data includes：Image And/or text；

First extraction module 302, the structured features for extracting target to be checked in input data, wherein, structuring Feature includes the classification identity characteristic of classification, the sub- attribute in target classification and the target of target；

Enquiry module 303, for the structured features according to target to be checked, the corresponding target of inquiry video database is special Property database, is determined in video database, the corresponding video segment of structured features of target to be checked, wherein, target property What the structured features that database is extracted by the video data in video database were constituted.

Referring to Fig. 4, Fig. 4 is a kind of structural representation of video segment structuralized query device provided in an embodiment of the present invention Figure, corresponding with the flow shown in Fig. 2, embodiment illustrated in fig. 4 of the present invention is on the basis of embodiment illustrated in fig. 3, increase detection Module 304, the second extraction module 305 and set up module 306：

Detection module 304, for the video in decoding video database, is determined each in each video in video database The target that sampled video frame is included；

Second extraction module 305, for each target for determining, extracts the structured features of target, and structure Change and be characterized as what is determined based on deep learning algorithm；

Module 306 is set up, for the structured features of each target according to extraction, target characteristic data storehouse is set up.

Specifically, enquiry module is determining module；

Determining module, for after the target that detection finishes that all frame of video are included in the video of video database, really Surely the frame of video of each target is included, based on the frame of video for including each target determined, determines that each target is corresponding Video segment.

Specifically, extraction module is the first determining module；

First determining module, for when input data is the text of the classification comprising target to be checked and its sub- attribute, Structured features of the classification and its sub- attribute for the target to be checked that text is included as target to be checked.

Specifically, extraction module, including：

Detection unit, for when input data be image when, the target to be checked that detection image is included；

Extraction unit, the structured features for extracting target to be checked；

Enquiry module, including：

First determining unit, for the classification and target class of the target included in the structured features using target to be checked Sub- attribute in not, target characteristic data storehouse corresponding to video database is screened, it is determined that the classification with target to be checked With the target of the sub- attribute all same in target classification；

First search unit, the classification identity of the target for being included in the structured features using target to be checked is special Levy, identified target is scanned for, obtain reaching predetermined threshold value with the similarity of the classification identity characteristic of target to be checked Target；

Second determining unit, for from video database to be checked, it is determined that the video corresponding to the target that search is obtained Fragment, as in video database to be checked, the corresponding video segment of structured features of target to be checked.

Specifically, extraction module is the second determining module；

Second determining module, for when input data include image and text when, the target to be checked that text is included Classification and its sub- attribute, and the classification identity characteristic of target to be checked that the image extracted is included, are used as target to be checked Structured features；

Enquiry module, including：

Filter element, for the classification of target to be checked included using text and its sub- attribute, to video database pair The target characteristic data storehouse answered is filtered, it is determined that classification and its target of sub- attribute all same with target to be checked；

Second search unit, for the classification identity characteristic of the target to be checked included using image, to identified mesh Mark is scanned for, and the similarity obtained with classification identity characteristic reaches the target of predetermined threshold value；

3rd determining unit, for from video database to be checked, it is determined that the video corresponding to the target that search is obtained Fragment, as in video database to be checked, the corresponding video segment of structured features of target to be checked.

The embodiment of the present invention additionally provides a kind of electronic equipment, as shown in figure 5, including processor 51, communication interface 52, depositing Reservoir 53 and communication bus 54, wherein, processor 51, communication interface 52, memory 53 is completed each other by communication bus 54 Communication,

Memory 53, for depositing computer program；

Processor 51, for performing during the program deposited on memory 53, realizes following steps：

Obtain the input data for including target to be checked；

The structured features of target to be checked in input data are extracted, wherein, structured features include the classification of target, mesh Mark the sub- attribute in classification and the classification identity characteristic of target；

According to the structured features of target to be checked, the corresponding target characteristic data storehouse of inquiry video database, it is determined that regarding In frequency database, the corresponding video segment of structured features of target to be checked, wherein, target characteristic data storehouse is by video counts The structured features composition extracted according to the video data in storehouse.

The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, abbreviation PCI) bus or EISA (Extended Industry Standard Architecture, abbreviation EISA) bus etc..The communication bus can be divided into address bus, data/address bus, controlling bus Deng.For ease of representing, only represented in figure with a thick line, it is not intended that only one bus or a type of bus.

The communication that communication interface is used between above-mentioned electronic equipment and other equipment.

Memory can include random access memory (Random Access Memory, abbreviation RAM), can also include Nonvolatile memory (non-volatile memory), for example, at least one magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.

Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, Abbreviation CPU), GPU, network processing unit (Network Processor, abbreviation NP) etc.；It can also be digital signal processor (Digital Signal Processing, abbreviation DSP), application specific integrated circuit (Applica tion Specific Integrated Circuit, abbreviation ASIC), field programmable gate array (Field-Programmable Gate Array, Abbreviation FPGA) or other PLDs, discrete gate or transistor logic, discrete hardware components.

It should be noted that the electronic equipment that the present invention is provided, as just a kind of implementation, is not constituted to this hair The restriction of the deployment way of the video segment structuralized query method of bright offer.Deployment beyond the clouds, locally, unit deployment or be distributed The implementation of formula framework deployment, and other can realize the deployment way of this method, all fall in protection scope of the present invention Within.

It is emphasized that above-mentioned video segment structuralized query method, can preferably be disposed real based on Cloud Server It is existing.

Cloud Server (also known as cloud computing server or cloud main frame) provides a kind of simple efficient, safe and reliable, disposal ability Can elastic telescopic calculating service, its way to manage it is simpler than physical server efficiently.User need not purchase hardware in advance, i.e., It can rapidly create or discharge any many Cloud Servers.

As the important component of cloud computing service, Cloud Server is to provide integrated service towards all kinds of Internet users The service platform of ability.The Platform integration traditional the Internet, applications three big key elements：Calculating, storage, network, User oriented provides the Internet infrastructure service of publicization.

Also, in actual applications, high in the clouds environment can be configured according to the self-demand of user, specifically with can be with Realize that the video segment structuralized query method that the present invention is provided is defined.

In the another embodiment that the present invention is provided, a kind of computer-readable recording medium, the computer are additionally provided Be stored with instruction in readable storage medium storing program for executing, when run on a computer so that computer performs any in above-described embodiment Described video segment structuralized query method.

In the another embodiment that the present invention is provided, a kind of computer program product for including instruction is additionally provided, when it When running on computers so that computer performs any described video segment structuralized query method in above-described embodiment.

In the above-described embodiments, it can come real wholly or partly by software, hardware, firmware or its any combination It is existing.When implemented in software, it can realize in the form of a computer program product whole or in part.The computer program Product includes one or more computer instructions.When loading on computers and performing the computer program instructions, all or Partly produce according to the flow or function described in the embodiment of the present invention.The computer can be all-purpose computer, special meter Calculation machine, computer network or other programmable devices.The computer instruction can be stored in computer-readable recording medium In, or the transmission from a computer-readable recording medium to another computer-readable recording medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, numeral from web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave) mode to another web-site, computer, server or Data center is transmitted.The computer-readable recording medium can be any usable medium that computer can be accessed or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disc Solid State Disk (SSD)) etc..

It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or deposited between operating In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to Nonexcludability is included, so that process, method, article or equipment including a series of key elements not only will including those Element, but also other key elements including being not expressly set out, or also include being this process, method, article or equipment Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that Also there is other identical element in process, method, article or equipment including the key element.

Each embodiment in this specification is described by the way of related, identical similar portion between each embodiment Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.It is real especially for system Apply for example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention It is interior.

Claims

1. a kind of video segment structuralized query method, it is characterised in that methods described includes：

The structured features of target to be checked described in the input data are extracted, wherein, the structured features include target Classification, the classification identity characteristic of the sub- attribute in target classification and target；

According to the structured features of the target to be checked, the corresponding target characteristic data storehouse of inquiry video database determines institute State in video database, the corresponding video segment of structured features of the target to be checked, wherein, the target characteristic data What the structured features that storehouse is extracted by the video data in the video database were constituted.

2. according to the method described in claim 1, it is characterised in that the structured features according to the target to be checked, Inquire about before the corresponding target characteristic data storehouse of video database, methods described also includes：

The video in the video database is decoded, each sampled video frame bag in each video is determined in the video database The target contained；

For each target determined, the structured features of the target are extracted, and the structured features are based on depth What learning algorithm was determined；

3. method according to claim 2, it is characterised in that the structured features according to the target to be checked, The corresponding target characteristic data storehouse of video database is inquired about, is determined in the video database, the structure of the target to be checked Change the corresponding video segment of feature, including：

After the target that all frame of video are included in the video that detection finishes the video database, it is determined that including each target Frame of video, based on the frame of video for including each target determined, determine the corresponding video segment of each target.

4. method according to claim 1 or 2, it is characterised in that to be checked described in the extraction input data The structured features of target, including：

When the input data is the text of the classification comprising target to be checked and its sub- attribute, the text is included The classification and its sub- attribute of the target to be checked as the target to be checked structured features.

5. method according to claim 1 or 2, it is characterised in that to be checked described in the extraction input data The structured features of target, including：

Extract the structured features of the target to be checked；

The structured features according to the target to be checked, the corresponding target characteristic data storehouse of inquiry video database, really In the fixed video database to be checked, the corresponding video segment of structured features of the target to be checked, including：

Sub- attribute in the classification and target classification of the target included in structured features using the target to be checked, to regarding The corresponding target characteristic data storehouse of frequency database is screened, it is determined that with the classification and target classification of the target to be checked The target of sub- attribute all same；

The classification identity characteristic of the target included in structured features using the target to be checked, enters to identified target Row search, the similarity obtained with the classification identity characteristic of the target to be checked reaches the target of predetermined threshold value；

From the video database to be checked, it is determined that the video segment corresponding to the target that search is obtained, as described to be checked Ask video database in, the corresponding video segment of structured features of the target to be checked.

6. method according to claim 1 or 2, it is characterised in that to be checked described in the extraction input data The structured features of target, including：

When input data includes image and text, the classification and its sub- attribute of the target to be checked that the text is included, with And the classification identity characteristic of the target to be checked that the described image extracted is included, it is used as the structuring of the target to be checked Feature；

The classification and its sub- attribute of the target to be checked included using the text, target property number corresponding to video database Filtered according to storehouse, it is determined that classification and its target of sub- attribute all same with the target to be checked；

The classification identity characteristic of the target to be checked included using described image, is scanned for identified target, obtained The target of predetermined threshold value is reached to the similarity with the classification identity characteristic；

7. a kind of video segment structuralized query device, it is characterised in that described device includes：

First extraction module, the structured features for extracting target to be checked described in the input data, wherein, the knot Structure feature includes the classification identity characteristic of classification, the sub- attribute in target classification and the target of target；

Enquiry module, for the structured features according to the target to be checked, the corresponding target property of inquiry video database Database, is determined in the video database, the corresponding video segment of structured features of the target to be checked, wherein, institute State what the structured features extracted by the video data in the video database in target characteristic data storehouse were constituted.

8. device according to claim 7, it is characterised in that described device also includes：

Detection module, for decoding the video in the video database, is determined every in each video in the video database The target that one sampled video frame is included；

Second extraction module, for each target for determining, extracts the structured features of the target, and the structure Change and be characterized as what is determined based on deep learning algorithm；

9. device according to claim 8, it is characterised in that the enquiry module is determining module；

The determining module, for the target that all frame of video are included in the video that detection finishes the video database Afterwards, it is determined that the frame of video comprising each target, based on the frame of video for including each target determined, determines each target pair The video segment answered.

10. the device according to claim 7 or 8, it is characterised in that the extraction module is the first determining module；

First determining module, for when the text that the input data is the classification comprising target to be checked and its sub- attribute When, the structuring of the classification and its sub- attribute of the target to be checked that the text is included as the target to be checked Feature.

11. the device according to claim 7 or 8, it is characterised in that the extraction module, including：

The enquiry module, including：

First determining unit, for the classification and target class of the target included in the structured features using the target to be checked Sub- attribute in not, target characteristic data storehouse corresponding to video database is screened, it is determined that with the target to be checked The target of sub- attribute all same in classification and target classification；

First search unit, the classification identity of the target for being included in the structured features using the target to be checked is special Levy, identified target is scanned for, the similarity obtained with the classification identity characteristic of the target to be checked reaches default The target of threshold value；

Second determining unit, for from the video database to be checked, it is determined that the video corresponding to the target that search is obtained Fragment, as in the video database to be checked, the corresponding video segment of structured features of the target to be checked.

12. the device according to claim 7 or 8, it is characterised in that the extraction module is the second determining module；

Second determining module, for when input data include image and text when, the mesh to be checked that the text is included Target classification and its sub- attribute, and the classification identity characteristic of the target to be checked that the described image extracted is included, as The structured features of the target to be checked；

The enquiry module, including：

Filter element, for the classification of target to be checked included using the text and its sub- attribute, to video database pair The target characteristic data storehouse answered is filtered, it is determined that classification and its target of sub- attribute all same with the target to be checked；

Second search unit, for the classification identity characteristic of the target to be checked included using described image, to being determined Target scan for, the similarity obtained with the classification identity characteristic reaches the target of predetermined threshold value；

3rd determining unit, for from the video database to be checked, it is determined that the video corresponding to the target that search is obtained Fragment, as in the video database to be checked, the corresponding video segment of structured features of the target to be checked.

13. a kind of electronic equipment, it is characterised in that including processor, communication interface, memory and communication bus, wherein, processing Device, communication interface, memory completes mutual communication by communication bus；

Memory, for depositing computer program；

Processor, for performing during the program deposited on memory, realizes any described method and steps of claim 1-6.