CN107544978A

CN107544978A - A kind of content based video retrieval system method

Info

Publication number: CN107544978A
Application number: CN201610473737.4A
Authority: CN
Inventors: 白永强; 罗旻; 鲍东山
Original assignee: BEIJING NUFRONT SOFTWARE TECHNOLOGY Co Ltd
Current assignee: BEIJING NUFRONT SOFTWARE TECHNOLOGY Co Ltd
Priority date: 2016-06-24
Filing date: 2016-06-24
Publication date: 2018-01-05

Abstract

The invention provides a kind of content based video retrieval system method, including：Video search request is received through human-computer interaction interface；Multiple retrieval process are triggered according to video search request, including：Triggering metadata retrieval server is retrieved to the metadata of video frequency program；Triggering captions retrieval server is retrieved to the XML file for depositing program credits text；Triggering video frequency searching cluster is retrieved to the characteristic of key frame of video；Speech retrieval cluster is triggered, the voice messaging of video frequency program, including pinyin string and syllable graph are retrieved；Each search result is integrated, then returns to user through the human-computer interaction interface.It this method provide a kind of solution of effective content based video retrieval system.

Description

A kind of content based video retrieval system method

Technical field

The present invention relates to content based video retrieval system (CBR) field.Tissue including video frequency feature data, storage, it is high The technology in the fields such as the index of dimensional feature vector and retrieval, distributed search.

Background technology

One information retrieval system generally includes the searching database of a core, a search dispatching server and service Device group.Interface external then that search and Data Enter are provided.As shown in Figure 1.

Wherein, Data Enter is by the mode manually keyed in mostly.Carried by provider's handle that content is retrieved in system Supply user, be entered into by the Data Enter interface of searching system in database for the content information of search.

Core database system is then mainly responsible for the information data that storage is used for user search.

Search dispatching server is responsible for receiving, parsing the request of user, and its searching request is distributed into retrieval server, with Perform actual retrieval.After retrieval result returns to search dispatching server, search dispatching server will be carried out to returning result Processing, such as sort, merge, screening.User is returned to after processing.Completion is once searched for.

Different from common information retrieval system, video searching system is complex, comprising module it is also relatively more.

One video searching system is analyzed by video features, characteristic storage, search dispatching and the video based on content Several big module compositions such as retrieval.Wherein, the storage of characteristic, search dispatching and video frequency searching are the cores of such search engine Module.

Traditional video frequency search system will obtain video frequency program by manual annotation for information about, and these information are deposited It is placed in database and is used for inquiry hereafter.That is, analysis module is really a work being had been manually done by people Make module.

This mode carries significant limitation.Artificial annotation not only expends substantial amounts of manpower and time, and often With very big subjectivity, it is impossible to make accurate, just portray to video program content.Particularly to the face of video frequency program The physical features such as color, texture are even more can not accurate description.Even this kind of spy without being bound by subjective factor of voice, captions Sign, also tends to make artificial treatment become infeasible because its data volume is huge.

Therefore, people by graphical analysis, speech analysis and caption analysis technology for video frequency program processing, using computer as Main tool, automation obtain the characteristic information related to video program content, and support is provided for the search based on content.

In such a system, each side's region feature of one section of video frequency program is analyzed and processed.

In terms of image, the division of scene and camera lens is carried out to video frequency program and extracts representative key Frame, and then image procossing is carried out to key frame, by its color, texture, shape in a mathematical format, such as vector, it is indicated.

Further, also high level is extracted from the low-level features of key frame of video, that is, foregoing characteristic Semantic information, face, the movement tendency of object such as occurred in key frame, and they are also illustrated that into the shape of text or mathematics Formula.

In terms of audio, the voice of the people that occurs in video frequency program, background music etc. are handled with computer, will It is converted into character string or the mathematical form with certain implication.

For example, for the voice of the people occurred in video frequency program, the method that speech recognition can be used, convert speech into The form of syllable graph or word figure.

For the music occurred in video frequency program, by the analysis to its wave character, can also obtain music melody, Tonality feature, or height variation characteristic of tone etc..

, it is necessary to identify the Chinese character occurred in video image in terms of captions, and extracted and be converted into character.

The data obtained after the processing of above means are referred to as the characteristic of video frequency program.The quantity of characteristic is past Past is very huge.Such as, the video frequency program of 30 minutes or so may include the key frame picture of more than 500, and every The feature of individual picture generally requires several tens, vectors of dimension even up to a hundred and portrayed；The voice of same one section of program exists After changing into a kind of characteristic structure of figure, several million space preservation is generally required.

Therefore, when carrying out content based video retrieval system, often it is faced with that data volume is huge, recall precision is low to ask Topic.It must try to solve.Either reduce the data volume of characteristic or take ad hoc base to reduce range of search to improve inspection Suo Sudu.

Meanwhile content based video retrieval system also exist can not accurate match the problem of.Search condition and Database Systems The characteristic of middle storage is frequently not absolutely to match.For example, even if in search condition image and database key Occurs the image of same person in frame picture, after analyzing these key frames, obtained characteristic vector has been also impossible to As being with the characteristic of search condition image entirely.But for video frequency searching, these images are " meeting " retrieval bar Part.Therefore, the retrieval for characteristic vector should perform fuzzy matching strategy.Appropriate retrieval and search strategy is needed to seek The result of condition can be met by looking for, and obtain the degree of fuzzy matching.

At present, in video analysis field, art of image analysis, speech analysis field and caption recognition field, all had Exciting achievement in research.The precision of analysis has reached certain degree.But at home at present still seldom will be upper The achievement in research for stating field is applied in the product of reality.It is combined as by the achievement in research in above-mentioned field, for based on content Video frequency searching service, even more beyond example.

The achievement in video analysis field, art of image analysis, speech analysis field and caption recognition field is combined, And it is aided with other focus technologies, common is content based video retrieval system service, also faces very big difficulty and challenge.No matter All there are considerable technological difficulties to need to solve from design, or from the exploitation of reality.

The content of the invention

In view of this, it is an object of the invention to ....In order to which some aspects of the embodiment to disclosure have one substantially Understanding, shown below is simple summary.The summarized section is not extensive overview, nor to determine that key/critical forms Element or the protection domain for describing these embodiments.Its sole purpose is that some concepts are presented with simple form, in this, as The preamble of following detailed description.

The invention provides a kind of content based video retrieval system method, including：

Video search request is received through human-computer interaction interface；

Multiple retrieval process are triggered according to video search request, including：

Triggering metadata retrieval server is retrieved to the metadata of video frequency program；

Triggering captions retrieval server is retrieved to the XML file for depositing program credits text；

Triggering video frequency searching cluster is retrieved to the characteristic of key frame of video；

Speech retrieval cluster is triggered, the voice messaging of video frequency program, including pinyin string and syllable graph are retrieved；

Each search result is integrated, then returns to user through the human-computer interaction interface.

It is preferred that metadata table, metadata retrieval module and metadata record are set in the metadata retrieval server Enter module, wherein：

Metadata table includes one or more in following information：Program ID, programm name, director, performer, language, The place of production, Class1, type 2, file format, file size, length, screen width, screen height, program address, program file Name, upload time, upper set address, uploads state, if must examine, program level, examine mark, program price, program abstract, envelope Kill attribute field；

Metadata retrieval module includes a storage process that program is retrieved according to program ID, and one according to combination Condition retrieves the storage process of program；

Metadata recording module includes a storage process that specified metadata information is inserted to database table.

It is preferred that a database table for being used for storing captions XML file is set in the captions retrieval server, one The individual table for storage server relevant configuration information, the storage process of a reading configuration information, one is used for XML retrievals Storage process, one is used for the segmented index to the storage process of Input of Data XML file and an XML.

It is preferred that a program id field is set in captions XML file database table, XML file file-name field and XML texts Part field；And/or

Arrange parameter id field, parameter file-name field and parameter value field in server configuration information table；And/or

Keyword logical expression generation program block and search program block are set in the storing process of XML retrievals.

It is preferred that a video scene retrieval server and a video frequency searching service are set in video frequency searching cluster Device, wherein：

The database table of one storage scenarios key frame index is set in video scene retrieval server, and one is used for rope Draw the storing process of typing and a program bag for being used to retrieve scene key frame；And/or

The database table of one storage key frame of video XML file, a storage key are set in frequency retrieval server The database table of frame index, the database table of a service device configuration information, one is used for being stored in for typing XML file Journey, a program bag for being used to generate index, a program bag for being used for search index table, one is used to call video scene to examine The remote linkage of rope server internal program.

It is preferred that index id field is set in scene key frame index data base table, three index cluster lower bound vectors Field, three index cluster upper bound VECTOR fields, index content nested table, it is poly- to index key frame Total no field and index in cluster Class ultimate range field；

Wherein, index content nested table, including entry mark id field, the affiliated program id field of key frame, key frame are compiled Number field, critical frame types field, scene start time field, scene end time field, camera lens time started field, camera lens End time field, key frame time point field, three key frame characteristic information VECTOR fields.

It is preferred that the structure and scene key frame index data base table of key frame of video XML file database table are set Structure is identical, including an index content nested table；And/or

One is set to be used to a new images characteristic value being added to a cluster in the program bag for generating index In program bag, a program bag for being used to extend a cluster by appointed threshold value, and one be used to create depositing of newly clustering Storage process；And/or

One retrieval primary storage process, a calculating search condition image are set in the program bag for search index table With the storage process of the minimum range of some cluster, one calculates search condition image and the minimum of the ultimate range of some cluster The storage process of value and one be used to judging some cluster whether effective program segment.

It is preferred that in by appointed threshold value extend one cluster program bag in set one extension primary storage process, One be used for calculate extension after cluster hypermatrix leading diagonal length storage process, one be used for calculate extension after cluster whether Overlapping storage process be present with other existing clusters；Wherein, the maximum permissible value for clustering hypermatrix catercorner length is set to 2.0；And/or

One main memory for being used to add is set in the program bag being added to a new images characteristic value in one cluster Whether storage process and a characteristic value for being used to judge an image belong to the storage process of some cluster.

It is preferred that the maximum permissible value of distance between search condition image and cluster is set to set in retrieving in primary storage process For 2.0.

It is preferred that the optimal retrieval clothes of a speech buffer storage retrieval server, a voice are set in speech retrieval cluster Business device and a voice syllable graph retrieval server.For above-mentioned and related purpose, after one or more embodiments include The feature that face will be explained in and be particularly pointed out in the claims.Following explanation and accompanying drawing describe in detail some exemplary Aspect, and some modes in the only utilizable various modes of principle of each embodiment of its instruction.It is other By as following detailed description is considered in conjunction with the accompanying and becomes obvious, the disclosed embodiments are will for benefit and novel features Including all these aspects and they be equal.

Brief description of the drawings

Fig. 1 is common searching system structural representation in the prior art；

Fig. 2 is a kind of content based video retrieval system method flow diagram of the embodiment of the present invention；

Fig. 3 is a kind of content based video retrieval system system block diagram of the embodiment of the present invention.

Embodiment

The following description and drawings fully show specific embodiments of the present invention, to enable those skilled in the art to Put into practice them.Other embodiments can include structure, logic, it is electric, process and other change.Embodiment Only represent possible change.Unless explicitly requested, otherwise single component and function are optional, and the order operated can be with Change.The part of some embodiments and feature can be included in or replace part and the feature of other embodiments.This hair The scope of bright embodiment includes the gamut of claims, and claims is all obtainable equivalent Thing.Herein, these embodiments of the invention can individually or generally be represented that this is only with term " invention " For convenience, and if in fact disclosing the invention more than one, the scope for being not meant to automatically limit the application is to appoint What single invention or inventive concept.

The embodiments of the invention provide a kind of content based video retrieval system method, as shown in Fig. 2 comprising the following steps：

Step S201：Video search request is received through human-computer interaction interface；

Step S202：Multiple retrieval process are triggered according to video search request；

Step S203：The information that each retrieval process retrieves is integrated；

Step S204：Information after integration is returned into user through the human-computer interaction interface.

Wherein, it is to trigger following retrieval process 1-4 simultaneously, so as to improve recall precision when performing step S202：

Process 1：Triggering metadata retrieval server is retrieved to the metadata of video frequency program；

Process 2：Triggering captions retrieval server is retrieved to the XML file for depositing program credits text；

Process 3：Triggering video frequency searching cluster is retrieved to the characteristic of key frame of video；

Process 4：Speech retrieval cluster is triggered, the voice messaging of video frequency program, including pinyin string and syllable graph are examined Rope.

It is preferred that the embodiment of the present invention sets metadata table, metadata retrieval mould in the metadata retrieval server Block and metadata recording module, wherein：

It is preferred that the embodiment of the present invention sets one to be used for storing captions XML file in the captions retrieval server Database table, one be used for storage server relevant configuration information table, one reading configuration information storage process, one For the storage process of XML retrievals, one is used for the segmentation rope to the storage process of Input of Data XML file and an XML Draw.

It is preferred that the embodiment of the present invention sets a program id field, XML file in captions XML file database table File-name field and XML file field；And/or arrange parameter id field, parameter file-name field and parameter in server configuration information table Value field；And/or keyword logical expression generation program block and search program block are set in the storing process of XML retrievals.

It is preferred that the embodiment of the present invention sets a video scene retrieval server in video frequency searching cluster and one regards Frequency retrieval server, wherein：

It is preferred that the embodiment of the present invention sets index id field, three indexes in scene key frame index data base table Lower bound VECTOR field is clustered, three index cluster upper bound VECTOR fields, index content nested table, indexes key frame sum in cluster Field and index cluster ultimate range field.Wherein, index content nested table, including entry mark id field, belonging to key frame Program id field, crucial frame number field, critical frame types field, scene start time field, scene end time field, mirror Head time started field, camera lens end time field, key frame time point field, three key frame characteristic information VECTOR fields.

It is preferred that the embodiment of the present invention sets the structure and scene key frame rope of key frame of video XML file database table It is identical to draw the structure of database table, including an index content nested table；And/or in the program bag for generating index Set one be used for by a new images characteristic value be added to one cluster in program bag, one be used for by appointed threshold value expand The program bag of a cluster is opened up, and one is used to create the storage process newly clustered；And/or in the journey for search index table One retrieval primary storage process is set in sequence bag, and one calculates being stored in for the minimum range that search condition image clusters with some Journey, a calculating search condition image are used to judge certain with the storage process of the minimum value of the ultimate range of some cluster and one It is individual cluster whether effective program segment.

It is preferred that one extension of setting in the program bag that a cluster is extended by appointed threshold value of the embodiment of the present invention Primary storage process, one be used for calculate extension after cluster hypermatrix leading diagonal length storage process, one be used for calculate expand Cluster whether to cluster with existing other after exhibition and overlapping storage process be present；Wherein, hypermatrix catercorner length is clustered most Big permissible value is set to 2.0；And/or one is set in the program bag being added to a new images characteristic value in one cluster Whether the characteristic value for being used to judge an image for the primary storage process of addition and one belongs to the storage process of some cluster.

It is preferred that distance between retrieving setting search condition image in primary storage process and clustering of the embodiment of the present invention Maximum permissible value is set to 2.0.

It is preferred that the embodiment of the present invention sets a speech buffer storage retrieval server, a language in speech retrieval cluster The optimal retrieval server of sound and a voice syllable graph retrieval server.

Referring to Fig. 3, the figure shows a kind of content based video retrieval system system of the embodiment of the present invention, including：Man-machine friendship Mutual interface 301, controller 302, metadata retrieval server 303, captions retrieval server 304, video frequency searching cluster 305, language Sound retrieves cluster 306 and data processor 307, wherein：

Human-computer interaction interface 301, for receiving video search request, retrieval result is returned into user；

Controller 302, for being asked to trigger metadata retrieval server 303, captions retrieval clothes simultaneously according to video search Business device 304, video frequency searching cluster 305, speech retrieval cluster 306 are accordingly retrieved；

Data processor 307, for by metadata retrieval server 303, captions retrieval server 304, video frequency searching collection Group 305, the retrieval result of speech retrieval cluster 306 are integrated, and are exported to human-computer interaction interface 301；

Metadata retrieval server 303, captions retrieval server 304, the video inspection of parallel trigger is described in detail below Suo Jiqun 305, speech retrieval cluster 306, it is specific as follows：

1st, metadata retrieval server 303：

Metadata is the text information manually filled in when programming, for portraying the topic of video frequency program, directing, drill The content informations such as member, the place of production, brief introduction, and frame per second, resolution ratio, program request expense, whether need DRM checking etc. characteristic.

This part is the module for uniquely needing manually to participate in whole system.

After these data are manually filled, it is entered into metadatabase.

When needing the relevant information of search result after performing simple metadata query, or execution Content based coding, Retrieval request will be sent to metadata retrieval server, metadatabase is inquired about.

2nd, captions retrieval server 304：

It is exactly the captioned test occurred in video frequency program to obtain subtitles appearances data.In caption analysis, by these texts This and its there is the scene of place category and start and end time of camera lens saves as the XML file of specified format, and be entered into In caption database.

The search condition sent by search dispatching server is a character string, wherein comprising several search conditions, is used Specify separators.

First, to extract different search conditions, and according to later search program requirement they are connected into it is specified The logical expression of pattern.Then, the video frequency program captions XML file in caption database was carried out according to this expression formula Filter, the program for including search condition in file is picked out.Finally, the retrieval by window condition in the file elected, finds There is place category scene and the temporal information of camera lens in the condition.

3rd, video frequency searching cluster 305:

In view of the characteristic amount of key frame of video is huge, in order to ensure the response time, video frequency searching module is designed to One retrieval cluster.The cluster includes two servers of video scene and video frequency searching.

Although video frequency feature data is also to deposit in the form of an xml-file in input database, in order to improve Recall precision, all key frame of video can be indexed.

Index uses the high dimension vector index technology based on R trees, and its basic thought is：Define two image feature datas it Between distance, the image of mutual distance within the specified range is divided into a cluster, i.e., the image of " similar " is divided into one kind.

When being retrieved, a search index, calculate " minimum range " and " maximum between search condition and each cluster The minimum value of distance ", those far clusters with search condition image difference are eliminated according to the two characteristic values.Finally, only Calculate the distance between image and search condition image in the cluster not being eliminated, and return of sorting.

So, the amount of images for participating in comparing and the number calculated are considerably reduced, improves the speed of retrieval.

In the two-server of video frequency searching cluster, the key frame images of video frequency program are all represented with above-mentioned index.

Video scene retrieval server：

Here all video scene key frame clusters are deposited.Since in a video frequency program, the quantity of scene key frame Than total crucial few an order of magnitude of number of frames, and scene key frame also has very strong representativeness in itself, so, first to field Scape key frame is retrieved, and can so improve retrieval rate.

Video frequency searching server：

Here storage has all scenes and the cluster of camera lens key frame.When only retrieval scene key frame is not being met necessarily It is required that retrieval result when, all key frames are retrieved, with the result really matched.

The matching algorithm of key frame of video is a fuzzy matching algorithm.Key frame and search condition figure i.e. in database As long as the thresholding that the matching degree of picture reaches certain can be received.

4th, speech retrieval cluster 306：

When the voice to video frequency program is analyzed, the syllable graph of voice will be obtained, can be obtained by searching for syllable graph Obtain and occurred what which said in program.But the voice messaging of the program of 30 minutes or so needs syllable graphs more than 600 width Portrayed, and the search speed of syllable graph is not also high in itself, therefore, in order to ensure the retrieval rate of searching system, by voice Retrieving portion is designed as a retrieval cluster, including speech buffer storage retrieval, voice optimize retrieval and voice syllable graph retrieval three Individual retrieval server.

This three servers ensure that user can quickly retrieve the voice messaging of those " often accessed ", i.e. language Information in sound caching.When information needed is not present in speech buffer storage, retrieve voice optimised service device in content, i.e., from Preferably retrieved on a small quantity in result of voice analysis.Meanwhile the background program of retrieval server by crossed using user search those Condition carries out offline complete search to voice syllable graph, and by the renewal of obtained result into caching.So, this is just improved The retrieval rate of user afterwards.

According to the disclosed embodiment, those skilled in the art can be enabled to realize or using the present invention.It is right For those skilled in the art, the various modifications of these embodiments are it will be apparent that and the general principles that define here Other embodiment can also be applied on the basis of the scope and spirit of the present invention are not departed from.Embodiment described above is only Presently preferred embodiments of the present invention, it is not intended to limit the invention, within the spirit and principles of the invention, that is made appoints What modification, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims

A kind of 1. content based video retrieval system method, it is characterised in that including：

Video search request is received through human-computer interaction interface；

Multiple retrieval process are triggered according to video search request, including：

Triggering metadata retrieval server is retrieved to the metadata of video frequency program；

Triggering captions retrieval server is retrieved to the XML file for depositing program credits text；

Triggering video frequency searching cluster is retrieved to the characteristic of key frame of video；

Speech retrieval cluster is triggered, the voice messaging of video frequency program, including pinyin string and syllable graph are retrieved；

Each search result is integrated, then returns to user through the human-computer interaction interface.
2. the method as described in claim 1, it is characterised in that in the metadata retrieval server set metadata table, Metadata retrieval module and metadata recording module, wherein：

Metadata table includes one or more in following information：Program ID, programm name, director, performer, language, the place of production, Class1, type 2, file format, file size, length, screen width, screen height, program address, program file name, upload Time, upper set address, uploads state, if must examine, program level, examine mark, program price, program abstract, close down mark Field；

Metadata retrieval module includes a storage process that program is retrieved according to program ID, and one according to combination condition To retrieve the storage process of program；

Metadata recording module includes a storage process that specified metadata information is inserted to database table.
3. the method as described in claim 1, it is characterised in that set one to be used for storing in the captions retrieval server The database table of captions XML file, a table for being used for storage server relevant configuration information, one is read depositing for configuration information Storage process, a storage process for being used for XML retrievals, one is used for the storage process of Input of Data XML file and one XML segmented index.
4. method as claimed in claim 3, it is characterised in that：

One program id field, XML file file-name field and XML file field are set in captions XML file database table；With/ Or

Arrange parameter id field, parameter file-name field and parameter value field in server configuration information table；And/or

Keyword logical expression generation program block and search program block are set in the storing process of XML retrievals.
5. the method as described in claim 1, it is characterised in that set a video scene to retrieve clothes in video frequency searching cluster Business device and a video frequency searching server, wherein：

The database table of one storage scenarios key frame index is set in video scene retrieval server, and one is used to index record The storing process entered and a program bag for being used to retrieve scene key frame；And/or

The database table of one storage key frame of video XML file, a storage key frame rope are set in frequency retrieval server The database table drawn, the database table of a service device configuration information, a storage process for being used for typing XML file, One program bag for being used to generate index, a program bag for being used for search index table, one is used to call video scene to retrieve The remote linkage of server internal program.
6. method as claimed in claim 5, it is characterised in that:

Index id field, three index cluster lower bound VECTOR fields, three ropes are set in scene key frame index data base table Draw cluster upper bound VECTOR field, index content nested table, index key frame Total no field and index cluster ultimate range in cluster Field；

Wherein, index content nested table, including entry mark id field, the affiliated program id field of key frame, crucial frame number word Section, critical frame types field, scene start time field, scene end time field, camera lens time started field, camera lens terminate Time field, key frame time point field, three key frame characteristic information VECTOR fields.
7. method as claimed in claim 5, it is characterised in that:

Set the structure of key frame of video XML file database table identical with the structure of scene key frame index data base table, Including an index content nested table；And/or

One is set to be used to be added to a new images characteristic value in one cluster in the program bag for generating index Program bag, a program bag for being used to extend a cluster by appointed threshold value, and one be used to create being stored in of newly clustering Journey；And/or

One retrieval primary storage process is set in the program bag for search index table, and one calculates search condition image and certain The storage process of the minimum range of individual cluster, one calculates search condition image and the minimum value of the ultimate range of some cluster Storage process and one be used to judging some cluster whether effective program segment.
8. method as claimed in claim 7, it is characterised in that:

In by appointed threshold value extend one cluster program bag in set one extension primary storage process, one be used for calculate After extension cluster hypermatrix leading diagonal length storage process, one be used for calculate extension after cluster whether with it is existing other Overlapping storage process be present in cluster；Wherein, the maximum permissible value for clustering hypermatrix catercorner length is set to 2.0；And/or

A primary storage for being used to add is set to enter in the program bag being added to a new images characteristic value in one cluster Whether journey and a characteristic value for being used to judge an image belong to the storage process of some cluster.
9. method as claimed in claim 8, it is characterised in that:

The maximum permissible value of distance between search condition image and cluster is set to be set to 2.0 in retrieving in primary storage process.
10. the method as described in claim 1, it is characterised in that:

One speech buffer storage retrieval server, the optimal retrieval server of a voice and a language are set in speech retrieval cluster Sound syllable graph retrieval server.