CN103020138A

CN103020138A - Method and device for video retrieval

Info

Publication number: CN103020138A
Application number: CN2012104761657A
Authority: CN
Inventors: 宗竞
Original assignee: JIANGSU LEMAIDAO NETWORK TECHNOLOGY Co Ltd
Current assignee: JIANGSU LEMAIDAO NETWORK TECHNOLOGY Co Ltd
Priority date: 2012-11-22
Filing date: 2012-11-22
Publication date: 2013-04-03

Abstract

The invention provides a method and a device for video retrieval. The method for video retrieval comprises the following steps: acquiring texture features of compressed video data; acquiring motion features of compressed video data; and performing similarity measure according to the feature fusion of the acquired texture features of the compressed video data and the acquired motion features of the compressed video data to judge the relevance of the compressed video data. The method and the device perform video retrieval based on compressed domain features, so as to improve the processing efficiency of video retrieval.

Description

A kind of method and apparatus of video frequency searching

Technical field

The present invention relates to network technology, especially a kind of method and apparatus of video frequency searching.

Background technology

Along with the fast development of multi-media computing and improving constantly of network transmission technology, the multi-medium data sharp increase that people can have access to.Video is as a kind of media format the most complicated in the multi-medium data, rely on its diversified form of expression, abundant semantic content and easily recording mode obtained widely application and development.

Video frequency searching is exactly to find required video clips from a large amount of video datas.Automatically find required piece of video breakpoint namely to realize content-based video frequency searching according to providing the description of example or feature.

The purpose of Content based video analysis and retrieval research is by video content being carried out Computer Processing, analysis and understanding, setting up structure and index, to realize acquiring video information easily and effectively.It is according to the content of video and context relation, retrieves in extensive video data.Content-based video frequency searching comprises a lot of technology, such as the automatic indexing of the analysis (Shot Detection technology) of video structure, video data and Video clustering etc.

At present, aspect the research of content-based video retrieval technology, color except identification and Description Image, texture, outside shape and the spatial relationship, other mainly concentrate on video lens and cut apart, the extraction of feature and description (comprising: visual signature, color, texture and shape and movable information and object information etc.), the aspects such as key-frame extraction and structure analysis.

According to the difference of submitting video content to, video frequency searching generally is divided into searching lens and fragment retrieval.In general, the concept of fragment is equivalent to the concept of scene, also is to be made of the relevant cinestrip of a succession of semanteme, and different is that fragment can be the some or all of of one section full scene.The at present majority of video frequency searching research also concentrates on the searching lens.The research of fragment retrieval aspect then just begins.In fact, from user's angle analysis, they can be video segments and seldom can be single physical shots to the inquiry of video database usually.From the angle analysis of quantity of information, the video segment that is comprised of several camera lenses has more more semantic than single camera lens, and it can represent the interested event of user, and therefore the result of inquiry is also more meaningful.Whether sports, the TV station that retrieval is liked in the plot of liking such as retrieval in retrieval interested event, the film in news, the sports cast retrieves certain bar advertisement and broadcasts etc.

Existing video frequency search system when video data is processed, is based on the data of decompression more.Decompressing needs certain operation time and corresponding operand, causes the video frequency search system treatment effeciency to reduce.

Therefore, need a kind of video retrieval method based on the compression domain feature and device to improve the treatment effeciency of video frequency searching.

Summary of the invention

According to an aspect of the present invention, provide a kind of video retrieval method, comprised the steps: to obtain the textural characteristics of compressed video data; Obtain the motion feature of compressed video data; Carry out measuring similarity to judge the correlativity of described compressed video data with the Fusion Features according to the motion feature of the textural characteristics of the compressed video data that obtains and the compressed video data that obtains.

Wherein, described compressed video data is the compressing video frequency flow according to Moving Picture Experts Group-2.

Wherein, the described step of obtaining the textural characteristics of compressed video data comprises the key frame that extracts described compressed video and the textural characteristics that obtains above-mentioned key frame.

Wherein, extract motion vector field and DCT residual error coefficient matrix during the described motion feature that obtains compressed video comprises from compressed video data and carry out obtaining camera motion parameter and reliable background macro block based on the global motion analysis of four parameter model, Simultaneous Motion Compensation is to obtain the absolute motion vector of each macro block.

According to a further aspect in the invention, provide a kind of video frequency searching device, having comprised: the First Characteristic acquisition module, for the textural characteristics that obtains compressed video data; The Second Characteristic acquisition module obtains the motion feature of compressed video data; And judge module, the Fusion Features that is used for the motion feature of the textural characteristics of the compressed video data that obtains according to the First Characteristic acquisition module and the compressed video data that the Second Characteristic acquisition module obtains carries out measuring similarity to judge the correlativity of described compressed video data.

Carry out video frequency searching according to video retrieval method of the present invention and device based on the compression domain feature, can improve the treatment effeciency of video frequency searching.

Description of drawings

Fig. 1 is the process flow diagram according to the video retrieval method of embodiment of the present invention; With

Fig. 2 is the process flow diagram according to the method for mobile payment of embodiment of the present invention.

Embodiment

The below is described in detail the preferred embodiment to the method and apparatus of video frequency searching of the present invention with reference to the accompanying drawings; it should be noted that; following description only is schematic; wherein related content does not consist of inventing the restriction of related content; those skilled in the art can also have many different variation patterns on the basis of disclosure below, and these all belong to protection scope of the present invention.

Consider the treatment effeciency problem that exists in the existing video retrieval technology, embodiments of the present invention provide a kind of content-based video frequency searching solution, at first directly extract textural characteristics and the motion feature of compressed video, and these content characteristic values are analyzed, then introduce the theoretical realization character of fuzzy judgment and merge and carry out video frequency searching.

When being carried out digitizing, video image will generate a large amount of numerical information, for example the digital image of a frame 720X576 dot matrix, 16 looks takies the storage space of 1.35MB, so the shared bandwidth of complete movable (per second 25 frames) image will reach per second 33.75MB.At this rate, CD-R disc commonly used can only be stored 16 seconds this active images.Therefore, in order to enter practical application, must need vision signal is compressed.

Video compress is divided into Lossless Compression and lossy compression method.

When Lossless Compression refers to the playback compressed file, can recover like clockwork raw data.This is usually used in the compression of data file, for example the ZIP file.The algorithm that Lossless Compression is commonly used is Huffman method and variable Run-Length Coding.The probability that Huffman statistics code word occurs according to shorter this cryptoprinciple of coding figure place of the high code assignment of frequency, reduces average word length, reaches the effect of packed data.This compression algorithm needs the probability that color value occurs in the statistical picture in advance, and the every width of cloth image of encoding scheme is not identical, and code efficiency is not high.Variable Run-Length Coding uses a pair of parameter, and color and length replace the same color value of a succession of Coutinuous store, thereby reduces the shared storage space of same color.Very useful during this compression algorithm compression black and white picture, but during to the Color Image Compression of activity and impracticable, it is subjected to the impact of visual complexity too large, causes compressibility excessively low, is difficult to surpass 3:1.

Lossy Compression Algorithm reduces the shared space of digital image by losing bulk redundancy information, can not intactly recover raw image during playback, and will lose selectively some details, loses how much information and is determined by the how high compressibility of needs.To the same compression algorithm, required compressibility is higher, and the picture information of loss is more.The general algorithm that adopts is transition coding+motion detection.Now general transition coding has the DCT(discrete cosine transform) and wavelet transformation, motion detection adopts the block search algorithm.Also have some other encryption algorithm: object coding, based on the coding of model, fractal image etc.Now used MPEG, H.263 wait the compression standard, all be based on the method for transition coding+motion detection, all belong to and diminish algorithm.

MPEG series is to use at present maximum compression standards.MPEG (Moving Picture Expert Group) is by ISO (International Standards Organization) (International Organization for Standardization in 1988, ISO) and (the International Electrotechnical Commission of International Electrotechnical Commission, IEC) unite the expert group of establishment, be responsible for the standard such as synchronous of the coding, decoding of exploitation television image data and voice data and they.

Moving Picture Experts Group-2 is direct high quality graphic and the sound coding standard relevant with digital television broadcasting in the MPEG series standard.MPEG-2 is the expansion of MPEG-1, because its basic coding algorithm with MPEG-1 is all identical, but MPEG-2 has increased the unexistent function of many MPEG-1, and for example the degree of accuracy of motion vector is brought up to half-pixel; Owing to having special vector in the key frame, having expanded wrong redundancy; But choice accuracy in the discrete cosine transform; The advanced prediction pattern; Quality retractility (image of tolerable different quality in same video flowing); Support VBR, variable performance (scalability) function of bit rate is provided; Increased the coding of interlaced scan tv.

The system model standard of MPEG-2 mainly is the combination that defines television image data, voice data and other data, these data is combined into one or more is suitable for the elementary stream storing or transmit.Data stream has two kinds of forms, a kind of program data stream (Program Stream, PS) that is called, and another kind is called transmitting data stream (Transport Stream, TS).Program data stream is that making up one or more normalized is packetised elementary streams (Packetized Elementary Streams, PES) a kind of data stream that generates, be used in and occur being fit to the application of using software to process under the wrong less environment that compares; Transmitting data stream also is one or more PES of combination and a kind of data stream of generating, and it is used in and occurs under wrong relatively many environment, for example in loss or noisy transmission system are arranged.

Further specifying video retrieval method according to the present invention below in conjunction with the application scenario of MPEG-2 video flowing namely installs.

Fig. 1 is the process flow diagram according to the video retrieval method of embodiment of the present invention.

As shown in Figure 1, obtain the textural characteristics of compressed video data at step S101 according to the video retrieval method of embodiment of the present invention.

In order to obtain the textural characteristics of video data, at first to extract key frame of video in compression domain.Video data is unordered, non-structured.The key-frame extraction technology can realize to this unstructured data effectively organize, management, index and inquiry.Traditional key-frame extraction technology is carried out in pixel domain, does not satisfy needs of the present invention.And compression domain key-frame extraction technology has that processing speed is fast, resource occupation is few, and the time efficiency high has become the one preferred technique of video structural process.

The key-frame extraction technology of some compression domain has been proposed in the prior art.For example at first the mpeg compressed video file part is decoded, the readout code stream information extracts the brightness DC coefficient of I frame as image feature vector, represents the similarity of the proper vector of adjacent I interframe with Euclidean distance, by the differentiation of adaptive threshold in the algorithm, obtain key frame again.

Subsequently, obtain the textural characteristics of above-mentioned key frame.The texture of piece image is the characteristics of image through quantizing in image calculation.Describing texture of image image or wherein spatial color distribution and the light distribution of pocket.The extraction of textural characteristics is divided into based on the method for structure with based on the method for statistics.Texture characteristic extracting method based on structure is that the texture that will will detect carries out modeling, the pattern that search repeats in image.Existing texture characteristic extracting method comprises LBP method (Local binary patterns) and gray level co-occurrence matrixes method.

The step that the LBP method is extracted the LBP proper vector is as follows: at first detection window is divided into 16 * 16 zonule (cell), for a pixel among each cell, (also can be a plurality of points of annular neighborhood with 8 points in its annular neighborhood, shown in three neighborhood examples that Fig. 3-4. uses the LBP algorithm) carry out clockwise or counterclockwise comparison, if center pixel value is larger than this adjoint point, be 1 with the adjoint point assignment then, otherwise assignment is 0, and each point can obtain 8 bits (usually being converted to decimal number) like this.Then calculate the histogram of each cell, be the frequency that occurs of each numeral (supposition is decimal number) (namely one about each pixel whether in the neighbour territory a large binary sequence of point add up), then this histogram is carried out normalized.The statistic histogram of each cell that will obtain at last connects, and has just obtained the LBP textural characteristics of view picture figure, then just can utilize SVM or other machines learning algorithm to classify.

Gray level co-occurrence matrixes is another kind of texture characteristic extracting method, at first for a piece image definition direction (orientation) and the step-length (step) take pixel as unit, gray level co-occurrence matrixes T(N * N), then defining M (i, j) is that the pixel of i and j appears at a point and simultaneously along the frequency on the point of defined direction span step-length for gray level.Wherein N is that gray level is divided number.Because co-occurrence matrix has the combination definition of direction and step-length, and determine that a factor of frequency is to the contributive number of pixels of matrix, and this number lacks than number altogether, and reduce along with the increase of step-length.Therefore resulting co-occurrence matrix is a sparse matrix, usually reduces to 8 grades so gray level is divided N.As calculating in the horizontal direction the co-occurrence matrix of pixel on the left and right directions, it then is symmetrical co-occurrence matrix.Similarly, if only consider pixel on the current pixel one direction (left or right), then be called asymmetric co-occurrence matrix.

As shown in Figure 1, obtain the motion feature of compressed video data at step S102 according to the video retrieval method of embodiment of the present invention.The extraction of Moving Objects is a pith that carries out video analysis.Traditional extracting method is to finish in pixel domain, and using it for video with compressed format storage then needs to expend a large amount of time and carry out first the decoding of compressed bit stream.For raising speed, by the specificity analysis to the MPEG code stream, existed in the prior art and directly in compression domain, carried out moving object extract.For example, can at first from compressed bit stream, extract motion vector field and DCT residual error coefficient matrix.Secondly, carry out the global motion analysis based on four parameter model, obtain camera motion parameter and reliable background macro block, Simultaneous Motion Compensation is to obtain the absolute motion vector of each macro block.Then, employing obtains existing the candidate region of Moving Objects based on the motion detection of Fourth-order moment, then estimate according to the motion amplitude of motion relevance characteristics defmacro interblock and the correlativity of angle, candidate region after scanning motion detects, each macro block that satisfies threshold condition is carried out cluster analysis and in conjunction with residual error DCT coefficient correction cluster result, to finish cutting apart of moving target.Adopt at last certain post-processing technology with further raising segmentation precision.

At step S103, the Fusion Features of the textural characteristics of the compressed video data that obtains according to step S101 according to the video retrieval method of embodiment of the present invention and the motion feature of the compressed video data that step S102 obtains carries out measuring similarity to judge the correlativity of described compressed video data.

Along with the research and development of image fusion technology, Fusion Features obviously gets up gradually in the advantage aspect the image similarity tolerance.The single image feature reflects respectively the attribute of image from different perspectives, and Fusion Features can utilize the useful information of many features, eliminates to a certain extent again the interference of subjective and objective factor, is a very valuable scheme.Can the weight of each feature be arranged, thereby reach satisfied retrieval effectiveness.

Fig. 2 is the block diagram of video frequency searching device of the present invention.As shown in Figure 2, the video frequency searching device comprises: First Characteristic acquisition module 201, for the textural characteristics that obtains compressed video data; Second Characteristic acquisition module 202 is for the motion feature feature of obtaining compressed video data; Judge module 203, the Fusion Features that is used for the motion feature of the textural characteristics of the compressed video data that obtains according to First Characteristic acquisition module 201 and the compressed video data that Second Characteristic acquisition module 202 obtains carries out measuring similarity to judge the correlativity of described compressed video data.

In sum, utilize user's biological characteristic to carry out safety certification according to the mobile-payment system of embodiment of the present invention, effectively avoided the unauthorized personnel to utilizing portable terminal to carry out payment transaction, thereby mentioned widely the safe reliability of mobile payment.

Claims

1. a video retrieval method is characterized in that, comprises the steps:

Obtain the textural characteristics (101) of compressed video data;

Obtain the motion feature (S102) of compressed video data; With

Carry out the correlativity (S204) of measuring similarity to judge described compressed video data according to the textural characteristics of the compressed video data that obtains and the Fusion Features of the motion feature of the compressed video data that obtains.

2. video retrieval method as claimed in claim 1, wherein, described compressed video data is the compressing video frequency flow according to Moving Picture Experts Group-2.

3. video retrieval method as claimed in claim 1 or 2, wherein, the described step of obtaining the textural characteristics of compressed video data comprises the key frame that extracts described compressed video and the textural characteristics that obtains above-mentioned key frame.

4. video retrieval method as claimed in claim 1 or 2, wherein, extract motion vector field and DCT residual error coefficient matrix during the described motion feature that obtains compressed video comprises from compressed video data and carry out obtaining camera motion parameter and reliable background macro block based on the global motion analysis of four parameter model, Simultaneous Motion Compensation is to obtain the absolute motion vector of each macro block.

5. a video frequency searching device is characterized in that, comprising:

First Characteristic acquisition module (201) is for the textural characteristics that obtains compressed video data;

Second Characteristic acquisition module (202) obtains the motion feature of compressed video data; With

Judge module (203), the Fusion Features that is used for the motion feature of the textural characteristics of the compressed video data that obtains according to First Characteristic acquisition module (201) and the compressed video data that Second Characteristic acquisition module (202) obtains carries out the correlativity (S204) of measuring similarity to judge described compressed video data.