CN101620629A

CN101620629A - Method and device for extracting video index and video downloading system

Info

Publication number: CN101620629A
Application number: CN200910147306A
Authority: CN
Inventors: 王婷
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2009-06-09
Filing date: 2009-06-09
Publication date: 2010-01-06
Also published as: WO2010142089A1

Abstract

The invention discloses a method and a device for extracting video indexes and a video downloading system. The method for extracting video indexes comprises the following steps: checking lenses of video files; analyzing the checked content of each lens; extracting a video key frame of each lens; positioning the edge of a particular text of the video files; recognizing the positioned text; extracting the recognized text as a key text; and using the key text and the video key frame of each lens as video indexes of the video files. The method for extracting video indexes can supply overall video index information for cell phone subscribers, effectively help the subscribers to select download videos and accurately position videos to be downloaded to avoid downloading unneeded videos, thereby the downloading of the videos more fits for the requirements of subscribers.

Description

A kind of method, device and video downloading system that extracts video index

Technical field

The present invention relates to the video search technical field, be meant a kind of method, device and video downloading system that extracts video index especially.

Background technology

Generally speaking, video file is compared with other message files, and its capacity is bigger, and the algorithm more complicated is not easy to analyze or index.Yet along with Internet video is increasing, social information's amount is exploded, and obtains the information of video effectively, helps the user to navigate to own interested content in the magnanimity video and seems increasingly important.At present, the video index technology mainly applies on the PC, and in mobile video, this technology is not found broad application.

Compare with PC, mobile phone is subjected to the restriction of factors such as bandwidth and memory capacity, and transmission speed is slow, shelf space is little, the calculation process ability.The user is with one section video of mobile phone-downloaded the time, and is usually more careful.Therefore, effectively use the video index technology on mobile phone, accurate in locating seems more important to the user's interest video.

In order to realize above-mentioned target, the someone has proposed a kind of mobile video index and searching method.This method is the elementary cell camera lens with Video Segmentation at first, then the camera lens switch frame is extracted, and offers the cellphone subscriber.This method offers the useful video information of cellphone subscriber to a certain extent, helps the user to select the video file of downloading.Yet the video information that this method provides is comprehensive inadequately, and the first frame of camera lens is difficult to comprehensively reflect the summary of a camera lens.

Summary of the invention

The invention provides a kind of method, device and video downloading system that extracts video index, in order to more fully video index information to be provided to the user.

A kind of method of extracting video index that the embodiment of the invention provides comprises:

Video file is carried out Shot Detection, and each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens; And

The particular text of video file is carried out the location, edge, the text that the location obtains is discerned, with in the text that identifies as crucial text;

With the key frame of video of described crucial text and described each camera lens video index as this video file of search.

A kind of device that extracts video index that the embodiment of the invention provides, described video index comprise the crucial text of corresponding video file and the key frame of video of each camera lens, and this device comprises:

The Shot Detection unit is used for video file is carried out Shot Detection;

The key frame of video extraction unit, each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens;

The edge positioning unit is used for the particular text of video file is carried out the location, edge;

Crucial text extraction unit is discerned the text that the location obtains, with the text that identifies as crucial text.

A kind of video downloading system that the embodiment of the invention provides comprises:

The base station is used in the future that the crucial text of the search video of self terminal and the request of key frame of video send to the video download services device, and will send to described terminal from the video index of video Download Server;

The video download services device is used for according to described request, sends searching request to operation key frame of video and critical file search unit, and will send to described base station from the video index of described key frame of video and the acquisition of critical file search unit;

Key frame of video and critical file search unit are used for video file is detected, and each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens; And the particular text of video file carried out the location, edge, the text that the location obtains is discerned, with the text that identifies as crucial text; The key frame of video of described crucial text and described each camera lens is returned to the video download services device as the video index of this video file of terminal searching.

In the embodiment of the invention, in extracting the video index process, because key frame and the crucial text that extracts offered the user, therefore can provide more fully video index information to the cellphone subscriber, make the cellphone subscriber can get access to the content information of video, effectively help the user to carry out the selection of foradownloaded video, make its accurate in locating to the video of wanting to download, avoid downloading to unwanted video, thereby make fit more user's demand of the download of video.

Description of drawings

Fig. 1 extracts the schematic flow sheet of video index for the embodiment of the invention;

Fig. 2 is a video-frequency band consecutive frame interframe distance distribution histogram;

Fig. 3 is the two field picture in one section advertisement video;

Fig. 4 is for carrying out figure as a result after the rim detection to Fig. 3 with Suo Beier (Sobel) edge detection operator;

Fig. 5 is for carrying out the figure as a result of horizontal projection to Fig. 3;

Fig. 6 carries out the figure as a result of specific text area location to Fig. 3 for the improvement algorithm of locating with the text based on the edge of the embodiment of the invention;

Fig. 7 is the apparatus structure synoptic diagram of the extraction video index of the embodiment of the invention;

Fig. 8 is the system architecture synoptic diagram of the embodiment of the invention;

Fig. 9 is for using the embodiment of the invention, and the cellphone subscriber gets access to the crucial text and the key frame synoptic diagram of this section video when choosing a certain video file.

Embodiment

Because when extracting the mobile video index in the prior art, it is comprehensive inadequately to extract video information, the first frame that only extracts camera lens is difficult to reflect the summary of a camera lens comprehensively, is necessary further to extract the key frame of camera lens.In addition, except that image information, the text in the video has usually also comprised succinct, eye-catching text in important information, the especially video of video, and these texts normally theme of one section video are concluded, or core content embodies.Therefore,, and they are offered the user, usually just the core content of video has been offered the user if can effectively locate and extract the crucial text of video.This video index for the user has played crucial effects undoubtedly, makes the mobile video index have higher accuracy.

The basic physical location of video is a camera lens.A camera lens is taken the some two field pictures continuous in time that obtain continuously by a video camera and is formed.One section video is combined by various conversion regimes by many camera lenses to be formed.The conversion regime of camera lens mainly contains two kinds: sudden change and gradual change.Sudden change is exactly from direct second camera lens of incision of a camera lens, the centre without any edit effect.Gradual change then is to have added certain editor's gimmick between the conversion of camera lens.

In the embodiment of the invention, video file is detected each camera lens that obtains carry out content analysis, extract the key frame of video of each camera lens; And the particular text of video file carried out the location, edge, and the text that the location obtains is discerned, extract crucial text; With the key frame of video of described crucial text and described each camera lens video index as this video file of terminal searching.

Referring to shown in Figure 1, the detailed process that the embodiment of the invention is extracted video index is as follows:

Step 101: video file is suddenlyd change and the detection of gradual change camera lens.

Here can utilize dual comparison algorithm that video is suddenlyd change and the detection of gradual change camera lens, concrete steps are as follows:

Obtain each camera lens the neighboring candidate frame frame pitch from; Lens mutation frame and gradual change start frame and the gradual change abort frame of frame pitch in determining this video file according to the neighboring candidate frame of each camera lens, and, with a lens mutation frame as one the sudden change camera lens, with between adjacent gradual change start frame and gradual change abort frame and gradual change start frame and the gradual change abort frame as a gradual change camera lens.

Based on the pixel comparison, compare, adopt dual comparison algorithm can better detect the gradual change camera lens with commonly used based on histogram and based on the Shot Detection algorithm at edge.

The frame pitch of the neighboring candidate frame of each camera lens is from carrying out the calculating of distance between consecutive frame with the histogram relative method.The consecutive frame interframe distance distribution histogram that is one section video shown in Figure 2, such as: can realize in the following manner:

(1) each candidate frame color of pixel is divided into several grades,, obtains the statistic histogram of each two field picture at each grade statistical pixel number;

(2) utilize the relatively distance of neighboring candidate interframe statistic histogram of following formula, obtain according to the distance of described neighboring candidate interframe statistic histogram each camera lens the neighboring candidate frame frame pitch from,

D (i, i + 1) = \frac{1}{2 M} Σ_{k = 1}^{N} | h_{i} (k) - h_{i + 1} (k) |

Wherein, (i i+1) is the distance of neighboring candidate interframe statistic histogram to D, and M is a total number of image pixels; N is a natural number, the number of levels that the expression color is divided into; h _i(k) be the statistical pixel number of i frame k level; h _I+1(k) h _i(k) be the statistical pixel number of i+1 frame k level.

Here, can detect sudden change camera lens and gradual change camera lens with two thresholding Th and Tl, wherein, Th＞Tl, Th are used for detecting the sudden change frame, and Tl is used to detect possible gradual change start frame.

Lens mutation frame in this video file and gradual change start frame and gradual change abort frame can be determined in the following way:

(1) does between consecutive frame apart from histogram;

(2) if the interframe distance D (i, i+1)＞Th, and bigger with part second largest frame pitch deviation value, then the i two field picture is the lens mutation frame; If Tl＜D (i, i+1)＜Th, then with the i two field picture as possible gradual change start frame;

Here, find out possible gradual change start frame, gradual change is different with sudden change, and sudden change shows as the unexpected jump of camera lens, therefore surpass certain threshold value apart from the gap between maximal value and the local second largest interframe distance value between local frame, filter the sudden change camera lens in this way earlier.Such as: in the certain hour section, D (1,2) is maximum frame pitch from, D (3,4) be second largest frame pitch when, D (1,2) and D (3,4) differ by more than certain threshold value, can assert that the 1st frame is the frame that suddenlys change;

(3) for possible gradual change start frame, from the i frame, the interframe distance D of cumulative neighboring frame (j, j+1);

(4) when the interframe distance D of cumulative neighboring frame (j, j+1)＜during Tl, stop frame pitch from accumulation, and D (j, j+1) compare with Th, if greater than Th, then with the abort frame of j frame as the gradual change camera lens, otherwise, think the gradual change that is not camera lens, abolish this possible gradual change start frame;

Wherein, the sudden change frame threshold value of Th for being provided with, Tl is possible gradual change start frame threshold value, Th＞Tl; I is a frame number, D (i, the i+1) distance of expression i frame and i+1 interframe, D (j, j+1) the expression frame pitch of accumulating j and j+1 interframe from.

Utilize above-mentioned dual comparison algorithm, can detect 4 sudden changes of existence camera lenses among Fig. 2,7 gradual change camera lenses.

Step 102: each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens.

Specifically can realize like this:

(1) from each camera lens that obtains, extract above candidate's key frame,

(2) according to a described content analysis that above candidate's key frame carries out, select to obtain the key frame of video of each camera lens.

After extracting camera lens, need carry out the extraction of key frame to camera lens.Say that in principle key frame should be able to provide comprehensive summary of a camera lens, should be able to provide a content abundant as far as possible summary in other words.According to information-theoretical viewpoint, the two field picture of different (or correlativity is less) carries more information than similar two field picture.The key frame that the present invention extracts is the main information that is used to offer this camera lens of user.Therefore, the criterion that is used for key-frame extraction mainly is the dissimilarity of considering between them.

In theory, because camera lens is made up of the very high two field picture of correlativity on continuous in time, the content, therefore chooses several frames of wherein least being correlated with and just can comprise maximum information as the camera lens key frame.

Certainly, can the chosen content correlativity from described above candidate's key frame low some frames are as the key frame of video of each camera lens.Such as: if the candidate's key frame in each camera lens comprises: first frame f ₁, intermediate frame f _N/2And tail frame f _N, then can calculate the distance between per two candidate's key frames, i.e. D (f in each camera lens ₁, f _N/2), D (f ₁, f _N), D (f _N/2, f _N), and the distance of per two candidate's key frames compared with preset threshold, if the distance of per two candidate's key frames is all little than preset threshold,, otherwise get the key frame of video of two maximum two field pictures of distance as corresponding camera lens with the key frame of video of intermediate frame as corresponding camera lens.

This algorithm is fairly simple, therefore has computing velocity faster, and extract two key frames with less correlativity at the most in a camera lens, offers the user.With transmission speed faster PC compare, this algorithm has satisfied the relatively slow cellphone subscriber's of transmission speed needs more.

Step 103: the particular text to video file carries out location, edge, the just specific text area in the positioning video.

Allow to occur the text of different size in the video.But the text that can embody the video key message cause that people's the common text number of words of attention can be not too many, and size is bigger for eye-catching.For example: the title of TV programme, the title of advertised product etc.Be illustrated in figure 3 as the two field picture in one section advertisement video, the title of advertised product promptly is a particular text of the present invention, has reflected the key message of video.The present invention requires to orient these specific text area.

Specifically can realize like this:

The present invention has adopted a kind of improvement algorithm of locating based on the text at edge, and the particular text in the video is positioned.Algorithm of the present invention divides following step to carry out:

(1) carries out edge extracting with the Sobel edge detection operator, promptly utilize the Sobel edge detection operator to extract the marginal portion of particular text.As shown in Figure 4.

(2) edge that extracts is carried out horizontal projection, the ordinate value of establishing projection is P (i), and P (i) is the edge pixel number of horizontal line.As shown in Figure 5.

(3) set line of text pixel threshold Tx, as ordinate value P (i).Extract the image line of P (i) 〉=Tx, smaller or equal to the image line merging of the pixel column of setting number, extract at interval.Such as: pixel behavior 3 or any natural number of setting number can.Such as setting number is 3 o'clock, and even capable and i3 to the i4 row image of i1 to i2 all satisfies P (i) 〉=Tx, and i1＜i2＜i3＜i4 is if i3-i2≤3 then all extract the capable image of i1 to i4.

(4) add up the capable capable pixel wide of consecutive image that extracts, if less than the T that sets _Nx, then abolish this image line, wherein, T _NxBe row pixel wide threshold value.

(5) to greater than T _NxImage line, the edge pixel that extracts is carried out vertical projection, the ordinate value of establishing projection is P (j), is the edge pixel number of j row.

(6) set text column pixel threshold T _y, extract P (j) 〉=T _yImage column, in like manner, the image column smaller or equal to 3 pixel columns is at interval merged, extract.

(7) if the row pixel wide of the consecutive image row that extract of statistics is less than the row pixel wide threshold value T that sets _Ny, then abolish this subimage block.

(8) the wide ratio of the wide and capable pixel of row pixel of the subimage block of above-mentioned condition is satisfied in calculating, if greater than the text column pixel wide of setting and the first threshold of line of text pixel wide ratio, promptly greater than T _Ny/nx, less than text column pixel wide of setting and second threshold value of line of text pixel wide ratio, promptly less than T _{' ny/nx}' first threshold is less than second threshold value, promptly first threshold is a lower limit, second threshold value is a higher limit, then with this subimage block as specific text area.Be illustrated in figure 6 as Fig. 3 is carried out result behind the specific text area location.

After navigating to specific text area with above-mentioned algorithm, discern with text recognition system, and the crucial text that will recognize extracts.For the text that repeats, only get one of them as the crucial text of video.

Step 104: the text that the location obtains is discerned, extracted crucial text.

Step 105: the key frame of video of described crucial text and described each camera lens is sent to terminal as the video index of this video file, browse for the user.In order to prevent semantic obscuring, the crucial text that extracts from different frame of video separately shows with space character.

Referring to shown in Figure 7, a kind of device that extracts video index of the embodiment of the invention, video index comprises the crucial text of corresponding video file and the key frame of video of each camera lens, and this device comprises: Shot Detection unit 71, key frame of video extraction unit 72, edge positioning unit 73 and crucial text extraction unit 74.

Shot Detection unit 71 is used for video file is detected;

Key frame of video extraction unit 72, each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens;

Edge positioning unit 73 is used for the particular text of video file is carried out the location, edge;

Crucial text extraction unit 74 is discerned the text that the location obtains, and extracts crucial text.

Described Shot Detection unit 71 comprises:

The interframe distance acquiring unit, be used to obtain each camera lens the neighboring candidate frame frame pitch from;

Judging unit, be used for frame pitch according to the neighboring candidate frame of each camera lens from lens mutation frame and gradual change start frame and the gradual change abort frame of determining this video file, and, with a lens mutation frame as one the sudden change camera lens, with between adjacent gradual change start frame and the gradual change abort frame as a gradual change camera lens.

Described judging unit is used for that (i i+1)＞Th, and when big, is the lens mutation frame with the i two field picture with the second largest value difference value in part then when the interframe distance D; If Tl＜D (i, i+1)＜Th, then with the i two field picture as possible gradual change start frame; For possible gradual change start frame, from the i frame, the interframe distance D of cumulative neighboring frame (j, j+1); When the interframe distance D of cumulative neighboring frame (j, j+1)＜during Tl, with D (j, j+1) with Th relatively, if greater than Th, then with the abort frame of j frame as the gradual change camera lens, otherwise, abolish this possible gradual change start frame;

Described interframe distance acquiring unit is used for:

Each candidate frame color of pixel is divided into several grades,, obtains the statistic histogram of each two field picture at each grade statistical pixel number;

Utilize the relatively distance of neighboring candidate interframe statistic histogram of following formula, obtain according to the distance of described neighboring candidate interframe statistic histogram each camera lens the neighboring candidate frame frame pitch from,

D (i, i + 1) = \frac{1}{2 M} Σ_{k = 1}^{N} | h_{i} (k) - h_{i + 1} (k) |

Wherein, M is a total number of image pixels; N is a natural number, the number of levels that the expression color is divided into; h _i(k) be the statistical pixel number of i frame k level;

It is the statistical pixel number of i+1 frame k level.

Described key frame of video extraction unit 72 comprises:

The candidate frame acquiring unit is used for extracting above candidate's key frame from each camera lens that obtains,

The key frame acquiring unit is used for selecting to obtain the key frame of video of each camera lens according to a described content analysis that above candidate's key frame carries out.

Described above candidate's key frame comprises: first frame, intermediate frame and tail frame, and then described key frame acquiring unit comprises:

Comparing unit is used for calculating the distance between per two the candidate's key frames of each camera lens, and the distance of per two candidate's key frames is compared with preset threshold,

Determining unit if it is all little than preset threshold to be used for the distance of per two candidate's key frames, with the key frame of video of intermediate frame as corresponding camera lens, otherwise is got the key frame of video of two maximum two field pictures of distance as corresponding camera lens.

Described edge positioning unit 73 comprises:

The edge extracting unit utilizes the Sobel edge detection operator to carry out edge extracting;

First processing unit is used for the edge pixel that extracts is carried out horizontal projection, and the abscissa value of this projection is P (i), extracts the image line of P (i) 〉=Tx, and the image line of interval smaller or equal to the pixel column of setting number merged; The capable capable pixel wide of consecutive image that statistics extracts is if described capable pixel wide is then abolished this image line less than the capable pixel wide threshold value Tnx that sets; This sets number is any natural number, such as being 3.

Second processing unit, be used for the image line of described capable pixel wide greater than Tnx, the edge pixel that extracts is carried out vertical projection, if the ordinate value of projection is P (j), extract the image column of P (j) 〉=Ty, and will extract at interval smaller or equal to the image column merging of the pixel column of setting number; This sets number is any natural number, such as being 3.

The location determining unit, the row pixel wide that is used to add up the consecutive image row that extract is if described row pixel wide, is then abolished the corresponding subimage block of these consecutive image row less than Tny; Otherwise the wide ratio of the wide and capable pixel of the row pixel of calculating corresponding subimage block, if greater than Tny/nx, less than Tny/nx, then with this subimage block as specific text area.

Referring to shown in Figure 8, a kind of download system of the embodiment of the invention comprises: base station 81, video download services device 82 and key frame of video and critical file search unit 83.

Base station 81 is used in the future that the crucial text of the search video of self terminal and the request of key frame of video send to the video download services device, and will send to described terminal from the video index of video Download Server;

Video download services device 82 is used for according to described request, sends searching request to operation key frame of video and critical file search unit, and will send to described base station from the video index of described key frame of video and the acquisition of critical file search unit;

Key frame of video and critical file search unit 83 after being used to receive searching request, detect video file, and each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens; And the particular text of video file carried out the location, edge, the text that the location obtains is discerned, with the text that identifies as crucial text; The key frame of video of described crucial text and described each camera lens is returned to the video download services device as the video index of this video file of terminal searching.

Described key frame of video and critical file search unit 83 are used to utilize dual comparison algorithm that video file is suddenlyd change and the detection of gradual change camera lens.

Key frame of video and critical file search unit can be used as an independently physical entity realization, also can be used as logic entity, such as: be that a key frame of video and crucial text search program are installed in the video download services device 82.At this moment, video download services device 82 sends the process that searching request obtains video index to operation key frame of video and critical file search unit, just is presented as the process of this search utility of operation.

Use the download system of the invention described above embodiment, the cellphone subscriber is when choosing a certain video file, promptly send the key frame of this section of search video and the request of crucial text to the video download services device, after the video download services device is received request, operation key frame of video and crucial text search program, and the result that will search for is sent to and offers the user on the mobile phone.The user just can get access to the crucial text and the key frame of this section video, as shown in Figure 9.Browse by crucial text and key frame to video, the user can get access to the key message of this section video, and then whether decision will download them.Like this, just avoided downloading to unwanted video file greatly, made fit more user's demand of the video file that downloads to.

Key frame of video and crucial text search program.Key frame of video and crucial text search program are moved in the video download services device, after the user chooses certain section video, can send the key frame of this section of search video and the request of crucial text to the video download services device, after server is received request, promptly move this program, and the result that will search for is sent on the mobile phone, offers the user.

Receive that video index that the terminal of video index shows has comprised the important information of corresponding video, offer the more more rich video of user index, after this, the user can send download request by the base station to the video download services device according to the video index that terminal shows, downloads and obtains corresponding video file.

In the embodiment of the invention, video file to user's selection, not only extract the key frame of camera lens, the image of reflecting video important information is offered the user, and extract the crucial text that can reflect its theme and core, further offer the more video information of user, make the cellphone subscriber can get access to the content information of video, effectively help the user to carry out the selection of foradownloaded video, make its accurate in locating to the video of wanting to download, avoid downloading to unwanted video, thereby make fit more user's demand of the download of video.Little based on the mobile phone shelf space, transmission speed is slow, the characteristics that arithmetic capability is weak, the embodiment of the invention extract video index this technology particularly useful.

Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims

1, a kind of method of extracting video index is characterized in that, this method may further comprise the steps:

The particular text of video file is carried out the location, edge, the text that the location obtains is discerned, with the text that identifies as crucial text;

With the key frame of video of described crucial text and described each camera lens video index as this video file.

2, method according to claim 1 is characterized in that, the described Shot Detection of carrying out comprises:

Obtain each camera lens the neighboring candidate frame frame pitch from;

Lens mutation frame and gradual change start frame and the gradual change abort frame of frame pitch in determining this video file according to the neighboring candidate frame of each camera lens, and, with a lens mutation frame as one the sudden change camera lens, with between adjacent gradual change start frame and gradual change abort frame and gradual change start frame and the gradual change abort frame as a gradual change camera lens.

3, method according to claim 2 is characterized in that, lens mutation frame and gradual change start frame and the gradual change abort frame of frame pitch in determining this video file according to the neighboring candidate frame of each camera lens comprises:

In the certain hour section, if the interframe distance D (i, i+1)＞Th, and this frame pitch from second largest frame pitch from difference bigger, then with the i two field picture as the lens mutation frame; If Tl＜D (i, i+1)＜Th, then with the i two field picture as possible gradual change start frame;

For possible gradual change start frame, from the i frame, the interframe distance D of cumulative neighboring frame (j, j+1);

When the interframe distance D of cumulative neighboring frame (j, j+1)＜during Tl, with D (j, j+1) with Th relatively, if greater than Th, then with the abort frame of j frame as the gradual change camera lens, otherwise, abolish this possible gradual change start frame;

4, method according to claim 2 is characterized in that, obtain each camera lens the neighboring candidate frame frame pitch from, comprising:

Utilize following formula to obtain the distance of neighboring candidate interframe statistic histogram, obtain according to the distance of described neighboring candidate interframe statistic histogram each camera lens the neighboring candidate frame frame pitch from,

D (i, i + 1) = \frac{1}{2 M} Σ_{k = 1}^{N} | h_{i} (k) - h_{i + 1} (k) |

Wherein, (i i+1) is the distance of neighboring candidate interframe statistic histogram to D, and M is a total number of image pixels; N is a natural number, the number of levels that the expression color is divided into; h _i(k) be the statistical pixel number of i frame k level;

It is the statistical pixel number of i+1 frame k level.

5, method according to claim 1 is characterized in that, each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens, comprising:

From each camera lens that obtains, extract above candidate's key frame,

According to a described content analysis that above candidate's key frame carries out, select to obtain the key frame of video of each camera lens.

6, method according to claim 5 is characterized in that, some frames that the chosen content correlativity is low from described above candidate's key frame are as the key frame of video of each camera lens.

7, method according to claim 6, it is characterized in that, described above candidate's key frame comprises: first frame, intermediate frame and tail frame, and then some frames that the chosen content correlativity is low from described above candidate's key frame comprise as the key frame of video of each camera lens:

Calculate the distance between per two candidate's key frames in each camera lens, and the distance of per two candidate's key frames compared with preset threshold,

If the distance of per two candidate's key frames is all little than preset threshold,, otherwise get the key frame of video of two maximum two field pictures of distance as corresponding camera lens with the key frame of video of intermediate frame as corresponding camera lens.

8, method according to claim 1 is characterized in that, the particular text of video file is carried out the location, edge, comprising:

Carry out edge extracting with the Suo Beier edge detection operator;

The edge pixel that extracts is carried out horizontal projection, and the abscissa value of this projection is P (i), extracts the image line of P (i) 〉=Tx, and will be at interval smaller or equal to the image line merging of the pixel column of setting number, and wherein Tx is the capable pixel threshold of setting;

The capable capable pixel wide of consecutive image that statistics extracts, if described capable pixel wide is then abolished this image line less than the Tnx that sets, wherein Tnx is row pixel wide threshold value;

To the image line of described capable pixel wide greater than described capable pixel wide threshold value, the edge pixel that extracts is carried out vertical projection, if the ordinate value of projection is P (j), extract the image column of P (j) 〉=Ty, and with the image column merging of interval smaller or equal to the pixel column of setting number, and the image column after will merging extracts, and wherein Ty is the row pixel threshold;

The row pixel wide of the consecutive image row that statistics extracts, less than the Tny that sets, wherein, Tny is a row pixel wide threshold value as if described row pixel wide, then abolishes the corresponding subimage block of these consecutive image row; Otherwise the wide ratio of the wide and capable pixel of the row pixel of calculating corresponding subimage block, if first threshold greater than row pixel wide and line of text pixel wide ratio, less than second threshold value of row pixel wide and line of text pixel wide ratio, then with this subimage block as specific text area.

9, according to any described method in the claim 1～8, it is characterized in that this method further comprises: the video index of described video file is sent to terminal.

10, a kind of device that extracts video index is characterized in that, described video index comprises the crucial text of corresponding video file and the key frame of video of each camera lens, and this device comprises:

The Shot Detection unit is used for video file is carried out Shot Detection;

11, device according to claim 10 is characterized in that, described Shot Detection unit comprises:

Judging unit, be used for frame pitch according to the neighboring candidate frame of each camera lens from lens mutation frame and gradual change start frame and the gradual change abort frame of determining this video file, and, with a lens mutation frame as one the sudden change camera lens, with between adjacent gradual change start frame and gradual change abort frame and gradual change start frame and the gradual change abort frame as a gradual change camera lens.

12, device according to claim 11 is characterized in that, described judging unit, be used in the certain hour section, when the interframe distance D (i, i+1)＞Th, and this frame pitch from the second largest frame pitch in part from difference when big, be the lens mutation frame then with the i two field picture; If Tl＜D (i, i+1)＜Th, then with the i two field picture as possible gradual change start frame; For possible gradual change start frame, from the i frame, the interframe distance D of cumulative neighboring frame (j, j+1); When the interframe distance D of cumulative neighboring frame (j, j+1)＜during Tl, with D (j, j+1) with Th relatively, if greater than Th, then with the abort frame of j frame as the gradual change camera lens, otherwise, abolish this possible gradual change start frame;

13, device according to claim 11 is characterized in that, described interframe distance acquiring unit is used for:

D (i, i + 1) = \frac{1}{2 M} Σ_{k = 1}^{N} | h_{i} (k) - h_{i + 1} (k) |

It is the statistical pixel number of i+1 frame k level.

14, device according to claim 10 is characterized in that, described key frame of video extraction unit comprises:

15, device according to claim 14 is characterized in that, described above candidate's key frame comprises: first frame, intermediate frame and tail frame, and then described key frame acquiring unit comprises:

16, device according to claim 15 is characterized in that, described edge positioning unit comprises:

The edge extracting unit utilizes the Suo Beier edge detection operator to carry out edge extracting;

First processing unit, be used for the edge pixel that extracts is carried out horizontal projection, the abscissa value of this projection is P (i), extracts the image line of P (i) 〉=Tx, and will be at interval smaller or equal to the image line merging of the pixel column of setting number, wherein Tx is the capable pixel threshold of setting; The capable capable pixel wide of consecutive image that statistics extracts, if described capable pixel wide is then abolished this image line less than the Tnx that sets, wherein Tnx is row pixel wide threshold value;

Second processing unit, be used for the image line of described capable pixel wide greater than Tnx, the edge pixel that extracts is carried out vertical projection, if the ordinate value of projection is P (j), extract the image column of P (j) 〉=Ty, and will be at interval merge smaller or equal to the image column of the pixel column of setting number, and the image column after will merging extracts, wherein Ty is the row pixel threshold;

The location determining unit, the row pixel wide that is used to add up the consecutive image row that extract, less than the Tny that sets, wherein, Tny is a row pixel wide threshold value as if described row pixel wide, then abolishes the corresponding subimage block of these consecutive image row; Otherwise the wide ratio of the wide and capable pixel of the row pixel of calculating corresponding subimage block, if greater than the text column pixel wide of setting and the first threshold of line of text pixel wide ratio, less than text column pixel wide of setting and second threshold value of line of text pixel wide ratio, then with this subimage block as specific text area.

17, a kind of video downloading system is characterized in that, comprising:

18, download system according to claim 17 is characterized in that, described key frame of video and critical file search unit are used to utilize dual comparison algorithm that video file is suddenlyd change and the detection of gradual change camera lens.