CN101620629A - Method and device for extracting video index and video downloading system - Google Patents

Method and device for extracting video index and video downloading system Download PDF

Info

Publication number
CN101620629A
CN101620629A CN200910147306A CN200910147306A CN101620629A CN 101620629 A CN101620629 A CN 101620629A CN 200910147306 A CN200910147306 A CN 200910147306A CN 200910147306 A CN200910147306 A CN 200910147306A CN 101620629 A CN101620629 A CN 101620629A
Authority
CN
China
Prior art keywords
frame
video
camera lens
text
gradual change
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910147306A
Other languages
Chinese (zh)
Inventor
王婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN200910147306A priority Critical patent/CN101620629A/en
Priority to PCT/CN2009/073467 priority patent/WO2010142089A1/en
Publication of CN101620629A publication Critical patent/CN101620629A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data

Abstract

The invention discloses a method and a device for extracting video indexes and a video downloading system. The method for extracting video indexes comprises the following steps: checking lenses of video files; analyzing the checked content of each lens; extracting a video key frame of each lens; positioning the edge of a particular text of the video files; recognizing the positioned text; extracting the recognized text as a key text; and using the key text and the video key frame of each lens as video indexes of the video files. The method for extracting video indexes can supply overall video index information for cell phone subscribers, effectively help the subscribers to select download videos and accurately position videos to be downloaded to avoid downloading unneeded videos, thereby the downloading of the videos more fits for the requirements of subscribers.

Description

A kind of method, device and video downloading system that extracts video index
Technical field
The present invention relates to the video search technical field, be meant a kind of method, device and video downloading system that extracts video index especially.
Background technology
Generally speaking, video file is compared with other message files, and its capacity is bigger, and the algorithm more complicated is not easy to analyze or index.Yet along with Internet video is increasing, social information's amount is exploded, and obtains the information of video effectively, helps the user to navigate to own interested content in the magnanimity video and seems increasingly important.At present, the video index technology mainly applies on the PC, and in mobile video, this technology is not found broad application.
Compare with PC, mobile phone is subjected to the restriction of factors such as bandwidth and memory capacity, and transmission speed is slow, shelf space is little, the calculation process ability.The user is with one section video of mobile phone-downloaded the time, and is usually more careful.Therefore, effectively use the video index technology on mobile phone, accurate in locating seems more important to the user's interest video.
In order to realize above-mentioned target, the someone has proposed a kind of mobile video index and searching method.This method is the elementary cell camera lens with Video Segmentation at first, then the camera lens switch frame is extracted, and offers the cellphone subscriber.This method offers the useful video information of cellphone subscriber to a certain extent, helps the user to select the video file of downloading.Yet the video information that this method provides is comprehensive inadequately, and the first frame of camera lens is difficult to comprehensively reflect the summary of a camera lens.
Summary of the invention
The invention provides a kind of method, device and video downloading system that extracts video index, in order to more fully video index information to be provided to the user.
A kind of method of extracting video index that the embodiment of the invention provides comprises:
Video file is carried out Shot Detection, and each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens; And
The particular text of video file is carried out the location, edge, the text that the location obtains is discerned, with in the text that identifies as crucial text;
With the key frame of video of described crucial text and described each camera lens video index as this video file of search.
A kind of device that extracts video index that the embodiment of the invention provides, described video index comprise the crucial text of corresponding video file and the key frame of video of each camera lens, and this device comprises:
The Shot Detection unit is used for video file is carried out Shot Detection;
The key frame of video extraction unit, each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens;
The edge positioning unit is used for the particular text of video file is carried out the location, edge;
Crucial text extraction unit is discerned the text that the location obtains, with the text that identifies as crucial text.
A kind of video downloading system that the embodiment of the invention provides comprises:
The base station is used in the future that the crucial text of the search video of self terminal and the request of key frame of video send to the video download services device, and will send to described terminal from the video index of video Download Server;
The video download services device is used for according to described request, sends searching request to operation key frame of video and critical file search unit, and will send to described base station from the video index of described key frame of video and the acquisition of critical file search unit;
Key frame of video and critical file search unit are used for video file is detected, and each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens; And the particular text of video file carried out the location, edge, the text that the location obtains is discerned, with the text that identifies as crucial text; The key frame of video of described crucial text and described each camera lens is returned to the video download services device as the video index of this video file of terminal searching.
In the embodiment of the invention, in extracting the video index process, because key frame and the crucial text that extracts offered the user, therefore can provide more fully video index information to the cellphone subscriber, make the cellphone subscriber can get access to the content information of video, effectively help the user to carry out the selection of foradownloaded video, make its accurate in locating to the video of wanting to download, avoid downloading to unwanted video, thereby make fit more user's demand of the download of video.
Description of drawings
Fig. 1 extracts the schematic flow sheet of video index for the embodiment of the invention;
Fig. 2 is a video-frequency band consecutive frame interframe distance distribution histogram;
Fig. 3 is the two field picture in one section advertisement video;
Fig. 4 is for carrying out figure as a result after the rim detection to Fig. 3 with Suo Beier (Sobel) edge detection operator;
Fig. 5 is for carrying out the figure as a result of horizontal projection to Fig. 3;
Fig. 6 carries out the figure as a result of specific text area location to Fig. 3 for the improvement algorithm of locating with the text based on the edge of the embodiment of the invention;
Fig. 7 is the apparatus structure synoptic diagram of the extraction video index of the embodiment of the invention;
Fig. 8 is the system architecture synoptic diagram of the embodiment of the invention;
Fig. 9 is for using the embodiment of the invention, and the cellphone subscriber gets access to the crucial text and the key frame synoptic diagram of this section video when choosing a certain video file.
Embodiment
Because when extracting the mobile video index in the prior art, it is comprehensive inadequately to extract video information, the first frame that only extracts camera lens is difficult to reflect the summary of a camera lens comprehensively, is necessary further to extract the key frame of camera lens.In addition, except that image information, the text in the video has usually also comprised succinct, eye-catching text in important information, the especially video of video, and these texts normally theme of one section video are concluded, or core content embodies.Therefore,, and they are offered the user, usually just the core content of video has been offered the user if can effectively locate and extract the crucial text of video.This video index for the user has played crucial effects undoubtedly, makes the mobile video index have higher accuracy.
The basic physical location of video is a camera lens.A camera lens is taken the some two field pictures continuous in time that obtain continuously by a video camera and is formed.One section video is combined by various conversion regimes by many camera lenses to be formed.The conversion regime of camera lens mainly contains two kinds: sudden change and gradual change.Sudden change is exactly from direct second camera lens of incision of a camera lens, the centre without any edit effect.Gradual change then is to have added certain editor's gimmick between the conversion of camera lens.
In the embodiment of the invention, video file is detected each camera lens that obtains carry out content analysis, extract the key frame of video of each camera lens; And the particular text of video file carried out the location, edge, and the text that the location obtains is discerned, extract crucial text; With the key frame of video of described crucial text and described each camera lens video index as this video file of terminal searching.
Referring to shown in Figure 1, the detailed process that the embodiment of the invention is extracted video index is as follows:
Step 101: video file is suddenlyd change and the detection of gradual change camera lens.
Here can utilize dual comparison algorithm that video is suddenlyd change and the detection of gradual change camera lens, concrete steps are as follows:
Obtain each camera lens the neighboring candidate frame frame pitch from; Lens mutation frame and gradual change start frame and the gradual change abort frame of frame pitch in determining this video file according to the neighboring candidate frame of each camera lens, and, with a lens mutation frame as one the sudden change camera lens, with between adjacent gradual change start frame and gradual change abort frame and gradual change start frame and the gradual change abort frame as a gradual change camera lens.
Based on the pixel comparison, compare, adopt dual comparison algorithm can better detect the gradual change camera lens with commonly used based on histogram and based on the Shot Detection algorithm at edge.
The frame pitch of the neighboring candidate frame of each camera lens is from carrying out the calculating of distance between consecutive frame with the histogram relative method.The consecutive frame interframe distance distribution histogram that is one section video shown in Figure 2, such as: can realize in the following manner:
(1) each candidate frame color of pixel is divided into several grades,, obtains the statistic histogram of each two field picture at each grade statistical pixel number;
(2) utilize the relatively distance of neighboring candidate interframe statistic histogram of following formula, obtain according to the distance of described neighboring candidate interframe statistic histogram each camera lens the neighboring candidate frame frame pitch from,
D ( i , i + 1 ) = 1 2 M Σ k = 1 N | h i ( k ) - h i + 1 ( k ) |
Wherein, (i i+1) is the distance of neighboring candidate interframe statistic histogram to D, and M is a total number of image pixels; N is a natural number, the number of levels that the expression color is divided into; h i(k) be the statistical pixel number of i frame k level; h I+1(k) h i(k) be the statistical pixel number of i+1 frame k level.
Here, can detect sudden change camera lens and gradual change camera lens with two thresholding Th and Tl, wherein, Th>Tl, Th are used for detecting the sudden change frame, and Tl is used to detect possible gradual change start frame.
Lens mutation frame in this video file and gradual change start frame and gradual change abort frame can be determined in the following way:
(1) does between consecutive frame apart from histogram;
(2) if the interframe distance D (i, i+1)>Th, and bigger with part second largest frame pitch deviation value, then the i two field picture is the lens mutation frame; If Tl<D (i, i+1)<Th, then with the i two field picture as possible gradual change start frame;
Here, find out possible gradual change start frame, gradual change is different with sudden change, and sudden change shows as the unexpected jump of camera lens, therefore surpass certain threshold value apart from the gap between maximal value and the local second largest interframe distance value between local frame, filter the sudden change camera lens in this way earlier.Such as: in the certain hour section, D (1,2) is maximum frame pitch from, D (3,4) be second largest frame pitch when, D (1,2) and D (3,4) differ by more than certain threshold value, can assert that the 1st frame is the frame that suddenlys change;
(3) for possible gradual change start frame, from the i frame, the interframe distance D of cumulative neighboring frame (j, j+1);
(4) when the interframe distance D of cumulative neighboring frame (j, j+1)<during Tl, stop frame pitch from accumulation, and D (j, j+1) compare with Th, if greater than Th, then with the abort frame of j frame as the gradual change camera lens, otherwise, think the gradual change that is not camera lens, abolish this possible gradual change start frame;
Wherein, the sudden change frame threshold value of Th for being provided with, Tl is possible gradual change start frame threshold value, Th>Tl; I is a frame number, D (i, the i+1) distance of expression i frame and i+1 interframe, D (j, j+1) the expression frame pitch of accumulating j and j+1 interframe from.
Utilize above-mentioned dual comparison algorithm, can detect 4 sudden changes of existence camera lenses among Fig. 2,7 gradual change camera lenses.
Step 102: each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens.
Specifically can realize like this:
(1) from each camera lens that obtains, extract above candidate's key frame,
(2) according to a described content analysis that above candidate's key frame carries out, select to obtain the key frame of video of each camera lens.
After extracting camera lens, need carry out the extraction of key frame to camera lens.Say that in principle key frame should be able to provide comprehensive summary of a camera lens, should be able to provide a content abundant as far as possible summary in other words.According to information-theoretical viewpoint, the two field picture of different (or correlativity is less) carries more information than similar two field picture.The key frame that the present invention extracts is the main information that is used to offer this camera lens of user.Therefore, the criterion that is used for key-frame extraction mainly is the dissimilarity of considering between them.
In theory, because camera lens is made up of the very high two field picture of correlativity on continuous in time, the content, therefore chooses several frames of wherein least being correlated with and just can comprise maximum information as the camera lens key frame.
Certainly, can the chosen content correlativity from described above candidate's key frame low some frames are as the key frame of video of each camera lens.Such as: if the candidate's key frame in each camera lens comprises: first frame f 1, intermediate frame f N/2And tail frame f N, then can calculate the distance between per two candidate's key frames, i.e. D (f in each camera lens 1, f N/2), D (f 1, f N), D (f N/2, f N), and the distance of per two candidate's key frames compared with preset threshold, if the distance of per two candidate's key frames is all little than preset threshold,, otherwise get the key frame of video of two maximum two field pictures of distance as corresponding camera lens with the key frame of video of intermediate frame as corresponding camera lens.
This algorithm is fairly simple, therefore has computing velocity faster, and extract two key frames with less correlativity at the most in a camera lens, offers the user.With transmission speed faster PC compare, this algorithm has satisfied the relatively slow cellphone subscriber's of transmission speed needs more.
Step 103: the particular text to video file carries out location, edge, the just specific text area in the positioning video.
Allow to occur the text of different size in the video.But the text that can embody the video key message cause that people's the common text number of words of attention can be not too many, and size is bigger for eye-catching.For example: the title of TV programme, the title of advertised product etc.Be illustrated in figure 3 as the two field picture in one section advertisement video, the title of advertised product promptly is a particular text of the present invention, has reflected the key message of video.The present invention requires to orient these specific text area.
Specifically can realize like this:
The present invention has adopted a kind of improvement algorithm of locating based on the text at edge, and the particular text in the video is positioned.Algorithm of the present invention divides following step to carry out:
(1) carries out edge extracting with the Sobel edge detection operator, promptly utilize the Sobel edge detection operator to extract the marginal portion of particular text.As shown in Figure 4.
(2) edge that extracts is carried out horizontal projection, the ordinate value of establishing projection is P (i), and P (i) is the edge pixel number of horizontal line.As shown in Figure 5.
(3) set line of text pixel threshold Tx, as ordinate value P (i).Extract the image line of P (i) 〉=Tx, smaller or equal to the image line merging of the pixel column of setting number, extract at interval.Such as: pixel behavior 3 or any natural number of setting number can.Such as setting number is 3 o'clock, and even capable and i3 to the i4 row image of i1 to i2 all satisfies P (i) 〉=Tx, and i1<i2<i3<i4 is if i3-i2≤3 then all extract the capable image of i1 to i4.
(4) add up the capable capable pixel wide of consecutive image that extracts, if less than the T that sets Nx, then abolish this image line, wherein, T NxBe row pixel wide threshold value.
(5) to greater than T NxImage line, the edge pixel that extracts is carried out vertical projection, the ordinate value of establishing projection is P (j), is the edge pixel number of j row.
(6) set text column pixel threshold T y, extract P (j) 〉=T yImage column, in like manner, the image column smaller or equal to 3 pixel columns is at interval merged, extract.
(7) if the row pixel wide of the consecutive image row that extract of statistics is less than the row pixel wide threshold value T that sets Ny, then abolish this subimage block.
(8) the wide ratio of the wide and capable pixel of row pixel of the subimage block of above-mentioned condition is satisfied in calculating, if greater than the text column pixel wide of setting and the first threshold of line of text pixel wide ratio, promptly greater than T Ny/nx, less than text column pixel wide of setting and second threshold value of line of text pixel wide ratio, promptly less than T ' ny/nx' first threshold is less than second threshold value, promptly first threshold is a lower limit, second threshold value is a higher limit, then with this subimage block as specific text area.Be illustrated in figure 6 as Fig. 3 is carried out result behind the specific text area location.
After navigating to specific text area with above-mentioned algorithm, discern with text recognition system, and the crucial text that will recognize extracts.For the text that repeats, only get one of them as the crucial text of video.
Step 104: the text that the location obtains is discerned, extracted crucial text.
Step 105: the key frame of video of described crucial text and described each camera lens is sent to terminal as the video index of this video file, browse for the user.In order to prevent semantic obscuring, the crucial text that extracts from different frame of video separately shows with space character.
Referring to shown in Figure 7, a kind of device that extracts video index of the embodiment of the invention, video index comprises the crucial text of corresponding video file and the key frame of video of each camera lens, and this device comprises: Shot Detection unit 71, key frame of video extraction unit 72, edge positioning unit 73 and crucial text extraction unit 74.
Shot Detection unit 71 is used for video file is detected;
Key frame of video extraction unit 72, each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens;
Edge positioning unit 73 is used for the particular text of video file is carried out the location, edge;
Crucial text extraction unit 74 is discerned the text that the location obtains, and extracts crucial text.
Described Shot Detection unit 71 comprises:
The interframe distance acquiring unit, be used to obtain each camera lens the neighboring candidate frame frame pitch from;
Judging unit, be used for frame pitch according to the neighboring candidate frame of each camera lens from lens mutation frame and gradual change start frame and the gradual change abort frame of determining this video file, and, with a lens mutation frame as one the sudden change camera lens, with between adjacent gradual change start frame and the gradual change abort frame as a gradual change camera lens.
Described judging unit is used for that (i i+1)>Th, and when big, is the lens mutation frame with the i two field picture with the second largest value difference value in part then when the interframe distance D; If Tl<D (i, i+1)<Th, then with the i two field picture as possible gradual change start frame; For possible gradual change start frame, from the i frame, the interframe distance D of cumulative neighboring frame (j, j+1); When the interframe distance D of cumulative neighboring frame (j, j+1)<during Tl, with D (j, j+1) with Th relatively, if greater than Th, then with the abort frame of j frame as the gradual change camera lens, otherwise, abolish this possible gradual change start frame;
Wherein, the sudden change frame threshold value of Th for being provided with, Tl is possible gradual change start frame threshold value, Th>Tl; I is a frame number, D (i, the i+1) distance of expression i frame and i+1 interframe, D (j, j+1) the expression frame pitch of accumulating j and j+1 interframe from.
Described interframe distance acquiring unit is used for:
Each candidate frame color of pixel is divided into several grades,, obtains the statistic histogram of each two field picture at each grade statistical pixel number;
Utilize the relatively distance of neighboring candidate interframe statistic histogram of following formula, obtain according to the distance of described neighboring candidate interframe statistic histogram each camera lens the neighboring candidate frame frame pitch from,
D ( i , i + 1 ) = 1 2 M Σ k = 1 N | h i ( k ) - h i + 1 ( k ) |
Wherein, M is a total number of image pixels; N is a natural number, the number of levels that the expression color is divided into; h i(k) be the statistical pixel number of i frame k level;
Figure G2009101473069D00092
It is the statistical pixel number of i+1 frame k level.
Described key frame of video extraction unit 72 comprises:
The candidate frame acquiring unit is used for extracting above candidate's key frame from each camera lens that obtains,
The key frame acquiring unit is used for selecting to obtain the key frame of video of each camera lens according to a described content analysis that above candidate's key frame carries out.
Described above candidate's key frame comprises: first frame, intermediate frame and tail frame, and then described key frame acquiring unit comprises:
Comparing unit is used for calculating the distance between per two the candidate's key frames of each camera lens, and the distance of per two candidate's key frames is compared with preset threshold,
Determining unit if it is all little than preset threshold to be used for the distance of per two candidate's key frames, with the key frame of video of intermediate frame as corresponding camera lens, otherwise is got the key frame of video of two maximum two field pictures of distance as corresponding camera lens.
Described edge positioning unit 73 comprises:
The edge extracting unit utilizes the Sobel edge detection operator to carry out edge extracting;
First processing unit is used for the edge pixel that extracts is carried out horizontal projection, and the abscissa value of this projection is P (i), extracts the image line of P (i) 〉=Tx, and the image line of interval smaller or equal to the pixel column of setting number merged; The capable capable pixel wide of consecutive image that statistics extracts is if described capable pixel wide is then abolished this image line less than the capable pixel wide threshold value Tnx that sets; This sets number is any natural number, such as being 3.
Second processing unit, be used for the image line of described capable pixel wide greater than Tnx, the edge pixel that extracts is carried out vertical projection, if the ordinate value of projection is P (j), extract the image column of P (j) 〉=Ty, and will extract at interval smaller or equal to the image column merging of the pixel column of setting number; This sets number is any natural number, such as being 3.
The location determining unit, the row pixel wide that is used to add up the consecutive image row that extract is if described row pixel wide, is then abolished the corresponding subimage block of these consecutive image row less than Tny; Otherwise the wide ratio of the wide and capable pixel of the row pixel of calculating corresponding subimage block, if greater than Tny/nx, less than Tny/nx, then with this subimage block as specific text area.
Referring to shown in Figure 8, a kind of download system of the embodiment of the invention comprises: base station 81, video download services device 82 and key frame of video and critical file search unit 83.
Base station 81 is used in the future that the crucial text of the search video of self terminal and the request of key frame of video send to the video download services device, and will send to described terminal from the video index of video Download Server;
Video download services device 82 is used for according to described request, sends searching request to operation key frame of video and critical file search unit, and will send to described base station from the video index of described key frame of video and the acquisition of critical file search unit;
Key frame of video and critical file search unit 83 after being used to receive searching request, detect video file, and each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens; And the particular text of video file carried out the location, edge, the text that the location obtains is discerned, with the text that identifies as crucial text; The key frame of video of described crucial text and described each camera lens is returned to the video download services device as the video index of this video file of terminal searching.
Described key frame of video and critical file search unit 83 are used to utilize dual comparison algorithm that video file is suddenlyd change and the detection of gradual change camera lens.
Key frame of video and critical file search unit can be used as an independently physical entity realization, also can be used as logic entity, such as: be that a key frame of video and crucial text search program are installed in the video download services device 82.At this moment, video download services device 82 sends the process that searching request obtains video index to operation key frame of video and critical file search unit, just is presented as the process of this search utility of operation.
Use the download system of the invention described above embodiment, the cellphone subscriber is when choosing a certain video file, promptly send the key frame of this section of search video and the request of crucial text to the video download services device, after the video download services device is received request, operation key frame of video and crucial text search program, and the result that will search for is sent to and offers the user on the mobile phone.The user just can get access to the crucial text and the key frame of this section video, as shown in Figure 9.Browse by crucial text and key frame to video, the user can get access to the key message of this section video, and then whether decision will download them.Like this, just avoided downloading to unwanted video file greatly, made fit more user's demand of the video file that downloads to.
Key frame of video and crucial text search program.Key frame of video and crucial text search program are moved in the video download services device, after the user chooses certain section video, can send the key frame of this section of search video and the request of crucial text to the video download services device, after server is received request, promptly move this program, and the result that will search for is sent on the mobile phone, offers the user.
Receive that video index that the terminal of video index shows has comprised the important information of corresponding video, offer the more more rich video of user index, after this, the user can send download request by the base station to the video download services device according to the video index that terminal shows, downloads and obtains corresponding video file.
In the embodiment of the invention, video file to user's selection, not only extract the key frame of camera lens, the image of reflecting video important information is offered the user, and extract the crucial text that can reflect its theme and core, further offer the more video information of user, make the cellphone subscriber can get access to the content information of video, effectively help the user to carry out the selection of foradownloaded video, make its accurate in locating to the video of wanting to download, avoid downloading to unwanted video, thereby make fit more user's demand of the download of video.Little based on the mobile phone shelf space, transmission speed is slow, the characteristics that arithmetic capability is weak, the embodiment of the invention extract video index this technology particularly useful.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (18)

1, a kind of method of extracting video index is characterized in that, this method may further comprise the steps:
Video file is carried out Shot Detection, and each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens; And
The particular text of video file is carried out the location, edge, the text that the location obtains is discerned, with the text that identifies as crucial text;
With the key frame of video of described crucial text and described each camera lens video index as this video file.
2, method according to claim 1 is characterized in that, the described Shot Detection of carrying out comprises:
Obtain each camera lens the neighboring candidate frame frame pitch from;
Lens mutation frame and gradual change start frame and the gradual change abort frame of frame pitch in determining this video file according to the neighboring candidate frame of each camera lens, and, with a lens mutation frame as one the sudden change camera lens, with between adjacent gradual change start frame and gradual change abort frame and gradual change start frame and the gradual change abort frame as a gradual change camera lens.
3, method according to claim 2 is characterized in that, lens mutation frame and gradual change start frame and the gradual change abort frame of frame pitch in determining this video file according to the neighboring candidate frame of each camera lens comprises:
In the certain hour section, if the interframe distance D (i, i+1)>Th, and this frame pitch from second largest frame pitch from difference bigger, then with the i two field picture as the lens mutation frame; If Tl<D (i, i+1)<Th, then with the i two field picture as possible gradual change start frame;
For possible gradual change start frame, from the i frame, the interframe distance D of cumulative neighboring frame (j, j+1);
When the interframe distance D of cumulative neighboring frame (j, j+1)<during Tl, with D (j, j+1) with Th relatively, if greater than Th, then with the abort frame of j frame as the gradual change camera lens, otherwise, abolish this possible gradual change start frame;
Wherein, the sudden change frame threshold value of Th for being provided with, Tl is possible gradual change start frame threshold value, Th>Tl; I is a frame number, D (i, the i+1) distance of expression i frame and i+1 interframe, D (j, j+1) the expression frame pitch of accumulating j and j+1 interframe from.
4, method according to claim 2 is characterized in that, obtain each camera lens the neighboring candidate frame frame pitch from, comprising:
Each candidate frame color of pixel is divided into several grades,, obtains the statistic histogram of each two field picture at each grade statistical pixel number;
Utilize following formula to obtain the distance of neighboring candidate interframe statistic histogram, obtain according to the distance of described neighboring candidate interframe statistic histogram each camera lens the neighboring candidate frame frame pitch from,
D ( i , i + 1 ) = 1 2 M Σ k = 1 N | h i ( k ) - h i + 1 ( k ) |
Wherein, (i i+1) is the distance of neighboring candidate interframe statistic histogram to D, and M is a total number of image pixels; N is a natural number, the number of levels that the expression color is divided into; h i(k) be the statistical pixel number of i frame k level;
Figure A2009101473060003C2
It is the statistical pixel number of i+1 frame k level.
5, method according to claim 1 is characterized in that, each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens, comprising:
From each camera lens that obtains, extract above candidate's key frame,
According to a described content analysis that above candidate's key frame carries out, select to obtain the key frame of video of each camera lens.
6, method according to claim 5 is characterized in that, some frames that the chosen content correlativity is low from described above candidate's key frame are as the key frame of video of each camera lens.
7, method according to claim 6, it is characterized in that, described above candidate's key frame comprises: first frame, intermediate frame and tail frame, and then some frames that the chosen content correlativity is low from described above candidate's key frame comprise as the key frame of video of each camera lens:
Calculate the distance between per two candidate's key frames in each camera lens, and the distance of per two candidate's key frames compared with preset threshold,
If the distance of per two candidate's key frames is all little than preset threshold,, otherwise get the key frame of video of two maximum two field pictures of distance as corresponding camera lens with the key frame of video of intermediate frame as corresponding camera lens.
8, method according to claim 1 is characterized in that, the particular text of video file is carried out the location, edge, comprising:
Carry out edge extracting with the Suo Beier edge detection operator;
The edge pixel that extracts is carried out horizontal projection, and the abscissa value of this projection is P (i), extracts the image line of P (i) 〉=Tx, and will be at interval smaller or equal to the image line merging of the pixel column of setting number, and wherein Tx is the capable pixel threshold of setting;
The capable capable pixel wide of consecutive image that statistics extracts, if described capable pixel wide is then abolished this image line less than the Tnx that sets, wherein Tnx is row pixel wide threshold value;
To the image line of described capable pixel wide greater than described capable pixel wide threshold value, the edge pixel that extracts is carried out vertical projection, if the ordinate value of projection is P (j), extract the image column of P (j) 〉=Ty, and with the image column merging of interval smaller or equal to the pixel column of setting number, and the image column after will merging extracts, and wherein Ty is the row pixel threshold;
The row pixel wide of the consecutive image row that statistics extracts, less than the Tny that sets, wherein, Tny is a row pixel wide threshold value as if described row pixel wide, then abolishes the corresponding subimage block of these consecutive image row; Otherwise the wide ratio of the wide and capable pixel of the row pixel of calculating corresponding subimage block, if first threshold greater than row pixel wide and line of text pixel wide ratio, less than second threshold value of row pixel wide and line of text pixel wide ratio, then with this subimage block as specific text area.
9, according to any described method in the claim 1~8, it is characterized in that this method further comprises: the video index of described video file is sent to terminal.
10, a kind of device that extracts video index is characterized in that, described video index comprises the crucial text of corresponding video file and the key frame of video of each camera lens, and this device comprises:
The Shot Detection unit is used for video file is carried out Shot Detection;
The key frame of video extraction unit, each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens;
The edge positioning unit is used for the particular text of video file is carried out the location, edge;
Crucial text extraction unit is discerned the text that the location obtains, with the text that identifies as crucial text.
11, device according to claim 10 is characterized in that, described Shot Detection unit comprises:
The interframe distance acquiring unit, be used to obtain each camera lens the neighboring candidate frame frame pitch from;
Judging unit, be used for frame pitch according to the neighboring candidate frame of each camera lens from lens mutation frame and gradual change start frame and the gradual change abort frame of determining this video file, and, with a lens mutation frame as one the sudden change camera lens, with between adjacent gradual change start frame and gradual change abort frame and gradual change start frame and the gradual change abort frame as a gradual change camera lens.
12, device according to claim 11 is characterized in that, described judging unit, be used in the certain hour section, when the interframe distance D (i, i+1)>Th, and this frame pitch from the second largest frame pitch in part from difference when big, be the lens mutation frame then with the i two field picture; If Tl<D (i, i+1)<Th, then with the i two field picture as possible gradual change start frame; For possible gradual change start frame, from the i frame, the interframe distance D of cumulative neighboring frame (j, j+1); When the interframe distance D of cumulative neighboring frame (j, j+1)<during Tl, with D (j, j+1) with Th relatively, if greater than Th, then with the abort frame of j frame as the gradual change camera lens, otherwise, abolish this possible gradual change start frame;
Wherein, the sudden change frame threshold value of Th for being provided with, Tl is possible gradual change start frame threshold value, Th>Tl; I is a frame number, D (i, the i+1) distance of expression i frame and i+1 interframe, D (j, j+1) the expression frame pitch of accumulating j and j+1 interframe from.
13, device according to claim 11 is characterized in that, described interframe distance acquiring unit is used for:
Each candidate frame color of pixel is divided into several grades,, obtains the statistic histogram of each two field picture at each grade statistical pixel number;
Utilize following formula to obtain the distance of neighboring candidate interframe statistic histogram, obtain according to the distance of described neighboring candidate interframe statistic histogram each camera lens the neighboring candidate frame frame pitch from,
D ( i , i + 1 ) = 1 2 M Σ k = 1 N | h i ( k ) - h i + 1 ( k ) |
Wherein, (i i+1) is the distance of neighboring candidate interframe statistic histogram to D, and M is a total number of image pixels; N is a natural number, the number of levels that the expression color is divided into; h i(k) be the statistical pixel number of i frame k level;
Figure A2009101473060006C1
It is the statistical pixel number of i+1 frame k level.
14, device according to claim 10 is characterized in that, described key frame of video extraction unit comprises:
The candidate frame acquiring unit is used for extracting above candidate's key frame from each camera lens that obtains,
The key frame acquiring unit is used for selecting to obtain the key frame of video of each camera lens according to a described content analysis that above candidate's key frame carries out.
15, device according to claim 14 is characterized in that, described above candidate's key frame comprises: first frame, intermediate frame and tail frame, and then described key frame acquiring unit comprises:
Comparing unit is used for calculating the distance between per two the candidate's key frames of each camera lens, and the distance of per two candidate's key frames is compared with preset threshold,
Determining unit if it is all little than preset threshold to be used for the distance of per two candidate's key frames, with the key frame of video of intermediate frame as corresponding camera lens, otherwise is got the key frame of video of two maximum two field pictures of distance as corresponding camera lens.
16, device according to claim 15 is characterized in that, described edge positioning unit comprises:
The edge extracting unit utilizes the Suo Beier edge detection operator to carry out edge extracting;
First processing unit, be used for the edge pixel that extracts is carried out horizontal projection, the abscissa value of this projection is P (i), extracts the image line of P (i) 〉=Tx, and will be at interval smaller or equal to the image line merging of the pixel column of setting number, wherein Tx is the capable pixel threshold of setting; The capable capable pixel wide of consecutive image that statistics extracts, if described capable pixel wide is then abolished this image line less than the Tnx that sets, wherein Tnx is row pixel wide threshold value;
Second processing unit, be used for the image line of described capable pixel wide greater than Tnx, the edge pixel that extracts is carried out vertical projection, if the ordinate value of projection is P (j), extract the image column of P (j) 〉=Ty, and will be at interval merge smaller or equal to the image column of the pixel column of setting number, and the image column after will merging extracts, wherein Ty is the row pixel threshold;
The location determining unit, the row pixel wide that is used to add up the consecutive image row that extract, less than the Tny that sets, wherein, Tny is a row pixel wide threshold value as if described row pixel wide, then abolishes the corresponding subimage block of these consecutive image row; Otherwise the wide ratio of the wide and capable pixel of the row pixel of calculating corresponding subimage block, if greater than the text column pixel wide of setting and the first threshold of line of text pixel wide ratio, less than text column pixel wide of setting and second threshold value of line of text pixel wide ratio, then with this subimage block as specific text area.
17, a kind of video downloading system is characterized in that, comprising:
The base station is used in the future that the crucial text of the search video of self terminal and the request of key frame of video send to the video download services device, and will send to described terminal from the video index of video Download Server;
The video download services device is used for according to described request, sends searching request to operation key frame of video and critical file search unit, and will send to described base station from the video index of described key frame of video and the acquisition of critical file search unit;
Key frame of video and critical file search unit are used for video file is detected, and each camera lens that detection is obtained carries out content analysis, extracts the key frame of video of each camera lens; And the particular text of video file carried out the location, edge, the text that the location obtains is discerned, with the text that identifies as crucial text; The key frame of video of described crucial text and described each camera lens is returned to the video download services device as the video index of this video file of terminal searching.
18, download system according to claim 17 is characterized in that, described key frame of video and critical file search unit are used to utilize dual comparison algorithm that video file is suddenlyd change and the detection of gradual change camera lens.
CN200910147306A 2009-06-09 2009-06-09 Method and device for extracting video index and video downloading system Pending CN101620629A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN200910147306A CN101620629A (en) 2009-06-09 2009-06-09 Method and device for extracting video index and video downloading system
PCT/CN2009/073467 WO2010142089A1 (en) 2009-06-09 2009-08-24 Method and device for extracting video index and video download system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910147306A CN101620629A (en) 2009-06-09 2009-06-09 Method and device for extracting video index and video downloading system

Publications (1)

Publication Number Publication Date
CN101620629A true CN101620629A (en) 2010-01-06

Family

ID=41513865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910147306A Pending CN101620629A (en) 2009-06-09 2009-06-09 Method and device for extracting video index and video downloading system

Country Status (2)

Country Link
CN (1) CN101620629A (en)
WO (1) WO2010142089A1 (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010142089A1 (en) * 2009-06-09 2010-12-16 中兴通讯股份有限公司 Method and device for extracting video index and video download system
CN102238434A (en) * 2011-07-22 2011-11-09 中兴通讯股份有限公司 Method and system for segmenting internet protocol television (IPTV) stream media file virtually
CN102650993A (en) * 2011-02-25 2012-08-29 北大方正集团有限公司 Index establishing and searching methods, devices and systems for audio-video file
CN103118220A (en) * 2012-11-16 2013-05-22 佳都新太科技股份有限公司 Keyframe pick-up algorithm based on multi-dimensional feature vectors
CN104185089A (en) * 2013-05-23 2014-12-03 三星电子(中国)研发中心 Video summary generation method, server and client-terminal
CN105516802A (en) * 2015-11-19 2016-04-20 上海交通大学 Multi-feature fusion video news abstract extraction method
WO2016192501A1 (en) * 2015-05-29 2016-12-08 中兴通讯股份有限公司 Video search method and apparatus
CN106488300A (en) * 2016-10-27 2017-03-08 广东小天才科技有限公司 A kind of video content inspection method and device
CN106603916A (en) * 2016-12-14 2017-04-26 天脉聚源(北京)科技有限公司 Key frame detection method and device
CN106767812A (en) * 2016-11-25 2017-05-31 梁海燕 A kind of interior semanteme map updating method and system based on Semantic features extraction
CN106874443A (en) * 2017-02-09 2017-06-20 北京百家互联科技有限公司 Based on information query method and device that video text message is extracted
CN106937114A (en) * 2015-12-30 2017-07-07 株式会社日立制作所 Method and apparatus for being detected to video scene switching
CN109862390A (en) * 2019-02-26 2019-06-07 北京融链科技有限公司 Optimization method and device, storage medium, the processor of Media Stream
CN110019880A (en) * 2017-09-04 2019-07-16 优酷网络技术(北京)有限公司 Video clipping method and device
CN110309353A (en) * 2018-02-06 2019-10-08 上海全土豆文化传播有限公司 Video index method and device
CN110826365A (en) * 2018-08-09 2020-02-21 阿里巴巴集团控股有限公司 Video fingerprint generation method and device
WO2020052270A1 (en) * 2018-09-14 2020-03-19 华为技术有限公司 Video review method and apparatus, and device
CN111126113A (en) * 2018-11-01 2020-05-08 普天信息技术有限公司 Method and device for processing face image
CN113923521A (en) * 2021-12-14 2022-01-11 深圳市大头兄弟科技有限公司 Video scripting method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021857A (en) * 2006-10-20 2007-08-22 鲍东山 Video searching system based on content analysis
CN101620629A (en) * 2009-06-09 2010-01-06 中兴通讯股份有限公司 Method and device for extracting video index and video downloading system

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010142089A1 (en) * 2009-06-09 2010-12-16 中兴通讯股份有限公司 Method and device for extracting video index and video download system
CN102650993A (en) * 2011-02-25 2012-08-29 北大方正集团有限公司 Index establishing and searching methods, devices and systems for audio-video file
CN102238434A (en) * 2011-07-22 2011-11-09 中兴通讯股份有限公司 Method and system for segmenting internet protocol television (IPTV) stream media file virtually
WO2013013533A1 (en) * 2011-07-22 2013-01-31 中兴通讯股份有限公司 Method and system for virtually segmenting and using iptv stream media file
CN103118220A (en) * 2012-11-16 2013-05-22 佳都新太科技股份有限公司 Keyframe pick-up algorithm based on multi-dimensional feature vectors
CN103118220B (en) * 2012-11-16 2016-05-11 佳都新太科技股份有限公司 A kind of Key-frame Extraction Algorithm based on multidimensional characteristic vectors
CN104185089B (en) * 2013-05-23 2018-02-16 三星电子(中国)研发中心 Video summary generation method and server, client
CN104185089A (en) * 2013-05-23 2014-12-03 三星电子(中国)研发中心 Video summary generation method, server and client-terminal
WO2016192501A1 (en) * 2015-05-29 2016-12-08 中兴通讯股份有限公司 Video search method and apparatus
CN105516802A (en) * 2015-11-19 2016-04-20 上海交通大学 Multi-feature fusion video news abstract extraction method
CN105516802B (en) * 2015-11-19 2018-10-23 上海交通大学 The news video abstract extraction method of multiple features fusion
CN106937114B (en) * 2015-12-30 2020-09-25 株式会社日立制作所 Method and device for detecting video scene switching
CN106937114A (en) * 2015-12-30 2017-07-07 株式会社日立制作所 Method and apparatus for being detected to video scene switching
CN106488300A (en) * 2016-10-27 2017-03-08 广东小天才科技有限公司 A kind of video content inspection method and device
CN106767812B (en) * 2016-11-25 2017-12-08 郭得科 A kind of indoor semantic map updating method and system based on Semantic features extraction
CN106767812A (en) * 2016-11-25 2017-05-31 梁海燕 A kind of interior semanteme map updating method and system based on Semantic features extraction
CN106603916A (en) * 2016-12-14 2017-04-26 天脉聚源(北京)科技有限公司 Key frame detection method and device
CN106874443A (en) * 2017-02-09 2017-06-20 北京百家互联科技有限公司 Based on information query method and device that video text message is extracted
CN110019880A (en) * 2017-09-04 2019-07-16 优酷网络技术(北京)有限公司 Video clipping method and device
CN110309353A (en) * 2018-02-06 2019-10-08 上海全土豆文化传播有限公司 Video index method and device
CN110826365A (en) * 2018-08-09 2020-02-21 阿里巴巴集团控股有限公司 Video fingerprint generation method and device
CN110826365B (en) * 2018-08-09 2023-06-23 阿里巴巴集团控股有限公司 Video fingerprint generation method and device
WO2020052270A1 (en) * 2018-09-14 2020-03-19 华为技术有限公司 Video review method and apparatus, and device
CN110913243A (en) * 2018-09-14 2020-03-24 华为技术有限公司 Video auditing method, device and equipment
CN111126113A (en) * 2018-11-01 2020-05-08 普天信息技术有限公司 Method and device for processing face image
CN111126113B (en) * 2018-11-01 2023-10-10 普天信息技术有限公司 Face image processing method and device
CN109862390A (en) * 2019-02-26 2019-06-07 北京融链科技有限公司 Optimization method and device, storage medium, the processor of Media Stream
CN109862390B (en) * 2019-02-26 2021-06-01 北京融链科技有限公司 Method and device for optimizing media stream, storage medium and processor
CN113923521A (en) * 2021-12-14 2022-01-11 深圳市大头兄弟科技有限公司 Video scripting method

Also Published As

Publication number Publication date
WO2010142089A1 (en) 2010-12-16

Similar Documents

Publication Publication Date Title
CN101620629A (en) Method and device for extracting video index and video downloading system
US9860593B2 (en) Devices, systems, methods, and media for detecting, indexing, and comparing video signals from a video display in a background scene using a camera-enabled device
US9443147B2 (en) Enriching online videos by content detection, searching, and information aggregation
CN104199933A (en) Multi-modal information fusion football video event detection and semantic annotation method
CN105657514A (en) Method and apparatus for playing video key information on mobile device browser
Huang et al. Automatic detection and localization of natural scene text in video
CN103297851A (en) Method and device for quickly counting and automatically examining and verifying target contents in long video
CN110674345A (en) Video searching method and device and server
CN111401238A (en) Method and device for detecting character close-up segments in video
CN101867729B (en) Method for detecting news video formal soliloquy scene based on features of characters
CN109408652B (en) Picture searching method, device and equipment
CN113435438B (en) Image and subtitle fused video screen plate extraction and video segmentation method
KR20050087987A (en) A apparatus and method for deciding anchor shot.
US8692852B2 (en) Intelligent display method for multimedia mobile terminal
CN107391661B (en) Recommended word display method and device
CN115640424A (en) Video library comparison method and system based on deep learning feature map analysis
JP6091552B2 (en) Movie processing apparatus and movie processing system
US20170103285A1 (en) Method and device for detecting copies in a stream of visual data
CN108391140B (en) Video frame analysis method and device
Jianyong et al. An edge-based approach for video text extraction
CN108363981B (en) Title detection method and device
Jung et al. Player information extraction for semantic annotation in golf videos
CN111614991B (en) Video progress determination method and device, electronic equipment and storage medium
CN114363535A (en) Video subtitle extraction method, apparatus, and computer-readable storage medium
Chattopadhyay et al. TV Video context extraction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20100106