CN1993989A

CN1993989A - Contents processing device, contents processing method, and computer program

Info

Publication number: CN1993989A
Application number: CN 200680000555
Authority: CN
Inventors: 奥田尚生
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2005-05-26
Filing date: 2006-05-10
Publication date: 2007-07-04
Anticipated expiration: 2026-05-10
Also published as: CN100551014C

Abstract

A change in the topics of image contents is detected by utilizing subtitles contained in images, and the contents are divided on every topics. At first, a scene-change point, at which scenes are highly changed by switching images, is detected from the image contents. Next, an average image of the frames one second before and after the scene-change point is formed and is used to detect it highly precisely whether or not the subtitles appear at the scene-change point. The sections, in which identical still subtitles appear, are detected to create the index information on the time period of the individual sections, at which the identical still subtitles appear.

Description

The method of contents processing apparatus, contents processing and computer program

Technical field

The present invention relates to be configured to the video content that obtains by for example recording television programs is carried out the contents processing apparatus of the processing such as authorized index (indexing), and relate to the method and computer program of contents processing.Particularly, the present invention relates to be configured to (promptly at the title of TV programme, theme) scene of definite TV programme that is write down changes and carries out the segmentation or the classified content treatment facility of scene on the basis, and relates to the method and computer program of contents processing.

More specifically, the present invention relates to detect theme changes, carries out the segmentation and the execution index establishment of video content on the basis of the theme that is detected contents processing apparatus on the basis of the captions (telop) that are configured in video content, comprise, and relate to the method and computer program of contents processing.Particularly, the present invention relates to be configured on the basis of relatively little data volume, detect the contents processing apparatus that theme changes, and relate to the method and computer program of contents processing by using the captions that in video content, comprise.

Background technology

In the information science of today, the importance of broadcasting is immeasurable.Particularly, because directly send sound and image to spectators, so television broadcasting has very big influence to spectators.Broadcast technology comprises technology widely, as processing, transmission and received signal and processing audio and video information.

The home penetration rate of television set is very high, and, watch TV programme by general public from each television station broadcast.As the another kind of mode of watching broadcasted content, spectators can note content, and the content that is write down in arbitrarily selected time playback.

Recently, the progress of digital technology has made it possible to store a large amount of audio-visual data.For example, hard disk drives (HDD) can be relatively inexpensively bought, and the recording equipment and the personal computer (PC) based on HDD of TV programme can be on market, obtained to write down and to play with tens of capacity to hundreds of gigabytes.HDD is a kind of can random-accessly the device.Therefore, when playing the program that writes down on HDD, the video band comes broadcast program in proper order according to record like that as is known, but can play-over the program (or any scene or fragment in the program) of any record.Receiving equipment such as television set or videograph and playback equipment receive and in big storage device (as hard disk unit) interim stores broadcast contents, the watching mode of playing the content of being stored then be called as " server broadcast ".By using the server broadcast system different with the existing-quality television system, spectators needn't watch the program of being broadcasted when program is broadcasted, but can watch program in any selected time.

The increase of the hard-disk capacity of server broadcast system has allowed viewer records to reach tens of hours TV programme.Yet spectators can not watch the whole television contents that write down basically on hard disk.If spectators can retrieve only interested scene, and make a summary and watch (digest viewing), then spectators can be efficiently and are used the content that is write down effectively.

Watch for the content that is write down being carried out scene retrieval and summary, must carry out authorized index image.As the method for video index establishment, known a kind of method wherein, detects the scene change point corresponding to the frame of the bigger variation of vision signal, and the execution index establishment.

For example, known a kind of scene-change detecting method, be used for when corresponding to the summation of the histogrammic difference of component two sequential chart image fields or picture frame, that represent composing images during greater than predetermined threshold, the scene of detected image changes (for example, referring to Patent Document 1).When forming histogram, distribute constant to predetermined layer and adjacent layer thereof, and with described constant addition; Calculate new histogram by normalization; By using the histogram of new calculating, in per two sequential chart image fields or picture frame, detect the variation of scene.Like this, even in the decay pattern picture, also can accurately detect scene and change.

In TV programme, comprise many scene change points.Usually, handle (treat) corresponding to time cycle of specific title (that is theme) and video content is carried out segmentation and classification be considered suitable for summary and watch.Yet even when same title is proceeded, scene also can change continually.Therefore, only depend on the authorized index that the video index preparation method of scene change point will not necessarily provide the user to want.

A kind of video sound-content compilation (compiling) equipment has been proposed, it is configured to: by using video data to detect video switch (cut) position, using voice data to carry out that sound is trooped (clustering) and come the execution index establishment by integrating video data and voice data, and collect, retrieve and select video content (for example, referring to Patent Document 2) according to index information.According to this video sound-content compilation equipment, will be linked with scene change point from the index information (being used to distinguish sound, noiseless and music) that audio-frequency information obtains.Like this, the fragment of significant image and sound can be detected is scene, and can ignore not too significant scene change point.Yet, because in a TV programme, have many scene change points, so can not on the basis of different themes, carry out segmentation to video content.

Usually, as the method that produces and edit such as the television broadcasting of news program and variety show (variety program), adopted a kind of method that in the corner of picture frame, shows the captions of the title of representing program clearly or impliedly.Can use the captions that in picture frame, show, as the important clue of in the display cycle of captions, specifying or assess the theme of broadcast program.Therefore, extracting captions from video content, and carry out the video index establishment, is an index with the content-defined of shown captions wherein.

For example, propose a kind of broadcast program contents menu and produced equipment, it is configured to: the captions that comprise in the detected image frame are as the characteristic image fragment of picture frame, and by extracting the menu (for example, referring to Patent Document 3) that automatically produces the content of expression broadcast program corresponding to the view data of captions only.Usually, in frame, detecting captions, must carry out rim detection.Yet edge calculations has added high processing burden.Equipment for being used for to each picture frame execution rim detection needs big amount of calculation.In addition, the main purpose of this equipment is to use the captions from the video data extraction to give birth to the program guide of news program from movable property, and lies in the variation of specifying the theme in the news program on the basis of detected captions or use theme not add image index.In other words, not to coming the problem of carries out image authorized index that solution is provided based on the information relevant with detected captions in picture frame.

[patent documentation 1]

Japanese unexamined patent discloses 2004-282318 number

[patent documentation 2]

Japanese unexamined patent discloses 2002-271741 number

[patent documentation 3]

Japanese unexamined patent discloses 2004-364234 number

Summary of the invention

One object of the present invention is: excellent contents processing apparatus is provided, and it can suitably be carried out by determine scene to change based on the title (being theme) of program to the video index establishment of institute's record video content and with video content and be segmented into scene; And the method and computer program that contents processing is provided.

Another object of the present invention is: excellent contents processing apparatus is provided, and it is configured to change, by each theme content carried out segmentation and execution index establishment by the theme that uses the captions that comprise in the image to detect in the video content; And the method and computer program that contents processing is provided.

Another object of the present invention is: excellent contents processing apparatus is provided, and it is configured to change by using the captions that comprise in the video content to detect title on the basis of relatively little data volume; And the method and computer program that contents processing is provided.

By considering the problems referred to above, a first aspect of the present invention provides a kind of contents processing apparatus, it is configured to handle the video content data of the picture frame that comprises temporal order, this equipment comprises: scene change-detection unit, it is configured to detect scene change point in video content to be processed, and this scene change point is the point between the scene of one of them picture frame two picture frames significantly different with the scene of another picture frame; Subject detection unit, the corresponding fragment of a plurality of successive image frames that it is configured in video content to be processed detection and theme and has shown same fixedly captions in its caption area; And index storage unit, its be configured to store the indication with by the described subject detection unit index information of detected corresponding time cycle of fragment.

Having become common is, receives and store temporarily the broadcasted content such as TV programme in receiving equipment, plays this content then.The growth of hard-disk capacity has made it possible to write down the TV programme corresponding to tens of hours.Therefore, effectively, from the content that is write down, retrieve the only interested scene of spectators, and allow spectators' executive summary to watch.Become possibility for making the scene retrieval of recorded contents and summary watch, must carry out authorized index image.

Traditionally, known by detecting scene change point and the method for produce index from video content.Yet, owing in TV programme, comprise many scene change points, so described authorized index is not necessarily optimum for spectators.

For broadcast TV program, in four corners of the picture frame of being everlasting, show the captions of the theme of representing program such as news program and variety show.Therefore, can extract captions, and the displaying contents that can use captions is as index from video content.Yet,, must carry out edge detection process to each picture frame for extracting captions from video content.Owing to must carry out a large amount of calculating, this is a problem.

Therefore, at first detect the scene change point that in video content to be processed, comprises, then, detect before scene change point and then and in the picture frame afterwards whether shown captions according to contents processing apparatus of the present invention.If detect captions, then detect the fragment that has wherein shown same fixedly captions.Like this, reduced to extracting captions and carried out the amount of edge detection process, thereby reduced to detecting the processing burden that theme applies.

For example, subject detection unit produce with scene change point before and the average image of one second afterwards corresponding picture frame of cycle, and detect the captions that in this average image, comprise.If before scene change point and show captions afterwards continuously, then the captions part will keep clear in the average image, and other parts will be blured.Like this, can improve the precision that captions detect.By carrying out for example rim detection, can carry out captions and detect.

The captions that subject detection unit will be in the average image shows in the caption area of picture frame before the scene change point in detected captions and the fragment that shows same fixedly captions therein compare, and the point that captions disappear are defined as the starting point of theme.Similarly, the captions that subject detection unit will be in the average image shows in the caption area of picture frame after the scene change point in detected captions and the fragment that shows same fixedly captions therein compare, and the point that captions disappear are defined as the end point of theme.Can by calculate by and the caption area of each picture frame that detected image compares in the average image in the average color of each color component whether surpass predetermined threshold so that determine the Euclidean distance between the average color between the picture frame, determine by less processing burden whether captions disappear from caption area.Certainly, detect the method that scene changes, can more accurately detect the point that captions disappear by known employing.

Yet, there is a problem, wherein, when calculating average color in caption area, the influence of the background color the captions that comprise in caption area is bigger.Thus, method has as an alternative proposed a kind ofly to use marginal information and determine whether to exist the method for captions.In other words, determine the edge image in the caption area of the frame that will compare, and determine the existence of the captions in the caption area on the basis of the comparative result of the edge image in the caption area of frame.More specifically, the edge image in the caption area of definite frame that will compare, and, when the number of pixels in the detected edge image in caption area significantly reduces, determine that captions disappear, and when the variation of number of pixels hour, determine that captions are shown continuously.In addition, when the number of pixels of edge image significantly increased, this can be confirmed as new captions and occur.

When captions changed, very large variation may not take place in the number of pixels of edge image.Even the variation of the number of pixels of the edge image in the caption area between frame hour, when (for example significantly reducing corresponding to the logical of each edge pixel of each edge image and the number of the edge pixel in the consequent image, 1/3rd or still less) time, also can assess the variation of captions, i.e. the starting and ending position of captions.

Subject detection unit is determined the length of fragment on the basis of starting point and end point, and, if when the length of fragment is longer than predetermined amount of time, this fragment is defined as corresponding to predetermined theme.Like this, can prevent error detection.

Subject detection unit can detect in frame on the basis of the size of caption area of captions and positional information therein determines whether these captions are necessary captions.Determine the position that the captions in the picture frame occur and the size of captions according to the convention of the common formulation of broadcaster.The position that occurs by the captions in the reference image frame on the basis of the convention of this formulation and the size of captions detect captions, can reduce error detection.

A second aspect of the present invention provides a kind of computer program that writes with computer-readable form, in order on computer system, to carry out processing to the video content of the picture frame that comprises temporal order, this processing comprises step: detect scene change point in video content to be processed, this scene change point is the point between the scene of one of them picture frame two picture frames significantly different with the scene of another picture frame; Before the detected scene change point in the step that is and then detecting scene change point and on the basis of picture frame afterwards, in video content to be processed, detect and the corresponding fragment of a plurality of successive image frames that in its caption area, shows same fixedly captions, in the caption area of picture frame, whether shown captions to detect; The index information with the detected corresponding time cycle of fragment in the step that detects fragment is indicated in storage; And when from the index information of storing step, storing, having selected theme, play and the zero-time of representing by index information and the fragment of corresponding video content of concluding time.

Computer program has according to a second aspect of the invention defined a kind of computer program that writes with computer-readable form, to carry out predetermined process in computer system.In other words,, on computer system, carry out cooperative operation, so that can realize and contents processing apparatus identical operations according to a first aspect of the invention by computer program according to a second aspect of the invention is installed to computer system.

The invention provides: excellent contents processing apparatus, the theme that detects video content in video content on the basis of the captions that it is configured to comprise changes, with the segmentation of execution video content on the basis of the theme that is detected, and the execution index establishment; And the method and computer program of contents processing.

The invention provides: excellent contents processing apparatus, it is configured to the captions that comprise by using in video content, change and detect title on the basis of relatively little data volume; And the method and computer program of contents processing.

For example, according to the present invention, can TV programme segmentation on the basis of theme to being write down.By utilizing theme to the TV programme segmentation and add index, the user can watch TV programme with the efficient way such as summary is watched.For example, the user can check the place that begins of theme when resetting the content that is write down, and if this theme does not make them become interested, then the user can jump to next theme.In addition, when storing the video content that is write down on DVD, the edit operation such as the only selected theme of storage is easy.

Describe other purposes of the present invention and advantage below with reference to the accompanying drawings in detail.

Description of drawings

Fig. 1 illustrates the schematic diagram of the functional structure of video content treatment facility according to an embodiment of the invention;

Fig. 2 illustrates the schematic diagram of the caption area that comprises in the exemplary scene of TV programme;

Fig. 3 illustrates flow chart fragment, the topic detection process that is used to detect the video content that wherein shows same fixedly captions;

Fig. 4 illustrates detection after scene change point and then and the method for the captions the average image of image acquisition before;

Fig. 5 illustrates detection after scene change point and then and the method for the captions the average image of image acquisition before;

Fig. 6 illustrates detection after scene change point and then and the method for the captions the average image of image acquisition before;

Fig. 7 illustrates detection after scene change point and then and the method for the captions the average image of image acquisition before;

Fig. 8 illustrates the example of the structure of the captions surveyed area in the picture frame of the ratio of width to height with 720 * 480 pixels;

Fig. 9 illustrates the situation that detects the original position of theme from frame sequence;

Figure 10 illustrates the flow chart of process that detects the original position of theme from frame sequence is shown;

Figure 11 illustrates the situation that detects the end position of theme from frame sequence;

Figure 12 illustrates the flow chart of process that detects the end position of theme from frame sequence is shown.

Reference numeral

10 video content treatment facilities

11 image storage unit

12 scene change-detection unit

13 subject detection unit

14 index storage unit

15 broadcast units

Embodiment

To describe embodiments of the invention with reference to the accompanying drawings in detail.

Fig. 1 illustrates the schematic diagram of the functional structure of video content treatment facility 10 according to an embodiment of the invention.Video content treatment facility 10 shown in this figure comprises image storage unit 11, scene change-detection unit 12, subject detection unit 13, index storage unit 14 and broadcast unit 15.

Image storage unit 11 demodulation and stored broadcast electric wave, and store the video content of downloading from information source via the internet.For example, image storage unit 11 can be made of hdd recorder.

Scene change unit 12 stands the video content of topic detection from image storage unit 11 retrieval, follows the tracks of the scene (scene or setting) that comprises in consecutive image, and detects wherein scene because the switching of image and the scene change point of marked change.

For example, image storage unit 11 can adopt the method that disclosed detection scene changes in transferring assignee's Japanese unexamined patent publication number 2004-282318.More specifically, by producing two continuously or the histogram of the representative image component of frame, and the variation of the scene when detecting summation in the histogrammic difference of being calculated greater than predetermined threshold, and determine scene change point.When producing histogram, to respective layer and adjacent layer distribution constant thereof, and with described constant addition.Then, by normalization, calculate another histogrammic result.By using the histogram of these new generations, can detect the scene variation in per two images on screen.Thereby, even in the decay pattern picture, also can accurately detect scene and change.

Subject detection unit 13 detects in standing the video content of topic detection and has wherein shown the fragment of same fixedly captions, and exports detected fragment, as in the TV programme corresponding to the fragment of particular topic.

In TV programme, can use the captions that in picture frame, show as the important clue of specifying or assess the theme of the fragment in the TV programme that has wherein shown captions such as news program and variety show.Yet it is very big to detect and extract the required amount of calculation of captions.Therefore, according to this embodiment,, in video content, on the basis of detected scene change point, detect the fragment that wherein shows same fixedly captions according to reducing the mode that to carry out the number of edge-detected image frame thereon as much as possible.The fragment that has wherein shown same fixedly captions can be regarded in the TV programme as fragment corresponding to particular topic.When the segmentation of carrying out video content, authorized index and summary are watched, this fragment suitably can be handled as single.Below explain principals is detected the details of handling.

Index storage unit 14 storage with 11 detected by image storage unit, wherein shown the relevant temporal information of each fragment of same fixedly captions.Following table has shown the example structure of the temporal information of storage in index storage unit 14.In this table, provide the record that is used for each detected fragment.In every record, the exercise question corresponding to the theme of fragment, the zero-time of fragment and the concluding time of fragment have been write down.For example, can as extend markup language (XML), write index information with the descriptive language of normal structure.The exercise question of theme can be the exercise question of video content (or TV programme) or the character information of shown captions.

Table 1

Content name

Zero-time [second]

Concluding time [second]

Video 1	20	45
Video 1	20	45	60	80
Video 2	10	25	60	80
Video 2	10	25	30	45
…	…	…	30	45

Broadcast unit 15 is retrieved the video content of being instigated to playing from image storage unit 11, and the video content that is retrieved is decoded and demodulation, so that as image and this video content of voice output.According to this embodiment, broadcast unit 15 is on the basis of content name and from the suitable index information of index storage unit 14 retrieval, so that displaying video content and index information is linked to this video content.For example, when from index information, having selected theme,, and play from by the zero-time of index information indication fragment to the concluding time from the corresponding video content of image storage unit 11 retrievals by index storage unit 14 management.

Then, will describe in detail by topic detection processing subject detection unit 13 execution, that be used in video content, detecting the fragment that has wherein shown same fixedly captions.

According to this embodiment, use and then by after the scene change-detection unit 12 detected scene change points and between frame, in picture frame, whether shown captions to detect.When detecting the captions of demonstration, owing to detect the fragment that has wherein shown same fixedly captions, so, can reduce the edge detection process that is used to extract captions.Therefore, can be reduced in the processing burden that applies when detecting theme.

For example, in having the TV programme of multiple type (as news program and variety show), show that captions are to obtain to understand and to support and appeal to or attract spectators' attention.In many cases, as shown in Figure 2, show fixedly captions in one of four zones on screen.Usually, fixedly captions have following characteristics:

1) as the representative (exercise question etc.) of the title of broadcast TV program;

2) when TV programme during, shown continuously about same title.

For example, in news program, when broadcasting specific news item, may show the exercise question of this news item continuously.Subject detection unit 13 detects and has wherein shown fixedly such fragment of the program of captions, and adds index to detected fragment corresponding to particular topic.Subject detection unit 13 can also produce the thumbnail (thumbnail) of the fixedly captions that detected, or the character of discerning shown captions is to obtain the character information corresponding to the exercise question of particular topic.

Fig. 3 illustrates the flow chart of being handled by the topic detection of subject detection unit 13 fragments that carry out, that be used for detecting the video content that has wherein shown same fixedly captions is shown.

At first, the frame of video (step S1) of retrieval first scene change point from video content to be processed.From picture frame generation the average image (step S2) corresponding to one second and one second afterwards before the scene change point.Then, the average image is carried out captions and detect (step S3).If these captions after scene change point and between continue to show that then the captions of the average image part will keep clear, and other parts will be blured.Therefore, can improve the accuracy of detection of captions.The picture frame that is used to generate the average image is not limited to before the scene change point and one second picture frame afterwards.Need only the picture frame that obtains being used to obtain the average image from scene change point before with point afterwards, can use picture frame more than two.

Fig. 4 to 6 illustrates from according to before the scene change point and detect the process of captions the average image that generates of picture frame afterwards.Because the scene of a picture frame has significant variation for the scene of other picture frames, so, by being averaged the frame that obtains, blur two picture frames, blended (alpha blend) as picture frame by A Erfa.If show same fixedly captions continuously as shown in Figure 5, before scene change point and afterwards, then the captions of the average image partly keep clear, and highlight from fuzzy background.Therefore, can extract captions in high-precision mode by carrying out edge detection process.If as shown in Figure 6, only before scene change point and among of picture frame afterwards, shown that captions (perhaps, if the captions that show in a picture frame are different with the captions that show in other frames), then the captions of the average image part will fog according to the mode the same with background.Like this, can not detect captions mistakenly.

Usually, the brightness of the brightness ratio background of captions is higher.Therefore, can adopt the method for using marginal information to detect captions.For example, input picture is carried out the YUV conversion, and, then, the Y component is carried out edge calculations.For carrying out edge calculations, artificial (artificial) image extraction method that can adopt the caption information processing method of in transferring assignee's Japanese unexamined patent publication number 2004-343352, describing or in Japanese unexamined patent 2004-318256, describe.

If from the average image, detect captions (step S4), then the rectangular area that meets the following conditions be defined as actual captions.

1) greater than the zone (for example, greater than 80 * 30 pixels) of predetermined area

2) can not cover zone (with reference to figure 2) more than a caption area

According to the common convention of formulating of broadcaster, and determine the position of captions appearance and the size of the captions in the picture frame.The position that occurs by the captions on the basis of the convention of this formulation, in the reference image frame and the size of captions detect captions, can reduce error detection.Fig. 8 illustrates the example of the structure of the captions surveyed area in the picture frame of the ratio of width to height with 720 * 480 pixels.

When detecting captions, seriatim the caption area of detected captions and the caption area in the picture frame before the scene change point are compared in order.With and then wherein the picture frame of captions after the picture frame that caption area disappears be defined as starting point (step S5) corresponding to the fragment of particular topic.

Fig. 9 illustrates the situation that detects the original position of theme in step S5 from frame sequence.As shown in the drawing, begin from the scene change point that among step S3, detects captions, in order each frame is carried out the comparison of caption area forward.Then, when detecting captions wherein, the frame is and then detected the original position that is the theme from frame that caption area disappears.

In Figure 10, show the process that detects the original position of theme from frame sequence in the mode of flow chart.At first, when before the present frame position, having frame (step S21), obtain this frame (step S22), and compare the caption area (step S23) of these frames.Then, if there be not to change ("No" among the step S24) in the caption area, then captions are to show continuously.Thus, this process turns back to step S21, to repeat said process.If change in the caption area ("Yes" among the step S24), then captions disappear.Thus, output is this frame frame afterwards and then, as the original position of theme, and finishes this processing routine.

Similarly, seriatim the caption area of detected captions and the caption area in the picture frame after the scene change point are compared in order.With and then wherein the picture frame before the picture frame that disappears from caption area of captions be defined as end point (step S6) corresponding to the fragment of particular topic.

Figure 11 illustrates the situation that detects the end position of theme from frame sequence.As shown in the drawing, begin from the scene change point that among step S3, detects captions, in order each frame is carried out the comparison of caption area backward.Then, when detecting captions wherein, the frame is and then detected the end position that is the theme from frame that caption area disappears.

Figure 12 illustrates the flow chart of process that detects the end position of theme from frame sequence is shown.At first, when after the present frame position, having frame (step S31), obtain this frame (step S32), and compare the caption area (step S33) of these frames.Then, if there be not to change ("No" among the step S34) in the caption area, then captions are to show continuously.Thus, this process turns back to step S31, to repeat said process.If change in the caption area ("Yes" among the step S34), then captions disappear.Thus, output is this frame frame before and then, as the end position of theme, and finishes this processing routine.

When shown in Fig. 9 and 11, detecting captions disappearance position, one by one relatively from forward and the caption area of frame backward, can accurately detect the position that captions have disappeared as the scene change point of original position by in order.Handle burden for reducing, can detect the apparent position that captions disappear by following steps.

1) the I picture of comparison in the coded image (as MPEG) of the I picture (in-frame encoding picture) that comprises alternately arrangement and a plurality of P picture (interframe forward predictive coded image)

2) the comparison diagram picture frame according to per second

For example, the average color of the RGB component of the caption area of the picture frame that can be compared by calculating, and determine whether the Euclidean distance (Euclidean distance) of the average color between the picture frame surpasses predetermined threshold, and determine whether captions disappear from caption area.Like this, can determine whether captions disappear, only need less processing burden simultaneously.In other words, determine captions n picture frame place disappearance before or after the scene change point of the equation (1) below satisfying, wherein, RO _Avg, GO _Avg, and BO _AvgRepresent the average color (that is, RGB component average) of the caption area in the picture frame at scene change point place, Rn _Avg, Gn _Avg, and Bn _AvgThe average color of the caption area of representative in the n picture frame of scene change point.For example, this threshold value is 60.

When fixedly captions disappear in the fragment that occurrence scene does not change, the average image will comprise background clearly, and captions will blur, as shown in Figure 7.In other words, this result opposite with shown in Fig. 5.When fixedly captions occur in the fragment that occurrence scene does not change also is like this.For detecting the point that captions disappear more accurately, can adopt the method that disclosed detection scene changes in Japanese unexamined patent publication number 2004-282318 to caption area.

Here, have a problem, wherein, when calculating average color in caption area, the influence of the background color the captions that comprise in caption area is bigger, and this has reduced accuracy of detection.Thus, method has as an alternative proposed a kind ofly to use marginal information and determine whether to exist the method for captions.In other words, the edge image in the caption area of definite frame that will compare, and the existence of captions in definite caption area on the basis of the comparative result of the edge image in the caption area of frame.More specifically, the edge image in the caption area of definite frame that will compare, and, when the number of pixels in the detected edge image in caption area significantly reduces, can determine that captions disappear.On the contrary, the variation of number of pixels hour can determine that captions are shown continuously.

For example, SC represents scene change point, and Rect represents the caption area at SC place, and EdgeImgl represents the edge image of the Rect at SC place.EdgeImgN representative begins to calculate edge image the caption area Rect of n frame of (towards the beginning or the end of time shaft) from SC.By predetermined threshold (for example, 128) with the edge image binarization.Among the step S33 in step S23 in flow chart shown in Figure 10 and the flow chart shown in Figure 12, compare the number (number of pixels) of the marginal point of EdgeImg1 and EdgeImgN.When the number of marginal point significantly reduces (for example, 1/3rd or still less), can be evaluated as captions disappear (yet, when the number of marginal point enlarges markedly, can be evaluated as captions and occur).

When the number of marginal point on EdgeImg1 and EdgeImgN too big not simultaneously, can be evaluated as captions and be shown continuously.Yet, there is such possibility, that is, even bigger variation does not also take place the number of marginal point, captions but change.Thus, when obtaining number to the logical of each pixel among EdgeImg1 and the EdgeImgN and the marginal point in the result images and significantly reduce (for example, 1/3rd or still less), be evaluated as captions and change, that is, this is the initial or end point of captions.Like this, can improve accuracy of detection.

Then, deduct the captions starting point of in step S5, determining, show the time to determine captions from the captions end point of among step S6, determining.Then, by only when captions have been shown predetermined amount of time, just being defined as fragment (step S7) the captions demonstration time, can reduce the possibility of error detection corresponding to particular topic.Can also obtain type information from Electronic Program Guide (EPG), and change the threshold value that captions show the time according to type about TV programme.For example, because for news program, captions show that the time is long relatively, thus can threshold value be set to 30 seconds, yet, for variety show, can threshold value be set to 10 seconds.

The captions starting point and the end point that will be identified as in step S7 corresponding to the fragment of particular topic are stored in (step S8) in the index storage unit 14.

Subject detection unit 13 contact scene change-detection unit 12, with confirm in step S6 after the detected captions end point, whether video content comprise scene change point (step S9).When after the captions end point, not finding scene change point, finish the entire process routine.When after the captions end point, finding scene change point, retrieve the frame (step S10) of next scene change point, this process turns back to step S2, and repeats above-mentioned topic detection process.

In step S4, when not detecting captions at scene detection point to be processed place, subject detection unit 13 contact scene change-detection unit 12 are to confirm whether to comprise follow-up scene change point (step S11) in video content.When not comprising follow-up scene change point, finish the entire process routine.On the contrary, when having comprised follow-up scene change point, retrieve the frame (step S10) of next scene change point, this process turns back to step S2, and repeats above-mentioned topic detection process.

According to present embodiment, the hypothesis of caption area is provided and carries out the captions testing process based on four corners, as shown in Figure 2 at video screen.Yet, in many TV programme, in one of these zones, show the current time constantly.For preventing error detection, can obtain the character information of the captions that detected, and, if when described feature identification is numeral, detected captions can be defined as not being actual captions.

In some cases, captions may disappear from screen, and after the several seconds, same captions may occur once more.Under these circumstances, when meeting the following conditions, show (that is, continuing same theme) by captions being shown the captions regard continuous as, though captions show discontinuous, be that captions show and also can prevent to generate extra index when being interrupted.

1) satisfies equation 1 in the caption area before captions disappear and after captions occur once more

2) in the caption area before captions disappear and captions occur once more after, the number of pixels of edge image is basic identical, and when acquisition during to the logical of corresponding each pixel in the edge image, the number of pixels of edge image is basic identical,

3) time quantum of captions disappearance is equal to or less than threshold value (for example, 5 seconds)

For example, can obtain the type information of TV programme, the feasible threshold value that can change break period according to the type (as news program or variety show) of TV programme from EPG.

Industrial applicibility

Hereinbefore, describe the present invention in detail with reference to specific embodiment.Yet obviously, those skilled in the art can revise or change these embodiment within the scope of the invention.

In this manual, described the situation that the video content execution index that mainly obtains by recording television programs is worked out, but main idea of the present invention is unrestricted.Can suitably carry out authorized index according to contents processing apparatus of the present invention to being produced and collecting and be used for the purposes except that television broadcasting and comprise the various video contents of the caption area of representing theme.

In essence, the form with example discloses the present invention, and the content that illustrates in this specification should not be interpreted as having restricted.The region estimation that should accessory rights requires goes out essence of the present invention.

Claims

1, a kind of contents processing apparatus is used to handle the video content that comprises chronological picture frame, and this equipment comprises:

Scene change-detection unit is used for detecting scene change point at video content to be processed, this scene change point be wherein scene because the switching of frame and the point of marked change;

Subject detection unit is used for detecting the fragment of video content to be processed, and this fragment is a plurality of continuous images frames that same fixedly captions wherein occur; And

Index storage unit, be used to store with detected by described subject detection unit, wherein the relevant index information of time of the fragment of same fixedly captions appears.

2, contents processing apparatus as claimed in claim 1 also comprises:

Broadcast unit is used for when the displaying video content, will be linked at the index information and the video content of index storage unit place management.

3, contents processing apparatus as claimed in claim 2, wherein, when from index information, having selected theme in index storage unit management, broadcast unit play and the output video content in, from by the zero-time of index information representative fragment to the corresponding video content of concluding time.

4, contents processing apparatus as claimed in claim 1, wherein, whether and then described subject detection unit is used by before the detected scene change point in scene change-detection unit and frame afterwards, occur in the corresponding position so that detect captions.

5, contents processing apparatus as claimed in claim 1, wherein, described subject detection unit is created in before the scene change point and the average image of the frame in the predetermined period of time afterwards, and this average image is carried out captions detect.

6, contents processing apparatus as claimed in claim 5, wherein, described subject detection unit:

The caption area of scene change point frame forward relatively, and will be in the original position that is the theme of the frame detection of captions after the frame that caption area disappears wherein and then, and

The caption area of scene change point frame backward relatively, and will and then wherein the frame before the frame that disappears from caption area of captions detect the end position that is the theme.

7, contents processing apparatus as claimed in claim 6, wherein, described subject detection unit is calculated average color to each color component in the caption area of each frame that will compare, and pass through to determine whether the Euclidean distance of the average color between the picture frame surpasses predetermined threshold, and determine whether captions disappear from caption area.

8, contents processing apparatus as claimed in claim 6, wherein, edge image in the caption area in each frame that described subject detection unit is determined to compare, and the existence of captions in definite caption area on the basis of the comparative result of the edge image in the caption area of frame.

9, contents processing apparatus as claimed in claim 8, wherein, edge image in the caption area in each frame that described subject detection unit is determined to compare, when the number of pixels of detected edge image in caption area significantly reduces, determine that captions disappear, and in the variation of number of pixels hour, determine that same captions occur continuously.

10, contents processing apparatus as claimed in claim 9, wherein, when the variation of the number of pixels of detected edge image in caption area hour, described subject detection unit obtain to the logical of corresponding each edge pixel of each edge image, and the number of the edge pixel in result images determines that captions change when significantly reducing.

11, contents processing apparatus as claimed in claim 6, wherein, described subject detection unit is determined the captions time of occurrence from detected captions original position to end position, and and if only if the time of occurrence of these captions just definite theme when longer than predetermined amount of time.

12, contents processing apparatus as claimed in claim 6, wherein, described subject detection unit detects in frame on the basis of the size of caption area of captions or positional information therein, determines whether captions are necessary.

13, a kind of content processing method is used for handling the video content that comprises chronological picture frame in being configured in the content-processing system of computer, and this method comprises:

Scene change-detection step, wherein, the scene change-detection parts that comprise in computer detect scene wherein in video content to be processed because the switching of frame and the scene change point of marked change;

The topic detection step, wherein, the topic detection parts that comprise in computer use in scene change-detection step before the detected scene change point and frame afterwards, detect at the scene change point place captions whether occur, and detect the fragment that wherein occurs same fixedly captions before detecting the scene change point of captions and in a plurality of successive image frames afterwards; And

The index stores step, wherein, the index stores component stores that in computer, comprises with wherein appear at described topic detection step in the relevant index information of time of fragment of detected same fixedly captions.

14, the method for processing video content as claimed in claim 13 also comprises:

Play step, when from the index information of the index stores step, storing, having selected theme, play and output from by the zero-time of the index information representative of corresponding video content fragment to the concluding time.

15, content processing method as claimed in claim 13 wherein, in the topic detection step, is created in before the scene change point and the average image of the frame of predetermined period afterwards, and this average image is carried out captions detect.

16, content processing method as claimed in claim 15,

Wherein, in the topic detection step,

The caption area of scene change point frame forward relatively, and will be and then the original position that is the theme of the frame detection of captions after the frame that caption area disappears wherein, and

The caption area of scene change point frame backward relatively, and will be and then wherein the frame before the frame that disappears from caption area of captions detect the end position that is the theme.

17, content processing method as claimed in claim 16, wherein, in the topic detection step, the average color of each color component in the caption area of each frame that calculating will be compared, and whether surpass predetermined threshold by the Euclidean distance of determining the average color between the picture frame, and determine whether captions disappear from caption area.

18, content processing method as claimed in claim 16, wherein, in the topic detection step, the edge image in the caption area in each frame of determining to compare, and on the basis of the comparative result of the edge image in the caption area of frame, determine the existence of the captions in the caption area.

19, content processing method as claimed in claim 18, wherein, in the topic detection step, edge image in the caption area in each frame of determining to compare, and when the number of pixels in the detected edge image in caption area significantly reduces, determine the disappearance of captions, and work as the continuous appearance of hour definite identical captions of variation of number of pixels.

20, content processing method as claimed in claim 19, wherein, in the topic detection step, the variation of the number of pixels in detected edge image in caption area hour, acquisition to the logical of corresponding each edge pixel of each edge image, and when the number of the edge pixel of result images significantly reduces, determine the variation of captions.

21, content processing method as claimed in claim 16, wherein, in the topic detection step, determine the captions time of occurrence from detected captions original position and end position, and and if only if the time of occurrence of these captions is just determined theme when longer than predetermined amount of time.

22, content processing method as claimed in claim 16 wherein, in the topic detection step, detects in frame therein on the basis of the size of caption area of captions or positional information and determines whether captions are necessary.

23, a kind of computer program that writes with computer-readable form, in order to carry out the processing of the video content of the picture frame that comprises temporal order on computer system, this processing comprises:

Scene change-detection step detects scene wherein because the switching of frame and the scene change point of marked change in video content to be processed;

The topic detection step, use is in scene change-detection step before the detected scene change point and frame afterwards, detect at the scene change point place captions whether occur, and detect the fragment that wherein occurs same fixedly captions before detecting the scene change point of captions and in a plurality of successive image frames afterwards;

The index stores step, storage with wherein appear at described topic detection step in the relevant index information of time of fragment of detected same fixedly captions; And