CN101127899B

CN101127899B - Hint information description method

Info

Publication number: CN101127899B
Application number: CN200710162216.8A
Authority: CN
Inventors: 守屋芳美; 西川博文; 关口俊一; 浅井光太郎; 山田悦久; 乙井研二; 黑田慎一; 小川文伸
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2002-04-12
Filing date: 2003-03-20
Publication date: 2015-04-01
Anticipated expiration: 2023-03-20
Also published as: CN101127899A; CN101132528B; CN101132528A

Abstract

Multimedia content containing moving pictures and audio is divided into multiple scenes and metadata is generated for each of the scenes obtained as a result of the division. It is possible to generatemetadata containing scene structure informationmetadata that describes the hierarchical structure of the content in addition to scene section information and titles. Also, in order to perform re-generation of metadata more appropriately, the metadata re-generation is performed using metadata optimization hint information describing each type of descriptor contained in the metadata.

Description

Hint information description method

The application is application number is 03808260.8, the applying date is on March 20th, 2003, and denomination of invention is the divisional application that " meta data edition device, meta data reproduction device, metadata dispensing device, metadata retrieval device, metadata regeneration condition setting apparatus and metadata distribution method " is applied for.

technical field

The present invention relates to and the multimedia comprising moving image and audio frequency " content " is divided into multiple scene, to the meta data edition device of the scene generator data of each segmentation, meta data reproduction device, metadata dispensing device, metadata retrieval device, metadata regeneration condition setting apparatus, " content " dispensing device and metadata distribution method.

background technology

In existing moving image management apparatus, after Iamge Segmentation is become multiple scene, the index of the block information needed for the regeneration of each scene, scene number and the representatively set of the image of scene is edited, by generating the parts being more than or equal to the index of; To represent that the title of retrieval object gives the parts of each index; When retrieving, use title search index, the parts regenerating the scene of index according to the order of scene number are one by one formed, to arrange the mode editing key of necessary scene, only can regenerate necessary scene (such as, please refer to Japanese Unexamined Patent Publication 2001-028722 patent gazette (the 1st page, Fig. 1)).

But, in above-mentioned existing moving image management apparatus, because the block information only needed for the regeneration carrying out each scene, scene number and represent the Computer image genration index of scene, so the problem that the structure that there is the video data that can not manage laminarity that video data has etc. is such.

Again, when retrieving the image of registration, because carry out with the title giving index, so also exist to obtain suitable result for retrieval, the problem that suitable title is such must be inputted.

The present invention is in order to the proposition that solves the problem, the object of the invention is except block information except obtaining scene and title, the meta data edition device of the metadata of the index information of the structure that " content " as video data etc. have etc. can also be generated.

Again, the object of the invention is to obtain can by the metadata generated by meta data edition device, only collect and regenerate the scene that user wants to see, and with the characteristic quantity etc. described in the metadata, retrieve meta data reproduction device, metadata dispensing device, metadata retrieval device, metadata regeneration condition setting apparatus, metadata dispensing device and the metadata distribution method of wanting the scene seen.

summary of the invention

The meta data edition device relevant with the present invention is provided with and the multimedia of at least one comprising in moving image, audio frequency " content " is divided into multiple scene, the scene of each segmentation is generated to the scene cut unit of the block information metadata representing the starting position of scene and the scene of end position; According to the block information metadata of the scene from above-mentioned scene cut unit, carry out the layering editor of each scene of above-mentioned multimedia " content ", generate the scene description edit cell of the scene structure information metadata of the hierarchy describing above-mentioned multimedia " content "; Block information metadata and above-mentioned scene structure information metadata with the above-mentioned scene of integration, generate the metadata description unit of the metadata describing the content and structure of above-mentioned multimedia " content " according to predetermined format.

Again, relevant with the present invention metadata dispensing device is provided with the information resolution unit of the kind of the descriptor comprised in mathematics expression metadata and the metadata option information of content; By according to above-mentioned analyzed after metadata option information and condition about metadata regeneration, mathematics expression comprises the metadata of the content and structure of the multimedia of at least one " content " in moving image, audio frequency, the metadata parsing/regeneration unit of regeneration the 2nd metadata; With the metadata Dispatching Unit the 2nd metadata become by above-mentioned metadata parsing/regeneration regeneration unit being distributed to client terminal.

Further, relevant with the present invention metadata distribution method contains the step of the metadata option information of the kind of the descriptor comprised in mathematics expression metadata; By according to above-mentioned analyzed after metadata option information and condition about metadata regeneration, mathematics expression comprises the metadata of the content and structure of the multimedia of at least one " content " in moving image, audio frequency, the step of regeneration the 2nd metadata; With the step the 2nd metadata by above-mentioned regeneration being distributed to client terminal.

accompanying drawing explanation

Fig. 1 is the block diagram of the formation representing the meta data edition device relevant with embodiments of the invention 1.

Fig. 2 is the figure of the news video of an example of the edit object represented as the meta data edition device relevant with embodiments of the invention 1.

Fig. 3 is the figure of an example of the block information metadata of the scene of the scene cut unit representing the meta data edition device relevant with embodiments of the invention 1.

Fig. 4 is the figure of an example of the scene structure information metadata of the scene description edit cell representing the meta data edition device relevant with embodiments of the invention 1.

Fig. 5 represents that " content " of the meta data edition device relevant with embodiments of the invention 1 regenerates the figure of picture image example of/display unit, user input unit.

Fig. 6 is the block diagram of the formation representing the meta data edition device relevant with embodiments of the invention 2.

Fig. 7 is the figure of the work for illustration of the meta data edition device relevant with embodiments of the invention 2.

Fig. 8 is the block diagram of the formation representing the meta data reproduction device relevant with embodiments of the invention 3.

Fig. 9 is the figure of the work for illustration of the meta data reproduction device relevant with embodiments of the invention 3.

Figure 10 represents and the block diagram of embodiments of the invention 4 about the formation of " content " dissemination system.

Figure 11 is the figure of the structural information representing " content " (the news video example) exported from the metadata resolution unit of the metadata distribution server relevant with embodiments of the invention 4.

Figure 12 is the figure of the structure example representing " content " after being reconstructed by the metadata regeneration unit of the metadata dissemination system relevant with embodiments of the invention 4.

Figure 13 is the block diagram of the formation representing the metadata distribution server relevant with embodiments of the invention 5.

Figure 14 is the figure of an example of the video " content " represented for illustration of the metadata option information produced by the metadata distribution server relevant with embodiments of the invention 5.

Figure 15 represents the figure when the description example by the metadata during MPEG-7 produced by the metadata distribution server relevant with embodiments of the invention 5.

Figure 16 is the figure of the form example of the metadata option information representing the metadata distribution server relevant with embodiments of the invention 5.

Figure 17 is the figure of the metadata option information representing the metadata distribution server relevant with embodiments of the invention 5.

Figure 18 represents that the metadata of the metadata distribution server relevant with embodiments of the invention 5 resolves the/flow diagram of the work of regeneration unit.

Figure 19 represents that the metadata of the metadata distribution server relevant with embodiments of the invention 5 resolves the/flow diagram of the work of regeneration unit.

Figure 20 is the block diagram of the formation representing the metadata retrieval server relevant with embodiments of the invention 6.

Figure 21 is the flow diagram of the work of the metadata resolution unit representing the metadata retrieval server relevant with embodiments of the invention 6.

Figure 22 is the block diagram of the formation representing the client terminal relevant with embodiments of the invention 7.

Figure 23 represents and the block diagram of embodiments of the invention 8 about the formation of " content " Distributor.

Embodiment

Below, we illustrate with reference to the accompanying drawings

The meta data edition device relevant with embodiments of the invention 1 and 2,

The meta data reproduction device relevant with embodiment 3,

" content " dissemination system relevant with embodiment 4,

The metadata distribution server relevant with embodiment 5,

The metadata retrieval server relevant with embodiment 6,

The client terminal relevant with embodiment 7 and

" content " Distributor relevant with embodiment 8.

Embodiment 1

In embodiment 1, we illustrate and the multimedia " content " comprising moving image and audio frequency are divided into multiple scene, and the hierarchy of generating scene describes and comprises the meta data edition device of metadata (index information) of characteristic quantity of each scene.

We illustrate the meta data edition device relevant with embodiments of the invention 1 with reference to accompanying drawing one side at one side.Fig. 1 is the block diagram of the formation representing the meta data edition device relevant with embodiments of the invention 1.In addition, in the various figures, identical label represents same or equivalent part.

In FIG, meta data edition device 100 has " content " regeneration/display unit 2, scene cut unit 3, thumbnail image generation unit 4, scene description edit cell 5, text message imparting unit 6, feature extraction unit 7, user input unit 8 and metadata description unit 9.

" content " regeneration/display unit 2 regenerates/shows the multimedia " content " 10 of the edit object be made up of video data and voice data etc." content " is divided into multiple scene by scene cut unit 3.Thumbnail image generation unit 4 extracts the representative frame of scene as thumbnail image.Scene description edit cell 5 edits scene by the generation hierarchical of the combination of the packetizing of the scene split by scene cut unit 3, scene, the deletion of scene, the relation information of scene.Text message is given unit 6 and is attached in each scene by various text message.Feature extraction unit 7 extracts the feature of scene.

Again, the indication information from user is outputted to " content " regeneration/display unit 2, scene cut unit 3, thumbnail image generation unit 4, scene description edit cell 5 and text message as user's input information 11 and gives unit 6 by user input unit 8.

Further, metadata description unit 9, by being given the block information metadata 12 of the scene that unit 6 and feature extraction unit 7 export, the thumbnail image information metadata 13 of scene, scene structure information metadata 14, text message metadata 15 and feature interpretation metadata 16 integration by scene cut unit 3, thumbnail image generation unit 4, scene description edit cell 5, text message, generates the metadata 17 describing the content and structure of multimedia " content " according to the form of regulation.

Below, our one side illustrates the work of the meta data edition device relevant with the present embodiment 1 with reference to accompanying drawing one side.Fig. 2 is the figure of the formation of the news video of an example of the edit object represented as the meta data edition device relevant with the present embodiment 1.

The situation of the news video of the formation shown in editor Fig. 2 is described by as an example.

First, " content " regeneration/display unit 2 of meta data edition device 100, when being stored in the multimedia " content " 10 of the video " content " in " content " memory cell (not shown in FIG.) etc. by inputs such as networks, in order to regenerate/showing them for editor.

When user's one side of meta data edition device 100 watches this reproduced picture, one side is inputted by user input unit 8 and cuts out position as scene, namely, when scene starting position and end position, scene cut unit 3 generates the block information metadata 12 of the scene showing scene starting position and the end position inputted from user.

Fig. 3 is the figure of an example of the block information metadata of the scene of the scene cut unit representing the meta data edition device relevant with the present embodiment 1.

Block information metadata 12 shown in this Fig. 3 represents the example generated from the news video shown in Fig. 2.As shown in Figure 3, by scene cut unit 3, for each scene of " news in brief ", " home news " and " world news " that cut out from news video " content " etc., generate the block information metadata 12 of the scene of the block information representing scene starting position and end position.

In scene description edit cell 5, when having by the instruction of user input unit 8 from the scene editor of user, according to the block information metadata 12 of the scene from scene cut unit 3, carry out the layering editor of the scene cut out continuously by scene cut unit 3, export scene structural information metadata 14.The layering editor of so-called scene refers to the packetizing of such as scene, the splitting again of scene, the combination of scene, the deletion of scene.The packetizing of so-called scene refers to such as, from the news video shown in Fig. 2, such as shown in Figure 4 by with " home news ", " scene of certain special characteristic association of world news, " Economic News " etc. accumulates " news " group.Again, the segmentation again of so-called scene refers to and a scene cut is become multiple scene.The combination of so-called scene refers to and collects multiple scene and become a scene.

Fig. 4 is the figure of an example of the scene structure information metadata of the scene description edit cell representing the meta data edition device relevant with the present embodiment 1.

Scene structure information metadata 14 shown in this Fig. 4 describes as the edited result in scene description edit cell 5, the hierarchy of the video " content " of generation.In the diagram, in scene description edit cell 5, by the editor of the scene of the combination of the packetizing of scene, the splitting again of scene, scene etc., to be called that the scene of " news " compiles " news in brief ", " news ", " special issue ", " physical culture " etc., further should " news " hierarchically compiles " home news ", " world news, " Economic News " show.

And, the metadata 14 such as shown in Fig. 4 generated in scene description edit cell 5 is outputted to metadata description unit 9.

On the other hand, in thumbnail image generation unit 4, according to the block information metadata 12 of the scene from scene cut unit 3, representative frame is generated as thumbnail image from each scene cut out scene cutting unit 3, the thumbnail information of generation is outputted to metadata description unit 9 as thumbnail image information metadata 13, is registered in metadata description unit 9.Here, user also can pass through user input unit 8, carries out thumbnail selection, but also automatically by multiple frames representatively frame of start frame and Fixed Time Interval, or automatically can detect scene change point, by these frames representatively frame.Thumbnail image information metadata 13 become the thumbnail in video " content " positional information (frame number or time) or, the present position information of the URL of thumbnail image etc.

Again, in feature extraction unit 7, according to the block information metadata 12 of the scene from scene cut unit 3, from each scene, the visual signature amount that the scene such as shape extracted motion, color or be included in the object in scene has.Using the characteristic quantity of extraction as feature interpretation metadata 16, output to metadata description unit 9, and be registered in metadata description unit 9.

Again, give in unit 6 at text message, according to the block information metadata 12 of the scene from scene cut unit 3, the various text messages of the importance degree of title, brief introduction, keyword, commentary, scene etc. are given each scene by user.The imparting of text message is when user is inputted by user input unit 8, by resolving the audio-frequency information and captions that comprise in " content ", the situation of automatically giving.By text message, as text message metadata 15, output to metadata description unit 9, and be registered in metadata description unit 9.

Fig. 5 represents that " content " of the meta data edition device relevant with the present embodiment 1 regenerates the/picture image example of display unit and user input unit 8.In Figure 5, video refresh memory picture G1, with regenerate in " content "/display unit 2 in picture image example quite, in this video refresh memory picture G1, " content " of regeneration/display editor.In Figure 5, although do not express, have " regeneration ", " stopping ", " backrush ", the user interface had in the common video reproducing apparatus such as the instruction button that " F.F. ", " transmission frame " regenerate.And, scene cut instruction picture G2 is demonstrated in the below of this video refresh memory picture G1.This scene cut instruction picture G2, such as, have lantern slide form, simultaneously watch the image shown in video refresh memory picture G1, one side user can indicate starting position and the end position of the image scene shown in video refresh memory picture G1.Again, scene cut instruction picture G2 can simultaneously between the starting position and end position of scene, the position of instruction thumbnail.Here, when being indicated picture G2 to specify the position of thumbnail by scene cut, thumbnail image generation unit 4 is from the frame generating thumbnail image of the assigned address of video " content ".

Again, indicated the thumbnail image of picture G2 assigned address by scene cut, as the carve information of scene, show in scene carve information display frame G3.In this scene cut information display screen G3, except thumbnail image, as shown in Figure 3, also the starting position of scene and the information of end position can be represented to each scene display.

Then, generate in instruction/display frame G4 in tree structure, indicate scene editor to user.One side user watches the scene cut information of the thumbnail image shown in scene carve information display frame G3 etc., and one side generates and represents the tree-like of the hierarchy that video " content " has.

Such as, as method of operation, when carrying out the packetizing of scene, on tree-like, adding new node, adding wanting the scene of packetizing on this node.The additional operation of scene, can consider the scene selecting to want to add on scene carve information display frame G3, by pulling, node adds the method for scene etc.By providing user input unit 8 as selecting scene, through text message imparting unit 6, text message being given the user interface of this scene in scene carve information display frame G3, tree structure generation instruction/display frame G4, the text message for scene can be inputted.

The various metadata integrations that metadata description unit 9 will be exported by scene cut unit 3, thumbnail image generation unit 4, scene description edit cell 5, text message imparting unit 6 and feature extraction unit 7, generate the meta data file described according to the descriptor format of regulation.The descriptor format of the regulation of metadata also can be described with the form specified alone, but in the present embodiment 1, uses by the MPEG-7 of iso standard.This MPEG-7 specifies to describe the structure of " content " and the form of feature, has XML file form and binary format.

Thus, if according to the meta data edition device 100 of the present embodiment 1, then because be provided with the scene description edit cell 5 of layering editor scene and extract the feature extraction unit 7 of feature from scene, so the metadata of the characteristic quantity of " content " that can generate description video data etc. hierarchy of having and each scene.

In addition, assuming that be input to the multimedia " content " 10 of " content " regeneration/display unit 2, there are the various situations of the situation obtained from " content " server (not shown in FIG.) on network, the situation obtained from " content " memory cell (not shown in FIG.) in meta data edition device 100 and the situation that obtains from the medium (not shown in FIG.) of CD and DVD etc. etc.Equally, assuming that the metadata will exported from metadata description unit 9, the situation in the metadata storage unit (not shown in FIG.) have the situation in " content " server (not shown in FIG.) be stored on network, being stored in meta data edition device and the situation etc. be stored in together with " content " in the medium (not shown in FIG.) of CD and DVD etc.

Again, in the present embodiment 1, be provided with both scene description edit cell 5 and feature extraction unit 7 and be illustrated, but being not limited thereto, certainly both can describe edit cell 5 by a scene set, also can feature extraction unit 7 be only set.

Embodiment 2

In above-described embodiment 1, we illustrate the situation of all manually carrying out scene cut, but in the present embodiment 2, the data editor that we illustrate with the scene change detection unit having detection scene change point is automatically feature.

We illustrate the meta data edition device relevant with embodiments of the invention 2 with reference to accompanying drawing one side at one side.Fig. 6 is the block diagram of the formation representing the meta data edition device relevant with embodiments of the invention 2.

In figure 6, meta data edition device 100A has " content " regeneration/display unit 2, scene cut unit 3, thumbnail image generation unit 4, scene description edit cell 5, text message imparting unit 6, feature extraction unit 7, user input unit 8, metadata description unit 9 and scene change detection unit 39.In addition, label 40 is the scene start position informations automatically detected.

Below, our one side illustrates the work of the meta data edition device relevant with the present embodiment 2 with reference to accompanying drawing one side.

Except the work of scene change detection unit 39 except scene cut unit 3 is all identical with above-described embodiment 1.Here we illustrate the distinctive work of embodiment 2.

Scene change detection unit 39 automatically carries out scene change, cuts a detection.Scene change detection is, such as, carries out according to the Histogram Difference etc. of interframe pixel difference, interframe color and brightness.Scene cut unit 3, according to the scene change point detected in scene change detection unit 39, determines scene starting position and end position.

Here, " content " of edit object is the process that the situation of news video states scene change detection unit 39 and scene cut unit 3 as an example in detail by we.

We are to be described by the situation of color histogram as the characteristic quantity for scene change detection.

In scene change detection detecting unit 39, color histogram is calculated to each frame.As color specification system, there are HSV, RGB, YCbCr etc., but, use the HSV color space here.This HSV color space is made up of tone (H), color saturation (S), such 3 key elements of brightness (V).Calculate the histogram of each key element.Then, from the histogram of trying to achieve, such as, according to following (formula 1), the Histogram Difference of interframe is calculated.N number of frame (such as N=3) that our supposition is counted from the start frame of scene belongs to Same Scene, that is, do not have scene change point.In addition, as the embryonic character amount of scene, according to following (formula 2), average (mean) and the standard deviation (sd) of the Histogram Difference of initial N number of interframe is tried to achieve.

{sum}_{i} = Σ_{k = 1}^{bin_H} | H_{i} (k) - H_{i - 1} (k) | + Σ_{k = 1}^{bin_S} | S_{i} (k) - S_{i - 1} (k) | + Σ_{k = 1}^{bin_V} | V_{i} (k) - V_{i - 1} (k) |

(formula 1)

Wherein,

Sum _i: between the histogram of frame i and frame i-1 difference and

H _i(h): the histogram of tone, bin_H: histogrammicly want prime number

S _i(h): the histogram of color saturation, bin_S: histogrammicly want prime number

V _i(h): the histogram of brightness, bin_V: histogrammicly want prime number

mean = \frac{1}{N - 1} Σ_{i = 1}^{n - 1} {sum}_{i},

sd = \sqrt{\frac{1}{N - 1} Σ_{i = 1}^{N - 1} {({sum}_{i} - mean)}^{2}}

(formula 2)

Wherein,

Mean: the Histogram Difference of interframe average

Sd: the standard deviation of the Histogram Difference of interframe

And, N+1 frame and after, using frame larger than mean+ λ sd for histogrammic inter-frame difference as scene change point, as the starting position candidate of new scene.

If present consideration obtains the starting position candidate of multiple scene, then, then, as news video, insert the situation of the image of the pattern determined by the switching of news etc.

News video, in the switching of news, such as, insert arranged by the studio of announcer and background, image that the word (captions) that illustrates is formed etc., the situation of the image of pattern that determines be a lot.Thus, the metadata of the image (being called template image) of registering the pattern that these determine in advance or the characteristic quantity describing template image.The characteristic quantity of so-called template image can enumerate template image color histogram or, motion pattern (part that announcer writes in news switches has a small amount of motion etc.) etc.

Registering in the situation of template image in advance, such as, shown in Fig. 7, when the image corresponding with scene change point mates with template image, during similar degree height, this scene change point is being registered as the starting position of scene.As the coupling of similar degree, there is the color histogram difference etc. of inter-frame difference and interframe.

Again, in the situation of characteristic quantity registering template image in advance, when from the image zooming-out characteristic quantity corresponding with scene change point, mate with the characteristic quantity of template image, during similar degree height, this scene change point is registered as the starting position of scene.The information of the starting position of scene is outputted to scene cut unit 3.

In scene cutting unit 3, according to the scene start position information automatically detected by scene change detection unit 39, determine scene starting position and end position.In addition, in the scene cut unit 3 of the present embodiment 2, same with above-described embodiment 1, also according to the instruction from user, scene starting position and end position can be determined.

In scene cutting unit 3, also the block information metadata 12 describing the starting position of scene and the scene of end position can be outputted to scene change detection unit 39, with this scene change detection unit 39, detect the scene change point comprised in this scenario.

In scene description edit cell 5, can according to the block information metadata 12 of the scene from scene cut unit 3, then the scene that segmentation and integration are detected automatically by scene change detection unit 39.In addition, the details of scene description edit cell 5 is identical with above-described embodiment 1.

Thus, if according to the meta data edition device 100 relevant with the present embodiment 2, then identical with above-described embodiment 1, the metadata of the characteristic quantity of the hierarchy that " content " that can generate description video data etc. has and each scene, and by scene set change detection unit 39, the scene change point of " content " automatically can be detected.

Embodiment 3

In the present embodiment, we illustrate and utilize the metadata that generated by the meta data edition device of above-described embodiment 1 and 2, and the summary of carrying out image regenerates and the meta data reproduction device of retrieval etc.

We illustrate the meta data reproduction device relevant with embodiments of the invention 3 with reference to accompanying drawing one side at one side.Fig. 8 is the block diagram of the formation representing the meta data reproduction device relevant with embodiments of the invention 3.

In fig. 8, meta data edition device 200 has metadata resolution unit 19, structure display unit 20, thumbnail image display unit 21, user input unit 22, retrieval unit 23, result for retrieval display unit 24, summary generation unit 25, Sketch display unit 26 and " content " regeneration unit 27.

Metadata resolution unit 19 is to the parsing of the metadata 28 of the information of thumbnail, the characteristic quantity of each scene etc. of the scene structure and each scene that describe the layering had about " content ".Structure display unit 20 shows the scene structure 29 obtained from metadata analysis result, i.e. the hierarchy of " content ".Thumbnail image display unit 21 shows the thumbnail image information 30 obtained from metadata analysis result.

User input unit 22 carries out retrieving and the instruction of regeneration etc.Retrieval unit 23 indicates (search condition 31) according to the retrieval from user and the characteristic quantity of scene that obtains from metadata and text message 32 are retrieved.Result for retrieval display unit 24 shows result for retrieval 33.Summary generation unit 25 generates instruction (summary formation condition 34) according to the summary from user and generates summary.Sketch display unit 26 shows " content " structure 38 generating summary." content " 37 regeneration/display " content " that " content " regeneration unit 27 regenerates instruction 36 according to summary info 35, " content ", regenerates.

Below, our one side illustrates the work of the meta data reproduction device relevant with the present embodiment 3 with reference to accompanying drawing one side.

First, metadata resolution unit 19 will describe the metadata 28 of the information of the scene structure of the layering had about " content " and the thumbnail of each scene, the characteristic quantity of each scene etc. as input, carry out the parsing of metadata.

In the present embodiment 3, because describe this metadata 28 with the form specified by MPEG-7 that the metadata description unit 9 of above-described embodiment 1,2 generates, so metadata both can be the text described with XML, also can be carry out the binary file of encoding by binary format.

Therefore, metadata resolution unit 19, if with XML descriptive metadata 28, then have the function of the XML server carrying out XML file parsing.Again, if encoded to metadata 28 by binary format, then there is the decoder function of carrying out metadata 28 and decoding.

Structure display unit 20, the analysis result of input metadata resolution unit 19, the scene structure 29 of the layering of display " content ".Such as, as shown in Figure 4, the scene structure of tree-like display " content " together with the title of each scene.

Thumbnail image display unit 21 inputs the analysis result (thumbnail image information 30) of metadata resolution unit 19, the thumbnail image list of display " content ".

Retrieval unit 23, indicates according to the retrieval from user by user input unit 22, the scene comprised in retrieval " content ".At this moment, by user input unit 22, according to the prompting etc. of keyword and sampled picture, input search condition.By retrieval unit 23, according to the text message 32 of the characteristic quantity and scene title etc. of the scene described in metadata, retrieve the scene consistent with the search condition pointed out by user (feature of keyword and sampled picture) 31.

At the end of the retrieval undertaken by retrieval unit 23, result for retrieval display unit 24 inputs the result for retrieval 33 of retrieval unit 23, demonstrates result for retrieval.As the display packing of result for retrieval, such as, the thumbnail image of the scene that display is consistent with search condition.

Again, summary generation unit 25 generates instruction according to the summary from user inputted by user input unit 22, generates the summary of " content ".At this moment, by user input unit 22, the information of the recovery time of " content " of input generation summary and user's hobby etc.Such as, when " content " is news video, input using the physical culture in news as main user want the information seen or, the user 1 hour news being summarized in 20 minutes wants user's taste information of the information seen etc.Again, by summary generation unit 25, according to the text message 32 of the recovery time of the scene described in metadata and the title of scene etc., generate the summary info 35 consistent with summary.This summary info 35 is, such as, generating the regeneration inventory of scene comprised in " content " of summary, be the URL etc. describing descriptions " content " positional information and should the starting position of scene and the inventory of end position of wanting regeneration in " content ".

Again, by " content " regeneration/display unit 27, according to the positional information of " content " that comprise in summary info 35, special object " content ", according to the scene inventory comprised in summary info 35, carries out regenerating the obtaining/regenerate of scene/show.Again, as other example, the situation hierarchically describing the scene structure of summarizing summary info also can be considered.

Fig. 9 is the figure representing the example hierarchically describing the scene structure generating summary.Fig. 9 (a) represents the example of the scene structure of original " content ".For each scene, be attached to the importance degree in 0.0 ~ 1.0 scope.1.0 mean that importance degree is the highest, and 0.0 means that importance degree is minimum.Importance degree is, such as, according to user have a liking for calculate.Such as, when registering the scene of the football match about team A in advance, when the result of the match particularly must seen and the such user of scoring scenes have a liking for, the importance degree of addition reaction user hobby in each scene.

And, in Fig. 9 (a), only generate summary by the scene that importance degree is the highest, generate the scene structure of summary as shown in Fig. 9 (b).In addition, each scene has the metadata of positional information (starting position and end position) in the positional information of the URL of " content " comprising this scene etc. and " content " of this scene etc.Send the information about the scene structure 38 generating summary to Sketch display unit 26, this Sketch display unit 26 such as generates the scene structure of summary with the tree-like form display shown in Fig. 9 (b).

Again, when user by user input unit 22 select to want from the thumbnail of each scene of display in the scene structure and thumbnail image display unit 21 and result for retrieval display unit 24 of display structure display unit 20 or Sketch display unit 26 to regenerate be more than or equal to the scene of time, can to regenerate in " content " regeneration/display unit 27/show the scene comprised in " content ".

Thus, if according to the meta data edition device 200 relevant with the present embodiment 3, then by the metadata generated by the meta data edition device illustrated in above-described embodiment 1,2, only can collect and regenerate the scene that user wants to see, or by Expressive Features amount in the metadata, the scene seen is wanted in retrieval.

Again, in the present embodiment 3, " content " regeneration/display unit 27 is in meta data edition device 200, but " content " regeneration/display unit also can in other device.This may be thought of as, such as, the operation regenerated about the display etc. of scene structure and thumbnail image, metadata and display are carried out in portable phone and portable data assistance etc., and process and display about the regeneration of multimedia " content " are the situations of carrying out in the terminal be connected with portable phone and portable data assistance etc. by network (such as PC etc.).

Embodiment 4

In the present embodiment 4, we illustrate to the metadata of client terminal distribution " content " metadata distribution server (metadata dispensing device) and with the terminal capability of client terminal correspondingly telescopically form and distribute " content " Distributor of this " content ".

We illustrate " content " dissemination system relevant with embodiments of the invention 4 with reference to accompanying drawing one side at one side.Figure 10 represents and the block diagram of embodiments of the invention 4 about the formation of " content " dissemination system.

In Fig. 10, " content " dissemination system 300 has metadata distribution server 400, various client terminal 481 ~ 48n and " content " Distributor 500.

Metadata distribution server 400 is made up of metadata storage unit 41, metadata resolution unit 42, terminal capability identifying unit 43, metadata regeneration unit 44 and metadata Dispatching Unit 45.

In metadata storage unit 41, the metadata that the meta data edition device that store such as above-described embodiment 1,2 generates.Metadata resolution unit 42 is resolved the description structure of " content " and the metadata 49 of feature.Terminal capability identifying unit 43, according to the information 51 of the performance about client terminal, judges the terminal capability of client terminal.Metadata regeneration unit 44, according to metadata analysis result 50, correspondingly reconstructs " content " with the terminal capability of client terminal, and regeneration describes the metadata 52 of being somebody's turn to do " content ".The metadata 53 of regeneration in metadata regeneration unit 44 is distributed to various client terminal 481 ~ 48n by metadata Dispatching Unit 45.

In addition, also metadata storage unit 41 can be arranged on the outside of the metadata distribution server 400 of the present embodiment 4.At this moment, metadata distribution server 400 inputs metadata 49 by network (not shown in FIG.) etc. from metadata storage unit 41.

Again, " content " Distributor 500 is made up of " content " memory cell 46 and " content " Dispatching Unit 47.

" content " memory cell 46 stores " content " 55." content " Dispatching Unit 47 is distributed with " content " from client terminal 481 ~ 48n and is required 54 correspondingly to distribute " content " 56.

Identical with the situation of above-mentioned metadata distribution server 400, also " content " memory cell 46 can be arranged on the outside of " content " Distributor 500.At this moment, " content " Distributor 500 is by network (not shown in FIG.) input " content " data 55.

Below, our one side illustrates the work of " content " dissemination system relevant with the present embodiment 4 with reference to accompanying drawing one side.

First, in metadata distribution server 400 side, metadata resolution unit 42 is resolved the metadata be stored in by metadata storage unit 41.The work of metadata resolution unit 42 is identical with the metadata resolution unit 19 of " content " regenerating unit 200 of above-described embodiment 3.Metadata resolution unit 42, by resolving metadata, obtains about each structure of " content " and the information of feature.

Figure 11 is the figure of the structural information representing " content " (such as the news video) exported from the metadata resolution unit of the metadata distribution server relevant with the present embodiment 4.In fig. 11, with the scene structure of the layering of tree represenation " content ".Tree-like each node, corresponding with each scene, give each node accordingly by each scene information.So-called scene information refers to the scene characteristic as the title of scene, summary, scene starting position and the temporal information of end position, the thumbnail, representative frame, thumbnail camera lens, representative shot, the visual characteristic quantity such as color and motion etc. of scene.In addition, in fig. 11, in various scene information, the only title of diagram scene.

Here, assuming that the client terminal various information household appliances equipment that to be terminal capability different.So-called terminal capability refers to communication speed, processing speed and the picture format that can regenerate/show, image resolution ratio, user input capability etc.Such as, assuming that client terminal 481 is the PC (personal computer) in communication speed, processing speed, display performance, user input capability with abundant performance.Again, assuming that client terminal 482 is portable phones, assuming that other client terminal is PDA etc.The information relevant with each terminal capabilities is sent from each client terminal 481 ~ 48n.

Terminal capability identifying unit 43 resolves the information 51 relevant with the terminal capabilities sent from each client terminal 481 ~ 48n, decision can distribute picture format, maximum image resolution, " content " length etc., output to " content " regeneration unit 44.Such as, be in the situation of the video " content " that the image resolution ratio of carrying out encoding with MPEG-2 is large in original " content ", in the client terminal 481 with abundant performance, can regenerate original " content ".Again, this client terminal 481 has the summary regeneration can carrying out image and the function retrieved that illustrate in above-described embodiment 3.On the other hand, in client terminal 482, carry out the short video lens of encoding only with regeneration MPEG-4, and the maximum image resolution that can show is also very little.

In " content " regeneration unit 44, correspondingly reconstruct " content " with the terminal capabilities of each client terminal 481 ~ 48n from terminal capability identifying unit 43, regeneration describes the metadata 52 of its structure and content, outputs to metadata Dispatching Unit 45.Such as, because original metadata is intactly distributed to client terminal 481, so do not carry out reconstructing of " content ".On the other hand, because for client terminal 482, only there is the regeneration function of short video lens, so whole scene can not be regenerated.Thus, reconstruct " content " with the short video lens of important scenes.

Figure 12 is the figure of the structure example representing " content " after being reconstructed by the metadata regeneration unit of the metadata dissemination system relevant with the present embodiment 4.As shown in figure 12, from each news scenes, extract important scene, only form by the representative shot of this scene or representative frame.Again, because client terminal 482 does not have the search function illustrated in above-described embodiment 3, so in the various information of scene, do not need the characteristic quantity of the scene being used for retrieving to describe in the metadata.Therefore, metadata regeneration unit 44 regeneration only describes the metadata of reconstituted scene structure and the representative shot of this scene or the positional information of representative frame, and outputs to metadata Dispatching Unit 45.

The metadata 53 generated by metadata regeneration unit 44 is distributed to client terminal 481 ~ 48n by this metadata Dispatching Unit 45.

Each client terminal 481 ~ 48n resolves the metadata 53 of being distributed by metadata Dispatching Unit 45, obtains the tree structure information of " content ".The user of each client terminal 481 ~ 48n, when selecting the scene wanting to regenerate, sends to " content " Dispatching Unit 47 of " content " Distributor 500 by the positional information of the scene selected from each client terminal 481 ~ 48n.

By " content " Dispatching Unit 47 of " content " Distributor 500, obtain the positional information of the scene sent from each client terminal 481 ~ 48n, obtain corresponding " content " 55 from " content " memory cell 46, be distributed to client terminal 481 ~ 48n.In the situation of client terminal 481, send starting position and the end position of scene, distribute the corresponding scene with original " content ".Again, in the situation of client terminal 482, send the position information (URI etc.) of the representative shot of scene.In addition, can not regenerate/show in the situation of the picture format of representative shot, image resolution ratio and image file size etc. in client terminal 482, by " content " Dispatching Unit 47, carry out format conversion and image resolution ratio conversion, reduce " content " summary etc. of file size, and send.

Thus, if according to the metadata distribution server 400 of the present embodiment, then can with the ability of each client terminal 481 ~ 48n correspondingly regeneration metadata, be distributed to each client terminal.

In addition, in Fig. 10, represent the metadata distribution server 400 and " content " Distributor 500 that form respectively, but the present invention is not limited thereto, also " content " Distributor can be arranged in metadata distribution server, or metadata distribution server be arranged in " content " Distributor.Again, metadata distribution server and " content " Distributor can certainly be arranged in same server.If done like this, then because " content " Dispatching Unit 47 can know the ability of each client terminal 481 ~ 48n simply from client's identifying unit 43, so can correspondingly reconstruct " content " of format conversion etc. with the ability of each client terminal 481 ~ 48n, be distributed to each client terminal 481 ~ 48n.

Again, in this embodiment 4, we illustrate the situation that the metadata be stored in metadata storage unit 41 is the metadata of the meta data edition device generation of such as above-described embodiment 1,2, but be not limited thereto, the metadata that the device beyond the meta data edition device that can certainly store above-described embodiment 1,2 generates.

Embodiment 5

In the present embodiment 5, we illustrate other example of the metadata distribution server illustrated in above-described embodiment 4.By the metadata distribution server of above-described embodiment 4, according to the end message sent from client terminal, carry out the regeneration of metadata.In the present embodiment 5, we illustrate in order to there being the regeneration in order to carry out more suitable metadata, be used as the metadata option information of the information of the regeneration for metadata, the metadata parsing/regeneration unit carrying out the regeneration of metadata is the metadata distribution server (metadata dispensing device) of feature.

We illustrate the metadata distribution server relevant with embodiments of the invention 5 with reference to accompanying drawing one side at one side.Figure 13 is the block diagram of the formation representing the metadata distribution server relevant with embodiments of the invention 5.

In fig. 13, metadata distribution server 400A has information resolution unit 61, metadata parsing/regeneration unit 63 and metadata Dispatching Unit 45.

Information resolution unit 61 analytical element data optimization information 60 also exports its result.Metadata parsing/regeneration unit 63, according to the metadata option information 62 be resolved out, about the performance of client terminal information or, about the condition 65 of the metadata regeneration of user's hobby etc., the structure of mathematics expression " content " and the metadata 49 of feature, export reconstituted metadata 64.Metadata 53 is distributed to client terminal by metadata Dispatching Unit 45.

In metadata storage unit 41 (please refer to Figure 10), store the metadata option information 60 of the structure of description " content " and the metadata 49 of feature and the information for this metadata 49 of regeneration.The so-called metadata option information 60 being used for regeneration metadata 49 refers in this metadata 49, describes the information comprising how many what kinds, or the summary of metadata 49 and the information of complexity.

Below, our one side illustrates the work of the metadata distribution server relevant with the present embodiment 5 with reference to accompanying drawing one side.

We, by having the video " content " of the structure shown in Figure 14 as an example, state metadata option information 60 in detail.

Video " content " (Root) (Soccer game program (football match program)) is roughly made up of first half of 2 scenes (Scene1, Scene2) and second half court, and the scene of first half is made up of multiple scene (Scene1-1, Scene1-2 ... ..Scene1-n) (goal scene, corner-kick scene etc.) further.In fig. 14, the temporal hierarchy between scene is represented by tree structure.

The temporal hierarchy of this " content " is described in the metadata 49 of correspondence, the time started of the time relationship namely between scene and each scene and length.Again, for each scene, except the level with layering accordingly the feature (such as, the complexity of color histogram and motion) that has of this scene, the text message, importance degree etc. of title, brief introduction, type, annotation etc. are also described outward.In addition, in the present embodiment 5, as the descriptor format of metadata, with the MPEG-7 by iso standard.

Figure 15 represents the description example by metadata during MPEG-7.In MPEG-7, by each scene description in the unit being called " video-frequency band ".In each video-frequency band, temporal information (starting point of scene and length), title, summary, type etc. are described.In addition, there is the situation different from the information that the layering of video-frequency band is correspondingly described in video-frequency band.In the example of Figure 15, in the video-frequency band of level 2 and level 3, describe importance degree, but do not describe importance degree in level 4.Again, only color and motion characteristics amount are described in the video-frequency band of level 4.

Temporal hierarchical relational between scene can be showed by recursively describing video-frequency band.In the description example of Figure 15, by being called the description of " time division ", the situation be made up of multiple video-frequency bands of a time upper segmentation video-frequency band is described.In MPEG-7, the hierarchy spatially that " content " has also similarly can be described.At this moment, the description being called " time division " is replaced, with the description representing " compartition " that be made up of multiple video-frequency bands of a spatial respect video-frequency band.

Metadata option information 60 for regeneration metadata 49 describes the kind of information (descriptor) and the information of content that comprise in this metadata 49.Thus, in metadata option information 60, for the metadata of Figure 15, contain the descriptor of the descriptor (" time division ") of the temporal hierarchy that performance " content " has, the descriptor of complexity of apparent color histogram and motion, performance title, brief introduction, type, importance degree.Again, as representing the index describing " content " and complexity, the degree of depth of the hierarchy of video-frequency band is 4 (level 1 ~ levels 4) to the maximum.Importance degree gets 5 discrete values ({ 0.0,0.25,0.5,0.75,1.0}).As the viewpoint of this importance degree, importance degree when describing importance degree when watching with the viewpoint of " TeamA (team A) " and to watch with the viewpoint of " TeamB (team B) ".Again, also contain the layered position (being described on which level of video-frequency band) describing importance degree.

The form example of Figure 16 representation element data optimization information 60.In the metadata option information 60 that Figure 16 represents, contain meta data file information and metadata inscape information.

The position of meta data file information descriptive metadata file, meta data file size, meta data file form (represents XML format, the file format of binary format etc.), grammar file information (position of the grammar file of regulation metadata grammer), represent that comprising (appearance) appearance of prime number of wanting in the metadata wants prime number etc., for the information of the resource (memory size needed for the storage/parsing carrying out metadata and the treatment system (S/W) etc. of resolving needed for metadata) needed for prediction processing metadata.In addition, such as, when with XML descriptive metadata, specify that the dtd file of this descriptor format (grammer) and schema file etc. are suitable with the grammar file of the form of regulation meta data file, and the position of the grammar file of dtd file and schema file etc. is described grammar file information.

So-called metadata inscape information describes the kind of descriptor and the information of content thereof that form metadata.In this metadata inscape information, contain the title of the descriptor comprised in the metadata, frequency (number of times) that this descriptor occurs in metadata, this descriptor whether be included in grammatically whole descriptors that can comprise description (completely descriptive) or when recursively describing this descriptor, the laminarity (degree of depth maximum) in the time that this descriptor is held or space.Such as, describe in example in the metadata shown in Figure 15, because " video-frequency band " is the descriptor recursively described, the maximum structure with 4 layerings, so the depth of seam division that " video-frequency band " descriptor has is 4 to the maximum.

Further, about the descriptor comprised in the descriptor recursively described, the appearance position (hierarchy of layer) that this descriptor occurs also is information.Such as, " importance degree " is included in the descriptor in " video-frequency band ", but is less than or equal in " video-frequency band " of level 3 when being included in, that is, time in the video-frequency band not being included in level 4, the position that " importance degree " occurs is maximum reaches 3.In addition, like this, also can with hierarchy of layer specify there is position, but when ID is distributed to comprise " importance degree " " video-frequency band " or " video-frequency band " self time, also can describe and occur the inventory of position as ID.Again, in the situation of descriptor with value, the scope of the value that the type of descriptor and descriptor can obtain also is an information.Such as, when respectively from the view point of " TeamA ", " TeamB ", by 5 discrete values ({ 0.0,0.25,0.5,0.75,1.0}) when showing importance degree, the value that importance degree can obtain is the inventory { 0.0 of the type with floating point, 0.25,0.5,0.75,1.0}.Descriptor for each inscape as metadata repeats above description.

Figure 17 represents an example of the metadata option information described according to the form of Figure 16.We see in an example of the metadata option information 60 shown in Figure 17, contain meta data file information and are called the metadata inscape information of " video-frequency band " and " title ".

Below, we illustrate the method using metadata option information 60 to carry out the regeneration of metadata with Figure 13.

By information resolution unit 61, the metadata option information 60 that parsing describes by prescribed form.By in metadata parsing/regeneration unit 63, use from information resolution unit 61 export analyzed after metadata option information 62, carry out the parsing of metadata 49, export the metadata 64 of regeneration according to the condition 65 about metadata regeneration.

Figure 18 represents an example by the analytic method of the metadata using the metadata parsing/regeneration unit 63 of resolved metadata option information 62 to implement.In this example embodiment, only extract the video-frequency band having importance degree and be more than or equal to the feature of 0.5 from original metadata 49, regeneration is only by the metadata that the description about the video-frequency band extracted is formed.

First, metadata parsing/regeneration unit 63 according to the condition 65 for metadata regeneration, specific regeneration become needed for metadata (step S1).Here, because only extract the video-frequency band that there is importance degree and be more than or equal to the feature of 0.5, so " importance degree " and " video-frequency band " is the descriptor needed for regeneration.

Secondly, according to the metadata option information 62 after analyzed, judge in metadata 49, whether be included in the specific descriptor of step S1 (following, to be described as an example by descriptor " importance degree ") (step S2).

When comprising " importance degree " descriptor in the metadata, carrying out the parsing (step S3) of metadata, when not comprising " importance degree " descriptor, terminating the dissection process (step S4) of metadata.

Again, when specifying the appearance position of " importance degree " in the metadata option information 62 after analyzed be until the level 3 of hierarchy time, in end until the moment (step S5) of the parsing of the video-frequency band of level 3, do not carry out the parsing of the layering being more than or equal to level 4, terminate dissection process (step S6).

In addition, in order to carry out the parsing of other metadata 49 when needing, the process that step S1 is later is repeated.Again, when the occurrence number of specifying " importance degree " descriptor in metadata option information 62 is 20, in the moment (step S5) of the parsing of end 20 " importance degree " descriptors, the parsing (step S6) of metadata is terminated.Further, terminate the dissection process of metadata in step S4 or step S6 after, in order to carry out the parsing of other metadata when needing, repeat the process that step S1 is later.

Figure 19 represent use analyzed after other example of analytic method of metadata of metadata option information 62.In this example embodiment, the video-frequency band comprising " title " descriptor is only extracted, regeneration metadata.The judgement whether comprising " title " descriptor is in the metadata identical with the example of Figure 18.

Metadata parsing/regeneration unit 63, when comprising " title " descriptor in the metadata, determines whether the video-frequency band (step S13) consistent with the ID of the appearance position described in metadata option information.

When not consistent with ID, because be the video-frequency band not comprising " title " descriptor, so skip the parsing (step S16) of the description of this video-frequency band.

When consistent with ID, in order to obtain " title " descriptor, carry out the parsing (step S15) of the description of this video-frequency band.

Then, when terminating the parsing to all videos section consistent with there is the ID of position (step S17), because more than this, in metadata, there is not the video-frequency band comprising " title " descriptor, so terminate dissection process (step S18).

In addition, during in order to need, carry out the parsing of other metadata, repeat the process that step S11 is later.Export by the reconstituted metadata 64 of the descriptor extracted by above dissection process.

By metadata Dispatching Unit 45, reconstituted metadata 64 is distributed to various client terminal.

In addition, although diagram is not out, but because after metadata regeneration, the position of meta data file and meta data file size, there is wanting prime number, also change about the information of metadata inscape in the metadata, so also can the regeneration metadata option information corresponding with the metadata after regeneration.

As mentioned above, so far in order to regeneration metadata, whole descriptors that must comprise in analytical element data, but in the present embodiment 5, because use and be described in the inventory of the descriptor comprised in metadata 49 and the appearance position of descriptor, the metadata option information 60 of occurrence number etc., the descriptor of analytical element data 49, so the parsing of the metadata 49 self for regeneration metadata can be saved, again, because according to occurring position and occurrence number, the parsing of the descriptor inconsistent with regeneration condition can be saved, so the disposal cost (treating capacity and memory use amount etc.) of the parsing that can reduce along with metadata and regeneration.

Embodiment 6

In above-described embodiment 5, we illustrate by the metadata option information 60 in order to metadata regeneration, reduce the metadata distribution server along with the parsing of metadata and the disposal cost of regeneration, but in the present embodiment 6, we illustrate by metadata option information, reduce the metadata retrieval server (metadata retrieval device) of the process of the retrieval along with metadata.

We illustrate the metadata retrieval server relevant with embodiments of the invention 6 with reference to accompanying drawing one side at one side.Figure 20 is the block diagram of the formation representing the metadata retrieval server relevant with embodiments of the invention 6.

In fig. 20, metadata retrieval server 600 has information resolution unit 61, metadata resolution unit 71 and retrieval unit 73.

Because information resolution unit 61 is identical with above-described embodiment 5, so omit the explanation to it.Metadata resolution unit 71, by the metadata option information 62 after analyzed and search condition 70, can by the little disposal cost structure of mathematics expression " content " and metadata 49 of expanding of feature expeditiously.Retrieval unit 73 utilizes the analysis result 72 of metadata, and retrieval is suitable for " content " of search condition.

Below, our one side illustrates the work of the metadata retrieval server relevant with the present embodiment 6 with reference to accompanying drawing one side.

Figure 21 is the flow diagram of the work of the metadata resolution unit representing the metadata retrieval server relevant with the present embodiment 6.

Metadata resolution unit 71, carries out the parsing of the metadata being more than or equal to by the metadata option information 62 corresponding with each metadata.The parsing of so-called metadata, refers to here from the feature interpretation needed for meta-data extraction retrieval.Such as, when providing the Color Characteristic of video-frequency band as search condition, when retrieval has the video-frequency band close to the feature of this video-frequency band, need to extract the video-frequency band with the feature interpretation relevant with color.In the metadata example shown in Figure 15 because in the video-frequency band of level 4 feature interpretation (" color histogram ") of additional color, so extract the description relevant with the video-frequency band of level 4.

Metadata resolution unit 71, resolves search condition 70, specific for the effective descriptor of retrieval (step S21).Also exist as search condition provide according to the characteristic quantity of the description specified in MPEG-7 situation and provide the situation etc. of image and keyword.When the characteristic quantity (such as color configuration information) as the description according to MPEG-7 provides search condition, this descriptor (color configuration information) becomes the effective descriptor of retrieval.Again, when providing search condition as keyword, the descriptor (title, summary, annotation etc.) of textual form becomes the effective descriptor of retrieval.

Below, with reference to metadata option information 62, judge whether the descriptor selected is included in (step S22) in metadata 49.When the descriptor for retrieving not being included in metadata 49, terminating the dissection process (step S24) of this metadata 49, when needing, carrying out the parsing of other metadata 49.

When the descriptor selected is included in metadata 49, carry out the parsing (step S23) of this metadata.About the analytic method of metadata, identical with the situation of above-described embodiment 5, carry out the metadata dissection process (step S25 ~ S26) shown in Figure 18 and Figure 19 expeditiously by metadata option information 62.By above process, extract the feature interpretation needed for retrieval by metadata resolution unit 71.

By retrieval unit 73, utilize the analysis result (feature interpretation needed for retrieval) of the metadata exported by metadata resolution unit 71, retrieval is suitable for " content " of search condition.In the above example, because of the description that the video-frequency band being exported feature interpretation coloured with tool (" color histogram ") by metadata resolution unit 71 is relevant, so judge, with the adaptability of the Color Characteristic (histogram) provided as search condition, to export the information (such as " temporal information ") of the video-frequency band be applicable to as result for retrieval 74.

As described above, in the present embodiment 6, because use metadata option information 60, analytical element data 49, so the parsing can saving the metadata 49 self for metadata regeneration.Again, because according to there is position and occurrence number, the parsing retrieving unwanted descriptor can be saved, so the disposal cost (treating capacity and memory use amount etc.) of the retrieval along with metadata can be reduced.

Embodiment 7

In above-described embodiment 5 or embodiment 6, we illustrate the server side utilizing metadata option information, but in the present embodiment 7, we illustrate the client terminal (metadata regeneration condition setting apparatus) utilizing metadata option information.

We illustrate the client terminal relevant with embodiments of the invention 7 with reference to accompanying drawing one side at one side.Figure 22 is the block diagram of the formation representing the client terminal relevant with embodiments of the invention 7.

In fig. 22, client terminal 48A has information resolution unit 80, metadata regeneration condition setting unit 82.

In addition, in fig. 22, in the function had in client terminal 48A, only represent and set for the relevant part of the parts of the condition of metadata regeneration with utilizing metadata option information 60.

Below, our one side illustrates the work of the client terminal relevant with the present embodiment 7 with reference to accompanying drawing one side.

The parsing of the metadata option information 60 that the form that information resolution unit 80 carries out specifying describes.Because this information resolution unit 80 is identical with above-described embodiment 5, so omit the detailed description to it.

Metadata regeneration condition setting unit 82, according to the analysis result 81 exported by information resolution unit 80, carries out the condition setting 83 of metadata regeneration.So-called condition setting refers to such as, selects the unwanted descriptor of client terminal 48A in the kind of the descriptor comprised from metadata option information 60.When client terminal 48A does not have the search function with characteristic quantity, do not need the descriptor of the characteristic quantity of the complexity of apparent color histogram and motion etc.

As other example of condition setting, when the hierarchy of the scene relation describing " content " is darker, when the complexity of metadata more increases, according to the maximum of the degree of depth of the hierarchy of the description in metadata option information 60, set the degree of depth of manageable hierarchy at client terminal.Again, in other example, according to the information of the value that the importance degree described in metadata option information 60 can obtain, the threshold value of the importance degree of the scene of the viewpoint that setting user has in mind and selection.

As mentioned above, when respectively from the view point of " TeamA ", " TeamB ", importance degree gets 5 discrete values ({ 0.0,0.25,0.5,0.75, time 1.0}), only selection can be carried out there is from the viewpoint of " TeamA " setting of the scene of the importance degree being more than or equal to 0.5 etc.

The condition 83 being used for metadata regeneration set by metadata regeneration condition setting unit 82 is sent to metadata distribution server.In metadata distribution server side, according to for the condition of metadata regeneration and the terminal capabilities of client terminal, reconstruct metadata.Such as, when the maximum of the degree of depth of the hierarchy of original metadata is 4, in metadata regeneration condition, when the degree of depth being set in the manageable hierarchy of client terminal is 2, to make the maximum of the degree of depth of hierarchy be the mode of 2, reconstruct the structure of metadata.

Again, when in metadata regeneration condition, only select and set the viewpoint from " TeamA ", when importance degree is more than or equal to the scene of 0.5, the metadata that regeneration is only made up of the scene consistent with identical conditions.Identical with above-described embodiment 5 can with metadata option efficient information rate carry out metadata regeneration.

As described above, in the present embodiment 7, because metadata option information 60 can be used to set condition for metadata regeneration, so and can apply and correspondingly generate applicable metadata with client terminal.

Embodiment 8

In above-described embodiment 5 or embodiment 6, we illustrate and become metadata with metadata option information regeneration, the server of the metadata of distribution regeneration, but in the present embodiment 8, we illustrate by metadata option information analysis metadata, " content " Distributor (" content " dispensing device) that the metadata carrying out utilizing analysis result regeneration to be suitable for client terminal and user's hobby is distributed.

We illustrate " content " Distributor relevant with embodiments of the invention 8 with reference to accompanying drawing one side at one side.Figure 23 represents and the block diagram of embodiments of the invention 8 about the formation of " content " Distributor.

In fig 23, " content " Distributor 500A has information resolution unit 61, metadata resolution unit 86 and " content " reconstruct/Dispatching Unit 88.

Below, our one side illustrates the work of " content " Distributor relevant with the present embodiment 8 with reference to accompanying drawing one side.

Because the work of information resolution unit 61 is identical with above-described embodiment 5, so omit the explanation to it.

Metadata resolution unit 86, use from information resolution unit 61 export analyzed after metadata option information 62, carry out the parsing of metadata 49, extract the information relevant with client terminal or, reconstruct the consistent description of relevant condition 85 with " content " that user has a liking for etc.Use the analytic method of information to have and above-described embodiment 5 something in common, but difference from Example 5 not use the description regeneration metadata extracted, but reconstruct " content " this point.By the description of being extracted by this metadata resolution unit 86, the metadata 87 after namely analyzed outputs to " content " and reconstructs/Dispatching Unit 88.

" content " reconstructs/Dispatching Unit 88, according to the description of being extracted by metadata resolution unit 86, reconstructs " content " 89.Here, we are described with the example stated in above-described embodiment 5.In the example of embodiment 5, only extract the video-frequency band that there is importance degree and be more than or equal to the feature of 0.5 from metadata 49, the metadata that regeneration is only made up of the description relevant with the video-frequency band extracted.

In the present embodiment 8, only extract the video-frequency band having importance degree and be more than or equal to the feature of 0.5 from metadata 49, regeneration also distributes " content " 90 be only made up of the scene corresponding with the video-frequency band extracted.Because in the description relevant with the video-frequency band extracted, describe the corresponding position of " content " and the position (temporal information) in " content " of this video-frequency band, so also corresponding scene can be cut out from " content ", reconstruct " content " 90, and distribute reconstituted " content " 90, but also can cut out corresponding scene from " content ", sequentially distribute the scene cut out.

As described above, in " content " Distributor 500A relevant with the present embodiment 8, because use the inventory of descriptor and the appearance position of descriptor that comprise in descriptive metadata 49, the metadata option information 60 of occurrence number etc., analytical element data 49, so the parsing of the metadata 49 self for metadata regeneration can be saved, again, because according to occurring position and occurrence number, the parsing of the descriptor inconsistent with regeneration condition can be saved, so can reduce along with the parsing of metadata when regeneration also distributes " content " being suitable for client terminal and user's hobby and the reconstituted disposal cost (treating capacity and memory use amount etc.) of " content ".

The present invention, as described above, because the multimedia " content " comprising moving image and audio frequency is divided into multiple scene, editor is divided into multiple scenes, generate the scene structure information metadata of hierarchy describing multimedia " content ", so the metadata describing the hierarchy that the multimedia " content " that comprises video data etc. has can be generated.

Claims

1. a metadata method for regenerating, is characterized in that, comprising:

Resolve the step being used for regeneration and comprising the information of the metadata of the structure of description " content " and the descriptor of feature;

According to the condition for metadata regeneration, determine the step of the descriptor needed for regeneration;

Use the described information after resolving, judge the step whether comprising the descriptor needed for determined regeneration in the metadata;

When being judged to comprise the descriptor needed for determined regeneration in the metadata, from the descriptor needed for the determined regeneration of meta-data extraction, carry out the step of the parsing of described metadata thus; And

The step that the metadata that the descriptor that regeneration is extracted by the parsing by described metadata is formed also exports,

The title having the descriptor comprised in metadata is described in described information.

2. a metadata method for regenerating, is characterized in that, comprising:

The scope of the value having each descriptor comprised in metadata to obtain is described in described information.

3. a metadata method for regenerating, is characterized in that, comprising:

Describe in described information and have the occurrence number of each descriptor comprised in metadata and occur position.

4. metadata method for regenerating according to claim 3, is characterized in that: the appearance position of each descriptor comprised in descriptive metadata is as the inventory of ID.

5. a metadata method for regenerating, is characterized in that, comprising:

When recursively describing the descriptor comprised in the metadata, the maximum having the degree of depth of hierarchy is described in described information.

6. a metadata method for regenerating, is characterized in that, comprising:

The position having the grammar file of the grammer of regulation metadata is described in described information.

7. a metadata method for regenerating, is characterized in that, comprising:

Whether describe in described information has this descriptor to comprise grammatically the information of whole descriptors that can comprise.