US6744922B1 - Signal processing method and video/voice processing device - Google Patents

Signal processing method and video/voice processing device Download PDF

Info

Publication number
US6744922B1
US6744922B1 US09/647,303 US64730300A US6744922B1 US 6744922 B1 US6744922 B1 US 6744922B1 US 64730300 A US64730300 A US 64730300A US 6744922 B1 US6744922 B1 US 6744922B1
Authority
US
United States
Prior art keywords
chain
similarity
segments
chains
video signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US09/647,303
Other languages
English (en)
Inventor
Toby Walker
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Original Assignee
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corp filed Critical Sony Corp
Assigned to SONY CORPORATION reassignment SONY CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WALKER, TOBY
Application granted granted Critical
Publication of US6744922B1 publication Critical patent/US6744922B1/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/785Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using colour or luminescence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • G06F16/7857Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • G06F2218/16Classification; Matching by matching signal segments
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording

Definitions

  • the present invention relates to a signal processing method for detecting and analyzing a pattern reflecting a semantics on which a signal is based, and a video signal processor for detecting and analyzing a visual and/or audio pattern reflecting a semantics on which a video signal is based.
  • a story board which is a panel formed from a sequence of images defining a main scene in a video application.
  • a story board is prepared by decomposing a video data into so-called shots and displaying representative images of the respective shots.
  • Most of the image extraction techniques are to automatically detect and extract shots from a video data as disclosed in “G. Ahanger and T. D. C. Little: A Survey of Technologies for Parsing and Indexing Digital Video, Journal of Visual Communication and Image Representation 7: 28-4, 1996”, for example.
  • the conventional image extraction techniques can only detect a so-called local video structure and a general video structure which is based on a special knowledge.
  • the present invention has an object to overcome the above-mentioned drawbacks of the prior art by providing a signal processing method and video signal processor, which can extract a high-level video structure in a variety of video data.
  • the above object can be attained by providing a signal processing method for detecting and analyzing a pattern reflecting the semantics of the content of a signal, the method including, according to the present invention, steps of: extracting, from a segment consisting of a sequence of consecutive frames forming together the signal, at least one feature which characterizes the properties of the segment; calculating, using the extracted feature, a criterion for measurement of a similarity between a pair of segments for every extracted feature and measuring a similarity between a pair of segments according to the similarity measurement criterion; and detecting, using the feature and similarity determination criterion, a similarity chain consisting of two or more, similar to each other, of the segments.
  • the above object can be attained by providing a video signal processor for detecting and analyzing a visual and/or audio pattern reflecting the semantics of the content of a supplied video signal, the apparatus including according to the present invention: means for extracting, from a visual and/or audio segment consisting of a sequence of consecutive visual and/or audio frames forming together the video signal, at least one feature which characterizes the properties of the visual and/or audio segment; means for calculating, using the extracted feature, a criterion for measurement of a similarity between a pair of visual segments and/or audio segments for every extracted feature and measuring a similarity between a pair of visual segments and/or audio segments according to the similarity measurement criterion; and means for detecting, using the feature and similarity determination criterion, a similarity chain consisting of two or more, similar to each other, of the visual and/or audio segments.
  • FIG. 1 explains the structure of a video data to which the present invention is applicable, using a video data model.
  • FIG. 2 explains a similarity chain for use to extract a local video structure.
  • FIG. 3 explains a similarity chain for use to extract a global video structure.
  • FIG. 4 is a block diagram of an embodiment of the video signal processor according to the present invention.
  • FIG. 5 is a flow chart of a series of operations effected in detecting, for analysis, a video structure in the video signal processor.
  • FIG. 6 explains the sampling of dynamic feature in the video signal processor.
  • FIG. 7 explains a basic similarity chain.
  • FIG. 8 explains a linked similarity chain.
  • FIG. 9 explains a cyclic chain.
  • FIG. 10 explains a series of operations effected in detecting a basic similarity chain using the batch clustering technique in the video signal processor.
  • FIG. 11 explains the dissimilarity threshold.
  • FIG. 12 is a flow chart of a series of operations effected in filtering the basic similarity chain in the video signal processor.
  • FIG. 13 is a flow chart of a series of operations effected in detecting a basic similarity chain using the consecutive clustering technique in the video signal processor.
  • FIG. 14 is a flow chart of a series of operations effected in detecting a linked similarity chain in the video signal processor.
  • FIG. 15 is a flow chart of a series of operations effected in detecting a cyclic chain in the video signal processor.
  • FIG. 16 is a flow chart of a series of operations effected in detecting a scene using the similarity chain in the video signal processor.
  • FIG. 17 is a flow chart of a series of operations effected in detecting a news item using the similarity chain in the video signal processor.
  • FIG. 18 is a flow chart of a series of operations effected in detecting a play in a sports program using the similarity chain in the video signal processor.
  • FIG. 19 is a flow chart of a series of operations effected in a topic detection in which a cyclic detection and scene detection are combined using the similarity chain, in the video signal processor.
  • the embodiment of the present invention is the video signal processor in which a desired content is automatically detected and extracted from a recorded video data. More particularly, the video signal processor is intended to detect and analyze a structure pattern of an image and/or sound, reflecting a semantics on which a video data is based. For the purpose of this analysis, a concept of “similarity chain” (will be referred to as “chain” where necessary hereinafter) is introduced.
  • chain a concept of “similarity chain”
  • FIG. 1 shows a video data model having a hierarchy of three levels such as frames, segments and a similarity chain, to which the present invention is applicable.
  • the video data model includes a sequence of frames at the lowest level.
  • the video data model includes a sequence of consecutive segments at a level one step higher than the level of the frames.
  • the video data model includes, at the highest level, a similarity chain consisting of a sequence of segments each having a certain similarity pattern.
  • the video data includes both visual information and audio information. That is, the frames in the video data include visual frames each being a single still image, and audio frames representing audio information having generally been sampled for a time as short as tens to hundreds of milliseconds.
  • each of segments is comprised of a sequence of visual frames having consecutively been picked up by a single camera.
  • the segment is called “shot”.
  • the segments include visual segments and/or audio segments, and each segment is the fundamental unit of a video structure.
  • the audio segments among these segments can be defined in many different manners as will be described below by way of example.
  • audio segments are bounded by periods of silence, respectively, in a video data detected by the well-known method, as the case may be.
  • each audio segment is formed from a sequence of audio frames classified in several categories such as speech, music, noise, silence, etc. as disclosed in “D. Kimber and L.
  • the audio segments are determined based on an audio cut point which is a large variation of a certain feature from one to the other of two successive audio frames, as disclosed in “ S. Pfeiffer, S. Fischer and E. Wolfgang: Automatic Audio Content Analysis, Proceeding of ACM Multimedia 96, November 1996, pp21-30”, for example.
  • the similarity chain consists of plurality of segments similar to each other and ordered in time, and has a structure pattern which is classified into some kinds depending upon the relation between the similar segments included in the chain and constraints on the pattern .
  • the index i j represents a segment number in an original video data in which the segment is included, and the suffix j to the reference i indicates that in the similarity chain, the segment is at the j-th position on the time base.
  • the similarity chain includes temporally discrete segments, there exists a time gap between the elements in the chain in some cases.
  • the segments S ij and S ij+1 are not always contiguous to each other in the original video data in which they are included.
  • a video data includes keys by which the user can perceptually know the outline of the video data.
  • the simplest and most important one of the keys is a structure pattern of similar visual segments or audio segments.
  • the structure pattern is the very information that has to be acquired using the similarity chain.
  • the similarity patterns include a basic similarity pattern, liked similarity pattern, local chain and cyclic pattern, which are the most important basic ones for the video data analysis.
  • the “basic similarity chain” is such a one that all segments included therein are similar to each other. However, there is no restraint to the structure pattern of the segments. Such a baric similarity chain can be obtained using a grouping algorithm or clustering algorithm for grouping the segments.
  • the “linked similarity chain” is a one in which adjacent segments included therein are similar to each other.
  • the “local chain” is a one in which a time interval between each pair of adjacent segments is shorter than a predetermined time.
  • each segment is similar to a rear m-th segment. Namely, the cyclic chain consists of m segments repeated in a similar manner.
  • These similarity chains can be used to extract a local video structure such as a scene in a video data and a global video structure such as a news item, as will be described later.
  • the “scene” is a significant organization of segments having been acquired by detecting visual segments (shots) or audio segments, grouped based on a feature indicative of an intra-segment perceptive activity amount for example.
  • the scene is subjective and depends upon the content or genre of a video data. However, it is assumed herein that the scene is a group of repeated patterns of video or audio segments whose features are similar to each other.
  • each visual segment is composed of two temporally overlapping chains for each segment components A and B.
  • temporally overlapping local chains are usable to detect a group of relevant visual segments or a scene.
  • FIG. 3 there is explained an example of the similarity chain for use to extract a global video structure.
  • a news program having a fixed structure will be considered.
  • segments of a news caster introducing news items appear for each news item, and then segments of a correspondent making a report at site appear, for example.
  • the repeatedly appearing visual segments of the news caster form together a global chain. Since the segments of the news caster represent news items, respectively, a news item can automatically be detected using th global chain.
  • the global chain can be used to detect each topic from a video data composed of a plurality of news items being topics A, B, C, D, . . . as shown in FIG. 3 .
  • the video signal processor is generally indicated with a reference 10 .
  • the features of segments in the video data are used to determine a similarity between segments, and automatically detect the above-mentioned similarity chain.
  • the video signal processor 10 is applicable to both visual and audio segments.
  • the video signal processor 10 can analyze the similarity chain to extract and reconstruct a scene being a local video structure, and a high-level structure being a global video structure such as a topic.
  • the video signal processor 10 includes a video segmentor 11 to segment or divide an input video data stream into visual or audio segments or into both, a video segment memory 12 to store the segments of the video data, a visual feature extractor 13 to extract a feature for each visual segment, an audio feature extractor 14 to extract a feature for each audio segment, a segment feature memory 15 to store the features of the visual and audio segments, a chain detector 16 in which the visual and audio segments are grouped into a chain, a feature similarity measurement block 17 to determine a similarity between two segments, and a chain analyzer 18 to detect and analyze a variety of video structures.
  • a video segmentor 11 to segment or divide an input video data stream into visual or audio segments or into both
  • a video segment memory 12 to store the segments of the video data
  • a visual feature extractor 13 to extract a feature for each visual segment
  • an audio feature extractor 14 to extract a feature for each audio segment
  • a segment feature memory 15 to store the features of the visual and audio segments
  • a chain detector 16 in which
  • the video segmentor 11 is supplied with a video data stream consisting of visual and audio data in any one of various digital formats including compressed video formats such as Moving Picture Experts Group Phase 1 (MPEG1), Moving Picture Experts Group Phase 2 (MPEG2) and digital video (DV), and divides the video data into visual or audio segments or into both segments.
  • compressed video formats such as Moving Picture Experts Group Phase 1 (MPEG1), Moving Picture Experts Group Phase 2 (MPEG2) and digital video (DV)
  • MPEG1 Moving Picture Experts Group Phase 1
  • MPEG2 Moving Picture Experts Group Phase 2
  • DV digital video
  • the video segment memory 12 stores the information segments of video data supplied from the video segmentor 11 . Also the video segment memory 12 supplies the information segments to the chain detector 16 upon query from the chain detector 16 .
  • the visual feature extractor 13 extracts a feature for each visual segment resulted from segmentation of the video data by the video segmentor 11 .
  • the visual feature extractor 13 can process a compressed video data without fully expanding it. It supplies the extracted feature of each visual segment to the downstream segment feature memory 15 .
  • the audio feature extractor 14 extracts a feature for each audio segment resulted from segmentation of the video data by the video segmentor 11 .
  • the audio feature extractor 14 can process a compressed audio data without fully expanding it. It supplies the extracted feature of each audio segment to the downstream segment feature memory 15 .
  • the segment feature memory 15 stores the visual and audio segment features supplied from the visual and audio feature extractors 13 and 14 , respectively. Upon query from the downstream feature similarity measurement block 17 , the segment feature memory 15 supplies stored features and segments to the feature similarity measurement block 17 .
  • the chain detector 16 groups the visual and audio segments into chains, respectively.
  • the chain detector 16 starts with each segment in a group to detect a repeated pattern of similar segments in a group of segments, and group such segments into the same chain. Then, after grouping candidate for chains, the chain detector 16 determines a final set of chains at a second filtering step.
  • the chain detector 16 supplies the detected chains to he downstream chain analyzer 18 .
  • the feature similarity measurement block 17 determines a similarity between two segments, and queries the segment feature memory 15 to retrieve the feature for a certain segment.
  • the chain analyzer 18 analyzes the chain structure detected by the chain detector 16 to detect a variety of local and global video structures.
  • the chain analyzer 18 can adjust the details of the video structures according to a special application as will be described in detail later.
  • the video signal processor 10 detects a video structure by effecting a series of operations as outlined in FIG. 5 using the similarity chains.
  • the video signal processor 10 divides a video data into visual or audio segments as will be described below.
  • the video signal processor 10 divides a video data supplied to the video segmentor 11 into visual or audio segments or possibly into both segments.
  • the video segmenting method employed in the video signal processor 10 is not any special one.
  • the video signal processor 10 segments a video data by the method disclosed in the previously mentioned “G. Ahanger and T. D. C. Little: A Survey of Technologies for Parsing and Indexing Digital Video, Journal of Visual Communication and Image Representation 7: 28-4, 1996”. This video segmenting method is well known in this field of art.
  • the video signal processor 10 according to the present invention can employ any video segmenting method.
  • the video signal processor 10 extracts a feature. More specifically, the video signal processor 10 calculates a feature which characterizes the properties of a segment by means of the visual feature extractor 13 and audio feature extractor 14 . In the video signal processor 10 , for example, a time duration of each segment, visual or visual features such as color histogram and texture feature, audio features such as frequency analysis result, level and pitch, activity determination result, etc. are calculated as applicable features. Of course, the video signal processor 10 according to the present invention is not limited to these applicable features.
  • the video signal processor 10 measures a similarity between segments using their features. More specifically, the video signal processor 10 measures a dissimilarity between segments by the feature similarity measurement block 17 and determines how similar two segments are to each other according to the feature similarity measurement criterion of the feature similarity measurement block 17 . Using the features having been extracted at step S 2 , the video signal processor 10 calculates a criterion for measurement of dissimilarity.
  • the video signal processor 10 detects a chain at step S 4 . Namely, the video signal processor 10 detects a chain of similar segments using the dissimilarity measurement criteria having been calculated at step S 3 and the features having been extracted at step S 2 .
  • the video signal processor 10 analyzes the chain at step S 5 . More specifically, the video signal processor 10 uses the chain detected at step S 4 to determine and output a local and/or global video structure of video data.
  • the video signal processor 10 can detect a chain structure from a video data. Therefore, using the result, the user can index and sum the content of the video data and quickly access to points of interest in the video data.
  • the video segmentation at step S 1 will be discussed herebelow.
  • the video signal processor 10 divides a video data supplied to the video segmentor 11 into visual or audio segments or possibly into both segments. Many techniques are available for automatic detection of a boundary between segments in a video data. As mentioned above, the video signal processor 10 according to the present invention is not limited to any special video segmenting method. On the other hand, the accuracy of chain detection in the video signal processor 10 substantially depends upon the accuracy of the video segmentation which is to be done before the chain detection.
  • the “features” are attributes of segments characterizing the properties of the segments and providing data with which a similarity between different segments is determined.
  • the visual and audio feature extractors 13 and 14 calculate feature of each segment.
  • the video signal processor 10 is not limited to any details of features.
  • the features considered to be effectively usable in the video signal processor 10 include visual feature, audio feature and visual-audio features as will be described below.
  • the requirement for these features usable in the video signal processor 10 is that they should be ones from which a dissimilarity can be determined.
  • the video signal processor 10 has to effect simultaneously effects a feature extraction and video segmentation as the case may be.
  • the features which will be described below meet the above requirement.
  • the features include first a one concerning an image (will be referred to as “visual feature” hereinafter).
  • a visual segment is composed of successive visual frames. Therefore, by extracting an appropriate one of the visual segments, it is possible to represent the depicted content of the visual segment by the extracted visual frame. Namely, a similarity of the appropriately extracted visual frame can be used as a similarity between visual segments.
  • the visual feature is an important one of the important features usable in the video signal processor 10 .
  • the visual feature can represent by itself only static information. Using a method which will be described later, the video signal processor 10 can extract a dynamic feature of visual segments based on the visual feature.
  • colors of images are important materials for determination of a similarity between two images.
  • the use of a color histogram for determination of a similarity between images is well known as disclosed in, for example, “G. Ahanger and T. D. C. Little: A Survey of Technologies for Parsing and Indexing Digital Video, Journal of Visual Communication and Image Representation 7: 28-4, 1996”.
  • the color histogram is acquired by dividing a three-dimensional space such as HSV, RGB or the like for example into n areas and calculating a relative ratio of frequency of appearance in each area of pixels of an image. Information thus acquired gives an n-dimensional vector.
  • a color histogram can be extracted directly from a compressed video data as disclosed in the U.S. Pat. No. 5,708,767.
  • Such a histogram represents a total color tone of an image but includes no timing data. For this reason, a video correlation is calculated as another visual feature in the video signal processor 10 .
  • a video correlation is calculated as another visual feature in the video signal processor 10 .
  • mutual overlapping of a plurality of similar segments is an important index indicating that the segments form together a chain structure. For example, in a dialogue scene, the camera is moved between two persons alternately and to one of them being currently speaking. Usually, for shooting the same person again, the camera is moved back to nearly the same position where he or she was previously shot.
  • initial images are thinned and reduced to grayscale images each of M ⁇ N (both M and N may be small values; for example, M ⁇ N may be 8 ⁇ 8) in size and a video correlation is calculated using the reduced grayscale images in the video signal processor 10 . That is, the reduced gray scale images are interpreted as an MN-dimensional feature vector.
  • the features different from the above-mentioned visual feature concern a sound.
  • This feature will be referred to as “audio feature” hereinafter.
  • the audio feature can represent the content of an audio segment.
  • a frequency analysis, pitch, level, etc. may be used as audio features. These audio features are known from various documents.
  • the video signal processor 10 can make a frequency analysis of a Fourier Transform component or the like to determine the distribution of frequency information in a single audio frame.
  • the video signal processor 10 can use FFT (Fast Fourier Transform) component, frequency histogram, power spectrum and other features.
  • the video signal processor 10 may use pitches such as a mean pitch and maximum pitch, and sound levels such as mean loudness and maximum loudness, as effective audio features for representation of audio segments.
  • the cepstrum feature used in the video signal processor 10 includes a cepstrum factor and its primary and secondary differential coefficients and may be a cepstrum spectral factor obtained from FFT spectrum or LPC (linear predictive coding).
  • Further features are those common to image and sound. They are neither any visual feature nor audio feature, but provide useful information for representation of features of segments included in a chain.
  • the video signal processor 10 uses an activity, as common visual-audio feature.
  • the activity is an index indicating how dynamic or static the content of a segment feels. For example, if a segment visually feels dynamic, the activity represents a rapidity with which a camera is moved along an object or with which an object being shot by the camera changes.
  • i and j are frames
  • F is a feature measured between the frames i and j
  • d F (i, j) is a dissimilarity measurement criterion for the feature d F
  • b and f are numbers for a first frame and last frame in one segment.
  • the video signal processor 10 can calculate the video activity V F using the above-mentioned histogram for example.
  • the features including the above-mentioned visual features basically indicate static information of a segment as in the above. To accurately represent the feature of a segment, however, dynamic information has to be taken in consideration. For this reason, the video signal processor 10 represents dynamic information by a feature sampling method which will be described below.
  • the video signal processor 10 extracts more than one static feature, starting at different time points in one segment. At this time, the video signal processor 10 determines the number of features to extract by keeping a balance between a highest fidelity of segment depiction and a minimum data redundancy. For example, when a certain image in the segment can be designated as a key frame in that segment, a histogram calculated from the key frame will be a feature to sample.
  • a certain sample is always selected at a predetermined time point, for example, at the last time point in a segment.
  • samples from two arbitrary segments changing to black frames (fading) will be same black frames, so that no different features will possibly be acquired. That is, selected two frames will be determined to be extremely similar to each other whatever the image contents of such segments are. This problem will take place since the samples are not good central values.
  • the video signal processor 10 is adapted not to extract a feature at such a fixed point but to extract a statistically central value in an entire segment.
  • the general feature sampling method will be described concerning two cases that (1) a feature can be represented as a real-number n-dimensional vector and (2) only a dissimilarity measurement criterion can be used. It should be noted that best-known visual and audio features such as histogram, power spectrum, etc. are included in the features in the case (1).
  • the number of samples is predetermined to be k and the video signal processor 10 automatically segments a feature of an entire segment into k different groups by using the well-known k-means clustering method as disclosed in “L. Kaufman and P. J. Rousseeuw: Finding Groups in Data: An Introduction to Cluster Analysis, John-Wiley and Sons, 1990”.
  • the video signal processor 10 selects, as a sample value, a group centroid or a sample approximate to the centroid from each of the k groups.
  • the complexity of the operations in the video signal processor 10 is just the linearly increased number of samples.
  • the video signal processor 10 forms the k groups by the use of the k-medoids algorithm method also disclosed in “L. Kaufman and P. J. Rousseeuw: Finding Groups in Data: An Introduction to Cluster Analysis, John-Wiley and Sons, 1990”.
  • the video signal processor 10 uses, as a sample value, the above-mentioned group medoid for each of the k groups.
  • the method for establishing the dissimilarity measurement criterion for features representing extracted dynamic features is based on the dissimilarity measurement criterion for static features on which the former method is based, which will further be described later.
  • the video signal processor 10 can extract a plurality of static features to represent a dynamic feature using the plurality of static features.
  • the video signal processor 10 can extract various features. However, each of such features is generally insufficient for representation, by itself, of a segment feature. For this reason, the video signal processor 10 can select a set of mutually complementary features by combining these different features. For example, the video signal processor 10 can provide more information than that of each feature by combining the above-mentioned color histogram and image correlation with each other.
  • the video signal processor 10 measures a dissimilarity between the segments by means of the feature similarity measurement block 17 .
  • the dissimilarity measurement criterion is small, it indicates that two features are similar to each other. If the criterion is large, it indicates that the two features are not similar to each other.
  • the function for calculation of the dissimilarity between the two segments S 1 and S 2 concerning the feature F is defined as dissimilarity measurement criterion d F (S 1 , S 2 ).
  • dissimilarity measurement criteria is only applicable to specific features.
  • many dissimilarity measurement criteria are generally applicable to the measurement of a similarity between features represented as points in a n-dimensional space.
  • the features include a Euclidean distance, inner product, L 1 distance, etc.
  • the video signal processor 10 adopts the L 1 distance as a feature.
  • the video signal processor 10 extracts, as features representing dynamic features, static features at different time points in a segment. Then, to determine a similarity between two extracted dynamic features, the video signal processor 10 uses, as a criterion for determination of a dissimilarity, a criterion for measurement of a dissimilarity between the static features on which the similarity measurement criterion is based.
  • the dissimilarity measurement criterion for the dynamic features should most advantageously be established using a dissimilarity between a pair of static features selected from each dynamic feature and most similar to each other.
  • d S ⁇ ⁇ ( SF 1 , SF 2 ) F 1 ⁇ SF 1 , F 2 min ⁇ SF 2 ⁇ ⁇ d F ⁇ ⁇ ( F 1 , F 2 ) ( 4 )
  • the function d F (F 1 , F 2 ) in the equation (4) above indicates a criterion for measurement of a dissimilarity between the static features F on which the equation (4) is based. It should be noted that the a maximum or mean value of the dissimilarity between features may be taken instead of a maximum value as the case may be.
  • a solution for this problem is to calculate a dissimilarity based on various features as a combination of respective features as weighted. That is, when there are available k features F 1 , F 2 , . . . , F k , the video signal processor 10 uses a dissimilarity measurement criterion d F (S 1 , S 2 ) for combined features.
  • the video signal processor 10 can calculate a dissimilarity measurement criterion using features having been extracted at step S 2 in FIG. 5 to determine a similarity between segments in consideration.
  • the video signal processor 10 uses the dissimilarity determination criterion and extracted features to detect a similarity chain indicatives of a linkage between similar segments.
  • some types of similarity chains will be defined and algorithms for detection of each type of similarity chain will be described in detail.
  • the similarity chains which will be defined below are independent of each other, one similarity chains may belong to more than one similarity chain type in the video signal processor 10 . Therefore, the similarity chains will be referred to using combinations of the defined type names.
  • the “local uniform linked chain” refers to a similarity chain having features of local, uniform, and linked similarity chains as will be described later.
  • the similarity chains are generally grouped into ones in which the relation between similar segments is restricted and ones whose structure is restricted.
  • a “chain C” means a series of segments S il , . . . , S im .
  • the index i k represents a segment number in a video data in which the segment is included, and the suffix k to the reference i indicates that in the similarity chain, the segment is at the k-th position on the time base.
  • a series of segments is always ordered on the time base. For all k's being 1, . . . , m ⁇ 1, respectively, i k ⁇ i k+1 . Further,
  • indicates the length of a chain.
  • C start and C end indicate a start time and end time, respectively, of the chain C in the video data. More precisely, the start time of the chain C is a start time of the first segment in the chain C and the end time of the chain C is an end time of the last segment in the chain C. Moreover, when a certain segment is taken as A, segments similar to the segment are indicated with references A′, A′′, A′′′, . . . . Finally, the similarity between two segments means that the dissimilarity determination criterion for them is smaller than a dissimilarity threshold which will further be described later. This is defined as “similar (S 1 , S 2 )”.
  • the similarity chain in which the relation between similar segments included in the similarity chain is restricted includes the basic, linked and cyclic similarity chains.
  • the basic similarity chain is a chain C in which all segments are similar to each other as shown in FIG. 7 . Note that there is no structural restriction for the basic similarity chain. In many cases, the basic similarity chain is obtained using a grouping algorithm or clustering algorithm for grouping similar segments.
  • the linked similarity chain is a chain C in which adjacent segments are similar to each other as shown in FIG. 8 .
  • a definition “similar (S k , S k+1 )” applies for all k's being 1, . . . ,
  • the segments in this similarity chain can be indicated with references A′, A′′, A′′′, . . . .
  • the cyclic chain is a chain Ccyclic in which each segment is similar to the rear m-th segment as shown in FIG. 9 . That is, the definition “similar (S k , S k+1 )” applies for all k's being 1, . . . ,
  • the cyclic chain is composed of an approximate repetition of a series of m segments.
  • the cyclic chains can be indicated with references S 1 , S 2 , . . . , S m , S 1 ′, S 2 ′, . . . , S m ′, S 1 ′′, S 2 ′′, . . . , S m ′′, ⁇ , S 1 ′′′, S 2 ′′′, . . . , S m ′′′.
  • the similarity chains whose structure is restricted include the local and uniform chains.
  • the local chain is a chain C in which the time interval between each pair of adjacent segments is shorter than a predetermined time as having been described in the above. Namely, in the local chain, when the maximum permissible time interval between two segments in the chain is taken as “gap”, i k+1 ⁇ i k ⁇ gap for the adjacent segments S ik and S ik+1 when all k's are 1, . . . ,
  • the “uniformity (C)” of the chain C given by the expression (6) takes a value ranging from 0 to 1. When the value is small, the time interval distribution of the segments are nearly uniform. When the value of the “uniformity (C)” has a value smaller than a predetermined uniformity threshold, the chain C is regarded as a uniform chain.
  • the video signal processor 10 adopts the batch clustering technique or consecutive clustering technique for detecting the basic similarity chain.
  • the batch clustering technique is to detect chains collectively. For adoption of this technique, however, it is necessary to terminate all the video segmentation before the chain detection.
  • the consecutive clustering technique is to detect chains one after another.
  • the consecutive chain detection can be real-time effected, in other words, chains can be detected while video data is being acquired or recorded.
  • the consecutive video analysis will show a problem of accuracy. That is to say, in the consecutive clustering technique, there is available no global information for determination of an optimum chain structure, and the consecutive clustering is susceptible to the order in which segments are supplied as input. Thus, this consecutive clustering will result in a low quality.
  • the video signal processor 10 adopts the batch clustering technique, it operates in two steps to detect basic similarity chains as shown in FIG. 10 .
  • the video signal processor 10 detects candidate chains. More particularly, the video signal processor 10 detects similar segments in video data and groups them into a cluster. The cluster group of segments thus obtained will be an initial candidate in detection of basic similarity chains.
  • the video signal processor 10 can use an arbitrary clustering technique when determining initial candidate similarity chains.
  • the video signal processor 10 will adopt the hierarchical clustering method disclosed in “L. Kaufman and P. J. Rousseeuw: Finding Groups in Data: An Introduction to Cluster Analysis, John-Wiley and Sons, 1990”.
  • This algorithm is such that two most similar segments are paired and the inter-cluster similarity determination criterion is used to pair most similar clusters at each level.
  • a criterion d c (C 1 , C 2 ) for determination of a dissimilarity between two clusters C 1 and C 2 is defined as a minimum similarity between two segments included in each cluster.
  • the video signal processor 10 may adopt a maximum function or mean function in place of the minimum function given by the expression (7) as necessary.
  • the hierarchical clustering method will put all segments included in video data into a single group.
  • the video signal processor 10 is adapted to judge whether two segments are similar to each other by comparing each of them with a dissimilarity threshold ⁇ sim as shown in FIG. 11 .
  • the dissimilarity threshold ⁇ sim is a threshold for determining whether two segments belong to the same chain.
  • the video signal processor 10 groups segments into clusters so that the dissimilarity of all cluster pairs will not be beyond the dissimilarity threshold ⁇ sim .
  • the dissimilarity threshold ⁇ sim may be set by the user or automatically. However, when a fixed value is used as the dissimilarity threshold ⁇ sim , its maximum value will depend upon the content of a video data. For a video data having a large variety, for example, the dissimilarity threshold ⁇ sim has to be set to a high level. On the contrary, for a video data having less variety, the dissimilarity threshold ⁇ sim has to be set to a low level. Generally, when the dissimilarity threshold ⁇ sim is at a high level, a decreased number of clusters will be detected. When the threshold ⁇ sim is at a low level, an increased number of clusters will be detected.
  • the video signal processor 10 can automatically determine an effective dissimilarity threshold ⁇ sim by the method which will be described below.
  • the video signal processor 10 detects a dissimilarity threshold ⁇ sim using a statistic value such as a mean value and median of the dissimilarity distribution between (n)(n ⁇ 1)/2 segment pairs.
  • a statistic value such as a mean value and median of the dissimilarity distribution between (n)(n ⁇ 1)/2 segment pairs.
  • the dissimilarity threshold ⁇ sim can be given by a ⁇ +b ⁇ where a and b are constants. It was found through our many experiences that setting of the constants a and b to 0.5 and 0.1 can provide a good result.
  • the video signal processor 10 has not to determine any dissimilarity between all segment pairs but has only to acquire at random from a set of all segment pairs segment pairs whose mean value ⁇ and standard deviation a are approximately real values to determine a dissimilarity. Using the mean value ⁇ and standard deviation ⁇ , the video signal processor 10 can automatically acquire an appropriate dissimilarity threshold ⁇ sim . That is, when the total number of segment pairs is n and an arbitrary small constant is C, the video signal processor 10 will extract a dissimilarity between a number, given by Cn, of segment pairs, for thereby automatically determining an appropriate dissimilarity threshold ⁇ sim .
  • the video signal processor 10 rearranges the segments included in each cluster to automatically acquire initial candidates for the basic similarity chain.
  • the video signal processor 10 has to determine which of the chain candidates forms an important framework of a video structure or is relevant to the video structure. To this end, the video signal processor 10 filters, at step S 12 , the chains using a quality determination criterion corresponding to a numerical reference indicating the quality of the chains. That is, the video signal processor 10 determines the importance and relevance of the chain candidates in the video structure analysis and provides only chain candidates whose quality is higher than a predetermined quality determination criterion threshold as a result of the chain detection.
  • the most simple example as a relevance determination function used in filtering is a Boolean function which indicates whether the chain candidate is acceptable or not. Note however that the video signal processor 10 may adopt a more complicated relevance determination function.
  • length, density, strength, etc. of the chain are used as chain quality determination criteria.
  • the chain length is defined as a number of segments included in one chain.
  • the chain length when it is short, it can be used as a chain quality determination criterion in the video signal processor 10 , which however depends upon whether the chain can be regarded as an ordinary noise. For example, when a certain chain has only a single segment, it will not have any information. That is, the chain length-based quality determination criterion can be used only when the chain includes a minimum permissible number of segments.
  • the chain density is defined as a ratio between a total number of segments included in a certain chain and that of segments included in a part of a video data that the chain occupies. This is because chains should preferably be concentrated in a limited time area as the case may be. In this case, the video signal processor 10 should use the chain density as the chain quality determination criterion.
  • the chain strength is an index indicative of how similar segments in a chain are to each other. The more similar the segments are to each other, the higher strength the chain has.
  • the video signal processor 10 may adopt many methods for determination of the chain strength, such as an intra-chain similarity determination method, a method for averaging the dissimilarity between all possible segment pairs, and a method of taking a maximum value of the dissimilarity between all possible segment pairs.
  • the intra-chain similarity determination method is to provide a similarity between segments included in a chain as a mean value of the dissimilarity between most typical segments included in the chain.
  • d dentroid 1 ⁇ C ⁇ ⁇ ⁇ ⁇ S ⁇ C ⁇ ⁇ d F ⁇ ⁇ ( S , S centroid ) ( 9 )
  • the video signal processor 10 filter a chain by a series of operations shown in FIG. 12 .
  • the video signal processor 10 initializes a chains list C list with candidate chains while emptying a filtering chains list C filtered .
  • step S 22 the video signal processor 10 judges whether the chains list C list is empty or not.
  • the video signal processor 10 will remove, at step S 23 , a certain chain C from the chains list C list taking the chain C as a first element in the chains list C list .
  • step S 24 the video signal processor 10 calculates a chain quality determination criterion for the chain C.
  • step S 25 the video signal processor 10 will judge whether the chain quality determination criterion is larger than the quality determination criterion threshold.
  • the video signal processor 10 goes to step S 22 where it will effect operation for another chain.
  • the video signal processor 10 will add the chain C to the filtering chains list C filtered at step S 26 .
  • the video signal processor 10 will judge, at step S 27 , whether the chains list C list is empty or not.
  • the video signal processor 10 When the chains list C list is determined to be empty, the video signal processor 10 will terminate the series of operations since there is no candidate chain.
  • the video signal processor 10 will go to step S 23 .
  • the video signal processor 10 will repeatedly effect the operations until the chains list C list becomes empty.
  • the video signal processor 10 can filter the chains to determine which of the chain candidates forms an important framework of a video structure or is relevant to the video structure.
  • the video signal processor 10 can detect a basic similarity chain using the batch clustering technique.
  • the video signal processor 10 can detect a basic similarity chain using the aforementioned consecutive clustering technique in stead of the batch clustering technique. That is, the video signal processor 10 processes segments in a video data one by one in their supplied order to repeatedly list candidate chains and update the chains list. Also in this case, the video signal processor 10 effects the main process of chain detection in two steps as in the batch clustering. Namely, first the video signal processor 10 will use the consecutive clustering algorithm to detect clusters including similar clusters. Next, the video signal processor 10 will use a similar chain quality determination criterion to that in the batch clustering technique to filter the detected clusters. Namely, the consecutive clustering technique used in the video signal processor 10 is different from the batch clustering technique in that the chain filtering is effected at an earlier time.
  • the consecutive clustering technique algorithm is used for cluster the segments.
  • most of the consecutive clustering methods are effected locally optimally. More specifically, the consecutive clustering algorithm is such that each time a new segment is supplied, it is judged whether the segment is assigned to an existing cluster or a new cluster including the segment is to be generated.
  • more elaborate consecutive clustering algorithms include a one in which to prevent a bias effect incidental to the input order of segments, the cluster division itself is updated each time a new segment is supplied. Such an algorithm is known from “J. Roure and L.
  • the video signal processor 10 effects a series of operations as shown in FIG. 13 as one example of the consecutive clustering algorithm. It is assumed here that a video data is divided in segments S 1 , . . . , S n . Note that the series of operations including the chain analysis step will be described herebelow.
  • the video signal processor 10 initializes the chains list C list to an empty state at step S 31 and sets the segment number i to 1 at step S 32 .
  • step S 33 the video signal processor 10 judges whether the segment number i is smaller or not than the total number n of segments.
  • the video signal processor 10 terminates the series of operations since there is no segment.
  • the video signal processor 10 will acquire a segment S 1 at step S 34 , and judge at step S 35 whether the chains list C list is empty or not.
  • the video signal processor 10 detects, at step S 36 , a chain C min whose dissimilarity to the segment S 1 .
  • the criterion dsc(C, S) is equal to the second argument in the expression (7) for the similarity determination criterion defined concerning the batch clustering technique, taken as a cluster including only the segment S 1 .
  • the minimum dissimilarity d SC (C min , S i ) between the chain C min and segment S i will be given by d min .
  • step S 37 the video signal processor 10 uses the dissimilarity threshold ⁇ sim having been described concerning the batch clustering technique to determine whether the minimum dissimilarity d min is smaller than the dissimilarity threshold ⁇ sim .
  • the video signal processor 10 goes to step S 42 where it will generate a new chain C new including only the segment S i as the only element segment, and adds the new chain C new to the chains list C list at step S 43 . Then, it goes to step S 39 .
  • the video signal processor 10 will add the segment S i to the chain C min at step S 38 . Namely, the video signal processor 10 will provide C min ⁇ C min ⁇ S i .
  • the video signal processor 10 filter the chains. That is, the video signal processor 10 determines the quality of the chain C for each element chain C ⁇ C list as mentioned above to acquire only chains whose quality determination criterion is larger than the quality determination criterion threshold, and adds the acquired chains to the filtering chains list C filtered .
  • step S 40 the video signal processor 10 will consecutively analyze the chains. That is, the video signal processor 10 passes to the analysis module the chains list C filtered having been filtered at that time.
  • the video signal processor 10 adds 1 to the segment number i at step S 41 , and goes to step S 33 .
  • the video signal processor 10 will repeat the above-mentioned series of operations until the segment number i becomes larger than the total number n of segments. Then, the video signal processor 10 will detect as the basic similarity chain each element chain of the chains list C list when the segment number i has become the total number n of segments.
  • the series of operations shown on FIG. 13 is based on the assumption that the total number n of segments included in an input video data is already known. Generally, however, the total number n of segments is not given in advance in many cases. In this case, the consecutive clustering algorithm should be used to judge at step S 33 in FIG. 13 whether the series of operations is to be continued or ended depending upon whether further segments are supplied or not.
  • the video signal processor 10 can detect a basic similarity chain using the consecutive clustering technique.
  • the detection of linked similarity chains by the video signal processor 10 can be regarded as a special case of basic similarity chain detection.
  • the video signal processor 10 effects the operations as shown in FIG. 14 to detect linked similarity chains using the consecutive clustering algorithm. It is assumed herein that a video data includes divided segments S 1 , . . . , S n . Note that a series of operations including the chain analysis step will be described.
  • the video signal processor 10 initializes the chains list C list to an empty state at step S 51 and sets the segment number i to 1 at step S 52 .
  • step S 53 the video signal processor 10 judges whether the segment number i is smaller or not than the total number n of segments.
  • the video signal processor 10 terminates the series of operations since there is no segment.
  • the video signal processor 10 will acquire a segment S i at step S 54 , and acquire at step S 55 a chain C min whose dissimilarity to the segment S 1 is minimum.
  • the dissimilarity determination criterion d SC (C, S) is given as a dissimilarity between the object segments and the last element segment in the chain C.
  • step S 56 the video signal processor 10 uses the aforementioned dissimilarity threshold ⁇ sim to determine whether the minimum dissimilarity d min is smaller than the dissimilarity threshold ⁇ sim .
  • the video signal processor 10 goes to step S 61 where it will generate a new chain C new including only the segment S i as the only element segment, and adds the new chain C new to the chains list C list at step S 62 . Then, it goes to step S 58 .
  • the video signal processor 10 will add the segment S i to the chain C min at step S 57 . Namely, the video signal processor 10 will provide C min ⁇ C min , S i .
  • the video signal processor 10 filter the chains. That is, the video signal processor 10 determines the quality of the chain C for each element chain C ⁇ C list as mentioned above to acquire only chains whose quality determination criterion is larger than the quality determination criterion threshold, and adds the acquired chains to the filtering chains list C filtered . Note that the video signal processor 10 may not effect this step.
  • step S 59 the video signal processor 10 will consecutively analyze the chains. That is, the video signal processor 10 passes to the analysis module the chains list C filtered having been filtered at that time.
  • the video signal processor 10 adds 1 to the segment number i at step S 60 , and goes to step S 53 .
  • the video signal processor 10 will repeat the above-mentioned series of operations until the segment number i becomes larger than the total number n of segments. Then, the video signal processor 10 will detect as the linked similarity chain each element chain of the chains list C list when the segment number i has become the total number n of segments.
  • the video signal processor 10 can detect linked similarity chains using the consecutive clustering technique.
  • the series of operations shown on FIG. 14 is based on the assumption that the total number n of segments included in an input video data is already known. Generally, however, the total number n of segments is not given in advance in many cases. In this case, the consecutive clustering algorithm should be used to judge at step S 53 in FIG. 14 whether the series of operations is to be continued or ended depending upon whether further segments are supplied or not.
  • the cyclic chain C cylic can be regarded as a set of k different basic similarity chains or linked similarity chains ⁇ C 1 , . . . , C k ⁇ .
  • the segments in the cyclic chain C cyclic will be referred to with references S 1 , . . . , S n and C(S i ) will indicate numbers 1, . . . , k for the chains in which the segment S i appears.
  • C cyclic is a cyclic chain, a list of chain numbers in series as C(S 1 ), C(S 2 ), . . .
  • C(S n ) will be described in the form of i 1 , . . . , i k , i 1 , . . . , i k , . . . , i 1 , . . . , i k .
  • i 1 , . . . , i k is a sequence of chain numbers 1, . . . , k, in other words, an arbitrary list including no overlapping chains.
  • a cyclic chain i 1 , i 1 , . . . , i 1 including one segment in one cycle will be referred to as “basic cyclic chain” hereinafter.
  • the video signal processor 10 finds cyclic chains approximate to each other in the video data by a series of operations shown in FIG. 15 .
  • a restrictive condition can be added that an original basic cyclic chain should be uniform as necessary. Operations to be effected under the restrictive condition will be discussed herebelow.
  • the video signal processor 10 detects basic cyclic chains included in a video data, generates an initial chains list based on the result of detection, and updates the initial chains list for all the basic cyclic chains included in the initial chains list to meet the above restrictive condition.
  • the video signal processor 10 detects an initial chains list C list using an algorithm for detection of the above-mentioned basic similarity chains or linked similarity chains at step S 71 .
  • the video signal processor 10 confirms the uniformity of the chain C included in the initial chains list at step S 72 .
  • the video signal processor 10 divides the chain C into a plurality of sub chains so that the time intervals between the sub chains will be maximum.
  • the video signal processor 10 filters the sub chains thus acquired using the chain quality determination criterion having been described concerning the algorithm for detection of the basic similarity chains or linked similarity chains, and adds the acquired uniform sub chains to the initial chains list C list .
  • the video signal processor 10 detects, from the chains list C list , a pair of temporally overlapping chains C 1 and C 2 of ⁇ C 1 , C 2
  • step S 74 the video signal processor 10 judges whether such overlapping chains C 1 and C 2 exist or not.
  • the video signal processor 10 terminate the series of operations by taking that the chains list C list already includes a plurality of cyclic chains.
  • the video signal processor 10 will evaluate, at steps S 75 to S 78 , the consistency between cycles in a set of the two chains C 1 and C 2 to determine whether the two chains C 1 and C 2 form together one cyclic chain.
  • step S 75 the video signal processor 10 puts the two chains C 1 and C 2 together to form a new cyclic chain C M .
  • segments included in the new chain C M will be indicated with references S 1 , S 2 , . . . , S
  • the video signal processor 10 will take the chain number C(S 1 ) in which the segment S 1 appears as C and decompose the chain C M into sub chains C M 1 , C M 2 , . . . , C m k at each appearance of C in the lists of chain numbers C(S 1 ), C(S 2 ), . . . , C(S
  • the video signal processor 10 will provide a list of sub chains as given by the expression (14):
  • step S 77 the video signal processor 10 finds a sub chain C M cycle which appears most frequently. That is, the video signal processor 10 will effect an operation as given by the expression (15):
  • C M cycle arg ⁇ ⁇ max C M k ⁇ ⁇ ⁇ C M i
  • C M i C M k , i ⁇ ⁇ 1 , ... ⁇ , k ⁇ ⁇ ⁇ ( 15 )
  • the video signal processor 10 evaluates, at step S 78 , whether the sub chain C M cycle appearing most frequently can be the first cycle of the original chain C M or not. That is, the video signal processor 10 defines a consistency factor mesh by a ratio of the appearance frequency of C M cycle acquired at step S 76 with the total number of sub chains, and judges, at step S 79 , whether the consistency factor exceeds a predetermined threshold or not, as given by the expression (16).
  • mesh ⁇ ⁇ C M i
  • C M i C M cycle , i ⁇ ⁇ 1 , ... ⁇ , k ⁇ ⁇ ⁇ k ( 16 )
  • the video signal processor 10 will go to step S 73 where it repeats the similar operations for detection of other overlapping chains.
  • the video signal processor 10 removes, at step S 80 , the chains C 1 and C 2 from the chains list C list , adds the chain C M to the chains list C list at step S 81 , and then goes to step S 73 .
  • the video signal processor 10 repeats the series of operations for all the cyclic chains included in the chains list C list until there exist no further overlapping chains, to thereby acquire the chains list C list including the last cyclic chain.
  • the video signal processor 10 can detect various kinds of changes of similar segments based on the dissimilarity determination criterion and extracted feature.
  • the video signal processor 10 uses detected chains to determine and output an output a local video structure and/or global video structure of the video data. How the result of chain analysis is used to detect a basic structure pattern taking place in a video data will be described in detail.
  • the scene is the unit of most basic local video structures positioned at a high level than the segments having previously been described, and it is composed of a series of segments which are semantically connected to each other.
  • the video signal processor 10 can detect such scenes using the chains.
  • the requirement for the chains to comply with in the scene detection in the video signal processor 10 is that the time interval between successive segments does not exceed a predetermined value called “time threshold”.
  • time threshold a predetermined value called “time threshold”.
  • the chains complying with the requirement will be referred to as “local chain” herein.
  • the video signal processor 10 effects a series of operations as shown in FIG. 16 to detect scenes using the chains.
  • the video signal processor 10 acquire a local chains list.
  • the video signal processor 10 detects, at step S 91 , a set of initial chains lists using the above-mentioned basic similarity chain detection algorithm.
  • the video signal processor 10 removes the chains C from the chains list at step S 93 .
  • the video signal processor 10 adds the sub chains C i to the chains list at step S 94 . Upon completion of this operation, all the chains become local ones.
  • step S 95 the video signal processor 10 detects a pair of temporally overlapping chains C 1 and C 2 of ⁇ C 1 , C 2 [C 1 start , C 1 end ] ⁇ [C 2 start , C 2 end ] from the chains.
  • step S 96 the video signal processor 10 judges whether there exist such temporally overlapping chains C 1 and C 2 .
  • the video signal processor 10 takes it that there exists one scene in each chain included in the chains list and terminates the series of operations.
  • the video signal processor 10 puts, at step S 97 , the temporally overlapping chains C 1 and C 2 together to form a new chain C M .
  • step S 98 the video signal processor 10 removes the temporally overlapping chains C 1 and C 2 from the chains list, adds the chain C M to the chains list, and then goes to step S 95 where it will repeat the similar operations.
  • the video signal processor 10 can detect a scene being a local structure pattern in a video data using the chains.
  • the video signal processor 10 acquires a local chain of the segments of the talker at steps S 91 to S 94 .
  • the video signal processor 10 will put these scenes together to form a single large scene indicative of the entire scene.
  • the video signal processor 10 can detect a dialogue scene.
  • the video signal processor 10 can detect scenes consecutively by effecting the above-mentioned algorithm consecutively.
  • a news program has such a cyclic structure that each news item begins with an introductory statement by an anchor, for example, and is followed by reports from more than one sites. That is, such a video structure can be regarded as a simple cyclic structure in which a time from an anchor shot until a next anchor shot is taken as one cycle.
  • the video signal processor 10 For automatic detection of a news item using a chain, the video signal processor 10 effects a series of operations as shown in FIG. 17 .
  • the video signal processor 10 uses the aforementioned cyclic chain detection algorithm to detect cyclic chains. With this operation, the video signal processor 10 can acquire a list of cyclic chains. In the cyclic chains list, each cycle may represent a news item or not.
  • the video signal processor 10 removes all cyclic chains whole cycles are shorter than a predetermined total length ratio of video data. More particularly, with this operation, the video signal processor 10 excludes a cyclic chain whose cycle is too short to be expectable to represent any news item. Such a cycle can take place for example when an emcee has an interview of a guest or when any other short time cycle takes place in a newscast.
  • the video signal processor 10 selects a cyclic chain whose time duration is the shortest in all the cyclic chains remained not excluded at step S 102 .
  • the video signal processor 10 will remove that cyclic chain from the cyclic chains list.
  • the video signal processor 10 will repeat this operation until any cyclic chain will not overlap any other cyclic chains.
  • the list of cyclic chains left after completion of the operation at step S 103 will include a detected news items list. That is, each cycle of the cyclic chains list acquired at step S 103 represent one news item.
  • the video signal processor 10 can automatically detected news items using the chain.
  • the video signal processor 10 can operate with no problem even with a shift of the new caster from one to another at changeover of news item from a main, sports to business segments for example in a newscast.
  • sports video data are featured by a fixed pattern that a play is formed from the same series of steps repeated many times.
  • the play is formed basically by a pitcher throwing a ball and a batter trying to hit the ball.
  • Other sports video data having such a play structure include video data of football, rugby and the like games.
  • a video data having the above play structure When a video data having the above play structure is broadcast, it will represent a repeated group of segments of various parts of a play. That is, in the video data, segments representing a pitcher are followed by segments representing a butter. When a ball thrown by the pitcher is hit by the butter, the segments representing the butter are followed by segments representing outfielders.
  • segments included in the video data of the baseball game and representing a pitcher are detected as one chain
  • segments representing a butter are detected as another chain
  • segments representing the other outfielders and various other scenes are detected as other chains.
  • a play structure in the sportscast will be a cyclic image detectable using the aforementioned cyclic chain detecting method.
  • Another example of such a play structure is a tennis game video data.
  • the video data of tennis game is composed of cycles such as a serve, a volley, a serve and then a volley.
  • the video signal processor 10 uses such segments to detect a play.
  • the video signal processor 10 can detect an approximate game play structure.
  • a video data of a ski jumping for broadcasting is generally composed of segments of a jumper's being ready for starting, skiing downhill on the approach, jumping and landing.
  • the video data is a repetition of a series of segments for each jumper.
  • a further restriction has to be set to exclude inappropriate chains. It depends upon the kind of a sports in consideration what restriction is appropriate for such a purpose.
  • the video signal processor 10 may use an empirical rule that of detected cyclic chains, only ones having long cycles are detected as plays.
  • the video signal processor 10 effects a series of operations as outlined in FIG. 18 to use the chain detection method for automatic detection of plays in a sportscast.
  • the video signal processor 10 detects cyclic chains using the aforementioned cyclic chain detection algorithm.
  • the video signal processor 10 applies a quality criterion to the detected chains list to filter the chains list and remove unsubstantial chains from the list.
  • the quality criterion is for example a cyclic chain covering the majority of a program. Namely, such a cyclic chain is kept while the other cyclic chains are excluded.
  • the video signal processor 10 may additionally adopt a restrictive condition peculiar to each sports.
  • the video signal processor 10 can automatically detect a play in a sportscast by analyzing the chains.
  • each of video data of many TV programs such as drama, comedy, variety show, etc. is composed of the aforementioned scenes.
  • the video data has a so-called “topic” structure as an upper structure.
  • Some of the topic structures consist of a list of some relevant scenes.
  • the topic is not always similar to topics in a newscast beginning with segments of an introduction by a studio emcee. For example, segments of a logo image or segments of a main emcee are used as visual examples in place of the introductory segments, and the same program music always broadcast at each start of a new topic is used as audio example.
  • the video signal processor 10 effects a series of operations as outlined in FIG. 19 to detect a topic by the combination of the cyclic detection using chains and the scene detection.
  • the video signal processor 10 detects basic similarity chains to discriminate a set of basic similarity chains from the other.
  • step S 122 the video signal processor 10 detects cyclic chains to discriminate a set of cyclic chains list from the other.
  • the video signal processor 10 uses the basic similarity chains list detected at step S 121 and applies the algorithm shown in FIG. 16 to extract scene structures.
  • the video signal processor 10 can acquire a scenes list.
  • the video signal processor 10 compares the cyclic chains list detected at step S 122 with each scene element detected at step S 123 , and removes all shorter-cycle cyclic chains than the scenes included in the detected scenes list. The remaining cyclic chains have some scenes in each cycle thereof, but the cycle will be discriminated as a candidate topic.
  • the video signal processor 10 can detect topics by a combination of the cyclic detection using the chains and the scene detection.
  • the video signal processor 10 can detect topics with a high accuracy by setting other restrictions and quality requirement for the operations at step S 124 .
  • the video signal processor 10 can determine and output a variety of local video structures and/or global video structures.
  • the video signal processor 10 can detect similarity chains composed of a plurality of visual or audio segments similar to each other. Then, the video signal processor 10 can analyze the detected similarity chains to extract high-level video structures. Especially, the video signal processor 10 can analyze the local and global video structures in a common framework.
  • the video signal processor 10 can full automatically effect the series of operations with no necessity for the user to know the structure of the video data content.
  • the video signal processor 10 can consecutively analyze the video structure. Further, if the platform used in the video signal processor 10 has a sufficiently high calculating capability, the video signal processor 10 can real-time analyze the video structure. Thus, the video signal processor 10 can be applied video data previously recorded and live video broadcast. For example, the video signal processor 10 is applicable to a live sportscast to detect plays in the sportscast.
  • the video signal processor 10 can provide a new high-level access base for video browsing as a result of the video structure detection. That is, the video signal processor 10 can access to a content-based video data using high-level video structures such as topics, not segments. For example, by displaying scenes, the video signal processor 10 can quickly know the summary of a program and rapidly detect an interesting part.
  • the video signal processor 10 can provide a strong and new method of access to a newscast by permitting the user to select, watch and listen to a news in units of news item.
  • the video signal processor 10 can provide a base for automatic summing-up of a video data as a result of the video structure detection.
  • a consistent summary should be prepared not by combining significant segments included in a video data but by decomposing the video data into significant components which can be reconstructed and combining appropriate segments based on the components.
  • Video structures detected by the video signal processor 10 will provide basic information for such summing-up.
  • the video signal processor 10 can analyze a video data into genres.
  • the video signal processor 10 can be adapted to detect only games of tennis.
  • the video signal processor 10 can be incorporated in a video editing system in a broadcasting station to edit a video data based on its contents.
  • the video signal processor 10 can be used as a home appliance to analyze data in a home video recorder and automatically extract video structures from such a data. Further, the video signal processor 10 can be used to sum up the contents of a video data and edit the video data based on its contents.
  • the video signal processor 10 can be used as a tool to complement the user's manual analysis of the contents of a video data.
  • the video signal processor 10 can provide an easier navigation of contents, and video structure analysis, of a video data by imaging the result of chain detection.
  • the video signal processor 10 can be applied to home electronic appliances such as set-top box, digital video recorder, home server, etc.
  • the present invention provides a signal processing method for detecting and analyzing a pattern reflecting the semantics of the content of a signal, the method including steps of extracting, from a segment consisting of a sequence of consecutive frames forming together the signal, at least one feature which characterizes the properties of the segment; calculating, using the extracted feature, a criterion for measurement of a similarity between a pair of segments for every extracted feature and measuring a similarity between a pair of segments according to the similarity measurement criterion; and detecting, using the feature and similarity determination criterion, a similarity chain consisting of two or more, similar to each other, of the segments.
  • the signal processing method can detect basic structure patterns composed of similar segments in a signal and analyze how the structure patterns are combined together, thereby permitting to extract high-level structures.
  • the present invention provides a video signal processor for detecting and analyzing a visual and/or audio pattern reflecting the semantics of the content of a supplied video signal
  • the apparatus including means for extracting, from a visual and/or audio segment consisting of a sequence of consecutive visual and/or audio frames forming together the video signal, at least one feature which characterizes the properties of the visual and/or audio segment; means for calculating, using the extracted feature, a criterion for measurement of a similarity between a pair of visual segments and/or audio segments for every extracted feature and measuring a similarity between a pair of visual segments and/or audio segments according to the similarity measurement criterion; and means for detecting, using the feature and similarity determination criterion, a similarity chain consisting of two or more, similar to each other, of the visual and/or audio segments.
  • the video signal processor can determine and output basic structure patterns each composed of similar visual and/or audio segments in a video signal and analyze how the structure patterns are combined together, thereby permitting to extract high-level video structures.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)
  • Television Signal Processing For Recording (AREA)
US09/647,303 1999-01-29 2000-01-27 Signal processing method and video/voice processing device Expired - Fee Related US6744922B1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2306999 1999-01-29
JP11-023069 1999-01-29
PCT/JP2000/000422 WO2000045603A1 (fr) 1999-01-29 2000-01-27 Procede de traitement des signaux et dispositif de traitement de signaux video/vocaux

Publications (1)

Publication Number Publication Date
US6744922B1 true US6744922B1 (en) 2004-06-01

Family

ID=12100124

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/647,303 Expired - Fee Related US6744922B1 (en) 1999-01-29 2000-01-27 Signal processing method and video/voice processing device

Country Status (3)

Country Link
US (1) US6744922B1 (fr)
EP (1) EP1067800A4 (fr)
WO (1) WO2000045603A1 (fr)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020146168A1 (en) * 2001-03-23 2002-10-10 Lg Electronics Inc. Anchor shot detection method for a news video browsing system
US20020191012A1 (en) * 2001-05-10 2002-12-19 Markus Baumeister Display of follow-up information relating to information items occurring in a multimedia device
US20040008789A1 (en) * 2002-07-10 2004-01-15 Ajay Divakaran Audio-assisted video segmentation and summarization
US20040085323A1 (en) * 2002-11-01 2004-05-06 Ajay Divakaran Video mining using unsupervised clustering of video content
US20050108622A1 (en) * 1999-01-30 2005-05-19 Lg Electronics Inc. Method of describing multiple level digest segment information scheme for multimedia contents and apparatus for generating digest stream from described multiple level digest segment information scheme and method thereof
US20050200762A1 (en) * 2004-01-26 2005-09-15 Antonio Barletta Redundancy elimination in a content-adaptive video preview system
US20060110057A1 (en) * 2004-11-23 2006-05-25 Microsoft Corporation Method and system for detecting black frames in a sequence of frames
US20060165375A1 (en) * 2004-11-26 2006-07-27 Samsung Electronics Co., Ltd Recordable PVR using metadata and recording control method thereof
US20060228048A1 (en) * 2005-04-08 2006-10-12 Forlines Clifton L Context aware video conversion method and playback system
US20070296863A1 (en) * 2006-06-12 2007-12-27 Samsung Electronics Co., Ltd. Method, medium, and system processing video data
US20090103886A1 (en) * 2005-06-27 2009-04-23 Matsushita Electric Industrial Co., Ltd. Same scene detection method, device, and storage medium containing program
US7596549B1 (en) 2006-04-03 2009-09-29 Qurio Holdings, Inc. Methods, systems, and products for analyzing annotations for related content
US20090257649A1 (en) * 2005-08-17 2009-10-15 Masaki Yamauchi Video scene classification device and video scene classification method
US20100067745A1 (en) * 2008-09-16 2010-03-18 Ivan Kovtun System and method for object clustering and identification in video
US7779004B1 (en) 2006-02-22 2010-08-17 Qurio Holdings, Inc. Methods, systems, and products for characterizing target systems
US7840903B1 (en) 2007-02-26 2010-11-23 Qurio Holdings, Inc. Group content representations
US20100329563A1 (en) * 2007-11-01 2010-12-30 Gang Luo System and Method for Real-time New Event Detection on Video Streams
US20110149169A1 (en) * 2008-06-26 2011-06-23 Nec Corporation High-quality content generating system, method therefor, and program
US8005841B1 (en) * 2006-04-28 2011-08-23 Qurio Holdings, Inc. Methods, systems, and products for classifying content segments
US8046814B1 (en) * 2003-10-22 2011-10-25 The Weather Channel, Inc. Systems and methods for formulating and delivering video having perishable information
US20120076357A1 (en) * 2010-09-24 2012-03-29 Kabushiki Kaisha Toshiba Video processing apparatus, method and system
US20120106798A1 (en) * 2009-07-01 2012-05-03 Nec Corporation System and method for extracting representative feature
US8200010B1 (en) * 2007-09-20 2012-06-12 Google Inc. Image segmentation by clustering web images
CN102056026B (zh) * 2009-11-06 2013-04-03 中国移动通信集团设计院有限公司 音视频同步检测方法及其系统、语音检测方法及其系统
US20130156321A1 (en) * 2011-12-16 2013-06-20 Shigeru Motoi Video processing apparatus and method
US8615573B1 (en) 2006-06-30 2013-12-24 Quiro Holdings, Inc. System and method for networked PVR storage and content capture
US8923607B1 (en) 2010-12-08 2014-12-30 Google Inc. Learning sports highlights using event detection
US20160155001A1 (en) * 2013-07-18 2016-06-02 Longsand Limited Identifying stories in media content
US20160172004A1 (en) * 2014-01-07 2016-06-16 Panasonic Intellectual Property Management Co., Ltd. Video capturing apparatus
US9436876B1 (en) * 2014-12-19 2016-09-06 Amazon Technologies, Inc. Video segmentation techniques
US9473803B2 (en) * 2014-08-08 2016-10-18 TCL Research America Inc. Personalized channel recommendation method and system
US20180101540A1 (en) * 2016-10-10 2018-04-12 Facebook, Inc. Diversifying Media Search Results on Online Social Networks
US11847829B2 (en) 2020-03-23 2023-12-19 Alibaba Group Holding Limited Method, apparatus, electronic device, and computer storage medium for video processing

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7296231B2 (en) * 2001-08-09 2007-11-13 Eastman Kodak Company Video structuring by probabilistic merging of video segments
GB0406504D0 (en) * 2004-03-23 2004-04-28 British Telecomm Method and system for detecting audio and video scene changes
JP4215681B2 (ja) * 2004-05-26 2009-01-28 株式会社東芝 動画像処理装置及びその方法
KR20060116335A (ko) * 2005-05-09 2006-11-15 삼성전자주식회사 이벤트를 이용한 동영상 요약 장치 및 방법과 그 장치를제어하는 컴퓨터 프로그램을 저장하는 컴퓨터로 읽을 수있는 기록 매체
CN101395607B (zh) 2006-03-03 2011-10-05 皇家飞利浦电子股份有限公司 用于自动生成多个图像的概要的方法和设备
FR2909506A1 (fr) * 2006-12-01 2008-06-06 France Telecom Structuration d'un flux de donnees numeriques
EP2291995A1 (fr) * 2008-06-24 2011-03-09 Koninklijke Philips Electronics N.V. Traitement d'images
WO2010055242A1 (fr) * 2008-11-13 2010-05-20 France Telecom Procede de decoupage de contenu multimedia, dispositif et programme d'ordinateur correspondant
CN111385670A (zh) * 2018-12-27 2020-07-07 深圳Tcl新技术有限公司 目标角色视频片段播放方法、系统、装置及存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0259976A (ja) 1988-08-26 1990-02-28 Matsushita Electric Works Ltd ブロック統合処理方式
JPH07193748A (ja) 1993-12-27 1995-07-28 Nippon Telegr & Teleph Corp <Ntt> 動画像処理方法および装置
EP0711078A2 (fr) 1994-11-04 1996-05-08 Matsushita Electric Industrial Co., Ltd. Appareil pour le codage et le décodage d'images
JPH08181995A (ja) 1994-12-21 1996-07-12 Matsushita Electric Ind Co Ltd 動画像符号化装置および動画像復号化装置
US5664227A (en) * 1994-10-14 1997-09-02 Carnegie Mellon University System and method for skimming digital audio/video data
JPH10257436A (ja) 1997-03-10 1998-09-25 Atsushi Matsushita 動画像の自動階層構造化方法及びこれを用いたブラウジング方法
US5821945A (en) * 1995-02-03 1998-10-13 The Trustees Of Princeton University Method and apparatus for video browsing based on content and structure
US6278446B1 (en) * 1998-02-23 2001-08-21 Siemens Corporate Research, Inc. System for interactive organization and browsing of video

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0259976A (ja) 1988-08-26 1990-02-28 Matsushita Electric Works Ltd ブロック統合処理方式
JPH07193748A (ja) 1993-12-27 1995-07-28 Nippon Telegr & Teleph Corp <Ntt> 動画像処理方法および装置
US5664227A (en) * 1994-10-14 1997-09-02 Carnegie Mellon University System and method for skimming digital audio/video data
EP0711078A2 (fr) 1994-11-04 1996-05-08 Matsushita Electric Industrial Co., Ltd. Appareil pour le codage et le décodage d'images
US5751377A (en) 1994-11-04 1998-05-12 Matsushita Electric Industrial Co., Ltd. Picture coding apparatus and decoding apparatus
JPH08181995A (ja) 1994-12-21 1996-07-12 Matsushita Electric Ind Co Ltd 動画像符号化装置および動画像復号化装置
US5821945A (en) * 1995-02-03 1998-10-13 The Trustees Of Princeton University Method and apparatus for video browsing based on content and structure
JPH10257436A (ja) 1997-03-10 1998-09-25 Atsushi Matsushita 動画像の自動階層構造化方法及びこれを用いたブラウジング方法
US6278446B1 (en) * 1998-02-23 2001-08-21 Siemens Corporate Research, Inc. System for interactive organization and browsing of video

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
Aoki, et al. "A shot classification method of selecting effective key-frames for video browsing", ACM, pp. 1-10, 1996.* *
Maybury, et al. "Multimedia summaries of broadcast news", IEEE, pp. 442-449, 1997.* *
Merlino, et al. "Broadcast news navigation using story segmentation", ACM, pp. 381-391, 1997.* *
Nakamura, et al. "Semantic analysis for video contents extraction-spotting by association in news video", ACM, pp. 393-401, 1997.* *
Rui, et al. "Exploring video structure beyond the shots", IEEE, pp. 1-4, 1998.* *
Taskiran, et al. "A compressed video database structured for active browsing and search", IEEE, pp. 133-137, 1998.* *
Vasconcelos, et al. Bayesian modeling of video editing and structure: semantic features for video summarization and browsing IEEE, pp. 153-157, 1998.* *
Yeung, et al. "Extracting story units from long programs for video browing and navigation", IEEE, pp. 296-305, 1996.* *
Yeung, et al. "Time-constrained clustering for segmentation of video into story units", IEEE, pp. 375-380, 1996.* *
Yeung, et al. "Video visualization for compact presentation and fast browsing of pictorial content", IEEE, pp. 771-785, 1997. *
Yoshitaka, et al. "Content-based retrieval of video data by the grammar of film", IEEE, pp. 310-317, 1997.* *

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050108622A1 (en) * 1999-01-30 2005-05-19 Lg Electronics Inc. Method of describing multiple level digest segment information scheme for multimedia contents and apparatus for generating digest stream from described multiple level digest segment information scheme and method thereof
US7406655B2 (en) 1999-01-30 2008-07-29 Lg Electronics, Inc. Method of describing multiple level digest segment information scheme for multimedia contents and apparatus for generating digest stream from described multiple level digest segment information scheme and method thereof
US7392467B2 (en) * 1999-01-30 2008-06-24 Lg Electronics, Inc. Method of describing multiple level digest segment information scheme for multimedia contents and apparatus for generating digest stream from described multiple level digest segment information scheme and method thereof
US20020146168A1 (en) * 2001-03-23 2002-10-10 Lg Electronics Inc. Anchor shot detection method for a news video browsing system
US20020191012A1 (en) * 2001-05-10 2002-12-19 Markus Baumeister Display of follow-up information relating to information items occurring in a multimedia device
US7149972B2 (en) * 2001-05-10 2006-12-12 Koninklijke Philips Electronics N.V. Display of follow-up information relating to information items occurring in a multimedia device
US7349477B2 (en) * 2002-07-10 2008-03-25 Mitsubishi Electric Research Laboratories, Inc. Audio-assisted video segmentation and summarization
US20040008789A1 (en) * 2002-07-10 2004-01-15 Ajay Divakaran Audio-assisted video segmentation and summarization
US20040085323A1 (en) * 2002-11-01 2004-05-06 Ajay Divakaran Video mining using unsupervised clustering of video content
US7375731B2 (en) * 2002-11-01 2008-05-20 Mitsubishi Electric Research Laboratories, Inc. Video mining using unsupervised clustering of video content
US8046814B1 (en) * 2003-10-22 2011-10-25 The Weather Channel, Inc. Systems and methods for formulating and delivering video having perishable information
US8090200B2 (en) * 2004-01-26 2012-01-03 Sony Deutschland Gmbh Redundancy elimination in a content-adaptive video preview system
US20050200762A1 (en) * 2004-01-26 2005-09-15 Antonio Barletta Redundancy elimination in a content-adaptive video preview system
US7650031B2 (en) * 2004-11-23 2010-01-19 Microsoft Corporation Method and system for detecting black frames in a sequence of frames
US20060110057A1 (en) * 2004-11-23 2006-05-25 Microsoft Corporation Method and system for detecting black frames in a sequence of frames
US20060165375A1 (en) * 2004-11-26 2006-07-27 Samsung Electronics Co., Ltd Recordable PVR using metadata and recording control method thereof
US20060228048A1 (en) * 2005-04-08 2006-10-12 Forlines Clifton L Context aware video conversion method and playback system
US7526725B2 (en) * 2005-04-08 2009-04-28 Mitsubishi Electric Research Laboratories, Inc. Context aware video conversion method and playback system
US20090103886A1 (en) * 2005-06-27 2009-04-23 Matsushita Electric Industrial Co., Ltd. Same scene detection method, device, and storage medium containing program
US20090257649A1 (en) * 2005-08-17 2009-10-15 Masaki Yamauchi Video scene classification device and video scene classification method
US8233708B2 (en) 2005-08-17 2012-07-31 Panasonic Corporation Video scene classification device and video scene classification method
US7779004B1 (en) 2006-02-22 2010-08-17 Qurio Holdings, Inc. Methods, systems, and products for characterizing target systems
US7596549B1 (en) 2006-04-03 2009-09-29 Qurio Holdings, Inc. Methods, systems, and products for analyzing annotations for related content
US8005841B1 (en) * 2006-04-28 2011-08-23 Qurio Holdings, Inc. Methods, systems, and products for classifying content segments
US20070296863A1 (en) * 2006-06-12 2007-12-27 Samsung Electronics Co., Ltd. Method, medium, and system processing video data
US9118949B2 (en) 2006-06-30 2015-08-25 Qurio Holdings, Inc. System and method for networked PVR storage and content capture
US8615573B1 (en) 2006-06-30 2013-12-24 Quiro Holdings, Inc. System and method for networked PVR storage and content capture
US7840903B1 (en) 2007-02-26 2010-11-23 Qurio Holdings, Inc. Group content representations
US8200010B1 (en) * 2007-09-20 2012-06-12 Google Inc. Image segmentation by clustering web images
US8428360B2 (en) * 2007-11-01 2013-04-23 International Business Machines Corporation System and method for real-time new event detection on video streams
US20100329563A1 (en) * 2007-11-01 2010-12-30 Gang Luo System and Method for Real-time New Event Detection on Video Streams
US9215479B2 (en) 2007-11-01 2015-12-15 International Business Machines Corporation System and method for real-time new event detection on video streams
US20110149169A1 (en) * 2008-06-26 2011-06-23 Nec Corporation High-quality content generating system, method therefor, and program
US8879004B2 (en) * 2008-06-26 2014-11-04 Nec Corporation High-quality content generation system, method therefor, and program
US8150169B2 (en) * 2008-09-16 2012-04-03 Viewdle Inc. System and method for object clustering and identification in video
US20100067745A1 (en) * 2008-09-16 2010-03-18 Ivan Kovtun System and method for object clustering and identification in video
US20120106798A1 (en) * 2009-07-01 2012-05-03 Nec Corporation System and method for extracting representative feature
US9361517B2 (en) * 2009-07-01 2016-06-07 Nec Corporation System and method for extracting representative feature
CN102056026B (zh) * 2009-11-06 2013-04-03 中国移动通信集团设计院有限公司 音视频同步检测方法及其系统、语音检测方法及其系统
US8879788B2 (en) * 2010-09-24 2014-11-04 Kabushiki, Kaisha Toshiba Video processing apparatus, method and system
US20120076357A1 (en) * 2010-09-24 2012-03-29 Kabushiki Kaisha Toshiba Video processing apparatus, method and system
US8923607B1 (en) 2010-12-08 2014-12-30 Google Inc. Learning sports highlights using event detection
US11556743B2 (en) 2010-12-08 2023-01-17 Google Llc Learning highlights using event detection
US10867212B2 (en) 2010-12-08 2020-12-15 Google Llc Learning highlights using event detection
US9715641B1 (en) 2010-12-08 2017-07-25 Google Inc. Learning highlights using event detection
US20130156321A1 (en) * 2011-12-16 2013-06-20 Shigeru Motoi Video processing apparatus and method
US8873861B2 (en) * 2011-12-16 2014-10-28 Kabushiki Kaisha Toshiba Video processing apparatus and method
US9734408B2 (en) * 2013-07-18 2017-08-15 Longsand Limited Identifying stories in media content
US20160155001A1 (en) * 2013-07-18 2016-06-02 Longsand Limited Identifying stories in media content
US20160172004A1 (en) * 2014-01-07 2016-06-16 Panasonic Intellectual Property Management Co., Ltd. Video capturing apparatus
US9473803B2 (en) * 2014-08-08 2016-10-18 TCL Research America Inc. Personalized channel recommendation method and system
US10528821B2 (en) 2014-12-19 2020-01-07 Amazon Technologies, Inc. Video segmentation techniques
US9436876B1 (en) * 2014-12-19 2016-09-06 Amazon Technologies, Inc. Video segmentation techniques
US20180101540A1 (en) * 2016-10-10 2018-04-12 Facebook, Inc. Diversifying Media Search Results on Online Social Networks
US11847829B2 (en) 2020-03-23 2023-12-19 Alibaba Group Holding Limited Method, apparatus, electronic device, and computer storage medium for video processing

Also Published As

Publication number Publication date
EP1067800A1 (fr) 2001-01-10
WO2000045603A1 (fr) 2000-08-03
EP1067800A4 (fr) 2005-07-27

Similar Documents

Publication Publication Date Title
US6744922B1 (en) Signal processing method and video/voice processing device
US6928233B1 (en) Signal processing method and video signal processor for detecting and analyzing a pattern reflecting the semantics of the content of a signal
US7027508B2 (en) AV signal processing apparatus for detecting a boundary between scenes, method and recording medium therefore
US8233708B2 (en) Video scene classification device and video scene classification method
US7474698B2 (en) Identification of replay segments
Del Fabro et al. State-of-the-art and future challenges in video scene detection: a survey
CN100409236C (zh) 流式视频书签
US6931595B2 (en) Method for automatic extraction of semantically significant events from video
Hanjalic Generic approach to highlights extraction from a sport video
Han et al. An integrated baseball digest system using maximum entropy method
US20080044085A1 (en) Method and apparatus for playing back video, and computer program product
EP1067786B1 (fr) Procede de description de donnees et unite de traitement de donnees
US20110243529A1 (en) Electronic apparatus, content recommendation method, and program therefor
Li et al. A general framework for sports video summarization with its application to soccer
US7214868B2 (en) Acoustic signal processing apparatus and method, signal recording apparatus and method and program
JP2000285243A (ja) 信号処理方法及び映像音声処理装置
TWI408950B (zh) 分析運動視訊之系統、方法及具有程式之電腦可讀取記錄媒體
JP5257356B2 (ja) コンテンツ分割位置判定装置、コンテンツ視聴制御装置及びプログラム
JP2000285242A (ja) 信号処理方法及び映像音声処理装置
CN1679027A (zh) 用于检测视频图像序列中内容属性的设备和方法
US20100079673A1 (en) Video processing apparatus and method thereof
KR100510098B1 (ko) 골프 비디오 이벤트 자동 검출 장치 및 그 방법
Chudasama et al. Views Detection from Cricket Video using Low Level Features.
Waseemullah et al. Unsupervised Ads Detection in TV Transmissions
Kyperountas et al. Scene change detection using audiovisual clues

Legal Events

Date Code Title Description
AS Assignment

Owner name: SONY CORPORATION, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:WALKER, TOBY;REEL/FRAME:011165/0422

Effective date: 20000920

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 8

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20160601