JP4340907B2 - オーディオビジュアルサマリ作成方法および装置 - Google Patents
オーディオビジュアルサマリ作成方法および装置 Download PDFInfo
- Publication number
- JP4340907B2 JP4340907B2 JP2005107342A JP2005107342A JP4340907B2 JP 4340907 B2 JP4340907 B2 JP 4340907B2 JP 2005107342 A JP2005107342 A JP 2005107342A JP 2005107342 A JP2005107342 A JP 2005107342A JP 4340907 B2 JP4340907 B2 JP 4340907B2
- Authority
- JP
- Japan
- Prior art keywords
- audio
- image
- segments
- visual
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 230000000007 visual effect Effects 0.000 title description 54
- 238000010801 machine learning Methods 0.000 claims abstract description 23
- 238000013528 artificial neural network Methods 0.000 claims abstract description 3
- 238000003066 decision tree Methods 0.000 claims abstract description 3
- 238000012549 training Methods 0.000 claims description 9
- 230000033001 locomotion Effects 0.000 claims description 7
- 230000001427 coherent effect Effects 0.000 claims description 5
- 238000013459 approach Methods 0.000 abstract description 9
- 238000005192 partition Methods 0.000 description 20
- 239000013598 vector Substances 0.000 description 10
- 230000001174 ascending effect Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000004880 explosion Methods 0.000 description 4
- 238000000354 decomposition reaction Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 101100072002 Arabidopsis thaliana ICME gene Proteins 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/16—Analogue secrecy systems; Analogue subscription systems
- H04N7/162—Authorising the user terminal, e.g. by paying; Registering the use of a subscription channel, e.g. billing
- H04N7/165—Centralised control of user terminal ; Registering at central
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
- G06F16/435—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
- G06F16/438—Presentation of query results
- G06F16/4387—Presentation of query results by the use of playlists
- G06F16/4393—Multimedia presentations, e.g. slide shows, multimedia albums
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/738—Presentation of query results
- G06F16/739—Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7834—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
- G06F18/256—Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2368—Multiplexing of audio and video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/266—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
- H04N21/26603—Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel for automatically generating descriptors from content, e.g. when it is not made available by its provider, using content analysis techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4341—Demultiplexing of audio and video streams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Library & Information Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Computer Security & Cryptography (AREA)
- Computational Linguistics (AREA)
- Television Signal Processing For Recording (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Description
通常のビデオ番組は、オーディオトラックおよび画像トラックの両方を含み、これらはいずれも長く連続することがある。このようなビデオ番組のサマリを作成するには、そのビデオを構成するオーディオトラックおよび画像トラックの両方を、有意味かつ管理可能な操作ユニットに分節化しなければならない。例えば、有意味なオーディオ操作ユニットとしては、1個の単語、1個の句、1個の文、あるいはその他のコヒーレントな音響プロファイルを有するオーディオセグメントの発声がある。同様に、可能な画像操作ユニットの例には、単一のカメラショット、一連の連続するカメラショット、ある判断基準によってグループ分けされた画像フレームのクラスタなどがある。
上記のように、図1は、機械学習によるビデオサマリ要約作成システム・方法の一実施例の動作を示す概略流れ図である。システムは、ビデオ入力の画像トラックおよびオーディオトラックを検査する。さらに、システムは、入力ビデオに関連する字幕があればそれも検査することが可能である。ビデオサマリ作成システム・方法は、これらの3つの入力コンポーネント、すなわち、字幕、オーディオトラック、および画像トラックの間の整列を実行することが可能である。各入力コンポーネントに対する特徴抽出および特殊な操作も実行可能である。抽出された特徴および各コンポーネント操作の出力はその後、オーディオビジュアル統合型サマリ、または、オーディオ中心型サマリもしくは画像中心型サマリのいずれかを作成するために、機械学習によるサマリ作成モジュールに入力される。以下の操作が一般に、入力コンポーネントのそれぞれに関して実行される。
オーディオおよびビジュアルサマリが作成された後、解決すべき最後の問題は、どのようにしてこれらの2つのサマリを同期するかである。オーディオトラックAおよび画像トラックIからなるビデオシーケンスをV=(I,A)とする。Vのオーディオサマリは、Asum={A(ti,τi)∈A|i=1,...,N(Asum)}と表される。ただし、A(ti,τi)は、時刻tiに開始し時間τiだけ継続するオーディオセグメントを表し、N(Asum)は、Asumを構成するオーディオセグメントの個数を表す。Asum内のすべてのオーディオセグメントは、それらの開始時刻tiの昇順に配列される。同様に、Vのビジュアルサマリは、Isum={I(tj,τj)∈I|j=1,...,N(Isum)}と表され、すべてのコンポーネントはそれらの開始時刻の昇順にソートされる。
上記のように、機械学習フレームワークに基づくビデオサマリ作成のシステムおよび方法は、人間の専門家が前もって作成した十分な数のサンプルビデオサマリからなるトレーニングデータを必要とする。機械学習によるサマリ作成のシステムおよび方法は、専門家のサンプルビデオサマリから学習すること、および、サンプルビデオサマリに示される挙動を模倣することによってビデオサマリを作成することが可能である。しかし、場合によっては、専門家により作られたサンプルビデオサマリを得ることが高価すぎることや非常に困難なことがある。このような場合、トレーニングデータを必要としないシステムおよび方法を提供することが好ましい。
個の時間スロットを提供することが可能であり、したがって利用可能な時間スロットの総数はStotal=Σi=1 PSiとなる。ここで、問題は次のようになる。ビデオサマリの全部でO個のフレームクラスタとStotal個の時間スロットが与えられた場合に、上記の2つの制約を満たすように、フレームクラスタと時間スロットの間の最適なマッチングを決定せよ。
I(0,10)からなるクラスタ1、
I(10,10)およびI(50,10)からなるクラスタ2、
I(30,10)からなるクラスタ3、
I(20,10)およびI(40,10)からなるクラスタ4、
I(60,10)からなるクラスタ5。
Claims (3)
- オーディオトラックおよび画像トラックを有するビデオ番組のオーディオ中心型オーディオビジュアルサマリを作成する方法において、
前記オーディオビジュアルサマリの時間長Lsumを選択するステップと、
前記オーディオビジュアルサマリの所望される内容に関連する与えられたオーディオ特性、画像特性およびテキスト特性に基づき、前記ビデオ番組内のオーディオセグメントの各々について前記オーディオビジュアルサマリに含められる確率をトレーニングデータに依拠して予測する機械学習法に従って、前記オーディオトラックから1個以上の音声ユニットおよび非音声ユニットを識別する識別ステップと、
前記時間長Lsumに達するまで、前記確率の降順に、1個以上のオーディオセグメントを前記オーディオビジュアルサマリに追加するステップと、
前記画像トラックを画像操作ユニットとして使用するコヒーレントな画像プロファイルおよび動きプロファイルを有する画像セグメントに分節化するステップと、
前記追加された1個以上のオーディオセグメントに対応する1個以上の画像セグメントを選択するステップと、
を有し、
前記識別するステップは、
前記オーディオトラックから非音声サウンドを含む非音声オーディオセグメントを検出するステップと、
前記非音声オーディオセグメントをコヒーレントな音響プロファイルを有する非音声ユニットに分節するステップと、
各非音声ユニットのオーディオ特性を前記確率を計算するために生成するステップと、
前記オーディオトラックから前記非音声オーディオセグメントを取り除くステップと、
前記非音声オーディオセグメントが取り除かれた前記オーディオトラックの残りのオーディオセグメントに対して音声認識を実行して音声トランスクリプトを生成するステップと、
前記音声トランスクリプトに基づいて、有意味な音声内容を有する音声ユニットを生成するステップと、
各音声ユニットのオーディオ特性を前記確率を計算するために生成するステップと、
を有することを特徴とするオーディオビジュアルサマリ作成方法。 - 字幕が存在するとき、前記方法は、字幕と音声トランスクリプトとを同期させるステップをさらに有することを特徴とする請求項1記載の方法。
- 前記確率は、ナイーブベイズ法、決定木法、ニューラルネットワーク法、および最大エントロピー法からなる群から選択される方法に従って計算されることを特徴とする請求項1記載の方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US25453400P | 2000-12-12 | 2000-12-12 | |
US10/011,215 US6925455B2 (en) | 2000-12-12 | 2001-10-25 | Creating audio-centric, image-centric, and integrated audio-visual summaries |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2001376561A Division JP3705429B2 (ja) | 2000-12-12 | 2001-12-11 | オーディオビジュアルサマリ作成方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2005309427A JP2005309427A (ja) | 2005-11-04 |
JP4340907B2 true JP4340907B2 (ja) | 2009-10-07 |
Family
ID=26682129
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2001376561A Expired - Fee Related JP3705429B2 (ja) | 2000-12-12 | 2001-12-11 | オーディオビジュアルサマリ作成方法 |
JP2005107342A Expired - Fee Related JP4340907B2 (ja) | 2000-12-12 | 2005-04-04 | オーディオビジュアルサマリ作成方法および装置 |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2001376561A Expired - Fee Related JP3705429B2 (ja) | 2000-12-12 | 2001-12-11 | オーディオビジュアルサマリ作成方法 |
Country Status (2)
Country | Link |
---|---|
US (1) | US6925455B2 (ja) |
JP (2) | JP3705429B2 (ja) |
Families Citing this family (119)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8028314B1 (en) | 2000-05-26 | 2011-09-27 | Sharp Laboratories Of America, Inc. | Audiovisual information management system |
US8020183B2 (en) | 2000-09-14 | 2011-09-13 | Sharp Laboratories Of America, Inc. | Audiovisual management system |
US20030038796A1 (en) * | 2001-02-15 | 2003-02-27 | Van Beek Petrus J.L. | Segmentation metadata for audio-visual content |
US6520032B2 (en) * | 2001-03-27 | 2003-02-18 | Trw Vehicle Safety Systems Inc. | Seat belt tension sensing apparatus |
US20030163815A1 (en) * | 2001-04-06 | 2003-08-28 | Lee Begeja | Method and system for personalized multimedia delivery service |
US20030088687A1 (en) | 2001-12-28 | 2003-05-08 | Lee Begeja | Method and apparatus for automatically converting source video into electronic mail messages |
US8060906B2 (en) * | 2001-04-06 | 2011-11-15 | At&T Intellectual Property Ii, L.P. | Method and apparatus for interactively retrieving content related to previous query results |
US7904814B2 (en) | 2001-04-19 | 2011-03-08 | Sharp Laboratories Of America, Inc. | System for presenting audio-video content |
CA2386303C (en) | 2001-05-14 | 2005-07-05 | At&T Corp. | Method for content-based non-linear control of multimedia playback |
JP4426743B2 (ja) * | 2001-09-13 | 2010-03-03 | パイオニア株式会社 | 映像情報要約装置、映像情報要約方法および映像情報要約処理プログラム |
US7474698B2 (en) | 2001-10-19 | 2009-01-06 | Sharp Laboratories Of America, Inc. | Identification of replay segments |
US8214741B2 (en) | 2002-03-19 | 2012-07-03 | Sharp Laboratories Of America, Inc. | Synchronization of video and data |
US6940540B2 (en) * | 2002-06-27 | 2005-09-06 | Microsoft Corporation | Speaker detection and tracking using audiovisual data |
US7657907B2 (en) | 2002-09-30 | 2010-02-02 | Sharp Laboratories Of America, Inc. | Automatic user profiling |
KR101109023B1 (ko) * | 2003-04-14 | 2012-01-31 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | 콘텐트 분석을 사용하여 뮤직 비디오를 요약하는 방법 및 장치 |
KR100708337B1 (ko) | 2003-06-27 | 2007-04-17 | 주식회사 케이티 | 퍼지 기반 oc―svm을 이용한 동영상 자동 요약 장치및 방법 |
US7372991B2 (en) * | 2003-09-26 | 2008-05-13 | Seiko Epson Corporation | Method and apparatus for summarizing and indexing the contents of an audio-visual presentation |
JP3848319B2 (ja) * | 2003-11-11 | 2006-11-22 | キヤノン株式会社 | 情報処理方法及び情報処理装置 |
DE60319710T2 (de) * | 2003-11-12 | 2009-03-12 | Sony Deutschland Gmbh | Verfahren und Vorrichtung zur automatischen Dissektion segmentierte Audiosignale |
EP1531458B1 (en) * | 2003-11-12 | 2008-04-16 | Sony Deutschland GmbH | Apparatus and method for automatic extraction of important events in audio signals |
EP1531478A1 (en) * | 2003-11-12 | 2005-05-18 | Sony International (Europe) GmbH | Apparatus and method for classifying an audio signal |
EP1538536A1 (en) * | 2003-12-05 | 2005-06-08 | Sony International (Europe) GmbH | Visualization and control techniques for multimedia digital content |
JP2007519987A (ja) * | 2003-12-05 | 2007-07-19 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | 内部及び外部オーディオビジュアルデータの統合解析システム及び方法 |
US8356317B2 (en) * | 2004-03-04 | 2013-01-15 | Sharp Laboratories Of America, Inc. | Presence based technology |
US8949899B2 (en) | 2005-03-04 | 2015-02-03 | Sharp Laboratories Of America, Inc. | Collaborative recommendation system |
US7594245B2 (en) | 2004-03-04 | 2009-09-22 | Sharp Laboratories Of America, Inc. | Networked video devices |
JP2006197115A (ja) * | 2005-01-12 | 2006-07-27 | Fuji Photo Film Co Ltd | 撮像装置及び画像出力装置 |
WO2007004110A2 (en) * | 2005-06-30 | 2007-01-11 | Koninklijke Philips Electronics N.V. | System and method for the alignment of intrinsic and extrinsic audio-visual information |
US8949235B2 (en) * | 2005-11-15 | 2015-02-03 | Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. | Methods and systems for producing a video synopsis using clustering |
DK1955205T3 (da) | 2005-11-15 | 2012-10-15 | Yissum Res Dev Co | Metode og system til produktion af en videosynopsis |
JP2009520278A (ja) * | 2005-12-16 | 2009-05-21 | ネクストバイオ | 科学情報知識管理のためのシステムおよび方法 |
US9183349B2 (en) | 2005-12-16 | 2015-11-10 | Nextbio | Sequence-centric scientific information management |
US8364665B2 (en) * | 2005-12-16 | 2013-01-29 | Nextbio | Directional expression-based scientific information knowledge management |
US20070157228A1 (en) | 2005-12-30 | 2007-07-05 | Jason Bayer | Advertising with video ad creatives |
US8032840B2 (en) * | 2006-01-10 | 2011-10-04 | Nokia Corporation | Apparatus, method and computer program product for generating a thumbnail representation of a video sequence |
RU2440606C2 (ru) * | 2006-03-03 | 2012-01-20 | Конинклейке Филипс Электроникс Н.В. | Способ и устройство автоматического генерирования сводки множества изображений |
US8689253B2 (en) | 2006-03-03 | 2014-04-01 | Sharp Laboratories Of America, Inc. | Method and system for configuring media-playing sets |
US8682654B2 (en) * | 2006-04-25 | 2014-03-25 | Cyberlink Corp. | Systems and methods for classifying sports video |
WO2007127695A2 (en) | 2006-04-25 | 2007-11-08 | Elmo Weber Frank | Prefernce based automatic media summarization |
CN101485123B (zh) * | 2006-07-04 | 2014-08-20 | 皇家飞利浦电子股份有限公司 | 内容替换的方法 |
KR20090027758A (ko) * | 2006-07-04 | 2009-03-17 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | 컨텐트 치환의 방법 |
US20080085055A1 (en) * | 2006-10-06 | 2008-04-10 | Cerosaletti Cathleen D | Differential cluster ranking for image record access |
WO2008050649A1 (fr) * | 2006-10-23 | 2008-05-02 | Nec Corporation | Système, procédé et programme de récapitulation de contenu |
US8677409B2 (en) * | 2007-01-05 | 2014-03-18 | At&T Intellectual Property I, L.P | Methods, systems, and computer program products for categorizing/rating content uploaded to a network for broadcasting |
BRPI0720802B1 (pt) | 2007-02-01 | 2021-10-19 | Briefcam, Ltd. | Método e sistema para gerar uma sinopse de vídeo de uma fonte de fluxo de vídeo ininterrupta como a gerada por uma câmera de segurança de vídeo |
US8204359B2 (en) * | 2007-03-20 | 2012-06-19 | At&T Intellectual Property I, L.P. | Systems and methods of providing modified media content |
US9870796B2 (en) * | 2007-05-25 | 2018-01-16 | Tigerfish | Editing video using a corresponding synchronized written transcript by selection from a text viewer |
US20080300872A1 (en) * | 2007-05-31 | 2008-12-04 | Microsoft Corporation | Scalable summaries of audio or visual content |
WO2008150109A1 (en) * | 2007-06-04 | 2008-12-11 | Enswers Co., Ltd. | Method of processing moving picture and apparatus thereof |
WO2009111581A1 (en) * | 2008-03-04 | 2009-09-11 | Nextbio | Categorization and filtering of scientific data |
KR101614160B1 (ko) * | 2008-07-16 | 2016-04-20 | 한국전자통신연구원 | 포스트 다운믹스 신호를 지원하는 다객체 오디오 부호화 장치 및 복호화 장치 |
US8259082B2 (en) | 2008-09-12 | 2012-09-04 | At&T Intellectual Property I, L.P. | Multimodal portable communication interface for accessing video content |
US20100070863A1 (en) * | 2008-09-16 | 2010-03-18 | International Business Machines Corporation | method for reading a screen |
US9141859B2 (en) | 2008-11-17 | 2015-09-22 | Liveclips Llc | Method and system for segmenting and transmitting on-demand live-action video in real-time |
US9141860B2 (en) * | 2008-11-17 | 2015-09-22 | Liveclips Llc | Method and system for segmenting and transmitting on-demand live-action video in real-time |
US9142216B1 (en) * | 2012-01-30 | 2015-09-22 | Jan Jannink | Systems and methods for organizing and analyzing audio content derived from media files |
US10002192B2 (en) * | 2009-09-21 | 2018-06-19 | Voicebase, Inc. | Systems and methods for organizing and analyzing audio content derived from media files |
US8707381B2 (en) * | 2009-09-22 | 2014-04-22 | Caption Colorado L.L.C. | Caption and/or metadata synchronization for replay of previously or simultaneously recorded live programs |
US9191639B2 (en) | 2010-04-12 | 2015-11-17 | Adobe Systems Incorporated | Method and apparatus for generating video descriptions |
CN102385861B (zh) * | 2010-08-31 | 2013-07-31 | 国际商业机器公司 | 一种用于从语音内容生成文本内容提要的系统和方法 |
JP5259670B2 (ja) * | 2010-09-27 | 2013-08-07 | 株式会社東芝 | コンテンツ要約装置およびコンテンツ要約表示装置 |
US9185469B2 (en) | 2010-09-30 | 2015-11-10 | Kodak Alaris Inc. | Summarizing image collection using a social network |
US9489732B1 (en) * | 2010-12-21 | 2016-11-08 | Hrl Laboratories, Llc | Visual attention distractor insertion for improved EEG RSVP target stimuli detection |
US8380711B2 (en) * | 2011-03-10 | 2013-02-19 | International Business Machines Corporation | Hierarchical ranking of facial attributes |
CN103186578A (zh) * | 2011-12-29 | 2013-07-03 | 方正国际软件(北京)有限公司 | 漫画有声效果的处理系统和处理方法 |
US9367745B2 (en) | 2012-04-24 | 2016-06-14 | Liveclips Llc | System for annotating media content for automatic content understanding |
US20130283143A1 (en) | 2012-04-24 | 2013-10-24 | Eric David Petajan | System for Annotating Media Content for Automatic Content Understanding |
US9412372B2 (en) * | 2012-05-08 | 2016-08-09 | SpeakWrite, LLC | Method and system for audio-video integration |
US9699485B2 (en) * | 2012-08-31 | 2017-07-04 | Facebook, Inc. | Sharing television and video programming through social networking |
US10346542B2 (en) | 2012-08-31 | 2019-07-09 | Verint Americas Inc. | Human-to-human conversation analysis |
US9459768B2 (en) * | 2012-12-12 | 2016-10-04 | Smule, Inc. | Audiovisual capture and sharing framework with coordinated user-selectable audio and video effects filters |
US9158435B2 (en) * | 2013-03-15 | 2015-10-13 | International Business Machines Corporation | Synchronizing progress between related content from different mediums |
US9804729B2 (en) | 2013-03-15 | 2017-10-31 | International Business Machines Corporation | Presenting key differences between related content from different mediums |
US9495365B2 (en) | 2013-03-15 | 2016-11-15 | International Business Machines Corporation | Identifying key differences between related content from different mediums |
US20140362290A1 (en) * | 2013-06-06 | 2014-12-11 | Hallmark Cards, Incorporated | Facilitating generation and presentation of sound images |
US8947596B2 (en) * | 2013-06-27 | 2015-02-03 | Intel Corporation | Alignment of closed captions |
US9368106B2 (en) | 2013-07-30 | 2016-06-14 | Verint Systems Ltd. | System and method of automated evaluation of transcription quality |
CN104183239B (zh) * | 2014-07-25 | 2017-04-19 | 南京邮电大学 | 基于加权贝叶斯混合模型的与文本无关的说话人识别方法 |
US20160098395A1 (en) * | 2014-10-01 | 2016-04-07 | Charter Communications Operating, Llc | System and method for separate audio program translation |
CN107005676A (zh) * | 2014-12-15 | 2017-08-01 | 索尼公司 | 信息处理方法、影像处理装置和程序 |
KR102306538B1 (ko) * | 2015-01-20 | 2021-09-29 | 삼성전자주식회사 | 콘텐트 편집 장치 및 방법 |
WO2016149438A1 (en) | 2015-03-17 | 2016-09-22 | Cornell University | Depth field imaging apparatus, methods, and applications |
US9940932B2 (en) * | 2016-03-02 | 2018-04-10 | Wipro Limited | System and method for speech-to-text conversion |
US9858340B1 (en) | 2016-04-11 | 2018-01-02 | Digital Reasoning Systems, Inc. | Systems and methods for queryable graph representations of videos |
US11409791B2 (en) | 2016-06-10 | 2022-08-09 | Disney Enterprises, Inc. | Joint heterogeneous language-vision embeddings for video tagging and search |
US10083369B2 (en) | 2016-07-01 | 2018-09-25 | Ricoh Company, Ltd. | Active view planning by deep learning |
US10535371B2 (en) * | 2016-09-13 | 2020-01-14 | Intel Corporation | Speaker segmentation and clustering for video summarization |
US10432789B2 (en) * | 2017-02-09 | 2019-10-01 | Verint Systems Ltd. | Classification of transcripts by sentiment |
JP6355800B1 (ja) * | 2017-06-28 | 2018-07-11 | ヤフー株式会社 | 学習装置、生成装置、学習方法、生成方法、学習プログラム、および生成プログラム |
KR102452644B1 (ko) * | 2017-10-31 | 2022-10-11 | 삼성전자주식회사 | 전자 장치, 음성 인식 방법 및 기록 매체 |
CN108175426B (zh) * | 2017-12-11 | 2020-06-02 | 东南大学 | 一种基于深度递归型条件受限玻尔兹曼机的测谎方法 |
KR102542788B1 (ko) * | 2018-01-08 | 2023-06-14 | 삼성전자주식회사 | 전자장치, 그 제어방법 및 컴퓨터프로그램제품 |
KR102468214B1 (ko) * | 2018-02-19 | 2022-11-17 | 삼성전자주식회사 | 사용자의 발화를 기반으로 컨텐츠를 제공하는 장치 및 시스템 |
JP2019160071A (ja) * | 2018-03-15 | 2019-09-19 | Jcc株式会社 | 要約作成システム、及び要約作成方法 |
US20190294886A1 (en) * | 2018-03-23 | 2019-09-26 | Hcl Technologies Limited | System and method for segregating multimedia frames associated with a character |
US10679069B2 (en) | 2018-03-27 | 2020-06-09 | International Business Machines Corporation | Automatic video summary generation |
US10372991B1 (en) | 2018-04-03 | 2019-08-06 | Google Llc | Systems and methods that leverage deep learning to selectively store audiovisual content |
US10558761B2 (en) * | 2018-07-05 | 2020-02-11 | Disney Enterprises, Inc. | Alignment of video and textual sequences for metadata analysis |
WO2020014223A1 (en) * | 2018-07-09 | 2020-01-16 | Tree Goat Media, LLC | Systems and methods for transforming digital audio content into visual topic-based segments |
US11100918B2 (en) | 2018-08-27 | 2021-08-24 | American Family Mutual Insurance Company, S.I. | Event sensing system |
WO2020053862A1 (en) * | 2018-09-13 | 2020-03-19 | Ichannel.Io Ltd. | A system and computerized method for subtitles synchronization of audiovisual content using the human voice detection for synchronization |
US11822888B2 (en) | 2018-10-05 | 2023-11-21 | Verint Americas Inc. | Identifying relational segments |
US10977872B2 (en) | 2018-10-31 | 2021-04-13 | Sony Interactive Entertainment Inc. | Graphical style modification for video games using machine learning |
US11636673B2 (en) | 2018-10-31 | 2023-04-25 | Sony Interactive Entertainment Inc. | Scene annotation using machine learning |
US11375293B2 (en) * | 2018-10-31 | 2022-06-28 | Sony Interactive Entertainment Inc. | Textual annotation of acoustic effects |
US11039177B2 (en) | 2019-03-19 | 2021-06-15 | Rovi Guides, Inc. | Systems and methods for varied audio segment compression for accelerated playback of media assets |
US11102523B2 (en) * | 2019-03-19 | 2021-08-24 | Rovi Guides, Inc. | Systems and methods for selective audio segment compression for accelerated playback of media assets by service providers |
GB2587627B (en) * | 2019-10-01 | 2023-05-03 | Sony Interactive Entertainment Inc | Apparatus and method for generating a recording |
US11270123B2 (en) * | 2019-10-22 | 2022-03-08 | Palo Alto Research Center Incorporated | System and method for generating localized contextual video annotation |
US11032620B1 (en) * | 2020-02-14 | 2021-06-08 | Sling Media Pvt Ltd | Methods, systems, and apparatuses to respond to voice requests to play desired video clips in streamed media based on matched close caption and sub-title text |
US11425181B1 (en) | 2021-05-11 | 2022-08-23 | CLIPr Co. | System and method to ingest one or more video streams across a web platform |
US11355155B1 (en) | 2021-05-11 | 2022-06-07 | CLIPr Co. | System and method to summarize one or more videos based on user priorities |
US11445273B1 (en) * | 2021-05-11 | 2022-09-13 | CLIPr Co. | System and method for creating a video summary based on video relevancy |
US11610402B2 (en) | 2021-05-11 | 2023-03-21 | CLIPr Co. | System and method for crowdsourcing a video summary for creating an enhanced video summary |
US11683558B2 (en) * | 2021-06-29 | 2023-06-20 | The Nielsen Company (Us), Llc | Methods and apparatus to determine the speed-up of media programs using speech recognition |
US11736773B2 (en) * | 2021-10-15 | 2023-08-22 | Rovi Guides, Inc. | Interactive pronunciation learning system |
EP4423748A1 (en) * | 2021-10-27 | 2024-09-04 | Microsoft Technology Licensing, LLC | Machine learning driven teleprompter |
US11902690B2 (en) * | 2021-10-27 | 2024-02-13 | Microsoft Technology Licensing, Llc | Machine learning driven teleprompter |
US11785278B1 (en) * | 2022-03-18 | 2023-10-10 | Comcast Cable Communications, Llc | Methods and systems for synchronization of closed captions with content output |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6335927B1 (en) * | 1996-11-18 | 2002-01-01 | Mci Communications Corporation | System and method for providing requested quality of service in a hybrid network |
US5867494A (en) * | 1996-11-18 | 1999-02-02 | Mci Communication Corporation | System, method and article of manufacture with integrated video conferencing billing in a communication system architecture |
US5999525A (en) * | 1996-11-18 | 1999-12-07 | Mci Communications Corporation | Method for video telephony over a hybrid network |
US6754181B1 (en) * | 1996-11-18 | 2004-06-22 | Mci Communications Corporation | System and method for a directory service supporting a hybrid communication system architecture |
US5867495A (en) * | 1996-11-18 | 1999-02-02 | Mci Communications Corporations | System, method and article of manufacture for communications utilizing calling, plans in a hybrid network |
US6731625B1 (en) * | 1997-02-10 | 2004-05-04 | Mci Communications Corporation | System, method and article of manufacture for a call back architecture in a hybrid network with support for internet telephony |
JP3325809B2 (ja) * | 1997-08-15 | 2002-09-17 | 日本電信電話株式会社 | 映像制作方法及び装置及びこの方法を記録した記録媒体 |
-
2001
- 2001-10-25 US US10/011,215 patent/US6925455B2/en not_active Expired - Fee Related
- 2001-12-11 JP JP2001376561A patent/JP3705429B2/ja not_active Expired - Fee Related
-
2005
- 2005-04-04 JP JP2005107342A patent/JP4340907B2/ja not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
US6925455B2 (en) | 2005-08-02 |
US20020093591A1 (en) | 2002-07-18 |
JP3705429B2 (ja) | 2005-10-12 |
JP2002251197A (ja) | 2002-09-06 |
JP2005309427A (ja) | 2005-11-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4340907B2 (ja) | オーディオビジュアルサマリ作成方法および装置 | |
JP4981026B2 (ja) | 複合ニュース・ストーリーの合成 | |
CN113709561B (zh) | 视频剪辑方法、装置、设备及存储介质 | |
US10134440B2 (en) | Video summarization using audio and visual cues | |
US20080187231A1 (en) | Summarization of Audio and/or Visual Data | |
CN106021496A (zh) | 视频搜索方法及视频搜索装置 | |
EP0966717A2 (en) | Multimedia computer system with story segmentation capability and operating program therefor | |
Kaushal et al. | A framework towards domain specific video summarization | |
CN114938462B (zh) | 授课视频的智能剪辑方法、系统、电子设备及存储介质 | |
CN115580758A (zh) | 视频内容生成方法及装置、电子设备、存储介质 | |
CN107066488A (zh) | 基于影视内容语义分析的影视桥段自动分割方法 | |
CN114363695B (zh) | 视频处理方法、装置、计算机设备和存储介质 | |
CN114996506A (zh) | 语料生成方法、装置、电子设备和计算机可读存储介质 | |
Toklu et al. | Videoabstract: a hybrid approach to generate semantically meaningful video summaries | |
Sundaram | Segmentation, structure detection and summarization of multimedia sequences | |
KR102294817B1 (ko) | 동영상 분석 장치 및 방법 | |
US20240037941A1 (en) | Search results within segmented communication session content | |
WO2023235580A1 (en) | Video-based chapter generation for a communication session | |
Jitaru et al. | Lrro: a lip reading data set for the under-resourced romanian language | |
Bechet et al. | Detecting person presence in tv shows with linguistic and structural features | |
JP2006157688A (ja) | 映像シーンに対する意味ラベル付与方法及び装置及びプログラム | |
Xu et al. | Automatic generated recommendation for movie trailers | |
Ide et al. | Assembling personal speech collections by monologue scene detection from a news video archive | |
Doudpota | Mining movie archives for song sequences | |
Dong et al. | Educational documentary video segmentation and access through combination of visual, audio and text understanding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20080107 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20080311 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20080512 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20081125 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20090126 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20090610 |
|
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20090623 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20120717 Year of fee payment: 3 |
|
R150 | Certificate of patent or registration of utility model |
Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20120717 Year of fee payment: 3 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20130717 Year of fee payment: 4 |
|
LAPS | Cancellation because of no payment of annual fees |