WO2019182075A1 - Information processing method and information processing device - Google Patents

Information processing method and information processing device Download PDF

Info

Publication number
WO2019182075A1
WO2019182075A1 PCT/JP2019/011933 JP2019011933W WO2019182075A1 WO 2019182075 A1 WO2019182075 A1 WO 2019182075A1 JP 2019011933 W JP2019011933 W JP 2019011933W WO 2019182075 A1 WO2019182075 A1 WO 2019182075A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
reproduction
signal
reference signal
information processing
Prior art date
Application number
PCT/JP2019/011933
Other languages
French (fr)
Japanese (ja)
Inventor
佳孝 浦谷
克己 石川
康之介 加藤
Original Assignee
ヤマハ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤマハ株式会社 filed Critical ヤマハ株式会社
Publication of WO2019182075A1 publication Critical patent/WO2019182075A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/04Sound-producing devices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/93Regeneration of the television signal or of selected parts thereof

Definitions

  • This disclosure relates to a technology for reproducing content.
  • Patent Document 1 watermark data including a time code is embedded in the sound of content reproduced in a movie theater or the like.
  • the user terminal of the user who uses the movie theater reproduces another content in conjunction with the reproduction of the content by detecting the sound.
  • Patent Document 1 has a problem that it is necessary to embed watermark data in the content in advance.
  • the audio of the content changes by embedding the watermark data for synchronization.
  • an information processing method temporally corresponds to an acoustic signal generated by collecting sound emitted by playing back the first content and the second content.
  • the temporal correspondence between the first content and the second content is analyzed by comparing with the reference signal being played, and the reproduction of the first content is performed based on the analyzed temporal correspondence.
  • the second content is played back by the playback device in conjunction with the above.
  • An information processing apparatus compares an acoustic signal generated by collecting sound emitted by playing back first content and a reference signal temporally corresponding to the second content.
  • an analysis unit that analyzes a temporal correspondence between the first content and the second content, and a time correspondence analyzed by the analysis unit, in conjunction with the reproduction of the first content
  • a reproduction control unit for causing the reproduction apparatus to reproduce the second content.
  • FIG. 1 is a block diagram illustrating a configuration of a playback system 100 according to an aspect of the present disclosure.
  • the reproduction system 100 is a computer system for reproducing the first content C1 and the second content C2 in conjunction with each other.
  • the playback system 100 of this embodiment includes a playback device 20 and an information processing device 30.
  • the playback device 20 is an output device that plays back the first content C1.
  • the first content C1 is a moving image work composed of sound Vc and video Mc.
  • the first content C1 is represented by an audio signal representing the sound Vc and a video signal representing the video Mc.
  • a movie or a television program is exemplified as the first content C1.
  • the playback device 20 plays back the first content C1 stored on an optical disk such as a DVD.
  • the playback device 20 includes a display device 23 (for example, a liquid crystal display panel) that displays the video Mc included in the first content C1, and a sound emitting device 21 (for example, a speaker) that emits the sound included in the first content C1. It has. That is, the reproduction by the reproduction apparatus 20 includes video display and sound emission. Note that the playback device 20 may include only the sound emitting device 21. That is, the display device 23 can be omitted from the playback device 20.
  • the information processing apparatus 30 is carried by a user who views the first content C1 reproduced by the reproduction apparatus 20.
  • an information terminal such as a mobile phone, a smartphone, or a tablet terminal is used as the information processing apparatus 30.
  • the information processing apparatus 30 plays back the second content C2.
  • the second content C2 is played back in conjunction with the playback of the first content C1 by the playback device 20.
  • the second content C2 is information (for example, sound or image) related to the first content C1, for example.
  • a privilege video for the purchaser of the first content C1 is provided to the information processing apparatus 30 as the second content C2.
  • the second content C2 of the present embodiment is an image showing a time series of subtitles (for example, a character string representing the line of characters) representing the details of the first content C1, for example.
  • the second content C2 may include a section where no image is actually displayed.
  • the information processing apparatus 30 includes a sound collection device 31, a reproduction device 33, a control device 35, and a storage device 37.
  • the control device 35 is a processing circuit such as a CPU (Central Processing Unit), and controls each element constituting the information processing device 30.
  • the control device 35 includes at least one circuit.
  • the playback device 33 is an output device that plays back various types of information under the control of the control device 35.
  • the playback device 33 is a liquid crystal display panel.
  • the storage device 37 (memory) is configured by a known recording medium such as a magnetic recording medium or a semiconductor recording medium, or a combination of a plurality of types of recording media, and is used by a program executed by the control device 35 and the control device 35 Stores various data.
  • the storage device 37 of the present embodiment stores the second content C2.
  • the second content C2 is acquired in advance from, for example, a distribution device (for example, a WEB server).
  • a storage device 37 for example, a cloud storage
  • the control device 35 performs writing and reading with respect to the storage device 37 via a mobile communication network or a communication network such as the Internet. May be. That is, the storage device 37 can be omitted from the information processing device 30.
  • the sound collection device 31 is an acoustic device (microphone) that collects surrounding sounds. Specifically, the sound collection device 31 collects the sound Vc emitted by the reproduction of the first content C1, and generates the sound signal Y representing the waveform of the sound Vc.
  • FIG. 2 schematically shows the first content C1.
  • the sound Vc of the portion (hereinafter referred to as “playback portion”) T1 of the first content C1 played by the playback device 20 is collected.
  • sound collection by the sound collection device 31 is started in response to a user operation on the information processing device 30.
  • the sound Vc of the reproduction portion T1 is continuously collected over a predetermined period (for example, a period sufficiently short with respect to the time length of the first content C1).
  • the playback portion T1 moves in the future direction (direction in which time passes) on the time axis.
  • the control device 35 realizes a plurality of functions (analysis unit 352 and reproduction control unit 354) for reproducing the second content C2 by executing a plurality of tasks according to the program stored in the storage device 37.
  • the function of the control device 35 may be realized by a set of a plurality of devices (that is, a system), or part or all of the function of the control device 35 is realized by a dedicated electronic circuit (for example, a signal processing circuit). Also good.
  • the analysis unit 352 analyzes the temporal correspondence between the first content C1 and the second content C2.
  • the analysis unit 352 of the present embodiment identifies a portion (hereinafter referred to as “processing portion”) T2 that temporally corresponds to the reproduction portion T1 of the first content C1 in the second content C2.
  • processing portion T2 is specified.
  • the reference signal Z is a signal temporally corresponding to the second content C2. Specifically, the reference signal Z has the same time axis as the second content C2. In the present embodiment, the start point of the reference signal Z and the start point of the second content C2 match, and the end point of the reference signal Z and the end point of the second content C2 match. For example, a signal representing the waveform of the sound Vc emitted by the reproduction of the first content C1 is used as the reference signal Z.
  • the reference signal Z is a signal representing the sound Vc over the entire time axis of the first content C1.
  • the reference signal Z is distributed from the distribution device together with the second content C2, for example, and stored in the storage device 37 in advance.
  • the analysis unit 352 specifies a portion P (hereinafter referred to as “similar portion”) P similar to the waveform of the acoustic signal Y in the reference signal Z.
  • the reference signal Z is divided into a plurality of sections (hereinafter referred to as “analysis sections”) on the time axis, and an index (hereinafter referred to as “similar index”) indicating similarity to the acoustic signal Y is calculated for each analysis section.
  • the time length of the analysis section is equal to the time length of the acoustic signal Y, for example. Each analysis interval may overlap each other.
  • the cross-correlation calculated for the analysis section and the acoustic signal Y is used as a similarity index between the analysis section and the acoustic signal Y.
  • the absolute value of the cross-correlation becomes maximum.
  • An analysis section in which the absolute value of the cross-correlation is maximum among the plurality of analysis sections is specified as the similar portion P. That is, by calculating the cross-correlation between the acoustic signal Y and the reference signal Z, the acoustic signal Y and the reference signal Z are compared.
  • the analysis unit 352 specifies a portion of the second content C2 that corresponds temporally to the similar portion P of the reference signal Z as the processing portion T2. A portion of the second content C2 that matches the similar portion P on the time axis is specified as the processing portion T2.
  • the analysis unit 352 analyzes the temporal correspondence between the first content C1 and the second content C2 by comparing the acoustic signal Y and the reference signal Z (that is, specifies the processing portion T2).
  • the playback control unit 354 causes the playback device 33 to play back the second content C2 in conjunction with the playback of the first content C1, based on the temporal correspondence between the first content C1 and the second content C2. Specifically, the reproduction control unit 354 reproduces the second content C2 in parallel with the reproduction of the first content C1.
  • the reproduction control unit 354 causes the processing portion T2 of the second content C2 to coincide with the reproduction portion T1 of the first content C1 on the time axis (that is, in synchronization) and reproduces the second content C2.
  • the second content C2 is played back by the playback device 33 so that the end point of the processing part T2 coincides with the end point of the playback part T1 on the time axis.
  • FIG. 3 is a display example of the second content C2 by the playback device 33.
  • the playback device 33 plays back the second content C2 in conjunction with the playback of the first content C1 under the control of the playback control unit 354. Specifically, the playback device 33 plays back the second content C2 by matching the processing portion T2 of the second content C2 with the playback portion T1 of the first content C1 in time.
  • the playback of the second content C2 also proceeds in conjunction with the progress of the playback of the first content C1. That is, the second content C2 is played back in parallel with the playback of the first content C1 by the playback device 20.
  • the second content C2 representing the subtitles "Hello" representing the words of the characters appearing in the reproduced portion T1 of the first content C1 are displayed.
  • FIG. 4 is a flowchart of a process in which the control device 35 reproduces the second content C2.
  • the process of FIG. 4 is started, for example, triggered by sound collection of the sound Vc (generation of the sound signal Y) by the sound collection device 31.
  • the analysis unit 352 specifies a similar part P similar to the acoustic signal Y of the first content C1 in the reference signal Z (Sa1). Specifically, among the plurality of analysis sections in the reference signal Z, an analysis section in which the similar index (absolute value of cross-correlation) calculated for the analysis section and the sound signal Y is specified as the similar portion P.
  • the analysis unit 352 identifies a processing portion T2 that temporally corresponds to the similar portion P in the second content C2 (Sa2).
  • Steps Sa1 and Sa2 are processes for analyzing the temporal correspondence between the first content C1 and the second content C2 by comparing the acoustic signal Y with the reference signal Z.
  • the reproduction control unit 354 causes the reproduction device 33 to reproduce the second content C2 in parallel with the reproduction of the first content C1 based on the temporal correspondence between the first content C1 and the second content C2 (Sa3). .
  • the first content C1 is configured.
  • the data needs to be embedded in the acoustic signal in advance.
  • the temporal correspondence between the first content C1 and the second content C2 is analyzed by comparing the acoustic signal Y and the reference signal Z, and the analyzed temporal correspondence is analyzed. Then, the second content C2 is reproduced in conjunction with the reproduction of the first content C1.
  • the acoustic signal Y and the reference signal Z are compared by calculating the cross-correlation between the acoustic signal Y and the reference signal Z, the waveform between the acoustic signal Y and the reference signal Z is compared.
  • the reproduction of the second content C2 can be linked with the reproduction of the first content C1 with high accuracy.
  • the second content C2 is reproduced in parallel with the reproduction of the first content C1, but the second content C2 may be reproduced subsequent to the reproduction of the first content C1, for example.
  • the second content C2 is reproduced immediately after the first content C1 ends.
  • the end point of the reference signal Z coincides with the start point of the second content C2. That is, the reference signal Z temporally corresponds to the second content C2.
  • the reproduction control unit 354 causes the reproduction device 33 to reproduce the second content C2 when the time elapses from the end point of the similar portion P of the reference signal Z to the end point of the reference signal Z.
  • the configuration for reproducing the second content C in conjunction with the reproduction of the first content C1 includes the configuration for reproducing the second content C2 in parallel with the reproduction of the first content C1, and the reproduction of the first content C1. And a configuration for reproducing the second content C2.
  • the second content C2 is reproduced in parallel with the reproduction of the first content C1
  • the second content C2 is reproduced subsequent to the reproduction of the first content C1
  • the reference signal Z is not limited to the above illustration.
  • a signal obtained by reducing the amount of information from the sound Vc for example, a signal whose sound quality has been reduced by data compression on the sound Vc signal
  • the reference signal Z may be used as the reference signal Z.
  • the reference signal Z is not limited to a signal representing the sound Vc.
  • a signal generated by performing some processing on the sound Vc may be used as the reference signal Z.
  • a signal obtained by extracting a portion of the sound Vc of the first content C1 whose volume exceeds a predetermined value or a signal in which a pulse is arranged at the rising point (attack portion) of the sound Vc of the first content C1 is the reference signal Z. It is good.
  • the signal representing the sound Vc emitted by the reproduction of the first content C1 is used as the reference signal Z, it is not necessary to prepare the reference signal Z separately from the first content C1.
  • the target and waveform represented by the reference signal Z are arbitrary.
  • the signal representing the sound Vc throughout the time axis of the first content C1 is used as the reference signal Z.
  • the sound Vc of the first content C1 is used for each of a plurality of sections separated from each other on the time axis.
  • the signal to be represented may be the reference signal Z. That is, a signal obtained by intermittently removing the sound Vc of the first content C1 is used as the reference signal Z.
  • the first content C1 is composed of the sound Vc and the video Mc
  • the first content C1 may be composed of only the sound Vc. That is, it is not essential that the first content C1 includes the video Mc.
  • the image showing the caption of the first content C1 is exemplified as the second content C2.
  • the second content C2 is not limited to the image showing the caption.
  • an image (still image or video) that introduces an actor appearing in the first content C1 or a story of the first content C1 may be used as the second content C2.
  • the sound related to the first content C1 may be used as the second content C2.
  • a sound emitting device for example, a speaker
  • the second content C2 may be composed of both sound and images. That is, reproduction by the reproduction device 33 includes image display and sound emission.
  • Various types of information related to the first content C1 can be adopted as the second content C2.
  • information irrelevant to the first content C1 for example, advertisement
  • the sound related to the sound Vc of the first content C1 (for example, the sound represented by the sound Vc included in the first content C1 with high sound quality) is the second content C2
  • a signal representing the sound related to the sound Vc. May be used as the reference signal Z. In the above configuration, it is not necessary to prepare the reference signal Z and the second content C2 separately.
  • the second content C2 that is continuous over the entire time axis of the first content C1 is adopted, but as illustrated in FIG. 5, the second content corresponding to a part of the first content C1 on the time axis. C2 may be adopted. That is, it is assumed that there is no processing portion T2 corresponding to the similar portion P of the reference signal Z in the second content C2.
  • reproduction portion T1 of the first content C1 reaches the second content C2
  • reproduction of the second content C2 is started.
  • the end point of the similar portion P has passed to the start point of the second content C2
  • the second content C2 is played back by the playback device 33.
  • the second content C2 may be composed of a plurality of parts separated from each other on the time axis. Based on the temporal correspondence between the first content C1 and the second content C2, a portion corresponding to the reproduction portion T1 of the first content C1 is reproduced among a plurality of portions constituting the second content C2.
  • the similarity index indicating the degree of similarity between the acoustic signal Y and the reference signal Z is not limited to the cross-correlation. That is, the method of comparing the acoustic signal Y and the reference signal Z is not limited to the calculation of the cross-correlation between the acoustic signal Y and the reference signal Z.
  • the sound collection device 31 starts collecting the sound Vc in response to a user operation on the information processing device 30, and the sound collection by the sound collection device 31 is performed in parallel with the reproduction of the first content C1, for example. May be executed periodically.
  • the information processing apparatus 30 may store a plurality of second contents C2 of different types. For example, the second content C2 selected by the user among the plurality of second contents C2 respectively associated with the first content C1 is reproduced. With the above configuration, it is possible to reproduce the second content C2 desired by the user.
  • the playback device 20 and the information processing device 30 may be configured integrally.
  • the playback device of the information processing device 30 plays back the second content C2 in conjunction with the playback of the first content C1.
  • the information processing apparatus 30 is realized by cooperation between a computer (specifically, the control apparatus 35) and a program.
  • the program according to the above-described form can be provided in a form stored in a computer-readable recording medium and installed in the computer.
  • the recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium
  • the recording medium of the form may be included.
  • the non-transitory recording medium includes an arbitrary recording medium excluding a transient propagation signal (transitory, “propagating signal”) and does not exclude a volatile recording medium. It is also possible to provide a program to a computer in the form of distribution via a communication network.
  • the information processing method includes an acoustic signal generated by collecting sound that is emitted by playing back the first content, and a reference signal temporally corresponding to the second content.
  • an acoustic signal generated by collecting sound that is emitted by playing back the first content and a reference signal temporally corresponding to the second content.
  • the temporal correspondence between the first content and the second content is analyzed, and the second content is linked to the reproduction of the first content based on the analyzed temporal correspondence.
  • the first content is compared by comparing the acoustic signal generated by collecting the sound emitted by the reproduction of the first content with the reference signal temporally corresponding to the second content.
  • the second content are reproduced in conjunction with the reproduction of the first content based on the analyzed temporal correspondence. That is, it is not necessary to embed data (watermark data in the technique of Patent Document 1) for making the reproduction of the second content temporally correspond to the reproduction of the first content in the first content.
  • the reference signal may be a signal representing the sound emitted by the reproduction of the first content, or a signal obtained by processing the signal.
  • the signal representing the sound emitted by the reproduction of the first content or the signal obtained by processing the signal is used as the reference signal, it is necessary to prepare the reference signal separately from the first content. There is no.
  • the acoustic signal and the reference signal may be compared by calculating a cross-correlation between the acoustic signal and the reference signal.
  • the sound signal and the reference signal are compared by calculating the cross-correlation between the sound signal and the reference signal, the similarity of the waveform between the sound signal and the reference signal is taken into account.
  • the reproduction of the second content can be linked with the reproduction of the first content with high accuracy.
  • the second content may be reproduced by a reproduction device.
  • the first content and the first content are compared with the configuration in which the second content is reproduced subsequent to the reproduction of the first content. The user can easily grasp the temporal correspondence with the two contents.
  • the aspect of the present disclosure can also be realized as an information processing apparatus that executes the information processing method exemplified above, or a program that causes a computer to execute the information processing method of each aspect exemplified above.
  • DESCRIPTION OF SYMBOLS 100 ... Reproduction system, 20 ... Reproduction apparatus, 30 ... Information processing apparatus, 31 ... Sound collection apparatus, 33 ... Reproduction apparatus, 35 ... Control apparatus, 37 ... Storage apparatus, 352 ... Analysis part, 354 ... Reproduction control part.

Abstract

The purpose of the present invention is to achieve temporal correspondence in playback between a plurality of contents, without embedding synchronization data in the contents. This information processing device (30) is provided with: an analysis unit (352) which compares a sound signal (Y) generated by collecting a sound (Vc) emitted by playing back a first content (C1) with a reference signal (Z) temporally corresponding to a second content (C2), thereby analyzing temporal correspondence between the first content (C1) and the second content (C2); and a playback control unit (353) which, on the basis of the temporal correspondence analyzed by the analysis unit (352), causes a playback device (33) to play back the second content (C2) in conjunction with the playback of the first content (C1).

Description

情報処理方法および情報処理装置Information processing method and information processing apparatus
 本開示は、コンテンツを再生する技術に関する。 This disclosure relates to a technology for reproducing content.
 特定のコンテンツの再生に連動して他のコンテンツを再生する技術が従来から提案されている。例えば特許文献1には、映画館等で再生されるコンテンツの音声にタイムコードを含む透かしデータが埋め込まれる。映画館を利用する利用者のユーザ端末は、当該音声を検出することで当該コンテンツの再生に連動して別のコンテンツを再生する。 A technology for playing back other content in conjunction with playback of specific content has been proposed. For example, in Patent Document 1, watermark data including a time code is embedded in the sound of content reproduced in a movie theater or the like. The user terminal of the user who uses the movie theater reproduces another content in conjunction with the reproduction of the content by detecting the sound.
日本国特許第6163680号公報Japanese Patent No. 6163680
 しかし、特許文献1の技術では、コンテンツに透かしデータを埋め込む処理が事前に必要になるという問題があった。また、同期用の透かしデータを埋め込むことでコンテンツの音声が変化するという問題がある。以上の事情を考慮して、本開示は、同期用のデータをコンテンツに埋め込むことなく、複数のコンテンツの再生を時間的に対応させることを目的とする。 However, the technique of Patent Document 1 has a problem that it is necessary to embed watermark data in the content in advance. In addition, there is a problem that the audio of the content changes by embedding the watermark data for synchronization. In view of the above circumstances, it is an object of the present disclosure to temporally support the reproduction of a plurality of contents without embedding synchronization data in the contents.
 以上の課題を解決するために、本開示の態様に係る情報処理方法は、第1コンテンツの再生により放音される音響の収音により生成された音響信号と、第2コンテンツと時間的に対応している基準信号とを対比することで、前記第1コンテンツと前記第2コンテンツとの時間的な対応を解析し、前記解析された時間的な対応のもとで、前記第1コンテンツの再生に連動して前記第2コンテンツを再生装置に再生させる。
 本開示の態様に係る情報処理装置は、第1コンテンツの再生により放音される音響の収音により生成された音響信号と、第2コンテンツと時間的に対応している基準信号とを対比することで、前記第1コンテンツと前記第2コンテンツとの時間的な対応を解析する解析部と、前記解析部が解析した時間的な対応のもとで、前記第1コンテンツの再生に連動して前記第2コンテンツを再生装置に再生させる再生制御部とを具備する。
In order to solve the above-described problem, an information processing method according to an aspect of the present disclosure temporally corresponds to an acoustic signal generated by collecting sound emitted by playing back the first content and the second content. The temporal correspondence between the first content and the second content is analyzed by comparing with the reference signal being played, and the reproduction of the first content is performed based on the analyzed temporal correspondence. The second content is played back by the playback device in conjunction with the above.
An information processing apparatus according to an aspect of the present disclosure compares an acoustic signal generated by collecting sound emitted by playing back first content and a reference signal temporally corresponding to the second content. Thus, an analysis unit that analyzes a temporal correspondence between the first content and the second content, and a time correspondence analyzed by the analysis unit, in conjunction with the reproduction of the first content A reproduction control unit for causing the reproduction apparatus to reproduce the second content.
再生システムの構成を例示するブロック図である。It is a block diagram which illustrates the composition of a playback system. 第1コンテンツと基準信号と第2コンテンツとの模式図である。It is a schematic diagram of a 1st content, a reference signal, and a 2nd content. 再生装置による第2コンテンツの表示例である。It is an example of a display of the 2nd content by a reproducing | regenerating apparatus. 制御装置の処理のフローチャートである。It is a flowchart of a process of a control apparatus. 変形例に係る第1コンテンツと基準信号と第2コンテンツとの模式図である。It is a schematic diagram of the 1st content which concerns on a modification, a reference signal, and 2nd content. 変形例に係る第2コンテンツの模式図である。It is a schematic diagram of the 2nd content which concerns on a modification.
 図1は、本開示の一態様に係る再生システム100の構成を例示するブロック図である。再生システム100は、第1コンテンツC1と第2コンテンツC2とを連動して再生するためのコンピュータシステムである。本実施形態の再生システム100は、再生装置20と情報処理装置30とで構成される。 FIG. 1 is a block diagram illustrating a configuration of a playback system 100 according to an aspect of the present disclosure. The reproduction system 100 is a computer system for reproducing the first content C1 and the second content C2 in conjunction with each other. The playback system 100 of this embodiment includes a playback device 20 and an information processing device 30.
 再生装置20は、第1コンテンツC1を再生する出力機器である。第1コンテンツC1は、音響Vcおよび映像Mcで構成される動画作品である。具体的には、第1コンテンツC1は、音響Vcを表す音響信号と映像Mcを表わす映像信号とで表わされる。映画やテレビ番組が第1コンテンツC1として例示される。例えば、再生装置20は、DVD等の光ディスクに記憶された第1コンテンツC1を再生する。再生装置20は、第1コンテンツC1に含まれる映像Mcを表示する表示装置23(例えば液晶表示パネル)と、第1コンテンツC1に含まれる音響を放音する放音装置21(例えばスピーカ)とを具備する。すなわち、再生装置20による再生は、映像の表示と音響の放音とを包含する。なお、再生装置20が放音装置21のみを含んでもよい。つまり、再生装置20から表示装置23は省略され得る。 The playback device 20 is an output device that plays back the first content C1. The first content C1 is a moving image work composed of sound Vc and video Mc. Specifically, the first content C1 is represented by an audio signal representing the sound Vc and a video signal representing the video Mc. A movie or a television program is exemplified as the first content C1. For example, the playback device 20 plays back the first content C1 stored on an optical disk such as a DVD. The playback device 20 includes a display device 23 (for example, a liquid crystal display panel) that displays the video Mc included in the first content C1, and a sound emitting device 21 (for example, a speaker) that emits the sound included in the first content C1. It has. That is, the reproduction by the reproduction apparatus 20 includes video display and sound emission. Note that the playback device 20 may include only the sound emitting device 21. That is, the display device 23 can be omitted from the playback device 20.
 情報処理装置30は、再生装置20により再生される第1コンテンツC1を視聴する利用者により携帯される。例えば携帯電話機、スマートフォンまたはタブレット端末等の情報端末が情報処理装置30として利用される。本実施形態の情報処理装置30は、第2コンテンツC2を再生する。具体的には、再生装置20による第1コンテンツC1の再生に連動して第2コンテンツC2が再生される。第2コンテンツC2は、例えば第1コンテンツC1に関連した情報(例えば音響または画像)である。例えば第1コンテンツC1の購入者に対する特典映像が第2コンテンツC2として情報処理装置30に提供される。第1コンテンツC1の時間軸上の全体(始点から終点)にわたり連続した情報が第2コンテンツC2として例示される。本実施形態の第2コンテンツC2は、例えば第1コンテンツC1の内容を表す字幕(例えば登場人物の台詞を表す文字列)の時系列を示す画像である。なお、第2コンテンツC2は、実際には画像が表示されない区間を含んでいてもよい。 The information processing apparatus 30 is carried by a user who views the first content C1 reproduced by the reproduction apparatus 20. For example, an information terminal such as a mobile phone, a smartphone, or a tablet terminal is used as the information processing apparatus 30. The information processing apparatus 30 according to the present embodiment plays back the second content C2. Specifically, the second content C2 is played back in conjunction with the playback of the first content C1 by the playback device 20. The second content C2 is information (for example, sound or image) related to the first content C1, for example. For example, a privilege video for the purchaser of the first content C1 is provided to the information processing apparatus 30 as the second content C2. Information that is continuous over the entire time axis (from the start point to the end point) of the first content C1 is exemplified as the second content C2. The second content C2 of the present embodiment is an image showing a time series of subtitles (for example, a character string representing the line of characters) representing the details of the first content C1, for example. The second content C2 may include a section where no image is actually displayed.
 情報処理装置30は、収音装置31と再生装置33と制御装置35と記憶装置37とを具備する。制御装置35は、例えばCPU(Central Processing Unit)等の処理回路であり、情報処理装置30を構成する各要素を制御する。制御装置35は、少なくとも1個の回路を含んで構成される。再生装置33は、制御装置35による制御のもとで各種の情報を再生する出力装置である。例えば、再生装置33は液晶表示パネルである。 The information processing apparatus 30 includes a sound collection device 31, a reproduction device 33, a control device 35, and a storage device 37. The control device 35 is a processing circuit such as a CPU (Central Processing Unit), and controls each element constituting the information processing device 30. The control device 35 includes at least one circuit. The playback device 33 is an output device that plays back various types of information under the control of the control device 35. For example, the playback device 33 is a liquid crystal display panel.
 記憶装置37(メモリ)は、例えば磁気記録媒体もしくは半導体記録媒体等の公知の記録媒体、または、複数種の記録媒体の組合せで構成され、制御装置35が実行するプログラムと制御装置35が使用する各種のデータとを記憶する。本実施形態の記憶装置37は、第2コンテンツC2を記憶する。第2コンテンツC2は、例えば配信装置(例えばWEBサーバ)から事前に取得される。なお、情報処理装置30とは別体の記憶装置37(例えばクラウドストレージ)を用意し、移動体通信網またはインターネット等の通信網を介して制御装置35が記憶装置37に対する書込および読出を実行してもよい。すなわち、記憶装置37は情報処理装置30から省略され得る。 The storage device 37 (memory) is configured by a known recording medium such as a magnetic recording medium or a semiconductor recording medium, or a combination of a plurality of types of recording media, and is used by a program executed by the control device 35 and the control device 35 Stores various data. The storage device 37 of the present embodiment stores the second content C2. The second content C2 is acquired in advance from, for example, a distribution device (for example, a WEB server). Note that a storage device 37 (for example, a cloud storage) separate from the information processing device 30 is prepared, and the control device 35 performs writing and reading with respect to the storage device 37 via a mobile communication network or a communication network such as the Internet. May be. That is, the storage device 37 can be omitted from the information processing device 30.
 収音装置31は、周囲の音響を収音する音響機器(マイクロホン)である。具体的には、収音装置31は、第1コンテンツC1の再生により放音される音響Vcを収音し、当該音響Vcの波形を表す音響信号Yを生成する。 The sound collection device 31 is an acoustic device (microphone) that collects surrounding sounds. Specifically, the sound collection device 31 collects the sound Vc emitted by the reproduction of the first content C1, and generates the sound signal Y representing the waveform of the sound Vc.
 図2には、第1コンテンツC1が模式的に図示されている。第1コンテンツC1のうち再生装置20により再生された部分(以下「再生部分」という)T1の音響Vcが収音される。例えば情報処理装置30に対する利用者からの操作に応じて収音装置31による収音が開始される。なお、実際には、収音装置31による収音が開始すると所定の期間(例えば第1コンテンツC1の時間長に対して充分に短い期間)にわたり継続して再生部分T1の音響Vcが収音される。第1コンテンツC1の再生の進行とともに再生部分T1は時間軸上で未来の方向(時間が経過する方向)に移動する。 FIG. 2 schematically shows the first content C1. The sound Vc of the portion (hereinafter referred to as “playback portion”) T1 of the first content C1 played by the playback device 20 is collected. For example, sound collection by the sound collection device 31 is started in response to a user operation on the information processing device 30. Actually, when sound collection by the sound collection device 31 is started, the sound Vc of the reproduction portion T1 is continuously collected over a predetermined period (for example, a period sufficiently short with respect to the time length of the first content C1). The As the playback of the first content C1 progresses, the playback portion T1 moves in the future direction (direction in which time passes) on the time axis.
 制御装置35は、記憶装置37に記憶されたプログラムに従って複数のタスクを実行することで、第2コンテンツC2を再生するための複数の機能(解析部352および再生制御部354)を実現する。なお、複数の装置の集合(すなわちシステム)で制御装置35の機能を実現してもよいし、制御装置35の機能の一部または全部を専用の電子回路(例えば信号処理回路)で実現してもよい。 The control device 35 realizes a plurality of functions (analysis unit 352 and reproduction control unit 354) for reproducing the second content C2 by executing a plurality of tasks according to the program stored in the storage device 37. Note that the function of the control device 35 may be realized by a set of a plurality of devices (that is, a system), or part or all of the function of the control device 35 is realized by a dedicated electronic circuit (for example, a signal processing circuit). Also good.
 解析部352は、第1コンテンツC1と第2コンテンツC2との時間的な対応を解析する。本実施形態の解析部352は、第2コンテンツC2のうち第1コンテンツC1の再生部分T1に時間的に対応する部分(以下「処理部分」という)T2を特定する。収音装置31が生成した音響信号Yと、記憶装置37に記憶された基準信号Zとを対比することで処理部分T2が特定される。 The analysis unit 352 analyzes the temporal correspondence between the first content C1 and the second content C2. The analysis unit 352 of the present embodiment identifies a portion (hereinafter referred to as “processing portion”) T2 that temporally corresponds to the reproduction portion T1 of the first content C1 in the second content C2. By comparing the acoustic signal Y generated by the sound collection device 31 with the reference signal Z stored in the storage device 37, the processing portion T2 is specified.
 基準信号Zは、第2コンテンツC2と時間的に対応する信号である。具体的には、基準信号Zは、時間軸が第2コンテンツC2と一致する。本実施形態では、基準信号Zの始点と第2コンテンツC2の始点とが一致し、基準信号Zの終点と第2コンテンツC2の終点とが一致する。例えば第1コンテンツC1の再生により放音される音響Vcの波形を表す信号が基準信号Zとして利用される。基準信号Zは、第1コンテンツC1の時間軸上の全体にわたり音響Vcを表す信号である。基準信号Zは、例えば第2コンテンツC2とともに配信装置から配信されて記憶装置37に事前に記憶される。 The reference signal Z is a signal temporally corresponding to the second content C2. Specifically, the reference signal Z has the same time axis as the second content C2. In the present embodiment, the start point of the reference signal Z and the start point of the second content C2 match, and the end point of the reference signal Z and the end point of the second content C2 match. For example, a signal representing the waveform of the sound Vc emitted by the reproduction of the first content C1 is used as the reference signal Z. The reference signal Z is a signal representing the sound Vc over the entire time axis of the first content C1. The reference signal Z is distributed from the distribution device together with the second content C2, for example, and stored in the storage device 37 in advance.
 具体的には、解析部352は、基準信号Zのうち音響信号Yの波形に類似する部分(以下「類似部分」という)Pを特定する。基準信号Zを時間軸上で複数の区間(以下「解析区間」という)に区切り、各解析区間について音響信号Yとの類似性を示す指標(以下「類似指標」という)を算定する。解析区間の時間長は、例えば音響信号Yの時間長と等しい。各解析区間は相互に重複し得る。 Specifically, the analysis unit 352 specifies a portion P (hereinafter referred to as “similar portion”) P similar to the waveform of the acoustic signal Y in the reference signal Z. The reference signal Z is divided into a plurality of sections (hereinafter referred to as “analysis sections”) on the time axis, and an index (hereinafter referred to as “similar index”) indicating similarity to the acoustic signal Y is calculated for each analysis section. The time length of the analysis section is equal to the time length of the acoustic signal Y, for example. Each analysis interval may overlap each other.
 解析区間と音響信号Yとについて算定される相互相関が、解析区間と音響信号Yとの間における類似指標として利用される。解析区間と音響信号Yとに共通の音響が含まれる場合(すなわち類似性が高い場合)、相互相関の絶対値は最大となる。複数の解析区間のうち相互相関の絶対値が最大となる解析区間が類似部分Pとして特定される。すなわち、音響信号Yと基準信号Zとの相互相関を算定することで、音響信号Yと基準信号Zとが対比される。解析部352は、第2コンテンツC2のうち基準信号Zの類似部分Pに時間的に対応する部分を処理部分T2として特定する。第2コンテンツC2のうち類似部分Pに時間軸上で一致する部分が処理部分T2として特定される。解析部352は、音響信号Yと基準信号Zとを対比することで、第1コンテンツC1と第2コンテンツC2との時間的な対応を解析(すなわち処理部分T2を特定)する。 The cross-correlation calculated for the analysis section and the acoustic signal Y is used as a similarity index between the analysis section and the acoustic signal Y. When a common sound is included in the analysis section and the sound signal Y (that is, when the similarity is high), the absolute value of the cross-correlation becomes maximum. An analysis section in which the absolute value of the cross-correlation is maximum among the plurality of analysis sections is specified as the similar portion P. That is, by calculating the cross-correlation between the acoustic signal Y and the reference signal Z, the acoustic signal Y and the reference signal Z are compared. The analysis unit 352 specifies a portion of the second content C2 that corresponds temporally to the similar portion P of the reference signal Z as the processing portion T2. A portion of the second content C2 that matches the similar portion P on the time axis is specified as the processing portion T2. The analysis unit 352 analyzes the temporal correspondence between the first content C1 and the second content C2 by comparing the acoustic signal Y and the reference signal Z (that is, specifies the processing portion T2).
 再生制御部354は、第1コンテンツC1と第2コンテンツC2との時間的な対応のもとで、第1コンテンツC1の再生に連動して第2コンテンツC2を再生装置33に再生させる。具体的には、再生制御部354は、第1コンテンツC1の再生に並行して第2コンテンツC2を再生させる。再生制御部354は、第2コンテンツC2の処理部分T2を第1コンテンツC1の再生部分T1に時間軸上で一致させて(すなわち同期して)、第2コンテンツC2を再生させる。具体的には、再生部分T1の終点に処理部分T2の終点が時間軸上で一致するように、第2コンテンツC2を再生装置33に再生させる。 The playback control unit 354 causes the playback device 33 to play back the second content C2 in conjunction with the playback of the first content C1, based on the temporal correspondence between the first content C1 and the second content C2. Specifically, the reproduction control unit 354 reproduces the second content C2 in parallel with the reproduction of the first content C1. The reproduction control unit 354 causes the processing portion T2 of the second content C2 to coincide with the reproduction portion T1 of the first content C1 on the time axis (that is, in synchronization) and reproduces the second content C2. Specifically, the second content C2 is played back by the playback device 33 so that the end point of the processing part T2 coincides with the end point of the playback part T1 on the time axis.
 図3は、再生装置33による第2コンテンツC2の表示例である。再生装置33は、再生制御部354による制御のもとで、第1コンテンツC1の再生に連動して第2コンテンツC2を再生する。具体的には、再生装置33は、第2コンテンツC2の処理部分T2が第1コンテンツC1の再生部分T1に時間的に一致させて、当該第2コンテンツC2を再生する。第1コンテンツC1の再生の進行に連動して第2コンテンツC2の再生も進行する。すなわち、再生装置20による第1コンテンツC1の再生に並行して第2コンテンツC2が再生される。例えば第1コンテンツC1の再生部分T1に登場する登場人物の台詞を表す字幕「こんにちは」を表す第2コンテンツC2が表示される。 FIG. 3 is a display example of the second content C2 by the playback device 33. The playback device 33 plays back the second content C2 in conjunction with the playback of the first content C1 under the control of the playback control unit 354. Specifically, the playback device 33 plays back the second content C2 by matching the processing portion T2 of the second content C2 with the playback portion T1 of the first content C1 in time. The playback of the second content C2 also proceeds in conjunction with the progress of the playback of the first content C1. That is, the second content C2 is played back in parallel with the playback of the first content C1 by the playback device 20. For example, the second content C2 representing the subtitles "Hello" representing the words of the characters appearing in the reproduced portion T1 of the first content C1 are displayed.
 図4は、制御装置35が第2コンテンツC2を再生する処理のフローチャートである。図4の処理は、例えば収音装置31による音響Vcの収音(音響信号Yの生成)を契機として開始される。処理が開始されると、解析部352は、基準信号Zのうち第1コンテンツC1の音響信号Yに類似する類似部分Pを特定する(Sa1)。具体的には、基準信号Zにおける複数の解析区間のうち、当該解析区間と音響信号Yとについて算定された類似指標(相互相関の絶対値)が最大となる解析区間が類似部分Pとして特定される。解析部352は、第2コンテンツC2のうち類似部分Pに時間的に対応する処理部分T2を特定する(Sa2)。ステップSa1およびステップSa2は、音響信号Yと基準信号Zとを対比することで、第1コンテンツC1と第2コンテンツC2との時間的な対応を解析する処理である。再生制御部354は、第1コンテンツC1と第2コンテンツC2との時間的な対応のもとで、第1コンテンツC1の再生に並行して第2コンテンツC2を再生装置33に再生させる(Sa3)。 FIG. 4 is a flowchart of a process in which the control device 35 reproduces the second content C2. The process of FIG. 4 is started, for example, triggered by sound collection of the sound Vc (generation of the sound signal Y) by the sound collection device 31. When the process is started, the analysis unit 352 specifies a similar part P similar to the acoustic signal Y of the first content C1 in the reference signal Z (Sa1). Specifically, among the plurality of analysis sections in the reference signal Z, an analysis section in which the similar index (absolute value of cross-correlation) calculated for the analysis section and the sound signal Y is specified as the similar portion P. The The analysis unit 352 identifies a processing portion T2 that temporally corresponds to the similar portion P in the second content C2 (Sa2). Steps Sa1 and Sa2 are processes for analyzing the temporal correspondence between the first content C1 and the second content C2 by comparing the acoustic signal Y with the reference signal Z. The reproduction control unit 354 causes the reproduction device 33 to reproduce the second content C2 in parallel with the reproduction of the first content C1 based on the temporal correspondence between the first content C1 and the second content C2 (Sa3). .
 例えば、第2コンテンツC2の再生を第1コンテンツC1の再生に時間的に対応させるための同期用のデータ(特許文献1の技術では透かしデータ)を利用する構成では、第1コンテンツC1を構成する音響信号に当該データを事前に埋め込む必要がある。それに対して、本実施形態では、音響信号Yと基準信号Zとを対比することで、第1コンテンツC1と第2コンテンツC2との時間的な対応が解析され、当該解析された時間的な対応のもとで、第1コンテンツC1の再生に連動して第2コンテンツC2が再生される。したがって、第2コンテンツC2の再生を第1コンテンツC1の再生に時間的に対応させる(すなわち同期させる)ためのデータを第1コンテンツC1に埋め込むことが不要である。また、音響信号に同期用のデータを埋め込むことに起因した音声の変化が、本実施形態によれば発生しないという利点もある。 For example, in a configuration using synchronization data (watermark data in the technique of Patent Document 1) for temporally corresponding the playback of the second content C2 to the playback of the first content C1, the first content C1 is configured. The data needs to be embedded in the acoustic signal in advance. On the other hand, in this embodiment, the temporal correspondence between the first content C1 and the second content C2 is analyzed by comparing the acoustic signal Y and the reference signal Z, and the analyzed temporal correspondence is analyzed. Then, the second content C2 is reproduced in conjunction with the reproduction of the first content C1. Therefore, it is not necessary to embed data for making the reproduction of the second content C2 temporally correspond to (ie, synchronize) the reproduction of the first content C1 in the first content C1. In addition, there is an advantage that a change in sound due to embedding synchronization data in an acoustic signal does not occur according to the present embodiment.
 本実施形態では、音響信号Yと基準信号Zとの相互相関を算定することで、当該音響信号Yと当該基準信号Zとが対比されるから、音響信号Yと基準信号Zとの間における波形の類似性を加味して、第2コンテンツC2の再生を第1コンテンツC1の再生に高精度に連動させることができる。 In the present embodiment, since the acoustic signal Y and the reference signal Z are compared by calculating the cross-correlation between the acoustic signal Y and the reference signal Z, the waveform between the acoustic signal Y and the reference signal Z is compared. Thus, the reproduction of the second content C2 can be linked with the reproduction of the first content C1 with high accuracy.
<変形例>
 以下に変形例を示す。以下の例示から任意に選択された2個以上の態様を、相互に矛盾しない範囲で適宜に併合してもよい。
<Modification>
A modification is shown below. Two or more aspects arbitrarily selected from the following examples may be appropriately combined as long as they do not contradict each other.
(1)第1コンテンツC1の再生に並行して第2コンテンツC2を再生したが、例えば第1コンテンツC1の再生に後続して第2コンテンツC2を再生してもよい。例えば第1コンテンツC1が終了した直後から第2コンテンツC2が再生される。以上の構成では、基準信号Zの終点が第2コンテンツC2の始点に一致する。すなわち、基準信号Zは、第2コンテンツC2と時間的に対応している。再生制御部354は、基準信号Zの類似部分Pの終点から基準信号Zの終点まで時間軸上で経過したら、当該第2コンテンツC2を再生装置33に再生させる。第1コンテンツC1の再生に連動して第2コンテンツCを再生させる構成には、第1コンテンツC1の再生に並行して第2コンテンツC2を再生する構成と、第1コンテンツC1の再生に後続して第2コンテンツC2を再生する構成との双方が含まれる。ただし、第1コンテンツC1の再生に並行して第2コンテンツC2を再生する前述の形態によれば、第1コンテンツC1の再生に後続して第2コンテンツC2が再生される構成と比較して、第1コンテンツC1と第2コンテンツC2との時間的な対応を利用者が容易に把握できるという利点がある。 (1) The second content C2 is reproduced in parallel with the reproduction of the first content C1, but the second content C2 may be reproduced subsequent to the reproduction of the first content C1, for example. For example, the second content C2 is reproduced immediately after the first content C1 ends. In the above configuration, the end point of the reference signal Z coincides with the start point of the second content C2. That is, the reference signal Z temporally corresponds to the second content C2. The reproduction control unit 354 causes the reproduction device 33 to reproduce the second content C2 when the time elapses from the end point of the similar portion P of the reference signal Z to the end point of the reference signal Z. The configuration for reproducing the second content C in conjunction with the reproduction of the first content C1 includes the configuration for reproducing the second content C2 in parallel with the reproduction of the first content C1, and the reproduction of the first content C1. And a configuration for reproducing the second content C2. However, according to the above-described embodiment in which the second content C2 is reproduced in parallel with the reproduction of the first content C1, the second content C2 is reproduced subsequent to the reproduction of the first content C1, There is an advantage that the user can easily grasp the temporal correspondence between the first content C1 and the second content C2.
(2)第1コンテンツC1の再生により放音される音響Vcを表す信号を基準信号Zとして例示したが、基準信号Zは以上の例示に限定されない。例えば音響Vcから情報量を低減した信号(例えば音響Vcの信号に対するデータ圧縮により音質を低下させた信号)を基準信号Zとして利用してもよい。 (2) Although the signal representing the sound Vc emitted by the reproduction of the first content C1 is exemplified as the reference signal Z, the reference signal Z is not limited to the above illustration. For example, a signal obtained by reducing the amount of information from the sound Vc (for example, a signal whose sound quality has been reduced by data compression on the sound Vc signal) may be used as the reference signal Z.
 また、基準信号Zは、音響Vcを表す信号に限定されない。例えば音響Vcに対して何らかの処理をして生成した信号を基準信号Zとして利用してもよい。例えば、第1コンテンツC1の音響Vcのうち音量が所定値を上回る部分を抽出した信号、または、第1コンテンツC1の音響Vcの立ち上がりの時点(アタック部分)にパルスを配列した信号を基準信号Zとしてもよい。ただし、第1コンテンツC1の再生により放音される音響Vcを表す信号を基準信号Zとして利用する構成によれば、第1コンテンツC1とは別個に基準信号Zを用意する必要がない。音響信号Yとの時間的な解析が可能な信号であれば、基準信号Zが表す対象や波形は任意である。 Further, the reference signal Z is not limited to a signal representing the sound Vc. For example, a signal generated by performing some processing on the sound Vc may be used as the reference signal Z. For example, a signal obtained by extracting a portion of the sound Vc of the first content C1 whose volume exceeds a predetermined value or a signal in which a pulse is arranged at the rising point (attack portion) of the sound Vc of the first content C1 is the reference signal Z. It is good. However, according to the configuration in which the signal representing the sound Vc emitted by the reproduction of the first content C1 is used as the reference signal Z, it is not necessary to prepare the reference signal Z separately from the first content C1. As long as the signal can be temporally analyzed with the acoustic signal Y, the target and waveform represented by the reference signal Z are arbitrary.
(3)第1コンテンツC1の時間軸上の全体にわたり音響Vcを表す信号を基準信号Zとして利用したが、例えば時間軸上の互いに離れた複数の区間の各々について第1コンテンツC1の音響Vcを表す信号を基準信号Zとしてもよい。すなわち、第1コンテンツC1の音響Vcを間欠的に除去した信号が基準信号Zとして利用される。 (3) The signal representing the sound Vc throughout the time axis of the first content C1 is used as the reference signal Z. For example, the sound Vc of the first content C1 is used for each of a plurality of sections separated from each other on the time axis. The signal to be represented may be the reference signal Z. That is, a signal obtained by intermittently removing the sound Vc of the first content C1 is used as the reference signal Z.
(4)音響Vcと映像Mcとで第1コンテンツC1を構成したが、音響Vcのみで第1コンテンツC1を構成してもよい。すなわち、第1コンテンツC1が映像Mcを含むことは必須ではない。 (4) Although the first content C1 is composed of the sound Vc and the video Mc, the first content C1 may be composed of only the sound Vc. That is, it is not essential that the first content C1 includes the video Mc.
(5)前述の形態では、第1コンテンツC1の字幕を示す画像を第2コンテンツC2として例示したが、第2コンテンツC2は字幕を示す画像に限定されない。例えば第1コンテンツC1に出演する俳優や第1コンテンツC1のストーリーを紹介する画像(静止画または映像)を第2コンテンツC2としてもよい。また、第1コンテンツC1に関連した音響を第2コンテンツC2として利用してもよい。第2コンテンツC2が音響を含む構成では、当該音響を放音する放音装置(例えばスピーカ)が再生装置33として利用される。また、音響および画像の双方で第2コンテンツC2を構成してもよい。すなわち、再生装置33による再生は、画像の表示と音響の放音とを包含する。第1コンテンツC1に関連した各種の情報が第2コンテンツC2として採用され得る。また、第1コンテンツC1に無関係の情報(例えば広告)を第2コンテンツC2としてもよい。 (5) In the above-described embodiment, the image showing the caption of the first content C1 is exemplified as the second content C2. However, the second content C2 is not limited to the image showing the caption. For example, an image (still image or video) that introduces an actor appearing in the first content C1 or a story of the first content C1 may be used as the second content C2. Further, the sound related to the first content C1 may be used as the second content C2. In the configuration in which the second content C2 includes sound, a sound emitting device (for example, a speaker) that emits the sound is used as the playback device 33. Further, the second content C2 may be composed of both sound and images. That is, reproduction by the reproduction device 33 includes image display and sound emission. Various types of information related to the first content C1 can be adopted as the second content C2. Further, information irrelevant to the first content C1 (for example, advertisement) may be used as the second content C2.
 なお、第1コンテンツC1の音響Vcに関連した音響(例えば第1コンテンツC1に含まれる音響Vcを高音質で表した音響)を第2コンテンツC2とする場合、音響Vcに関連した音響を表す信号を基準信号Zとして利用してもよい。以上の構成では、基準信号Zと第2コンテンツC2とを別個に用意する必要がない。 In addition, when the sound related to the sound Vc of the first content C1 (for example, the sound represented by the sound Vc included in the first content C1 with high sound quality) is the second content C2, a signal representing the sound related to the sound Vc. May be used as the reference signal Z. In the above configuration, it is not necessary to prepare the reference signal Z and the second content C2 separately.
(6)第1コンテンツC1の時間軸上の全体にわたり連続した第2コンテンツC2を採用したが、図5に例示される通り、第1コンテンツC1のうち時間軸上の一部分に対応した第2コンテンツC2を採用してもよい。すなわち、第2コンテンツC2のうち基準信号Zの類似部分Pに時間的に対応する処理部分T2が存在しない場合も想定される。以上の構成では、第1コンテンツC1の再生部分T1が第2コンテンツC2に到達したら、当該第2コンテンツC2の再生を開始する。具体的には、類似部分Pの終点から第2コンテンツC2の始点まで経過したら、当該第2コンテンツC2を再生装置33に再生させる。 (6) The second content C2 that is continuous over the entire time axis of the first content C1 is adopted, but as illustrated in FIG. 5, the second content corresponding to a part of the first content C1 on the time axis. C2 may be adopted. That is, it is assumed that there is no processing portion T2 corresponding to the similar portion P of the reference signal Z in the second content C2. In the above configuration, when the reproduction portion T1 of the first content C1 reaches the second content C2, reproduction of the second content C2 is started. Specifically, when the end point of the similar portion P has passed to the start point of the second content C2, the second content C2 is played back by the playback device 33.
 また、図6に例示される通り、時間軸上の互いに離れた複数の部分で第2コンテンツC2が構成されてもよい。第1コンテンツC1と第2コンテンツC2との時間的な対応のもとで、第2コンテンツC2を構成する複数の部分のうち、第1コンテンツC1の再生部分T1に対応する部分が再生される。 Further, as illustrated in FIG. 6, the second content C2 may be composed of a plurality of parts separated from each other on the time axis. Based on the temporal correspondence between the first content C1 and the second content C2, a portion corresponding to the reproduction portion T1 of the first content C1 is reproduced among a plurality of portions constituting the second content C2.
(7)解析区間と音響信号Yとについて算定される相互相関を類似指標として利用したが、音響信号Yと基準信号Zとの類否の度合を示す類似指標は、相互相関に限定されない。すなわち、音響信号Yと基準信号Zとを対比する方法は、当該音響信号Yと当該基準信号Zとの相互相関の算定には限定されない。 (7) Although the cross-correlation calculated for the analysis section and the acoustic signal Y is used as the similarity index, the similarity index indicating the degree of similarity between the acoustic signal Y and the reference signal Z is not limited to the cross-correlation. That is, the method of comparing the acoustic signal Y and the reference signal Z is not limited to the calculation of the cross-correlation between the acoustic signal Y and the reference signal Z.
(8)情報処理装置30に対する利用者からの操作に応じて収音装置31による音響Vcの収音を開始したが、収音装置31による収音は、例えば第1コンテンツC1の再生に並行して定期的に実行してもよい。 (8) The sound collection device 31 starts collecting the sound Vc in response to a user operation on the information processing device 30, and the sound collection by the sound collection device 31 is performed in parallel with the reproduction of the first content C1, for example. May be executed periodically.
(9)情報処理装置30に種類が相違する複数の第2コンテンツC2を記憶してもよい。例えば、第1コンテンツC1にそれぞれ関連する複数の第2コンテンツC2のうち利用者が選択した第2コンテンツC2が再生される。以上の構成では、利用者が所望する第2コンテンツC2を再生することが可能である。 (9) The information processing apparatus 30 may store a plurality of second contents C2 of different types. For example, the second content C2 selected by the user among the plurality of second contents C2 respectively associated with the first content C1 is reproduced. With the above configuration, it is possible to reproduce the second content C2 desired by the user.
(10)再生装置20と情報処理装置30とを別体とする構成を例示したが、再生装置20と情報処理装置30とを一体に構成してもよい。例えば情報処理装置30の再生装置は、第1コンテンツC1を再生に連動して第2コンテンツC2を再生する。 (10) Although the configuration in which the playback device 20 and the information processing device 30 are separated is illustrated, the playback device 20 and the information processing device 30 may be configured integrally. For example, the playback device of the information processing device 30 plays back the second content C2 in conjunction with the playback of the first content C1.
(11)情報処理装置30は、上述した通り、コンピュータ(具体的には制御装置35)とプログラムとの協働により実現される。前述の形態に係るプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性(non-transitory)の記録媒体であり、CD-ROM等の光学式記録媒体(光ディスク)が好例であるが、半導体記録媒体または磁気記録媒体等の公知の任意の形式の記録媒体を含み得る。なお、非一過性の記録媒体とは、一過性の伝搬信号(transitory, propagating signal)を除く任意の記録媒体を含み、揮発性の記録媒体を除外するものではない。また、通信網を介した配信の形態でプログラムをコンピュータに提供することも可能である。 (11) As described above, the information processing apparatus 30 is realized by cooperation between a computer (specifically, the control apparatus 35) and a program. The program according to the above-described form can be provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium The recording medium of the form may be included. Note that the non-transitory recording medium includes an arbitrary recording medium excluding a transient propagation signal (transitory, “propagating signal”) and does not exclude a volatile recording medium. It is also possible to provide a program to a computer in the form of distribution via a communication network.
<付記>
 以上に例示した形態から、例えば以下の構成が把握される。
<Appendix>
For example, the following configuration is grasped from the above-exemplified form.
 本開示の第1態様に係る情報処理方法は、第1コンテンツの再生により放音される音響の収音により生成された音響信号と、第2コンテンツと時間的に対応している基準信号とを対比することで、前記第1コンテンツと前記第2コンテンツとの時間的な対応を解析し、前記解析された時間的な対応のもとで、前記第1コンテンツの再生に連動して前記第2コンテンツを再生装置に再生させる。以上の態様では、第1コンテンツの再生により放音される音響の収音により生成された音響信号と、第2コンテンツと時間的に対応している基準信号とを対比することで、第1コンテンツと第2コンテンツとの時間的な対応が解析され、当該解析された時間的な対応のもとで、第1コンテンツの再生に連動して第2コンテンツが再生される。つまり、第2コンテンツの再生を第1コンテンツの再生に時間的に対応させるためのデータ(特許文献1の技術では透かしデータ)を第1コンテンツに埋め込むことが不要になる。 The information processing method according to the first aspect of the present disclosure includes an acoustic signal generated by collecting sound that is emitted by playing back the first content, and a reference signal temporally corresponding to the second content. By comparing, the temporal correspondence between the first content and the second content is analyzed, and the second content is linked to the reproduction of the first content based on the analyzed temporal correspondence. Play the content on the playback device. In the above aspect, the first content is compared by comparing the acoustic signal generated by collecting the sound emitted by the reproduction of the first content with the reference signal temporally corresponding to the second content. And the second content are reproduced in conjunction with the reproduction of the first content based on the analyzed temporal correspondence. That is, it is not necessary to embed data (watermark data in the technique of Patent Document 1) for making the reproduction of the second content temporally correspond to the reproduction of the first content in the first content.
 前記基準信号は、前記第1コンテンツの再生により放音される前記音響を表す信号、または当該信号を加工処理した信号であってもよい。以上の態様では、第1コンテンツの再生により放音される音響を表す信号、または当該信号を加工処理した信号が基準信号として利用されるから、基準信号を第1コンテンツとは別個に用意する必要がない。 The reference signal may be a signal representing the sound emitted by the reproduction of the first content, or a signal obtained by processing the signal. In the above aspect, since the signal representing the sound emitted by the reproduction of the first content or the signal obtained by processing the signal is used as the reference signal, it is necessary to prepare the reference signal separately from the first content. There is no.
 前記音響信号と前記基準信号との相互相関を算定することで、当該音響信号と当該基準信号とを対比してもよい。以上の態様では、音響信号と基準信号との相互相関を算定することで、当該音響信号と当該基準信号とが対比されるから、音響信号と基準信号との間における波形の類似性を加味して第2コンテンツの再生を第1コンテンツの再生に高精度に連動させることができる。 The acoustic signal and the reference signal may be compared by calculating a cross-correlation between the acoustic signal and the reference signal. In the above aspect, since the sound signal and the reference signal are compared by calculating the cross-correlation between the sound signal and the reference signal, the similarity of the waveform between the sound signal and the reference signal is taken into account. Thus, the reproduction of the second content can be linked with the reproduction of the first content with high accuracy.
 前記第1コンテンツの再生に並行して、前記第2コンテンツを再生装置に再生させてもよい。以上の態様では、第1コンテンツの再生に並行して第2コンテンツが再生されるから、第1コンテンツの再生に後続して第2コンテンツが再生される構成と比較して、第1コンテンツと第2コンテンツとの時間的な対応を利用者が容易に把握できる。 In parallel with the reproduction of the first content, the second content may be reproduced by a reproduction device. In the above aspect, since the second content is reproduced in parallel with the reproduction of the first content, the first content and the first content are compared with the configuration in which the second content is reproduced subsequent to the reproduction of the first content. The user can easily grasp the temporal correspondence with the two contents.
 以上に例示した情報処理方法を実行する情報処理装置、または、以上に例示した各態様の情報処理方法をコンピュータに実行させるプログラムとしても、本開示の態様は実現される。 The aspect of the present disclosure can also be realized as an information processing apparatus that executes the information processing method exemplified above, or a program that causes a computer to execute the information processing method of each aspect exemplified above.
 本出願は、2018年3月23日付にて提出された日本国特許出願である特願2018-056349に基づくものであり、その内容はここに参照として取り込まれる。 This application is based on Japanese Patent Application No. 2018-056349 filed on Mar. 23, 2018, the contents of which are incorporated herein by reference.
100…再生システム、20…再生装置、30…情報処理装置、31…収音装置、33…再生装置、35…制御装置、37…記憶装置、352…解析部、354…再生制御部。 DESCRIPTION OF SYMBOLS 100 ... Reproduction system, 20 ... Reproduction apparatus, 30 ... Information processing apparatus, 31 ... Sound collection apparatus, 33 ... Reproduction apparatus, 35 ... Control apparatus, 37 ... Storage apparatus, 352 ... Analysis part, 354 ... Reproduction control part.

Claims (8)

  1.  第1コンテンツの再生により放音される音響の収音により生成された音響信号と、第2コンテンツと時間的に対応している基準信号とを対比することで、前記第1コンテンツと前記第2コンテンツとの時間的な対応を解析し、
     前記解析された時間的な対応のもとで、前記第1コンテンツの再生に連動して前記第2コンテンツを再生装置に再生させる
     コンピュータにより実現される情報処理方法。
    The first content and the second content are compared by comparing the acoustic signal generated by collecting the sound emitted by the reproduction of the first content with the reference signal temporally corresponding to the second content. Analyzing the temporal correspondence with content,
    An information processing method realized by a computer that causes a playback device to play back the second content in conjunction with the playback of the first content based on the analyzed temporal correspondence.
  2.  前記基準信号は、前記第1コンテンツの再生により放音される前記音響を表す信号、または当該信号を加工処理した信号である
     請求項1の情報処理方法。
    The information processing method according to claim 1, wherein the reference signal is a signal representing the sound emitted by reproducing the first content, or a signal obtained by processing the signal.
  3.  前記音響信号と前記基準信号との相互相関を算定することで、当該音響信号と当該基準信号とを対比する
     請求項1または請求項2の情報処理方法。
    The information processing method according to claim 1 or 2, wherein the acoustic signal and the reference signal are compared by calculating a cross-correlation between the acoustic signal and the reference signal.
  4.  前記第1コンテンツの再生に並行して、前記第2コンテンツを再生装置に再生させる
     請求項1から請求項3の何れかの情報処理方法。
    The information processing method according to any one of claims 1 to 3, wherein the second content is played back by a playback device in parallel with the playback of the first content.
  5.  第1コンテンツの再生により放音される音響の収音により生成された音響信号と、第2コンテンツと時間的に対応している基準信号とを対比することで、前記第1コンテンツと前記第2コンテンツとの時間的な対応を解析する解析部と、
     前記解析部が解析した時間的な対応のもとで、前記第1コンテンツの再生に連動して前記第2コンテンツを再生装置に再生させる再生制御部と
     を具備する情報処理装置。
    The first content and the second content are compared by comparing the acoustic signal generated by collecting the sound emitted by the reproduction of the first content with the reference signal temporally corresponding to the second content. An analysis unit that analyzes temporal correspondence with content;
    An information processing apparatus comprising: a reproduction control unit that causes the reproduction device to reproduce the second content in conjunction with reproduction of the first content based on the temporal correspondence analyzed by the analysis unit.
  6.  前記基準信号は、前記第1コンテンツの再生により放音される前記音響を表す信号、または当該信号を加工処理した信号である
     請求項5の情報処理装置。
    The information processing apparatus according to claim 5, wherein the reference signal is a signal representing the sound emitted by reproducing the first content, or a signal obtained by processing the signal.
  7.  前記解析部は、前記音響信号と前記基準信号との相互相関を算定することで、当該音響信号と当該基準信号とを対比する
     請求項5または請求項6の情報処理装置。
    The information processing apparatus according to claim 5, wherein the analysis unit compares the acoustic signal with the reference signal by calculating a cross-correlation between the acoustic signal and the reference signal.
  8.  前記再生制御部は、前記第1コンテンツの再生に並行して、前記第2コンテンツを再生装置に再生させる
     請求項5から請求項7の何れかの情報処理装置。
     
    The information processing apparatus according to claim 5, wherein the reproduction control unit causes the reproduction apparatus to reproduce the second content in parallel with the reproduction of the first content.
PCT/JP2019/011933 2018-03-23 2019-03-20 Information processing method and information processing device WO2019182075A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018056349A JP7102826B2 (en) 2018-03-23 2018-03-23 Information processing method and information processing equipment
JP2018-056349 2018-03-23

Publications (1)

Publication Number Publication Date
WO2019182075A1 true WO2019182075A1 (en) 2019-09-26

Family

ID=67987849

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/011933 WO2019182075A1 (en) 2018-03-23 2019-03-20 Information processing method and information processing device

Country Status (2)

Country Link
JP (1) JP7102826B2 (en)
WO (1) WO2019182075A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015501583A (en) * 2011-10-19 2015-01-15 トムソン ライセンシングThomson Licensing System and method for automatically discovering content programs
JP2015149705A (en) * 2013-10-21 2015-08-20 ソニー株式会社 Information processing device and method, and program
JP2016111492A (en) * 2014-12-05 2016-06-20 株式会社テレビ朝日 Terminal equipment, server device, and program
JP2016119561A (en) * 2014-12-19 2016-06-30 ティアック株式会社 Portable device with wireless lan function, and recording system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105940679B (en) 2014-01-31 2019-08-06 交互数字Ce专利控股公司 Method and apparatus for synchronizing the playback at two electronic equipments

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015501583A (en) * 2011-10-19 2015-01-15 トムソン ライセンシングThomson Licensing System and method for automatically discovering content programs
JP2015149705A (en) * 2013-10-21 2015-08-20 ソニー株式会社 Information processing device and method, and program
JP2016111492A (en) * 2014-12-05 2016-06-20 株式会社テレビ朝日 Terminal equipment, server device, and program
JP2016119561A (en) * 2014-12-19 2016-06-30 ティアック株式会社 Portable device with wireless lan function, and recording system

Also Published As

Publication number Publication date
JP2019169852A (en) 2019-10-03
JP7102826B2 (en) 2022-07-20

Similar Documents

Publication Publication Date Title
JP5022025B2 (en) A method and apparatus for synchronizing content data streams and metadata.
KR101884483B1 (en) Media recognition and synchronisation to a motion signal
KR102065512B1 (en) Computing device, method, computer program for processing video
JP4331217B2 (en) Video playback apparatus and method
KR102340196B1 (en) Video processing apparatus and method of operations thereof
KR20090039408A (en) Apparatus and method for providing the thread of a contents
JP2003177784A (en) Method and device for extracting sound turning point, method and device for sound reproducing, sound reproducing system, sound delivery system, information providing device, sound signal editing device, recording medium for sound turning point extraction method program, recording medium for sound reproducing method program, recording medium for sound signal editing method program, sound turning point extraction method program, sound reproducing method program, and sound signal editing method program
WO2019182075A1 (en) Information processing method and information processing device
US20160048271A1 (en) Information processing device and information processing method
WO2013008869A1 (en) Electronic device and data generation method
JP2005252372A (en) Digest video image producing device and method
JP6215866B2 (en) Internet video playback system and program
JPWO2019043871A1 (en) Display timing determination device, display timing determination method, and program
JP2007516550A (en) REPRODUCTION DEVICE, REPRODUCTION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM CONTAINING PROGRAM FOR PERFORMING THE REPRODUCTION METHOD
JP6367882B2 (en) Client terminal and internet video playback system provided with the same
JP2017017387A (en) Video processing apparatus and video processing method
WO2017026387A1 (en) Video-processing device, video-processing method, and recording medium
US20220394323A1 (en) Supplmental audio generation system in an audio-only mode
US11228802B2 (en) Video distribution system, video generation method, and reproduction device
JP5687961B2 (en) Synchronous playback apparatus and synchronous playback method
JP7139766B2 (en) TERMINAL DEVICE, OPERATING METHOD AND PROGRAM FOR TERMINAL DEVICE
JP2010152287A (en) Automatic play synchronizing device, automatic play keyboard instrument, and program
JP2006157692A (en) Video reproducing method and device thereof, and program
JP2009135754A (en) Digest creating apparatus and method
KR20010054297A (en) The mapping method for digital motion pictures and replaying method by using of it

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19770383

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19770383

Country of ref document: EP

Kind code of ref document: A1