JP2006157692A

JP2006157692A - Video reproducing method and device thereof, and program

Info

Publication number: JP2006157692A
Application number: JP2004347281A
Authority: JP
Inventors: Makoto Muto; 誠武藤; Satoshi Shimada; 聡嶌田; Kazu Miyagawa; 和宮川; Hiroshi Konishi; 宏志小西; Toshikazu Karitsuka; 俊和狩塚; Masashi Morimoto; 正志森本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-11-30
Filing date: 2004-11-30
Publication date: 2006-06-15
Anticipated expiration: 2024-11-30
Also published as: JP4353084B2

Abstract

<P>PROBLEM TO BE SOLVED: To listen to user voice comments, without their overlapping, while viewing video. <P>SOLUTION: Video reproduction is started, presence/non-presence of user voice is determined, it is determined whether the voice is non-language; and when it is non-language, the non-language is reproduced, or if the voice is other than the non-language, a user utterance phase of highest appraise within an utterance phase is reproduced. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、映像再生方法及び装置及びプログラムに係り、特に、映像視聴者の音声を映像と共に再生する映像再生方法及び装置及びプログラムに関する。 The present invention relates to a video playback method, apparatus, and program, and more particularly, to a video playback method, apparatus, and program for playing back audio of a video viewer together with video.

今日、映像と共に付加的な情報提示としてテキストや静止画を映像に動機させて提示することで、ユーザへより多様な情報の提示を行うことが行われている。 Nowadays, a variety of information is presented to users by presenting text and still images as additional information along with the video.

このようなシステムにおいて、テキストや静止画を提示する場合、複数の情報が同時に提示された場合においても、利用者は必要な情報を取捨選択して読んだり、見たりすることができる。しかしながら、付加情報として複数の音声を同時に提示しようとした場合、音声を同時再生すると音声の重複によってユーザは内容が理解できないため、それを回避するためにシステム側で音声の再生を制御する必要があり、そのためのいくつかの方法が提案されている。 In such a system, when a text or a still image is presented, even when a plurality of pieces of information are presented at the same time, the user can selectively read and view necessary information. However, when a plurality of voices are to be presented simultaneously as additional information, if the voices are played back at the same time, the user cannot understand the contents due to the duplication of the voices. There are several ways to do this.

従来の第1の方法として、特定の話者からの発声要求によって他の発声者の音声をカットして、当該発声者の発声を優先的に再生するという方法がある（例えば、特許文献1参照）。 As a conventional first method, there is a method in which the voice of another speaker is cut according to an utterance request from a specific speaker and the utterance of the speaker is preferentially reproduced (see, for example, Patent Document 1). ).

また、従来の第２の方法として、ユーザの発声履歴に基づいて、各ユーザのゲインを制御して、会話の中心となる相手の聞き分けを容易にする方法がある（例えば、特許文献2参照）。
特開平4−１９１３５０号公報特開２０００−４９９４８号公報 Further, as a second conventional method, there is a method of facilitating the discrimination of the partner who is the center of the conversation by controlling the gain of each user based on the user's utterance history (see, for example, Patent Document 2). .
Japanese Patent Laid-Open No. 4-191350 JP 2000-49948 A

しかしながら、上記従来の第１の方法には、特定話者の発声に重複して他の話者の発声があった場合に、該特定話者の発声開始時点、終了時点において、他の話者の発声が途切れて、または、途中から再生されるという問題がある。例えば、ある話者が話している途中で、当該特定話者が話し始めると、もともと話していた話者の発声が途中で途切れて、内容の理解に支障をきたす場合がある。 However, in the first conventional method, when another speaker speaks in duplicate with the voice of the specific speaker, the other speaker is uttered at the start and end of the speech of the specific speaker. There is a problem that the utterance of is interrupted or reproduced from the middle. For example, when a specific speaker starts speaking while a certain speaker is speaking, the utterance of the speaker originally speaking is interrupted in the middle, which may hinder understanding of the content.

また、上記従来の第２の方法には、会話の中心でない人の聞き分けをも容易にする目的では利用できないという問題がある。例えば、ユーザが会話の中心でない人の音声を聞きたい場合もあるが、そのような用途には利用できない。 Further, the second conventional method has a problem that it cannot be used for the purpose of facilitating discrimination of a person who is not the center of conversation. For example, the user may want to hear the voice of a person who is not the center of the conversation, but cannot be used for such a purpose.

本発明は、上記の点に鑑みなされたもので、ユーザの音声コメントを、重なることなく聞くことが可能な映像再生方法及び装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and an object of the present invention is to provide a video reproduction method, apparatus, and program capable of listening to a user's voice comment without overlapping.

図１は、本発明の原理説明図である。 FIG. 1 is an explanatory diagram of the principle of the present invention.

本発明（請求項１）は、同一の映像を視聴中の複数のユーザの音声群を映像と共に再生する映像再生方法において、
映像を読み込んで再生を開始する再生開始ステップ（ステップ１）と、
ユーザ毎に設けられている、映像に対する発声の開始時刻、終了時刻からなる音声区間を格納する音声区間情報ファイルを参照し、該開始時刻、該終了時刻が設定されているか否かに基づいて、該ユーザ毎に設けられている視聴者音声ファイルに音声が格納されているかを判定する音声存在判定ステップ（ステップ２）と、
視聴者音声ファイルに音声が格納されている場合に、該音声が非言語か否かを判定し、非言語の場合には該音声を再生する非言語判定ステップ（ステップ３）と、
非言語判定ステップ（ステップ３）において、非言語ではないと判定された場合には、音声区間情報ファイルに格納されている音声区間に対応するユーザの音声について、ユーザ毎に予め決められている評価値が格納されているユーザ評価情報ファイルをユーザＩＤに基づいて参照し、最も高い評価値を有するユーザの音声を再生する音声再生ステップ（ステップ４）と、
音声再生ステップ（ステップ４）で音声再生された次の音声区間に対して、音声存在判定ステップ（ステップ２）、非言語判定ステップ（ステップ３）、音声再生ステップ（ステップ４）を映像ファイルの映像が終了するまで繰り返す繰り返しステップ（ステップ５）と、を行う。 The present invention (Claim 1) is a video playback method for playing back a group of voices of a plurality of users who are viewing the same video together with the video.
A playback start step (step 1) for reading video and starting playback;
Refer to the audio section information file that stores the audio section consisting of the start time and end time of the utterance for the video provided for each user, and based on whether the start time and the end time are set, An audio presence determination step (step 2) for determining whether audio is stored in a viewer audio file provided for each user;
A non-language determining step (step 3) of determining whether or not the sound is non-language when the sound is stored in the viewer's sound file, and reproducing the sound if the sound is non-language;
In the non-language determination step (step 3), when it is determined that the language is not non-language, the user's voice corresponding to the voice section stored in the voice section information file is predetermined for each user. A sound reproduction step (step 4) of referring to the user evaluation information file in which the value is stored based on the user ID and reproducing the sound of the user having the highest evaluation value;
For the next audio segment that has been reproduced in the audio reproduction step (step 4), the audio presence determination step (step 2), the non-language determination step (step 3), and the audio reproduction step (step 4) are performed in the video file. And repeat step (step 5) until it is finished.

本発明は、同一の映像を視聴中の複数のユーザの音声群を映像と共に再生する映像再生方法において、
映像を読み込んで再生を開始する再生開始ステップと、
ユーザ毎に設けられている、映像に対する発声の開始時刻、終了時刻からなる音声区間を格納する音声区間情報ファイルを参照し、該開始時刻、該終了時刻が設定されているか否かに基づいて、該ユーザ毎に設けられている視聴者音声ファイルに音声が格納されているかを判定する音声存在判定ステップと、
視聴者音声ファイルに音声が格納されている場合に、該音声が非言語か否かを判定し、非言語の場合には該音声を再生する非言語判定ステップと、
非言語判定ステップにおいて、非言語ではないと判定された場合には、音声区間情報ファイルに格納されている音声区間に対応するユーザの音声について、ユーザ毎に予め決められている評価値が格納されているユーザ評価情報ファイルをユーザＩＤに基づいて参照し、評価値に基づいて再生する音量を決定し、音声を再生する音声再生ステップと、
音声再生ステップで音声再生された次の音声区間に対して、音声存在判定ステップ、非言語判定ステップ、音声再生ステップを映像ファイルの映像が終了するまで繰り返す繰り返しステップと、を行う。 The present invention relates to a video playback method for playing back a group of voices of a plurality of users who are viewing the same video together with the video.
A playback start step for reading video and starting playback;
Refer to the audio section information file that stores the audio section consisting of the start time and end time of the utterance for the video provided for each user, and based on whether the start time and the end time are set, An audio presence determination step for determining whether audio is stored in a viewer audio file provided for each user;
A non-language determination step of determining whether or not the sound is non-language when the sound is stored in the viewer sound file, and reproducing the sound if the sound is non-language;
If it is determined in the non-language determination step that the language is not non-language, an evaluation value predetermined for each user is stored for the user's voice corresponding to the voice section stored in the voice section information file. A sound reproduction step of referring to the user evaluation information file being determined based on the user ID, determining a volume to be reproduced based on the evaluation value, and reproducing sound;
A voice presence determination step, a non-language determination step, and a repetition step of repeating the voice playback step until the video of the video file is completed are performed on the next voice section that has been played back by the voice playback step.

本発明（請求項３）は、同一の映像を視聴中の複数のユーザの音声群を映像と共に再生する映像再生方法において、
映像を読み込んで再生を開始する再生開始ステップと、
ユーザ毎に設けられている、映像に対する発声の開始時刻、終了時刻からなる音声区間を格納する音声区間情報ファイルを参照し、該開始時刻、該終了時刻が設定されているか否かに基づいて、該ユーザ毎に設けられている視聴者音声ファイルに音声が格納されているかを判定する音声存在判定ステップと、
視聴者音声ファイルに音声が格納されている場合に、該音声が非言語か否かを判定し、非言語以外で発声が重複しないユーザのグループを全ての音声区間情報ファイルを検索することにより取得する第１のユーザグループ検索ステップと、
第１のユーザグループ検索ステップにより取得したユーザのグループに基づいて、視聴者音声ファイルから該グループに属する全てのユーザの音声を読み出して音声を再生する第１の音声再生ステップと、
第１のユーザグループ検索ステップにおいて選択されなかったユーザの中で、発声が重複しないユーザのグループを、該ユーザの音声区間情報ファイルを検索することにより検索する第２のユーザグループ検索ステップと、
第２のユーザグループ検索ステップで検索されたグループの全ユーザの音声を視聴者音声ファイルから読み出して、映像と共に遡って再生する第２の音声再生ステップと、
全てのユーザの音声の再生が終了するまで、第２のユーザグループ検索ステップ及び第２の音声再生ステップを繰り返す繰り返しステップと、を行う。 The present invention (Claim 3) is a video playback method for playing back a group of voices of a plurality of users who are viewing the same video together with the video.
A playback start step for reading video and starting playback;
Refer to the audio section information file that stores the audio section consisting of the start time and end time of the utterance for the video provided for each user, and based on whether the start time and the end time are set, An audio presence determination step for determining whether audio is stored in a viewer audio file provided for each user;
When audio is stored in the viewer audio file, it is determined whether or not the audio is non-language, and a group of users other than non-language who does not overlap utterances is obtained by searching all audio section information files A first user group search step to:
A first audio reproduction step of reproducing audio by reading out audio of all users belonging to the group from a viewer audio file based on the group of users acquired by the first user group search step;
A second user group search step of searching for a group of users whose utterances are not duplicated among the users not selected in the first user group search step by searching the voice section information file of the user;
A second audio reproduction step of reading out audio of all users of the group searched in the second user group search step from the viewer audio file and reproducing the audio retroactively together with the video;
A repetition step of repeating the second user group search step and the second sound reproduction step is performed until the reproduction of all the user's sounds is completed.

本発明（請求項４）は、同一の映像を視聴中の複数のユーザの音声群を映像と共に再生する映像再生方法において、
映像を読み込んで再生を開始する再生開始ステップと、
ユーザ毎に設けられている視聴者音声ファイルのうち、未再生のユーザの音声について、ユーザ毎に予め決められている評価値が格納されているユーザ評価情報ファイルを参照して、評価値が最大のユーザの音声を該ユーザの視聴者音声ファイルから読み出して再生する第１の音声再生ステップと、
前回再生されたユーザの次の発声までの時間間隔に収まる他のユーザの音声を、ユーザ毎に設けられている映像に対する発声の開始時刻、終了時刻からなる音声区間を格納する全ての音声区間情報ファイルを検索することにより取得する発声候補検索ステップと、
発声候補検索ステップで取得したユーザの音声のうち、前回再生されたユーザの次に評価値が高いユーザの音声を該ユーザの視聴者音声ファイルから読み出して再生する第２の音声再生ステップと、
全てのユーザの音声を再生するまで、発声候補検索ステップ及び第２の音声再生ステップを繰り返す繰り返しステップと、を行う。 The present invention (Claim 4) is a video playback method for playing back a group of voices of a plurality of users who are viewing the same video together with the video.
A playback start step for reading video and starting playback;
Among the viewer audio files provided for each user, with respect to unreproduced user audio, the evaluation value is maximized by referring to a user evaluation information file in which an evaluation value predetermined for each user is stored. A first audio reproduction step of reading out and reproducing the user's audio from the user's viewer audio file;
All audio section information for storing the audio section consisting of the start time and end time of the utterance for the video provided for each user, with the voice of another user falling within the time interval until the next utterance of the user reproduced last time An utterance candidate search step obtained by searching a file;
A second audio reproduction step of reading out and reproducing the user's voice having the next highest evaluation value from the user's voice reproduced in the utterance candidate search step from the user's viewer voice file;
Repeating the utterance candidate search step and the second sound reproduction step until all user sounds are reproduced.

本発明（請求項５）は、請求項１の音声再生ステップにおいて、
音声区間内に最も高い評価値を有するユーザの音声を視聴者音声ファイルから読み出して再生する際に、
所定の時間マージンＴの音声区間の再生の重複を許容する。 According to the present invention (Claim 5), in the sound reproduction step of Claim 1,
When the user's voice having the highest evaluation value in the voice section is read from the viewer voice file and played back,
Overlapping reproduction of audio sections with a predetermined time margin T is allowed.

本発明（請求項６）は、同一の映像を視聴中の複数のユーザの音声群を映像と共に再生する映像再生方法において、
映像を読み込んで再生を開始する再生開始ステップと、
ユーザ毎に設けられている、映像に対する発声の開始時刻、終了時刻からなる音声区間を格納する音声区間情報ファイルを参照し、該開始時刻、該終了時刻が設定されているか否かに基づいて、該ユーザ毎に設けられている視聴者音声ファイルに音声が格納されているかを判定する音声存在判定ステップと、
視聴者音声ファイルに音声が格納されている場合に、非言語の場合には該音声を再生する非言語判定ステップと、
非言語判定ステップにおいて、非言語ではないと判定された第１の音声について、該第１の音声の開始時刻と該第１の音声の次に発声された他のユーザの第２の音声の開始時刻との差が該第１の音声の区間長よりも短い場合には、該第１の音声を短縮して再生する音声再生ステップと、
音声区間情報ファイルに格納されている音声区間の全てに対して、音声存在判定ステップ、非言語判定ステップ、音声再生ステップを繰り返す繰り返しステップと、を行う。 The present invention (Claim 6) is a video playback method for playing back a plurality of audio groups of users who are viewing the same video together with the video.
A playback start step for reading video and starting playback;
Refer to the audio section information file that stores the audio section consisting of the start time and end time of the utterance for the video provided for each user, and based on whether the start time and the end time are set, An audio presence determination step for determining whether audio is stored in a viewer audio file provided for each user;
A non-language determination step of reproducing the sound if the sound is stored in the viewer's sound file, if the sound is non-language;
In the non-language determination step, for the first sound determined not to be non-language, the start time of the first sound and the start of the second sound of another user uttered next to the first sound If the difference from the time is shorter than the section length of the first sound, a sound playback step for shortening and playing back the first sound;
A repeat step of repeating the voice presence determination step, the non-language determination step, and the voice playback step is performed on all the voice segments stored in the voice segment information file.

図２は、本発明の原理構成図である。 FIG. 2 is a principle configuration diagram of the present invention.

本発明（請求項７）は、同一の映像を視聴中の複数のユーザの音声群を映像と共に再生する映像再生装置であって、
映像を格納した映像ファイル１０１と、
ユーザ毎に設けられている、映像に対する発声の開始時刻、終了時刻からなる音声区間を格納する複数の音声区間情報ファイル２１０と、
ユーザ毎に設けられている該ユーザの音声が格納されている複数の視聴者音声ファイル１０と、
ユーザ毎に予め決められている評価値が格納されているユーザ評価情報ファイル１０８と、
映像・音声を再生する再生手段１１０と、
音声の再生を制御するための再生制御手段１２０と、を有し、
再生手段１１０は、
映像ファイル１０１から映像を読み込んで再生を開始する映像再生開始手段と、
再生制御手段１２０からの指示に基づいて、音声を映像に同期させて再生させる手段と、を有し、
再生制御手段１２０は、
音声区間情報ファイル２０を参照し、該開始時刻、該終了時刻が設定されているか否かに基づいて、視聴者音声ファイル１０に音声が格納されているかを判定する音声存在判定手段と、
視聴者音声ファイル１０に音声が格納されている場合に、該音声が非言語か否かを判定する非言語判定手段と、
非言語判定手段において、非言語ではないと判定された場合には、音声区間情報ファイル２０に格納されている音声区間に対応するユーザの音声について、ユーザ評価情報ファイル１０８をユーザＩＤに基づいて参照し、最も高い評価値を有するユーザの音声を再生手段に再生させ、非言語判定手段において非言語と判定された場合には該非言語の音声を再生手段に再生させる音声再生制御手段と、
再生手段１１０で音声再生された次の音声区間に対して、音声存在判定手段、非言語判定手段、音声再生制御手段を映像ファイルの映像が終了するまで繰り返す繰り返し手段と、
を有する。 The present invention (Claim 7) is a video playback device that plays back a group of voices of a plurality of users who are viewing the same video together with the video,
A video file 101 storing video;
A plurality of audio section information files 210 for storing audio sections including start time and end time of utterance for video provided for each user;
A plurality of viewer audio files 10 each storing audio of the user provided for each user;
A user evaluation information file 108 storing evaluation values predetermined for each user;
Playback means 110 for playing back video and audio;
Reproduction control means 120 for controlling the reproduction of sound,
The reproduction means 110
Video playback start means for reading video from the video file 101 and starting playback;
Based on an instruction from the reproduction control unit 120, and reproduces the audio in synchronization with the video,
The reproduction control means 120 is
Audio presence determination means for referring to the audio section information file 20 and determining whether audio is stored in the viewer audio file 10 based on whether the start time and the end time are set;
Non-language determination means for determining whether or not the sound is non-language when the sound is stored in the viewer sound file 10;
When the non-language determining unit determines that the language is not non-language, the user evaluation information file 108 is referred to based on the user ID for the user's voice corresponding to the voice section stored in the voice section information file 20. A voice reproduction control unit that causes the reproduction unit to reproduce the voice of the user having the highest evaluation value, and causes the reproduction unit to reproduce the non-language voice when the non-language determination unit determines the non-language.
Repeating means for repeating the sound presence determining means, the non-language determining means, and the sound playback control means until the video of the video file is completed for the next voice section played back by the playback means 110;
Have

本発明（請求項８）は、同一の映像を視聴中の複数のユーザの音声群を映像と共に再生する映像再生であって、
映像を格納した映像ファイルと、
ユーザ毎に設けられている、映像に対する発声の開始時刻、終了時刻からなる音声区間を格納する複数の音声区間情報ファイルと、
ユーザ毎に設けられている該ユーザの音声が格納されている複数の視聴者音声ファイルと、
ユーザ毎に予め決められている評価値が格納されているユーザ評価情報ファイルと、
映像・音声を再生する再生手段と、
音声の再生を制御するための再生制御手段と、を有し、
再生手段は、
映像ファイルから映像を読み込んで再生を開始する映像再生開始手段と、
再生制御手段からの指示に基づいて、音声を映像に同期させて再生させる手段と、を有し、
再生制御手段は、
音声区間情報ファイルを参照し、該開始時刻、該終了時刻が設定されているか否かに基づいて、視聴者音声ファイルに音声が格納されているかを判定する音声存在判定手段と、
視聴者音声ファイルに音声が格納されている場合に、該音声が非言語か否かを判定する非言語判定手段と、
非言語判定手段において、非言語ではないと判定された場合には、音声区間情報ファイルに格納されている音声区間に対応するユーザの音声について、ユーザ評価情報ファイルを該ユーザのユーザＩＤに基づいて参照し、評価値に基づいて再生手段に再生させる音量を決定し、該音量でユーザごとの音声を再生手段に再生させ、該非言語判定手段において、非言語と判定された場合には、非言語の音声を該再生手段に再生させる音声再生制御手段と、
音声再生された次の音声区間に対して、音声存在判定手段、非言語判定手段、音量制御手段、音声再生制御手段を映像ファイルの映像が終了するまで繰り返す繰り返し手段と、
を有する。 The present invention (Claim 8) is a video reproduction for reproducing a group of users of a plurality of users who are viewing the same video together with the video,
A video file containing the video,
A plurality of audio section information files for storing audio sections consisting of start time and end time of utterance for video provided for each user;
A plurality of viewer audio files storing the user's audio provided for each user;
A user evaluation information file storing evaluation values predetermined for each user;
Playback means for playing back video and audio;
Playback control means for controlling the playback of audio,
Reproduction means
Video playback start means for reading video from a video file and starting playback;
Based on an instruction from the reproduction control means, and has a means for reproducing the sound in synchronization with the video,
The playback control means
Audio presence determination means for referring to an audio section information file and determining whether audio is stored in the viewer audio file based on whether the start time and the end time are set;
Non-language determination means for determining whether or not the sound is non-language when the sound is stored in the viewer sound file;
When the non-language determination unit determines that the language is not non-language, the user evaluation information file is stored based on the user ID of the user for the user's voice corresponding to the voice section stored in the voice section information file. The sound volume to be played back by the playback means is determined based on the evaluation value, and the sound for each user is played back by the playback means at the volume, and if the non-language determination means determines that the language is non-language, Audio reproduction control means for causing the reproduction means to reproduce the sound of
Repeating means for repeating the sound presence determining means, the non-language determining means, the volume control means, and the sound reproduction control means until the video of the video file is finished for the next voice section that has been played back,
Have

本発明（請求項９）は、同一の映像を視聴中の複数のユーザの音声群を映像と共に再生する映像再生装置であって、
映像を格納した映像ファイルと、
ユーザ毎に設けられている、映像に対する発声の開始時刻、終了時刻からなる音声区間を格納する複数の音声区間情報ファイルと、
ユーザ毎に設けられている該ユーザの音声が格納されている複数の視聴者音声ファイルと、
映像・音声を再生する再生手段と、
音声の再生を制御するための再生制御手段と、を有し、
再生手段は、
映像ファイルから映像を読み込んで再生を開始する映像再生開始手段と、
再生制御手段からの指示に基づいて、音声を映像に同期させて再生させる手段と、を有し、
再生制御手段は、
音声区間情報ファイルを参照し、該開始時刻、該終了時刻が設定されているか否かに基づいて、視聴者音声ファイルに音声が格納されているかを判定する音声存在判定手段と、
視聴者音声ファイルに音声が格納されている場合に、該音声が非言語か否かを判定する非言語判定手段と、
非言語以外で発声が重複しないユーザのグループを全ての音声区間情報ファイルを検索することにより取得する第１のユーザグループ検索手段と、
第１のユーザグループ検索手段により取得したユーザのグループに基づいて、視聴者音声ファイルから該グループに属する全てのユーザの音声を読み出して、再生手段に音声を再生させる第１の音声再生制御手段と、
第１のユーザグループ検索手段において選択されなかったユーザの中で、発声が重複しないユーザのグループを、音声区間情報ファイルを検索することにより検索する第２のユーザグループ検索手段と、
第２のユーザグループ検索手段で検索されたグループに属する全てのユーザの音声を視聴者音声ファイルから読み出して、再生手段に映像と共に遡って再生させる第２の音声再生制御手段と、
全てのユーザの音声の再生が終了するまで、第２のユーザグループ検索手段及び第２の音声再生制御手段を繰り返す繰り返し手段と、を有する。 The present invention (Claim 9) is a video playback device that plays back a group of voices of a plurality of users who are viewing the same video together with the video,
A video file containing the video,
A plurality of audio section information files for storing audio sections consisting of start time and end time of utterance for video provided for each user;
A plurality of viewer audio files storing the user's audio provided for each user;
Playback means for playing back video and audio;
Playback control means for controlling the playback of audio,
Reproduction means
Video playback start means for reading video from a video file and starting playback;
Based on an instruction from the reproduction control means, and has a means for reproducing the sound in synchronization with the video,
The playback control means
Audio presence determination means for referring to an audio section information file and determining whether audio is stored in the viewer audio file based on whether the start time and the end time are set;
Non-language determination means for determining whether or not the sound is non-language when the sound is stored in the viewer sound file;
First user group search means for acquiring a group of users whose utterances other than non-languages are not duplicated by searching all voice section information files;
First audio reproduction control means for reading out voices of all users belonging to the group from the viewer audio file based on the user group acquired by the first user group search means and causing the reproduction means to reproduce the audio; ,
A second user group search means for searching for a group of users whose utterances are not duplicated among the users not selected by the first user group search means by searching the voice section information file;
Second audio reproduction control means for reading out voices of all users belonging to the group searched by the second user group search means from the viewer audio file and causing the reproduction means to play back together with the video;
And repeating means for repeating the second user group search means and the second sound reproduction control means until the reproduction of the sounds of all the users is completed.

本発明（請求項１０）は、同一の映像を視聴中の複数のユーザの音声群を映像と共に再生する映像再生装置であって、
映像を格納した映像ファイルと、
ユーザ毎に設けられている、映像に対する発声の開始時刻、終了時刻からなる音声区間を格納する複数の音声区間情報ファイルと、
ユーザ毎に設けられている該ユーザの音声が格納されている複数の視聴者音声ファイルと、
ユーザ毎に予め決められている評価値が格納されているユーザ評価情報ファイルと、
映像・音声を再生する再生手段と、
音声の再生を制御するための音声再生制御手段と、を有し、
再生手段は、
映像ファイルから映像を読み込んで再生を開始する映像再生開始手段と、
再生制御手段からの指示に基づいて、音声を映像に同期させて再生させる手段と、を有し、
音声再生制御手段は、
視聴者音声ファイルのうち、未再生のユーザの音声について、ユーザ評価情報ファイルを参照して、評価値が最大のユーザの音声を視聴者音声ファイルから読み出して再生手段に再生させる第１の音声再生制御手段と、
前回再生されたユーザの次の発声までの時間間隔に収まる他のユーザの音声を、全ての音声区間情報ファイルを検索することにより取得する発声候補検索手段と、
発声候補検索手段で取得したユーザの音声のうち、ユーザ評価情報ファイルを参照して、前回再生されたユーザの次に評価値が高いユーザの音声を、該ユーザの視聴者音声ファイルから読み出して再生手段に再生させる第２の音声再生制御手段と、
全てのユーザの音声を再生するまで、発声候補検索手段及び第２の音声再生制御手段を繰り返す繰り返し手段と、を有する。 The present invention (Claim 10) is a video playback device that plays back a group of voices of a plurality of users who are viewing the same video together with the video,
A video file containing the video,
A plurality of audio section information files for storing audio sections consisting of start time and end time of utterance for video provided for each user;
A plurality of viewer audio files storing the user's audio provided for each user;
A user evaluation information file storing evaluation values predetermined for each user;
Playback means for playing back video and audio;
Audio reproduction control means for controlling audio reproduction,
Reproduction means
Video playback start means for reading video from a video file and starting playback;
Based on an instruction from the reproduction control means, and has a means for reproducing the sound in synchronization with the video,
The sound reproduction control means
First audio reproduction for reading out the user's voice having the maximum evaluation value from the viewer's voice file and playing it back to the playback means with reference to the user evaluation information file for the voice of the unplayed user in the viewer's voice file Control means;
Utterance candidate search means for acquiring other user's voice that fits in the time interval until the next utterance of the user reproduced last time by searching all voice section information files;
Of the user's voices acquired by the utterance candidate search means, the user evaluation information file is referred to, and the voice of the user having the second highest evaluation value after the previous playback is read from the user's viewer voice file and played back. Second sound reproduction control means for reproducing the means;
And repeating means for repeating the utterance candidate search means and the second sound reproduction control means until all user sounds are reproduced.

本発明（請求項１１）は、請求項７の音声再生制御手段において、
音声区間内に最も高い評価値を有するユーザの音声を視聴者音声ファイルから読み出して、再生手段に再生させる際に、
所定の時間マージンＴの音声区間の再生の重複を許容する手段を含む。 According to the present invention (claim 11), in the sound reproduction control means of claim 7,
When the voice of the user having the highest evaluation value in the voice section is read from the viewer voice file and played back by the playback means,
Means for allowing reproduction of audio sections of a predetermined time margin T in duplicate.

本発明（請求項１２）は、同一の映像を視聴中の複数のユーザの音声群を映像と共に再生する映像再生装置であって、
映像を格納した映像ファイルと、
ユーザ毎に設けられている、映像に対する発声の開始時刻、終了時刻からなる音声区間を格納する複数の音声区間情報ファイルと、
ユーザ毎に設けられている該ユーザの音声が格納されている複数の視聴者音声ファイルと、
映像・音声を再生する再生手段と、
音声の再生を制御するための音声再生制御手段と、を有し、
再生手段は、
映像ファイルから映像を読み込んで再生を開始する映像再生開始手段と、
再生制御手段からの指示に基づいて、音声を映像に同期させて再生させる手段と、を有し、
音声再生制御手段は、
音声区間情報ファイルを参照し、該開始時刻、該終了時刻が設定されているか否かに基づいて、視聴者音声ファイルに音声が格納されているかを判定する音声存在判定手段と、
視聴者音声ファイルに音声が格納されている場合に、該音声が非言語か否かを判定する非言語判定手段と、
非言語判定手段において、非言語であると判定された音声を再生手段に再生させ、非言語ではないと判定された第１の音声について、該第１の音声の開始時刻と該第１の音声の次に発声された他のユーザの第２の音声の開始時刻との差が該第１の音声の区間長よりも短い場合には、該第１の音声を短縮して該再生手段に再生させる音声再生制御手段と、
音声区間情報ファイルに格納されている音声区間の全てに対して、音声存在判定手段、非言語判定手段、音声再生制御手段を繰り返す繰り返し手段と、を有する。 The present invention (Claim 12) is a video playback device that plays back a group of voices of a plurality of users who are viewing the same video together with the video,
A video file containing the video,
A plurality of audio section information files for storing audio sections consisting of start time and end time of utterance for video provided for each user;
A plurality of viewer audio files storing the user's audio provided for each user;
Playback means for playing back video and audio;
Audio reproduction control means for controlling audio reproduction,
Reproduction means
Video playback start means for reading video from a video file and starting playback;
Based on an instruction from the reproduction control means, and has a means for reproducing the sound in synchronization with the video,
The sound reproduction control means
Audio presence determination means for referring to an audio section information file and determining whether audio is stored in the viewer audio file based on whether the start time and the end time are set;
Non-language determination means for determining whether or not the sound is non-language when the sound is stored in the viewer sound file;
The non-language determining means causes the playback means to reproduce the sound determined to be non-language, and for the first sound determined not to be non-language, the start time of the first sound and the first sound If the difference from the start time of the second voice of another user uttered next is shorter than the section length of the first voice, the first voice is shortened and played back on the playback means. Voice reproduction control means for
Repeating means for repeating the voice presence determining means, the non-language determining means, and the voice reproduction control means for all the voice sections stored in the voice section information file.

本発明（請求項１３）は、同一の映像を視聴中の複数のユーザの音声群を映像と共に再生する映像再生プログラムであって、
請求項１乃至６記載の映像再生方法を実現するための処理をコンピュータに実行させるプログラムである。 The present invention (Claim 13) is a video playback program for playing back a group of voices of a plurality of users who are viewing the same video together with the video,
A program for causing a computer to execute processing for realizing the video playback method according to claim 1.

上述のように本発明によれば、映像視聴者の音声を映像と共に再生する場合において、複数のユーザの音声を聞き取りに妨げなく聞くことが可能となる。 As described above, according to the present invention, it is possible to listen to the voices of a plurality of users without hindering listening when reproducing the voice of the video viewer together with the video.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図３は、本発明の一実施の形態におけるシステム構成を示す。同図に示すシステムは、記憶媒体に格納された複数のファイルと、映像再生装置として利用される計算機１００及び、計算機１００の処理結果を出力するスピーカを含むモニタ２００から構成される。 FIG. 3 shows a system configuration according to an embodiment of the present invention. The system shown in FIG. 1 includes a plurality of files stored in a storage medium, a computer 100 used as a video playback device, and a monitor 200 including a speaker that outputs a processing result of the computer 100.

ファイルは、映像ファイル１０１、視聴者音声ファイルＡ１０２、音声区間ファイルＡ１０３、視聴者音声ファイルＢ１０４、音声区間ファイルＢ１０５、視聴者音声ファイルＣ１０６、音声区間ファイルＣ１０７、ユーザ評価情報ファイル１０８がある。なお、同図では、視聴者音声ファイルと音声区間ファイルのグループの数を３つとしたが、この数は１以上であれば任意でよい。例えば、映像視聴中の２人の音声を聞く場合は、視聴者音声ファイルＣ１０６、音声区間ファイル１０７は不要であり、Ｎ人の視聴者に対してＮ個の視聴者音声ファイルと、音声区間ファイルを用意すれば本発明は実現可能である。 The files include a video file 101, a viewer audio file A102, an audio segment file A103, a viewer audio file B104, an audio segment file B105, a viewer audio file C106, an audio segment file C107, and a user evaluation information file 108. In the figure, the number of groups of the viewer audio file and the audio section file is three, but this number may be arbitrary as long as it is one or more. For example, when listening to the sound of two people who are viewing a video, the viewer audio file C106 and the audio section file 107 are not necessary, and N viewer audio files and audio section files are provided for N viewers. If this is prepared, the present invention can be realized.

映像ファイル１０１は、ユーザが視聴するファイルである。 The video file 101 is a file that the user views.

視聴者音声ファイルＡ１０２は、ある視聴者Ａが映像ファイル１０１を見ながら発声した音声をマイク等（図示せず）によって録音して得られた音声ファイルである。ここで、映像ファイル１０１中の時間と、視聴者音声ファイルＡ１０２中の時間は同期が取れているとする。すなわち、映像ファイル１０１中のある時刻に発せられた音声は、視聴者音声ファイルＡ１０２中の当該時刻に対応付けられて記録される。 The viewer audio file A102 is an audio file obtained by recording a voice uttered by a viewer A while watching the video file 101 with a microphone or the like (not shown). Here, it is assumed that the time in the video file 101 and the time in the viewer audio file A102 are synchronized. That is, the sound uttered at a certain time in the video file 101 is recorded in association with the time in the viewer sound file A102.

音声区間ファイルＡ１０３は、視聴者音声ファイルＡ１０２中で、音声が発声された区間群の開始時刻、終了時刻を有するファイルである。当該音声区間ファイルの例を図４に示す。当該音声区間ファイルは、音声ＩＤに対応付けられた発声の開始時刻／終了時刻が格納される。また、音声区間ファイルＡ１０３を無しとして、計算機１００によって音声区間ファイルＡ１０３に相当する情報を生成するとしてもよい。 The audio section file A103 is a file having the start time and end time of the section group in which the voice is uttered in the viewer audio file A102. An example of the speech segment file is shown in FIG. The voice section file stores the start time / end time of utterance associated with the voice ID. Further, the computer 100 may generate information corresponding to the voice section file A103 without the voice section file A103.

視聴者音声ファイルＢ１０４は、視聴者音声ファイルＡ１０２の音声を発したユーザとは別のユーザが映像ファイル１０１を視聴しながら発した音声のファイルである。ここで、視聴者音声ファイルＡ１０２と同様に映像ファイル１０１との同期がとられているとする。 The viewer audio file B104 is an audio file issued while viewing the video file 101 by a user other than the user who generated the audio of the viewer audio file A102. Here, it is assumed that the video file 101 is synchronized with the viewer audio file A102.

音声区間ファイルＢ１０５は、視聴者音声ファイルＢ１０４の中で、音声が発声された区間群の開始時刻／終了時刻を有するファイルである。 The audio section file B105 is a file having the start time / end time of the section group in which the voice is uttered in the viewer audio file B104.

視聴者音声ファイルＣ１０６は、視聴者音声ファイルＡ１０２の音声を発したユーザ、視聴者音声ファイルＢ１０４の音声を発したユーザとは別のユーザが映像ファイル１０１を視聴しながら発した音声のファイルである。ここで、視聴者音声ファイルＡ１０２と同様に映像ファイル１０１との同期がとられている。 The viewer audio file C106 is an audio file generated while a user other than the user who generated the audio of the viewer audio file A102 and the user who generated the audio of the viewer audio file B104 is viewing the video file 101. . Here, the video file 101 is synchronized in the same manner as the viewer audio file A102.

音声区間ファイルＣ１０７は、視聴者音声ファイルＣ１０６の中で、音声が発声された区間群の開始時刻／終了時刻を有するファイルである。 The audio section file C107 is a file having the start time / end time of the section group in which the voice is uttered in the viewer audio file C106.

なお、上記の視聴者音声ファイル１０２，１０４，１０６の構成は、映像ＩＤ、音声を入力したシーンＩＤ、音声ＩＤ、音声を有し、映像ファイル１０１とはシーンＩＤにより同期を取るものとする。 The viewer audio files 102, 104, and 106 have a video ID, a scene ID to which audio is input, an audio ID, and audio, and are synchronized with the video file 101 by the scene ID.

ユーザ評価情報ファイル１０８は、視聴者音声ファイルＡ１０２を発声したユーザ（ユーザＡ）、視聴者音声ファイルＢ１０４を発声したユーザ（ユーザＢ）、視聴者音声ファイルＣ１０４を発声したユーザ（ユーザＣ）の発声の重要度に関する情報を有する。例えば、図５のように、各ユーザに対して、その発声の重要度を０から１００までの数値で指定する。なお、ユーザ評価情報ファイル１０８は、例えば、予め投票等を行い、その投票数等によりユーザ毎に評価を行い、評価値を付与して格納することにより生成されているものとする。 The user evaluation information file 108 is uttered by the user (user A) who utters the viewer audio file A102, the user (user B) who utters the viewer audio file B104, and the user (user C) who utters the viewer audio file C104. Information on the importance of For example, as shown in FIG. 5, the importance of the utterance is designated by a numerical value from 0 to 100 for each user. Note that the user evaluation information file 108 is generated, for example, by performing voting in advance, performing evaluation for each user based on the number of votes, and assigning and storing an evaluation value.

計算機１００は、映像ファイル１０１、視聴者音声ファイルＡ１０２、音声区間ファイルＣ１０６、音声区間ファイルＣ１０７、ユーザ評価情報ファイル１０８の入力と、モニタ２００への映像、音声の送出と、各種処理を行う。 The computer 100 performs various processes such as input of a video file 101, a viewer audio file A102, an audio section file C106, an audio section file C107, and a user evaluation information file 108, transmission of video and audio to the monitor 200, and the like.

モニタ２００は、計算機１００から送られる映像を表示し、ユーザに提示する。 The monitor 200 displays the video sent from the computer 100 and presents it to the user.

図３に示すシステムでは、予め複数のユーザが同一の映像を視聴しながら発声した音声を、任意のユーザに対して当該映像と当該音声とを提示する。例えば、ユーザが発声する音声としては「凄い」、「うまそうだ」、「ヤッホー」などが考えられる。 In the system shown in FIG. 3, the video and the audio are presented to an arbitrary user as voices uttered while a plurality of users are viewing the same video in advance. For example, as the voice uttered by the user, “great”, “good”, “yahoo”, and the like can be considered.

［第１の実施の形態］
本実施の形態では、蓄積されている音声が非言語・言語であるかを判定し、さらに、言語である場合に、予め定められている評価値に基づいて、再生する音声を決定するものである。 [First Embodiment]
In the present embodiment, it is determined whether or not the accumulated voice is a non-language / language, and if the voice is a language, the voice to be reproduced is determined based on a predetermined evaluation value. is there.

図６は、本発明の第１の実施の形態における映像再生装置の構成を示す。 FIG. 6 shows the configuration of the video reproduction apparatus according to the first embodiment of the present invention.

同図に示す映像再生装置は、上記で説明した映像ファイル１０１、視聴者音声ファイルＡ１０２、音声区間ファイルＡ１０３、視聴者音声ファイルＢ１０４、音声区間ファイルＢ１０５、視聴者音声ファイルＣ１０６、音声区間ファイルＣ１０７，ユーザ評価情報ファイル１０８及びモニタ２００が映像再生装置として利用される計算機１００に接続されている構成である。 The video playback apparatus shown in the figure includes the video file 101, viewer audio file A102, audio segment file A103, viewer audio file B104, audio segment file B105, viewer audio file C106, audio segment file C107, described above. In this configuration, the user evaluation information file 108 and the monitor 200 are connected to a computer 100 used as a video playback device.

映像再生装置１００は、再生部１１０、音声再生制御部１２０から構成される。 The video playback device 100 includes a playback unit 110 and an audio playback control unit 120.

なお、モニタ２００には、スピーカが内蔵されている構成とし、再生部１１０から音声が出力されるものとする。 The monitor 200 has a built-in speaker, and audio is output from the playback unit 110.

以下、映像再生装置１００の動作を説明する。 Hereinafter, the operation of the video reproduction apparatus 100 will be described.

図７は、本発明の第１の実施の形態における動作のフローチャートである。 FIG. 7 is a flowchart of the operation in the first embodiment of the present invention.

ステップ１０１）再生部１１０は、映像ファイル１０１を読み込み、再生を開始して、モニタ２００への映像、音声の送出を開始する。 Step 101) The playback unit 110 reads the video file 101, starts playback, and starts sending video and audio to the monitor 200.

ステップ１０２）音声再生制御部１２０は、現時点において、視聴者音声ファイルＡ１０２、視聴者音声ファイルＢ１０４、視聴者音声ファイルＣ１０６のいずれかに音声が存在するかどうかを、音声区間ファイルＡ１０３、音声区間ファイルＢ１０５、音声区間ファイルＣ１０７を参照して判定する。判定方法としては、各音声区間ファイルの開始時刻及び終了時刻の欄がＮＵＬＬ以外であれば、音声が存在すると判定する。音声が存在する場合には、ステップ１０３に移行し、存在しない場合は当該ステップを繰り返す。 Step 102) The audio reproduction control unit 120 determines whether or not audio is present in any of the viewer audio file A102, the viewer audio file B104, and the viewer audio file C106 at present, Judgment is made with reference to B105 and the voice section file C107. As a determination method, if the start time and end time fields of each audio section file are other than NULL, it is determined that there is audio. If there is sound, the process proceeds to step 103, and if not, the step is repeated.

ステップ１０３）ステップ１０２で音声が存在すると判定された場合には、音声が存在した視聴者音声ファイルについて、音声が非言語音声か否かを判定する。ここで、非言語音声とは「えー」「あー」等の音声を指す。音声信号に対して、非言語音声か否かの判定を行う処理としては、例えば、「金澤博史，クリスマエダ、竹林洋一、“計算機と対話のための非言語音声の認識と合成”電子情報通信学会論文誌、D-II, Vol.J77-D-II No.8, pp.1512-1421, 1994」等を用いることができる。また、一括してセンタに非言語／言語の判定を依頼する方法も考えられる。 Step 103) If it is determined in step 102 that the sound is present, it is determined whether or not the sound is a non-language sound for the viewer sound file in which the sound is present. Here, the non-verbal speech refers to speech such as “e” or “a”. For example, “Hiroshi Kanazawa, Chris Maeda, Yoichi Takebayashi,“ Recognition and synthesis of non-verbal speech for interaction with a computer ”can be used as a process for determining whether or not a speech signal is non-verbal speech. Journal, D-II, Vol.J77-D-II No.8, pp.1512-1421, 1994 ”can be used. Further, a method of requesting the center to make a non-language / language determination is also conceivable.

ここでは、映像再生処理中に判定を行う例を挙げたがが、映像再生処理前に各音声ファイル（１０２，１０４，１０６）に対して非言語音声かどうかの判定を行い、音声区間ファイル（１０３，１０５，１０７）に付加情報として記録しておき、本ステップででは、当該音声区間ファイル中の当該付加情報を参照することによって、非言語音声かどうかの判定を行ってもよい。非言語である場合は、ステップ１０５に移行する。非言語でない場合にはステップ１０４に移行する。これは、非言語の音声に関しては、言語の音声と重なった場合に、該言語の理解を妨げる影響が小さいことを考慮して、図８のａ，ｂ，ｃのように非言語音声は全て再生するためである。例えば、「ありがとう」という音声と「おじゃまします」という音声が同時に再生されると、それぞれの音声が聞き取りにくいが、「あー」という非言語音声と、「おじゃまします」という音声が同時に再生されたとしても、「おじゃまします」の音声の聞き取りはあまり妨げられない。 Here, an example is given in which the determination is performed during the video playback process, but before the video playback process, it is determined whether each audio file (102, 104, 106) is non-verbal audio and an audio section file ( 103, 105, 107) as additional information, and in this step, it may be determined whether or not it is non-verbal speech by referring to the additional information in the speech segment file. If it is a non-language, the process proceeds to step 105. If it is not a non-language, the process proceeds to step 104. This is because non-linguistic voices are all non-linguistic voices such as a, b, and c in FIG. 8 in consideration of the fact that, when overlapped with the voices of a language, the influence of hindering the understanding of the language is small. It is for reproduction. For example, if you hear “Thank you” and “Ojamamasu” at the same time, they will be difficult to hear, but “Ah” and “Ojamamasu” will be played at the same time. Even so, listening to the voice of “Ojamasashi” is not much disturbed.

なお、非音声を判定する処理をセンタ側で行うことも可能である。この場合には、視聴者音声ファイル、映像ファイルを当該装置内に持つ必要はない。 It is also possible to perform processing for determining non-voice on the center side. In this case, it is not necessary to have a viewer audio file and a video file in the apparatus.

ステップ１０４）音声再生制御部１２０は、ステップ１０２で認められた音声に対して、当該音声の発声区間を音声区間ファイル１０３、１０５，１０７を参照して同定し、当該区間内に当該ユーザよりも評価の高いユーザの音声があるかどうかを、ユーザ評価情報ファイル１０８と音声区間ファイル１０３，１０５，１０７を参照して判定する。音声がある場合はステップ１０２に移行し、音声がない場合にはステップ１０５に移行する。例えば、図８において、ユーザＡの音声ｄに対して、当該音声の区間内には、ユーザＡよりも評価の高いユーザの音声は存在しないので、ステップ１０５に移行して、当該音声が再生される。一方、図８において、ユーザＢの音声ｅに対して、当該音声の区間内にはユーザＢよりも評価の高いユーザＡの音声が存在するので、ステップ１０２に移行し、当該音声については再生しない。 Step 104) The voice reproduction control unit 120 identifies the voice utterance section with respect to the voice recognized in Step 102 with reference to the voice section files 103, 105, and 107, and within the section from the user. It is determined with reference to the user evaluation information file 108 and the voice section files 103, 105, and 107 whether there is a voice of a user with a high evaluation. If there is sound, the process proceeds to step 102, and if there is no sound, the process proceeds to step 105. For example, in FIG. 8, since there is no user's voice with a higher evaluation than user A in the voice section with respect to the voice d of user A, the process moves to step 105 and the voice is reproduced. The On the other hand, in FIG. 8, since there is a voice of user A having a higher evaluation than the user B in the voice section with respect to the voice e of user B, the process proceeds to step 102 and the voice is not reproduced. .

これは、評価の高いユーザの音声は、評価のユーザの音声と比べて、聞く必要性が高いことを考慮して、評価の高いユーザの音声と、評価の低いユーザの音声とが重なった場合は、評価の高いユーザの音声のみを再生することによって、音声の重なりによる音声聞き取りの妨げの影響をなくすものである。例えば、図８のユーザＡの発声とユーザＢの発声が重なっている部分（ｄ，ｅ）において、ユーザＡの発声ｄは再生されるが、ユーザＢの発声ｅは再生されない。 This is because the voice of a user with a high evaluation and the voice of a user with a low evaluation overlap with the voice of a user with a high evaluation in consideration of the necessity of listening to the voice of the user with a high evaluation. In this case, only the voice of a highly evaluated user is reproduced, thereby eliminating the influence of hindering the voice listening due to the voice overlap. For example, in the portion (d, e) where the utterance of user A and the utterance of user B overlap in FIG. 8, the utterance d of user A is reproduced, but the utterance e of user B is not reproduced.

ステップ１０５）再生部１１０は、ステップ１０２で存在すると判定された音声を再生する。 Step 105) The reproduction unit 110 reproduces the sound determined to exist at step 102.

このように、映像とユーザの音声を同期して再生することにより、映像の各シーンにおいて、他のユーザの発声を聞きながら映像を楽しむことができる。 In this way, by reproducing the video and the user's voice in synchronization, it is possible to enjoy the video while listening to the voices of other users in each scene of the video.

ステップ１０６）映像の末尾まで再生されたかどうかを判定し、映像の末尾である場合は、ステップ１０７に移行する。映像の末尾でない場合はステップ１０２に移行する。 Step 106) It is determined whether or not the end of the video has been reproduced. If it is the end of the video, the process proceeds to Step 107. If it is not the end of the video, the process proceeds to step 102.

ステップ１０７）再生部１１０は、映像と音声の再生を停止する。 Step 107) The reproduction unit 110 stops the reproduction of the video and audio.

このように、本実施の形態では、複数の音声の中で評価の高いユーザの音声を漏れなく聞くことができる。 As described above, in this embodiment, it is possible to listen to the voice of a user who is highly evaluated among a plurality of voices without omission.

［第２の実施の形態］
本実施の形態では、予め定められている評価値に基づいて、再生する音量を制御する例を説明する。 [Second Embodiment]
In this embodiment, an example will be described in which the playback volume is controlled based on a predetermined evaluation value.

本実施の形態における映像再生装置１００の構成は、前述の第１の実施の形態における図６の構成と同様であるが、音声再生制御部１２０と再生部１１０の動作が異なる。 The configuration of the video playback apparatus 100 in the present embodiment is the same as the configuration of FIG. 6 in the first embodiment described above, but the operations of the audio playback control unit 120 and the playback unit 110 are different.

本実施の形態では、前述の第１の実施の形態のように、発声区間内に自分よりも高い評価の人の発声区間があるか否かを判定するのではなく、ユーザ評価情報ファイル１０８に予め定められているユーザの評価値の大小に応じて、再生する音量を制御するものである。 In the present embodiment, as in the first embodiment described above, instead of determining whether or not there is a utterance section of a person with a higher evaluation than that in the utterance section, the user evaluation information file 108 is stored. The playback volume is controlled in accordance with a predetermined user evaluation value.

図９は、本発明の第２の実施の形態における動作のフローチャートである。同図では、前述の第１の実施の形態における図７と同様の動作については、図７と同一のステップ番号を付し、その説明を省略する。 FIG. 9 is a flowchart of the operation in the second embodiment of the present invention. In the figure, the same operations as those in FIG. 7 in the first embodiment are denoted by the same step numbers as those in FIG. 7, and the description thereof is omitted.

ステップ２０４）音声再生制御部１２０は、ステップ１０２で存在すると判定された音声に対して、ユーザ評価情報ファイル１０８を参照して、評価の大小に応じて再生時の音量を決定する。例えば、図１０の例では、ユーザＡ、ユーザＢ、ユーザＣの順に大きくする。図１０では、音量の大きさを四角形の縦の辺の長さの長短で模式的に示している。 Step 204) The sound reproduction control unit 120 refers to the user evaluation information file 108 for the sound determined to be present in Step 102, and determines the sound volume at the time of reproduction according to the magnitude of the evaluation. For example, in the example of FIG. 10, the user A, the user B, and the user C are increased in this order. In FIG. 10, the volume level is schematically shown by the length of the vertical side of the rectangle.

音量を決定する際は、例えば、次式で計算されるｃ値を原音声波形に乗算することにより決定される。ここで、εは、評価値であり、図５に示すように、０から１００の値をとり、値が大きいほど評価が高い。 When determining the volume, for example, it is determined by multiplying the original speech waveform by the c value calculated by the following equation. Here, ε is an evaluation value, and takes a value from 0 to 100 as shown in FIG. 5. The larger the value, the higher the evaluation.

ｃ＝ε／１００
これは、評価の高いユーザの音声は、評価の低いユーザの音声と比べて、聞く必要性が高いことを考慮して、評価の高いユーザの音声の音量を相対的に大きく再生することによって、音声の重なりによる音声聞き取りの妨げの影響を少なくするためである。例えば、図１０のユーザＡの発声ｆとユーザＢの発声ｇが重なっている部分において、再生部１１０において、ユーザＡの発声は、ユーザＢの発声よりも大きな音量で再生されるので、ユーザＡの発声の聞き取りがあまり妨げられない。 c = ε / 100
This is because, by considering that the voice of a high-rated user is higher in necessity than the voice of a low-rated user, the volume of the high-rated user's voice is reproduced relatively large. This is to reduce the influence of the hindrance to the voice listening due to the overlapping of voices. For example, in the portion where user A's utterance f and user B's utterance g overlap in FIG. 10, user A's utterance is reproduced at a louder volume than user B's utterance. Listening to is not hindered.

このように、本実施の形態では、複数の音声の中で評価の高いユーザの音声を大きな音で、評価の低いユーザの音声を小さな音で聞くことができる。 As described above, in the present embodiment, it is possible to hear a user's voice with a high evaluation among a plurality of voices with a loud sound and a user's voice with a low evaluation with a small sound.

［第３の実施の形態］
本実施の形態では、音声区間が重複しないユーザ群毎に、映像を複数回再生するものである。 [Third Embodiment]
In the present embodiment, the video is reproduced a plurality of times for each user group whose voice sections do not overlap.

図１１は、本発明の第３の実施の形態における動作のフローチャートである。同図において、前述の第１の実施の形態の図７のフローチャートにおける動作と同一動作については、同一符号を付し、その説明を省略する。 FIG. 11 is a flowchart of the operation in the third embodiment of the present invention. In the figure, the same operations as those in the flowchart of FIG. 7 of the first embodiment are denoted by the same reference numerals, and the description thereof is omitted.

ステップ５０２）音声再生制御部１２０は、非言語以外で重複のない発声パターンを持つユーザのグループを、音声区間ファイル１０３，１０５，１０７を参照して検索する。これは、音声区間が重複しない複数のユーザの音声を同時に再生しても、音声の重なりが生じないので、ユーザの音声の聞き取りを妨げることを防ぐことができるからである。例えば、図１２のｈに示すユーザＡとユーザＣからなるグループの場合は、非言語以外での重複がなく、同時に再生しても聞き取りの妨げが生じない（ユーザＡとユーザＣの重複している部分は非言語の部分であるため妨げが生じない）。 Step 502) The voice reproduction control unit 120 searches the voice section files 103, 105, and 107 for a group of users who have utterance patterns that are not non-language and have no duplication. This is because even if the voices of a plurality of users whose voice sections do not overlap are reproduced at the same time, the voices are not overlapped, so that it is possible to prevent the user's voice from being disturbed. For example, in the case of the group consisting of the user A and the user C shown in FIG. 12h, there is no duplication other than non-language, and even when played at the same time, there is no hindrance to listening (duplication between the user A and the user C). Part is a non-language part and will not be disturbed).

ステップ５０３）再生部１１０は、ステップ５０２で得られたユーザのグループのユーザ（ユーザＡ，ユーザＣ）の音声を再生する。 Step 503) The reproduction unit 110 reproduces the voice of the user (user A, user C) in the user group obtained in step 502.

ステップ５０４）音声再生制御部１２０は、ステップ５０２で得られたユーザのグループを除くユーザの非言語以外での重複のない発声パターンを持つユーザのグループをステップ５０２と同様に検索する。例えば、図１２の場合は、残りのユーザであるユーザＢが検索され、ユーザＢのみが重複のない発声パターンを持つユーザのグループとして検索される。これは、ステップ５０３において、音声が再生されなかった（グループｈにグループ化されなかった）ユーザの音声を聞くために行う。 Step 504) The voice reproduction control unit 120 searches for a group of users having an utterance pattern without duplication other than the user's non-language other than the group of users obtained in Step 502 in the same manner as in Step 502. For example, in the case of FIG. 12, the user B who is the remaining user is searched, and only the user B is searched as a group of users having an utterance pattern without duplication. This is done in step 503 to hear the voice of the user whose voice was not played (not grouped in group h).

ステップ５０５）再生部１１０は、ステップ５０４で得られたユーザのグループに含まれるユーザの中で最初の音声の時点に遡って映像と当該ユーザの音声を再生する。 Step 505) The reproduction unit 110 reproduces the video and the audio of the user by going back to the time of the first audio among the users included in the user group obtained in Step 504.

ステップ５０６）ステップ５０６において全てのユーザの音声を再生したかを判定する。全てのユーザの音声を再生した場合は、ステップ１０６に移行し、全てのユーザの音声を再生していない場合は、ステップ５０４に移行する。 Step 506) It is determined whether or not the voices of all users have been reproduced in Step 506. If all the user's voices have been reproduced, the process proceeds to step 106, and if not all the user's voices have been reproduced, the process proceeds to step 504.

これは、図１２のように全ユーザの音声を発声区間の重複なく再生するために、映像を複数回時間的に遡って再生して、それに同期してユーザの音声を再生するためである。 This is because, as shown in FIG. 12, in order to reproduce the voices of all the users without overlapping the utterance sections, the video is reproduced a plurality of times in time, and the user's voice is reproduced in synchronization therewith.

このように、本実施の形態では、全てのユーザの全ての音声を音声の重複なしで漏れなく聞くことができる。 Thus, in this Embodiment, all the voices of all the users can be heard without omission without duplication of voices.

［第４の実施の形態］
本実施の形態では、評価が相対的に低いユーザの音声については、映像との同期をずらして、発声区間が重複しないように再生するものである。 [Fourth Embodiment]
In the present embodiment, the voice of a user with a relatively low evaluation is reproduced so that the utterance sections do not overlap by shifting the synchronization with the video.

本実施の形態における映像再生装置１００の構成は、前述の第１の実施の形態における図６の構成と同様であるが、再生部１１０と音声再生制御部１２０の動作が異なる。 The configuration of the video playback device 100 in the present embodiment is the same as the configuration of FIG. 6 in the first embodiment described above, but the operations of the playback unit 110 and the audio playback control unit 120 are different.

図１３は、本発明の第４の実施の形態における動作のフローチャートである。同図において、前述の図９、図１１と同一の動作については同一のステップ番号を付与し、その説明を省略する。 FIG. 13 is a flowchart of the operation in the fourth embodiment of the present invention. In the figure, the same step numbers are assigned to the same operations as those in FIGS. 9 and 11, and the description thereof is omitted.

ステップ７０２）まだ、音声再生制御部１２０は、音声を再生していないユーザのうちで、評価が最大のユーザをユーザ評価情報ファイル１０８を参照して同定し、再生部１１０において当該ユーザの最初の音声を再生する。 Step 702) The audio reproduction control unit 120 identifies the user who has the highest evaluation among the users who have not yet reproduced the audio with reference to the user evaluation information file 108, and the reproduction unit 110 determines the first user of the user. Play audio.

なお、音声が再生されたか否かを判定する処理については、音声区間ファイルの音声区間毎にフラグを付与しておき、当該処理の前にフラグがオンかオフかを判定することで実現できる。また、これ以外の方法でも再生済みか否かを判定できる方法であればよい。 The process for determining whether or not the sound has been reproduced can be realized by assigning a flag for each sound section of the sound section file and determining whether the flag is on or off before the process. Further, any method other than this may be used as long as it can be determined whether or not the reproduction has been completed.

ステップ７０３）音声再生制御部１２０は、ステップ７０２で音声を再生したユーザの次の発声までの時間間隔を音声区間ファイル１０３，１０５，１０７を参照して計算し、その時間間隔に収まる他のユーザ発声を検索し、再生部１１０において再生する。ここで、当該時間に収まる発声が複数存在した場合は、発声の開始時刻に対する再生が終わった時刻に最も近い発声を採用する。例えば、図１４のように、ユーザＡの最初の発声ｊの間隔Ｔに収まるユーザＢの発声ｎを再生する。詳細な処理については、図１５を用いて後述する。 Step 703) The voice reproduction control unit 120 calculates a time interval until the next utterance of the user who reproduced the voice in Step 702 with reference to the voice section files 103, 105, and 107, and other users who fall within the time interval The utterance is retrieved and reproduced by the reproduction unit 110. Here, when there are a plurality of utterances that fit within the time, the utterance closest to the time when the reproduction with respect to the start time of the utterance ends is adopted. For example, as shown in FIG. 14, the utterance n of user B that falls within the interval T of the first utterance j of user A is reproduced. Detailed processing will be described later with reference to FIG.

これは、評価の高いユーザの音声は、評価の低いユーザの音声と比べて、映像と同期して再生する必要性が高いことを考慮して、評価の高いユーザの音声は、映像と同期して再生し、相対的に低い評価のユーザの音声は映像と同期をずらして再生することによって、音声の重なりによる音声聞き取りの妨げの映像を小さくして、かつ、全体の総再生時間を短縮するためである。 This is because the voice of a user with a high evaluation is synchronized with the video, considering that the voice of a user with a high evaluation is more likely to be played back in synchronization with the video than the voice of a user with a low evaluation. By playing back relatively low-rated user's audio out of sync with the video, the video that hinders voice listening due to the overlap of audio is reduced and the total playback time is shortened. Because.

ステップ７０４）再生部１１０は、ステップ７０２で再生した次のユーザの音声を再生する。 Step 704) The playback unit 110 plays back the voice of the next user played back in Step 702.

ステップ７０５）ステップ７０２で再生したユーザの音声を全て再生したかどうかを判定し、全て再生した場合はステップ５０６に移行し、全て再生していない場合はステップ７０３に移行する。 Step 705) It is determined whether or not all of the user's voice reproduced in Step 702 has been reproduced, and if all have been reproduced, the process proceeds to Step 506, and if not all, the process proceeds to Step 703.

次に、上記のステップ７０３の処理について詳しく説明する。 Next, the processing in step 703 will be described in detail.

図１５は、本発明の第４の実施例の評価値が低いユーザの音声を再生する動作のフローチャートである。以下の説明では、図１４の例を用いて説明する。 FIG. 15 is a flowchart of the operation of reproducing the voice of a user with a low evaluation value according to the fourth embodiment of the present invention. The following description will be made using the example of FIG.

ステップ７０３１）ユーザＡの次の発声ｋの開始時刻から今の発声ｊの終了時刻を引き、空き時間Ｔとする。 Step 7031) The end time of the current utterance j is subtracted from the start time of the next utterance k of the user A to obtain a free time T.

ステップ７０３２）空き時間Ｔに収まる他のユーザの発声があるかを判定し、ない場合には、ステップ７０４の処理に移行し、ある場合には、ステップ７０３３に移行する。 Step 7032) It is determined whether or not there is another user's utterance within the vacant time T. If there is not, the process proceeds to Step 704. If there is, the process proceeds to Step 7033.

ステップ７０３３）今発声したユーザＡの発声ｊの終了時刻に最も近い開始時刻を有する他のユーザ（ユーザＢ）の発声ｎをピックアップし、再生し、ステップ７０３１に移行する。 Step 7033) The utterance n of another user (user B) having the start time closest to the end time of the utterance j of user A who has just uttered is picked up and reproduced, and the process proceeds to step 7031.

本実施の形態によれば、全てのユーザの全ての音声を、音声の重複なく漏れなく聞くことができ、かつ、第３の実施の形態に比べて短時間で全ての音声を聞くことができる。 According to the present embodiment, it is possible to hear all voices of all users without omissions and to hear all voices in a shorter time compared to the third embodiment. .

［第５の実施の形態］
本実施の形態は、前述の第１の実施の形態における、発声区間内に自分よりも高い評価の人の発声区間があるかどうかの判定（ステップ１０４）において、一定時間ｔの重複を許容するものである。 [Fifth Embodiment]
In the present embodiment, in the determination of whether or not there is an utterance section of a person with a higher evaluation than that in the utterance section in the first embodiment described above (step 104), an overlap of a certain time t is allowed. Is.

本実施の形態における映像再生装置１００の構成は、前述の第１の実施の形態における図６の構成と同様であるが、音声再生制御部１２０の動作が異なる。 The configuration of the video playback device 100 in the present embodiment is the same as the configuration of FIG. 6 in the first embodiment described above, but the operation of the audio playback control unit 120 is different.

図１６は、本発明の第５の実施の形態における動作のフローチャートである。同図において、前述の第１の実施の形態における図９の動作と同一の動作については、同一のステップ番号を付し、その説明を省略する。 FIG. 16 is a flowchart of the operation in the fifth embodiment of the present invention. In the figure, the same operation as the operation of FIG. 9 in the first embodiment is given the same step number, and the description thereof is omitted.

ステップ９０４）音声再生制御部１２０は、マージンＴ秒を除く発声区間内に自分よりも高い評価の人の発声区間があるかをユーザ評価情報ファイル１０８を参照して判定する。当該発声区間がある場合にはステップ１０２に移行する。当該発声区間がない場合は、ステップ１０５に移行する。Ｔの値は、例えば、既定値として１秒とする。ここで、図１７の場合、ユーザＡの発声ｏの発声区間とユーザＢの発声ｐの発声区間の重複時間ｑはマージンＴよりも短いので、これらの発声は同時に再生する。 Step 904) The voice reproduction control unit 120 determines whether there is a utterance section of a person with a higher evaluation than itself in the utterance section excluding the margin T seconds with reference to the user evaluation information file 108. If there is the utterance section, the process proceeds to step 102. If there is no utterance section, the process proceeds to step 105. The value of T is, for example, 1 second as a default value. Here, in the case of FIG. 17, since the overlap time q of the utterance section of the utterance o of the user A and the utterance section of the utterance p of the user B is shorter than the margin T, these utterances are reproduced simultaneously.

これは、ある発声の末尾と、それとは別の発声の先頭の重なりが短時間であれば、聞き取りの妨げにならないという人間の知覚特性を考慮したものである。 This takes into account the human perceptual characteristic that if the overlap between the end of a certain utterance and the beginning of another utterance is short, it will not hinder listening.

また、本実施の形態では、音声の再生を時間的に密にすることができるので、より多くの音声を再生することができる。 In the present embodiment, since the sound can be played back densely in time, more sounds can be played back.

このように、本実施の形態によれば、複数の音声の中で評価の高いユーザの音声を漏れなく聞くことができ、第１の実施の形態に比べて多くの音声を聞くことができる。 As described above, according to the present embodiment, it is possible to hear the voice of a user who is highly evaluated among a plurality of voices without omission, and it is possible to hear more voices compared to the first embodiment.

［第６の実施の形態］
本実施の形態は、ユーザの発声区間が重複する場合に、音声を早送りして再生時間を短縮することによって、音声の重複による聞き取りの妨げをなくすものである。 [Sixth Embodiment]
In the present embodiment, when the user's utterance section overlaps, the voice is fast-forwarded to shorten the reproduction time, thereby eliminating the hindrance to hearing due to the overlap of the voice.

本実施の形態における映像再生装置１００の構成は、前述の第１の実施の形態における図６の構成の構成に含まれるユーザ評価情報ファイル１０８を使用しない以外は同様であるが、音声再生制御部１２０の動作が異なる。 The configuration of the video playback apparatus 100 in the present embodiment is the same except that the user evaluation information file 108 included in the configuration of the configuration of FIG. 6 in the first embodiment is not used, but the audio playback control unit 120 operations are different.

図１８は、本発明の第６の実施の形態における動作のフローチャートである。同図において、前述の第１の実施の形態における図７のフローチャートと同一の動作については、同一のステップ番号を付し、その説明を省略する。 FIG. 18 is a flowchart of the operation in the sixth embodiment of the present invention. In the figure, the same operations as those in the flowchart of FIG. 7 in the first embodiment are denoted by the same step numbers, and the description thereof is omitted.

ステップ１１０４）再生制御部１２０は、ステップ１０２で音声が存在すると判定された音声の開始時刻と、全ユーザの発声の中で時間的に次の開始時刻との時間差を音声区間ファイル１０３，１０５，１０７を参照して計算する。求められた時間差がステップ１０２で認められた発声の区間長よりも短いかどうかを判定する。短い場合はステップ１１０５に移行し、短くない場合はステップ１０５に移行する。 Step 1104) The reproduction control unit 120 determines the time difference between the start time of the sound determined to be present in Step 102 and the next start time in the utterances of all users as the sound section files 103, 105, Calculate with reference to 107. It is determined whether or not the obtained time difference is shorter than the utterance interval length recognized in step 102. If it is shorter, the process proceeds to step 1105. If it is not shorter, the process proceeds to step 105.

ステップ１１０５）再生部１１０は、ステップ１０２で存在すると判定された発声をステップ１１０４で計算した時間差に図１９のように縮めて再生する。例えば、図１０の例において、ユーザＡの発声ｒの開始時刻との時間差ｕは、ユーザＡの発声ｒの発声の区間長ｖよりも短いので、ユーザＡの発声１２０１を当該時間差ｕに短縮して再生する。 Step 1105) The playback unit 110 plays back the utterance determined to be present in step 102, with the time difference calculated in step 1104 reduced as shown in FIG. For example, in the example of FIG. 10, since the time difference u from the start time of the utterance r of the user A is shorter than the section length v of the utterance of the utterance r of the user A, the utterance 1201 of the user A is shortened to the time difference u. To play.

時間を縮めて再生する方法としては、例えば、ソフトウェアサンプラが利用できる。 For example, a software sampler can be used as a method for reducing the playback time.

これは、図１９のように発声区間が他のユーザの発声と重複する場合に、発声の時間を短縮して再生することにより発声の重複による聞き取りの妨げを防ぎ、映像を時間的に遡って再生することなく、全ユーザの音声を聞くものである。 As shown in FIG. 19, when the utterance section overlaps with another user's utterance, the utterance time is shortened and reproduced to prevent the hearing from being disturbed, and the video is traced back in time. Listen to the voices of all users without playing them.

このように、本実施の形態によれば映像を遡って再生することなく、全てのユーザの全ての音声を、音声の重複なく漏れなく聞くことができる。 As described above, according to the present embodiment, it is possible to listen to all the sounds of all users without any overlap without reproducing the video retroactively.

上記の第１〜第６の実施の形態における、図７、図９、図１１、図１３、図１５、図１６、図１８の動作をプログラムとして構築し、映像再生装置として利用されるコンピュータにインストールして実行する、または、ネットワークを介して流通させることが可能である。 In the above-described first to sixth embodiments, the operations shown in FIGS. 7, 9, 11, 13, 15, 15, and 18 are constructed as a program, and the computer used as a video reproduction apparatus is used. It can be installed and executed, or distributed via a network.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

本発明は、映像視聴者の音声を映像と共に再生する技術に適用可能である。 The present invention is applicable to a technique for reproducing the audio of a video viewer together with video.

本発明の原理説明図である。It is a principle explanatory view of the present invention. 本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の一実施の形態におけるシステム構成図である。1 is a system configuration diagram according to an embodiment of the present invention. 本発明の一実施の形態における音声区間ファイルの例である。It is an example of the audio | voice area file in one embodiment of this invention. 本発明の一実施の形態におけるユーザ評価情報ファイルの例である。It is an example of the user evaluation information file in one embodiment of the present invention. 本発明の第１の実施の形態における映像再生装置の構成図である。1 is a configuration diagram of a video reproduction device according to a first embodiment of the present invention. 本発明の第１の実施の形態における動作のフローチャートである。It is a flowchart of the operation | movement in the 1st Embodiment of this invention. 本発明の第１の実施の形態におけるユーザの発声と発声に対する評価を示す図である。It is a figure which shows the user's utterance and evaluation with respect to utterance in the 1st Embodiment of this invention. 本発明の第２の実施の形態における動作のフローチャートである。It is a flowchart of the operation | movement in the 2nd Embodiment of this invention. 本発明の第２の実施の形態におけるユーザの発声と評価値による再生音量の関係を示す図である。It is a figure which shows the relationship between the user's utterance and the reproduction | regeneration sound volume by evaluation value in the 2nd Embodiment of this invention. 本発明の第３の実施の形態における動作のフローチャートである。It is a flowchart of the operation | movement in the 3rd Embodiment of this invention. 本発明の第３の実施の形態におけるユーザの発声と再生順序の関係を示す図である。It is a figure which shows the relationship between a user's utterance and reproduction | regeneration order in the 3rd Embodiment of this invention. 本発明の第４の実施の形態における動作のフローチャートである。It is a flowchart of the operation | movement in the 4th Embodiment of this invention. 本発明の第４の実施の形態における発声と再生の関係を示す図である。It is a figure which shows the relationship between utterance and reproduction | regeneration in the 4th Embodiment of this invention. 本発明の第４の実施の形態における評価値が低いユーザの音声を再生する動作のフローチャートである。It is a flowchart of the operation | movement which reproduces | regenerates the user's audio | voice with a low evaluation value in the 4th Embodiment of this invention. 本発明の第５の実施の形態における動作のフローチャートである。It is a flowchart of the operation | movement in the 5th Embodiment of this invention. 本発明の第５の実施の形態における発声と再生の関係を示す図である。It is a figure which shows the relationship between utterance and reproduction | regeneration in the 5th Embodiment of this invention. 本発明の第６の実施の形態における動作のフローチャートである。It is a flowchart of the operation | movement in the 6th Embodiment of this invention. 本発明の第６の実施の形態における発声と再生の関係を示す図である。It is a figure which shows the relationship between utterance and reproduction | regeneration in the 6th Embodiment of this invention.

Explanation of symbols

１０視聴者音声ファイル
２０音声区間ファイル
１００計算機、映像再生装置
１０１映像ファイル
１０２視聴者音声ファイルＡ
１０３音声区間情報ファイルＡ
１０４視聴者音声ファイルＢ
１０５音声区間情報ファイルＢ
１０６視聴者音声ファイルＣ
１０７音声区間情報ファイルＣ
１０８ユーザ評価情報ファイル
１１０再生手段、再生部
１２０再生制御手段、再生制御部
２００モニタ 10 viewer audio file 20 audio segment file 100 computer, video playback device 101 video file 102 viewer audio file A
103 Voice section information file A
104 Viewer audio file B
105 Voice section information file B
106 Viewer audio file C
107 Voice section information file C
108 user evaluation information file 110 playback means, playback section 120 playback control means, playback control section 200 monitor

Claims

In a video playback method for playing back audio groups of a plurality of users viewing the same video together with the video,
A playback start step for reading video and starting playback;
Based on whether or not the start time and the end time are set with reference to an audio section information file that is provided for each user and stores an audio section consisting of a start time and an end time of utterance for the video. An audio presence determination step for determining whether audio is stored in a viewer audio file provided for each user;
A non-language determination step of determining whether or not the sound is non-language when sound is stored in the viewer sound file, and reproducing the sound if non-language;
In the non-language determination step, when it is determined that the language is not non-language, an evaluation value predetermined for each user is obtained for the user's voice corresponding to the voice section stored in the voice section information file. Referring to the stored user evaluation information file based on the user ID, and reproducing the user's voice having the highest evaluation value;
Repeating the audio presence determination step, the non-language determination step, and the audio reproduction step until the video of the video file is completed for the next audio section that is audio-reproduced in the audio reproduction step;
A video playback method characterized in that

In a video playback method for playing back audio groups of a plurality of users viewing the same video together with the video,
A playback start step for reading video and starting playback;
Based on whether or not the start time and the end time are set with reference to an audio section information file that is provided for each user and stores an audio section consisting of a start time and an end time of utterance for the video. An audio presence determination step for determining whether audio is stored in a viewer audio file provided for each user;
A non-language determination step of determining whether or not the sound is non-language when sound is stored in the viewer sound file, and reproducing the sound if non-language;
In the non-language determination step, when it is determined that the language is not non-language, an evaluation value predetermined for each user is obtained for the user's voice corresponding to the voice section stored in the voice section information file. Referring to the stored user evaluation information file based on the user ID, determining a volume to be played based on the evaluation value, and playing back the sound;
Repeating the audio presence determination step, the non-language determination step, and the audio reproduction step until the video of the video file is completed for the next audio section that is audio-reproduced in the audio reproduction step;
A video playback method characterized in that

In a video playback method for playing back audio groups of a plurality of users viewing the same video together with the video,
A playback start step for reading video and starting playback;
Based on whether or not the start time and the end time are set with reference to an audio section information file that is provided for each user and stores an audio section consisting of a start time and an end time of utterance for the video. An audio presence determination step for determining whether audio is stored in a viewer audio file provided for each user;
When audio is stored in the viewer audio file, it is determined whether or not the audio is non-language, and a search is made for all the audio section information files for a group of users whose non-speech sounds other than non-language. A first user group search step acquired by:
Based on the group of users acquired in the first user group search step, a first audio reproduction step of reproducing audio by reading out audio of all users belonging to the group from the viewer audio file;
A second user group search step of searching for a group of users whose utterances are not duplicated among the users not selected in the first user group search step by searching the voice section information file of the user;
A second audio reproduction step of reading out audio of all users in the group searched in the second user group search step from the viewer audio file and reproducing the audio retroactively together with the video;
Repeating the second user group search step and the second voice playback step until the playback of the voices of all users is completed;
A video playback method characterized in that

In a video playback method for playing back audio groups of a plurality of users viewing the same video together with the video,
A playback start step for reading video and starting playback;
Among the viewer audio files provided for each user, with respect to unreproduced user audio, the evaluation value is maximized by referring to a user evaluation information file in which an evaluation value predetermined for each user is stored. A first audio reproduction step of reading out and reproducing the user's audio from the user's viewer audio file;
All audio segments that store the audio segment consisting of the start time and the end time of the utterance for the video provided for each user for the other user's speech that fits in the time interval until the next utterance of the user reproduced last time An utterance candidate search step acquired by searching an information file;
A second audio reproduction step of reading out and reproducing the user's voice having the next highest evaluation value from the user's voice reproduced in the utterance candidate search step from the user's viewer voice file;
Repeating the utterance candidate search step and the second audio reproduction step until all the user's audio is reproduced; and
A video playback method characterized in that

In the audio reproduction step,
When reading and reproducing the user's voice having the highest evaluation value in the voice section from the viewer voice file,
2. The video playback method according to claim 1, wherein duplication of playback of an audio section having a predetermined time margin T is allowed.

In a video playback method for playing back audio groups of a plurality of users viewing the same video together with the video,
A playback start step for reading video and starting playback;
Based on whether or not the start time and the end time are set with reference to an audio section information file that is provided for each user and stores an audio section consisting of a start time and an end time of utterance for the video. An audio presence determination step for determining whether audio is stored in a viewer audio file provided for each user;
A non-language determination step of determining whether or not the sound is non-language when sound is stored in the viewer sound file, and reproducing the sound if non-language;
In the non-language determination step, for the first sound determined not to be non-language, the start time of the first sound and the second sound of another user uttered next to the first sound If the difference from the start time is shorter than the section length of the first sound, a sound playback step for shortening and playing back the first sound;
Repeating the voice presence determination step, the non-language determination step, and the voice reproduction step for all the voice segments stored in the voice segment information file;
A video playback method characterized in that

A video playback device that plays back a group of users who are watching the same video together with the video,
A video file containing the video,
A plurality of audio section information files that are provided for each user and store an audio section including a start time and an end time of utterance for the video;
A plurality of viewer audio files storing the user's audio provided for each user;
A user evaluation information file storing evaluation values predetermined for each user;
Playback means for playing back video and audio;
Playback control means for controlling the playback of audio,
The reproducing means includes
Video playback start means for reading video from the video file and starting playback;
Based on an instruction from the reproduction control means, and a means for reproducing audio in synchronization with the video,
The reproduction control means includes
Sound presence determination means for referring to the sound section information file and determining whether sound is stored in the viewer sound file based on whether the start time and the end time are set;
Non-language determination means for determining whether or not the sound is non-language when sound is stored in the viewer sound file;
If the non-language determining means determines that the language is not non-language, the user evaluation information file is determined based on the user ID for the user's voice corresponding to the voice section stored in the voice section information file. A sound reproduction control unit that refers to and reproduces the voice of the user having the highest evaluation value on the reproduction unit and causes the reproduction unit to reproduce the non-language voice when the non-language determination unit determines that the voice is non-language. When,
Repeating means for repeating the sound presence determination means, the non-language determination means, and the sound reproduction control means until the end of the video of the video file, for the next voice section played back by the playback means,
A video playback apparatus comprising:

A video playback for playing back a group of voices of a plurality of users who are viewing the same video together with the video,
A video file containing the video,
A plurality of audio section information files that are provided for each user and store an audio section including a start time and an end time of utterance for the video;
A plurality of viewer audio files storing the user's audio provided for each user;
A user evaluation information file storing evaluation values predetermined for each user;
Playback means for playing back video and audio;
Playback control means for controlling the playback of audio,
The reproducing means includes
Video playback start means for reading video from the video file and starting playback;
Based on an instruction from the reproduction control means, and a means for reproducing audio in synchronization with the video,
The reproduction control means includes
Sound presence determination means for referring to the sound section information file and determining whether sound is stored in the viewer sound file based on whether the start time and the end time are set;
Non-language determination means for determining whether or not the sound is non-language when sound is stored in the viewer sound file;
When the non-language determining means determines that the language is not non-language, the user evaluation information file is stored in the user evaluation information file for the user's voice corresponding to the voice section stored in the voice section information file. Is determined based on the evaluation value, the sound volume to be played back by the playback means is determined based on the evaluation value, the sound for each user is played back by the playback means at the volume, and the non-language determination means determines that the sound is non-language. Includes a sound reproduction control means for causing the reproduction means to reproduce non-language sound;
Repeating means for repeating the sound presence determination means, the non-language determination means, the volume control means, and the sound reproduction control means until the video of the video file ends for the next voice section that has been played back,
A video playback apparatus comprising:

A video playback device that plays back a group of users who are watching the same video together with the video,
A video file containing the video,
A plurality of audio section information files that are provided for each user and store an audio section including a start time and an end time of utterance for the video;
A plurality of viewer audio files storing the user's audio provided for each user;
Playback means for playing back video and audio;
Playback control means for controlling the playback of audio,
The reproducing means includes
Video playback start means for reading video from the video file and starting playback;
Based on an instruction from the reproduction control means, and a means for reproducing audio in synchronization with the video,
The reproduction control means includes
Sound presence determination means for referring to the sound section information file and determining whether sound is stored in the viewer sound file based on whether the start time and the end time are set;
Non-language determination means for determining whether or not the sound is non-language when sound is stored in the viewer sound file;
First user group search means for acquiring a group of users whose utterances other than non-languages are not duplicated by searching all the speech section information files;
Based on the group of users acquired by the first user group search means, the first voice for reading the voices of all users belonging to the group from the viewer voice file and causing the playback means to play back the voices Playback control means;
A second user group search means for searching for a group of users whose utterances are not duplicated among the users not selected in the first user group search means by searching the voice section information file;
A second audio reproduction control unit that reads out audio of all users belonging to the group searched by the second user group search unit from the viewer audio file and causes the reproduction unit to reproduce the audio retroactively;
Repeating means for repeating the second user group search means and the second sound reproduction control means until the reproduction of all the user's sounds is completed;
A video playback apparatus comprising:

A video playback device that plays back a group of users who are watching the same video together with the video,
A video file containing the video,
A plurality of audio section information files that are provided for each user and store an audio section including a start time and an end time of utterance for the video;
A plurality of viewer audio files storing the user's audio provided for each user;
A user evaluation information file storing evaluation values predetermined for each user;
Playback means for playing back video and audio;
Audio reproduction control means for controlling audio reproduction,
The reproducing means includes
Video playback start means for reading video from the video file and starting playback;
Based on an instruction from the reproduction control means, and a means for reproducing audio in synchronization with the video,
The sound reproduction control means includes
Among the viewer audio files, the user's voice that has not been played is referred to the user evaluation information file, and the user's voice with the highest evaluation value is read from the viewer voice file and played back by the playback means. 1 audio reproduction control means;
Utterance candidate search means for acquiring other user's voice that fits in the time interval until the next utterance of the user reproduced last time by searching all the voice section information files;
Of the user's voice acquired by the utterance candidate search means, referring to the user evaluation information file, the voice of the user having the second highest evaluation value after the previous played user is read out from the user's viewer voice file and played. Second sound reproduction control means for reproducing the means;
Repeating means for repeating the utterance candidate search means and the second sound reproduction control means until all user sounds are reproduced;
A video playback apparatus comprising:

The sound reproduction control means includes
When the user's voice having the highest evaluation value in the voice section is read from the viewer voice file and played back by the playback means,
8. The video reproduction apparatus according to claim 7, further comprising means for allowing reproduction of audio sections having a predetermined time margin T.

A video playback device that plays back a group of users who are watching the same video together with the video,
A video file containing the video,
A plurality of audio section information files that are provided for each user and store an audio section including a start time and an end time of utterance for the video;
A plurality of viewer audio files storing the user's audio provided for each user;
Playback means for playing back video and audio;
Audio reproduction control means for controlling audio reproduction,
The reproducing means includes
Video playback start means for reading video from the video file and starting playback;
Based on an instruction from the reproduction control means, and a means for reproducing audio in synchronization with the video,
The sound reproduction control means includes
Sound presence determination means for referring to the sound section information file and determining whether sound is stored in the viewer sound file based on whether the start time and the end time are set;
Non-language determination means for determining whether or not the sound is non-language when sound is stored in the viewer sound file;
The non-language determination means causes the reproduction means to reproduce the sound determined to be non-language, and for the first sound determined not to be non-language, the first sound start time and the first sound If the difference from the start time of the second voice of another user uttered next to the voice of the first voice is shorter than the section length of the first voice, the first voice is shortened and the playback means Voice playback control means for playing back,
Repeating means for repeating the voice presence determination means, the non-language determination means, and the voice reproduction control means for all the voice sections stored in the voice section information file;
A video playback apparatus comprising:

A video playback program that plays back a group of voices of a plurality of users who are viewing the same video together with the video,
7. A video playback program for causing a computer to execute processing for realizing the video playback method according to claim 1.