WO2024053094A1 - Media information emphasis playback device, media information emphasis playback method, and media information emphasis playback program - Google Patents

Media information emphasis playback device, media information emphasis playback method, and media information emphasis playback program Download PDF

Info

Publication number
WO2024053094A1
WO2024053094A1 PCT/JP2022/033902 JP2022033902W WO2024053094A1 WO 2024053094 A1 WO2024053094 A1 WO 2024053094A1 JP 2022033902 W JP2022033902 W JP 2022033902W WO 2024053094 A1 WO2024053094 A1 WO 2024053094A1
Authority
WO
WIPO (PCT)
Prior art keywords
media information
playback
emphasis
user
unit
Prior art date
Application number
PCT/JP2022/033902
Other languages
French (fr)
Japanese (ja)
Inventor
麻衣子 井元
真二 深津
淳一 中嶋
馨亮 長谷川
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2022/033902 priority Critical patent/WO2024053094A1/en
Publication of WO2024053094A1 publication Critical patent/WO2024053094A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk

Definitions

  • the present invention relates to a media information emphasizing reproduction device, a media information emphasizing reproduction method, and a media information emphasizing reproduction program.
  • Non-Patent Document 1 So far, in two-way communication between users in remote locations (a small number of people), a system has been proposed in which users in remote locations view live distribution while sharing the excitement with each other [Non-Patent Document 1]. Although it may be possible to increase the feeling of participation and excitement among users in remote locations, it does not provide a sense of unity with the event venue.
  • the present invention has been made in view of the above-mentioned circumstances, and an object of the present invention is to provide a media information emphasizing playback device, a media information emphasizing playback method, and a media information emphasizing playback method that allow users in remote locations to feel a high sense of participation in an event venue.
  • the goal is to provide a playback program.
  • the media information emphasis playback device includes a media information reception section, a user state acquisition section, an emotion estimation section, and a media information emphasis playback section.
  • the media information receiving unit receives media information including video and audio.
  • the user status acquisition unit acquires status information indicating the viewing status of the user.
  • the emotion estimator estimates the user's emotion during viewing based on the status information input from the user status acquisition unit.
  • the media information emphasis reproduction section emphatically reproduces the media information input from the media information reception section based on the estimation result input from the emotion estimation section.
  • the media information emphasis playback method includes the steps of receiving media information including video and audio, acquiring status information indicating the user's viewing status, and estimating the user's emotion while viewing based on the status information. and a step of emphasizing and reproducing media information based on the estimation result of the user's emotion during viewing.
  • the media information emphasis playback program causes a computer having a processor and a storage device to execute the functions of the media information reception section, user state acquisition section, emotion estimation section, and media information emphasis playback section of the above-mentioned media information emphasis playback device.
  • a media information emphasizing playback device a media information emphasizing playback method, and a media information emphasizing playback program that allow users in remote locations to feel a high sense of participation in an event venue.
  • FIG. 1 is a block diagram of a media information transmitting and receiving system including a media information emphasizing playback device according to an embodiment.
  • FIG. 2 is a block diagram showing the hardware configuration of the media information emphasizing playback device according to the embodiment.
  • FIG. 3 is a flowchart showing the flow of processing executed by the media information emphasizing playback device according to the embodiment.
  • FIG. 1 is a block diagram of a media information transmitting and receiving system including a media information emphasizing playback device according to an embodiment.
  • FIG. 1 only one of the N bases Rn is illustrated.
  • the configuration of each base Rn is similar.
  • Base O is an event venue where an event will be held.
  • Media information including video and audio of the event is distributed from base O (event venue) via the IP network 70.
  • the base Rn is a remote location that receives and views media information distributed from the base O (event venue) via the IP network 70.
  • the remote location is the home of the user viewing the media information.
  • the base O (event venue) is provided with a server 10, a video shooting device 21, an event audio recording device 22, and an audience audio recording device 23.
  • the server 10 includes a media information generation section 11 and a media information transmission section 12.
  • the event held at the event venue may be, for example, a music concert, a play, a sports competition, etc.
  • the video photographing device 21 includes a camera and its related equipment, and photographs the event.
  • the video shooting device 21 outputs the shot video of the event to the media information generation unit 11 of the server 10.
  • the event audio recording device 22 includes a microphone and its related equipment, and records audio etc. generated by implementing the event.
  • sounds generated by implementing an event will be simply referred to as event sounds.
  • the event sounds include voices uttered by performers, sounds made by the performers, sound effects, and the like.
  • the event sounds include voices uttered by competitors, sounds produced by the competitors, sounds made for the progress of the competition, and the like.
  • the event audio recording device 22 outputs the recorded event audio to the media information generation unit 11 of the server 10.
  • the audience audio recording device 23 includes a microphone and its related equipment, and records audio etc. generated by the audience at the event.
  • the sounds generated by the audience at the event will be simply referred to as audience sounds.
  • the audience sounds include cheers emitted by the audience, sounds made by the audience making noises, and the like.
  • the audience audio recording device 23 outputs the recorded audience audio to the media information generation unit 11 of the server 10.
  • the media information generation unit 11 generates a video based on the event video input from the video shooting device 21, the event audio input from the event audio recording device 22, and the audience audio input from the audience audio recording device 23. Generate media information including audio (event audio and audience audio). Media information is information distributed to base Rn via IP network 70. The media information generation section 11 outputs the generated media information to the media information transmission section 12.
  • the media information generation unit 11 may separate event audio, audience audio, or both using a known audio analysis technique.
  • the media information generation unit 11 may separate event audio into voices and background sounds.
  • An example of such speech analysis technology is "Masashi Nishiyama, Makoto Hirohata, Toshiyuki Ono. Sound source separation for volume balance adjustment between voice and background sound. Information Processing Society of Japan Research Report. Vol. 2013-CVIM-187 No. 46 .'' is disclosed.
  • the media information generation unit 11 may separate the event audio for each sound source.
  • An example of such speech analysis technology is disclosed in "Mizuki Kobayashi, Hiroshi Tezuka, Mari Inaba. Proposal of instrument sound separation method using musical scores. Entertainment Computing Symposium (EC2015). September 2015.” There is.
  • the media information generation unit 11 separates event audio and audience audio
  • the event audio recording device 22 separates the event audio
  • the audience audio recording device 23 separates the audience audio. You may.
  • the audio may be separated at the site O instead.
  • the media information transmitter 12 transmits the media information input from the media information generator 11 to the IP network 70.
  • base O event venue
  • the event audio recording device 22 for recording event audio
  • the audience audio recording device 23 for recording audience audio.
  • one audio recording device may be provided to record a mixture of event audio and audience audio.
  • Base Rn (remote location)
  • base Rn remote location
  • base O event venue
  • IP network 70 IP network
  • the base Rn (remote location) is provided with a media information emphasis playback device 30, a camera 41, a microphone 42, a biological information measurement device 43, and a playback information output device 44.
  • the reproduction information output device 44 has a display and a speaker, and outputs video and audio based on the reproduction information input from the media information emphasis reproduction device 30. By viewing the video and audio output from the playback information output device 44, the user views the event being held at the base O (event venue). In the following, it is assumed that the user is viewing an event through the reproduction information output device 44.
  • the media information emphasis playback device 30 is a user terminal, receives media information distributed from the base O (event venue), and outputs playback information to the playback information output device 44.
  • the media information emphasis playback device 30 includes a user state acquisition section 31 , an emotion estimation section 32 , a media information emphasis playback section 33 , and a media information reception section 34 .
  • the media information receiving unit 34 receives media information transmitted from the server 10 at the base O (event venue) via the IP network 70 and outputs it to the media information emphasis reproduction unit 33.
  • the camera 41 is installed by the user himself to photograph the user.
  • the camera 41 photographs the user and outputs the video information to the user status acquisition section 31.
  • the microphone 42 is installed by the user himself so as to pick up the user's voice.
  • the microphone 42 picks up the user's voice and background sound, and outputs the voice information to the user status acquisition unit 31.
  • the biological information measuring device 43 measures the user's biological information. Biological information includes brain function and heart rate. Therefore, the electrodes and sensors included in the biological information measuring device 43 are attached to the user by the user himself/herself. The biological information measuring device 43 outputs the biological information measured by the user to the user state acquisition unit 31.
  • the user status acquisition unit 31 acquires video information input from the camera 41, audio information input from the microphone 42, and biological information input from the biological information measuring device 43 as status information indicating the user's status. .
  • the user state acquisition unit 31 outputs the acquired state information to the emotion estimation unit 32.
  • the camera 41, the microphone 42, and the biological information measuring device 43 are installed at the base Rn as devices capable of acquiring the user's status. need not be provided. At least one device that can acquire the user's status may be provided.
  • the emotion estimating unit 32 estimates the user's emotion during viewing based on the status information input from the user status acquisition unit 31. For example, the emotion estimation unit 32 uses a known emotion estimation technique to estimate whether the user's emotion is one of three emotions: "positive,” “neutral,” and “negative.”
  • An example of such emotion estimation technology is "Atsushi Okada, Joeji Uemura, Kazuya Mera, Yoshiaki Kurosawa, Toshiyuki Takezawa. Real-time emotion estimation system from facial expressions, acoustic information, and text information. The 31st Annual Conference of the Japanese Society. for Artificial Intelligence, 2017.
  • the emotion estimation section 32 outputs the estimation result to the media information emphasis reproduction section 33.
  • the media information emphasizing reproduction section 33 emphatically reproduces the media information input from the media information receiving section 34 based on the estimation result input from the emotion estimation section 32.
  • emphasizing and reproducing media information based on the estimation result means reproducing the media information by changing the media information according to the estimation result. Therefore, depending on the estimation result, emphasizing and reproducing media information based on the estimation result includes reproducing the media information as it is without changing the media information.
  • emphasizing reproduction of media information based on the estimation result will also be simply referred to as emphasizing reproduction.
  • changing the media information includes changing the volume of the audio of the media information, processing the video of the media information, or both. Furthermore, changing the media information also includes changing the media information once and then changing it again to return to the original media information, resulting in no change to the media information.
  • the media information emphasis reproduction unit 33 Each time an estimation result is input from the emotion estimation unit 32, the media information emphasis reproduction unit 33 temporarily stores the estimation result and compares it with the estimation result input last time to determine whether there is a change in the estimation result. do. After the determination, the media information emphasis reproduction unit 33 updates the temporarily stored estimation result. As a result of the determination, if there is a change in the estimation result, the media information emphasis playback unit 33 changes the emphasis playback of the media information.
  • changing the emphasis reproduction of media information means changing the change given to the media information.
  • the estimation result of the emotion estimation unit 32 is one of the three emotions "positive”, “neutral”, and "negative".
  • the media information emphasizing playback section 33 plays back the media information by making the audio volume louder than before.
  • the media information emphasis playback section 33 may add an AR (augmented reality) effect to the video to create a sense of excitement and play back the media information.
  • AR effects include confetti and lighting.
  • the media information emphasizing playback unit 33 plays back the media information at a lower audio volume than before. Furthermore, the media information emphasizing playback unit 33 may add an AR effect to the video to raise the user's mood and play back the media information.
  • the audio to be changed may be either event audio or audience audio, or both.
  • the voice to be changed may be any of the separated voices or all of the separated voices.
  • the media information emphasis reproduction unit 33 continues to play the media information as before. Reproduce.
  • the media information emphasis playback unit 33 outputs playback information that emphasizes playback of media information based on the estimation result to the playback information output device 44.
  • the reproduction information output device 44 outputs video and audio based on the reproduction information input from the media information emphasis reproduction device 30.
  • the media information emphasizing playback device 30 is configured with a personal computer, a server computer, or the like.
  • FIG. 2 is a block diagram showing the hardware configuration of the media information emphasis playback device 30 according to the embodiment.
  • the media information emphasis playback device 30 includes a processor 51, a ROM (Read Only Memory) 52, a RAM (Random Access Memory) 53, an auxiliary storage device 54, an input/output interface 55, It has a communication interface 56.
  • ROM Read Only Memory
  • RAM Random Access Memory
  • the processor 51, ROM 52, RAM 53, auxiliary storage device 54, input/output interface 55, and communication interface 56 are electrically connected to each other via a bus 57, and exchange data via the bus 57.
  • the processor 51 is configured with a general-purpose hardware processor including, for example, a CPU (Central Processing Unit) and a GPU (Graphical Processing Unit).
  • the processor 51 controls the ROM 52, RAM 53, auxiliary storage device 54, input/output interface 55, and communication interface 56 as a whole.
  • the ROM 52 is a nonvolatile memory that forms part of the main storage device.
  • the ROM 52 non-temporarily stores a startup program necessary for starting the processor 51.
  • the processor 51 is activated by executing a program in the ROM 52.
  • the ROM 52 is composed of, for example, an EPROM (Erasable Programmable Read Only Memory), and stores various startup settings in addition to the startup program.
  • the RAM 53 is a volatile memory that forms part of the main storage device.
  • the RAM 53 temporarily stores programs necessary for processing by the processor 51 and data necessary for executing the programs.
  • the processor 51 calculates data in the RAM 53 by executing a program in the RAM 53, and stores the calculation results in the RAM 53.
  • the auxiliary storage device 54 is composed of nonvolatile memory such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive).
  • the auxiliary storage device 54 non-temporarily stores programs executed by the processor 51 and data necessary for executing the programs.
  • the processor 51 reads programs and data in the auxiliary storage device 54 into the RAM 53, and executes various functions by executing the programs.
  • the input/output interface 55 is connected to an external input device 61 , output device 62 , etc., and enables input of information from the input device 61 and output of information to the output device 62 .
  • the input/output interface 55 may be a wired interface or a wireless interface.
  • the wired interface includes a port to which a device is connected.
  • Wireless interfaces include Bluetooth (registered trademark), WiFi (registered trademark), and the like.
  • the input device 61 includes a camera 41, a microphone 42, and a biological information measuring device 43.
  • Input device 61 may further include a keyboard, mouse, touch panel, receiving device, disk drive, and the like.
  • the input device 61 is not limited to this, and may include any other input equipment.
  • Output device 62 includes playback information output device 44 .
  • Output devices 62 may further include displays, transmitters, disk drives, and the like.
  • the output device 62 is not limited to this, and may include any other output equipment.
  • the input device 61 and the output device 62 may be configured with an input/output device 63 having both functions.
  • the program non-temporarily stored in the auxiliary storage device 54 is provided to the computer via, for example, a computer-readable recording medium 64 on which the program is non-temporarily recorded.
  • a computer-readable recording medium 64 are referred to as non-transitory computer-readable storage media.
  • Non-transitory computer-readable recording media include disks such as flexible disks, optical disks (CD-ROM, CD-R, DVD-ROM, DVD-R, etc.), magneto-optical disks (MO, etc.), semiconductor memories, etc. .
  • the programs non-temporarily stored in the auxiliary storage device 54 include a media information emphasis playback program.
  • the media information emphasis playback program is a program that causes the computer constituting the media information emphasis playback device 30 to implement the functions of the user state acquisition section 31, emotion estimation section 32, media information emphasis playback section 33, and media information reception section 34. .
  • the program non-temporarily stored in the auxiliary storage device 54 is stored via the disk drive, which is the input device 61, and the input/output interface 55, when the recording medium 64 is a disk, or via the input/output interface 55, when the recording medium 64 is a semiconductor memory. Then, the data is read into the auxiliary storage device 54 via the port that is the input/output interface 55 and stored non-temporarily. Further, the program may be stored on a server on the network, downloaded from the server, and stored non-temporarily in the auxiliary storage device 54.
  • the communication interface 56 enables communication of information to and from the IP network 70. That is, the communication interface 56 makes it possible to receive media information distributed from the base O (event venue).
  • the processor 51 executes the program in the ROM 52, loads the OS into the RAM 53, and starts it.
  • the processor 51 monitors input of instructions, connection of external devices, etc. under the control of the OS. Further, the processor 51 sets a program area and a data area in the RAM 53 under the control of the OS.
  • the processor 51 reads the media information emphasizing playback program from the auxiliary storage device 54 into the program area of the RAM 53, and also loads the data necessary for executing the media information emphasizing playback program. The data is read from the auxiliary storage device 54 into the data area of the RAM 53.
  • the processor 51 calculates data in the data area according to the media information emphasis playback program, and writes the calculation results into the data area. Through such operations, the processor 51, the RAM 53, the auxiliary storage device 54, the input/output interface 55, and the communication interface 56 work together to obtain the user status acquisition unit 31, emotion estimation unit 32, and media information of the media information emphasis playback device 30. It implements the functions of the emphasis playback section 33 and the media information reception section 34.
  • FIG. 3 is a flowchart showing the flow of emphasized playback processing executed by the media information emphasized playback device according to the embodiment.
  • the media information emphasis reproduction section 33 always outputs reproduction information to the reproduction information output device 44.
  • step S1 the user state acquisition unit 31 converts the video information input from the camera 41, the audio information input from the microphone 42, and the biological information input from the biological information measuring device 43 into a state indicating the user's state. Obtain as information.
  • step S2 the emotion estimation unit 32 estimates the user's emotion during viewing based on the state information acquired in step S1.
  • step S3 the media information emphasis reproduction unit 33 compares the previous estimation result obtained in step S2 with the current estimation result, and determines whether there is a change in the estimation result. If there is no change in the estimation result, the process returns to step S1. If there is a change in the estimation result, the process advances to step S4.
  • step S4 the media information emphasis reproduction section 33 changes the emphasis reproduction of media information.
  • the media information emphasis reproduction section 33 changes the volume of the audio of the media information according to the estimation result. An example will be explained below.
  • A, B, C, and D are set in advance as coefficients for changing the sound volume.
  • A is a numerical value satisfying 0.8 ⁇ A ⁇ 1
  • B is a numerical value satisfying 0.5 ⁇ B ⁇ 0.8
  • C is a numerical value satisfying 1 ⁇ C ⁇ 1.2
  • D is a numerical value satisfying 0.5 ⁇ B ⁇ 0.8.
  • the numerical value satisfies 1.2 ⁇ D ⁇ 1.5.
  • the media information emphasis playback unit 33 changes the audio volume to A times the previous volume.
  • the media information emphasis playback unit 33 changes the audio volume to B times the previous volume.
  • the media information emphasis playback unit 33 changes the audio volume to C times the previous volume.
  • the media information emphasis playback unit 33 changes the audio volume to A times the previous volume.
  • the media information emphasis playback unit 33 changes the audio volume to C times the previous volume.
  • the media information emphasis playback unit 33 changes the audio volume to D times the previous volume.
  • the volume level may be changed immediately at the timing when the estimation result changes, or it may be changed linearly so that the volume reaches a predetermined level after a certain period of time (for example, one second).
  • step S5 After changing the emphasis playback of media information, in step S5, the process returns to step S1 while the user continues viewing, and when the user finishes viewing, the operation of the media information emphasis playback device 30 is ended by the user. .
  • a media information reproduction technique is provided that allows a user in a remote location to feel a high sense of participation in an event venue.
  • the streamed video that the user is viewing changes according to the emotional ups and downs of the user viewing the live event from a remote location, making it possible for the event venue and other viewers to see how the user is viewing the event ( It is felt as if cheering, emotions, emotions) are acting (propagating), which can increase the sense of participation (satisfaction) and unity in the event.
  • the media information emphasizing playback device 30 which is a user terminal, it is possible to immediately adjust the volume and add effects without communication delay, and the user's viewing (watching) situation can be seen in real time by the performer at the event venue. You can feel as if you are affecting the audience, or the audience in a remote location.
  • the present invention is not limited to the above-described embodiments, and can be variously modified at the implementation stage without departing from the gist thereof.
  • each embodiment may be implemented in combination as appropriate, and in that case, the combined effect can be obtained.
  • the embodiments described above include various inventions, and various inventions can be extracted by combinations selected from the plurality of constituent features disclosed. For example, if a problem can be solved and an effect can be obtained even if some constituent features are deleted from all the constituent features shown in the embodiment, the configuration from which these constituent features are deleted can be extracted as an invention.

Abstract

This media information emphasis playback device comprises a media information reception unit, a user state acquisition unit, an emotion inference unit, and a media information emphasis playback unit. The media information reception unit receives media information which includes video and audio. The user state acquisition unit acquires state information which indicates a state of a user during viewing. The emotion inference unit infers an emotion of the user during viewing, on the basis of the state information which has been input from the user state acquisition unit. On the basis of an inference result which has been input from the emotion inference unit, the media information emphasis playback unit performs emphasis playback of the media information which has been input from the media information reception unit.

Description

メディア情報強調再生装置、メディア情報強調再生方法、およびメディア情報強調再生プログラムMedia information emphasizing playback device, media information emphasizing playback method, and media information emphasizing playback program
 本発明は、メディア情報強調再生装置、メディア情報強調再生方法、およびメディア情報強調再生プログラムに関する。 The present invention relates to a media information emphasizing reproduction device, a media information emphasizing reproduction method, and a media information emphasizing reproduction program.
 近年、イベント会場で行われているエンタテインメントやスポーツのライブ配信を自宅等で視聴するライブビューイングイベントが増えてきている。 In recent years, there has been an increase in the number of live viewing events where people can watch live streaming of entertainment and sports taking place at event venues from their homes.
 実際のイベント会場では、自分が盛り上がると周囲の観客もそれにつられて盛り上がるなどの感情や情動の共起により一体感や盛り上がり感を得られるが、自宅等の遠隔地で配信映像を視聴するだけでは、そのような相互作用は起こらず、一体感を得られにくい。 At an actual event venue, you can get a sense of unity and excitement through the co-occurrence of feelings and emotions, such as when you get excited, the audience around you gets excited as well, but you can't just watch the distributed video from a remote location such as your home. , such interactions do not occur, and it is difficult to achieve a sense of unity.
 これまでに、遠隔地のユーザ同士(少人数)の双方向コミュニケーションにおいては、遠隔地のユーザ同士が盛り上がりを共有しながらライブ配信を視聴するシステムが提案されている[非特許文献1]。遠隔地のユーザの参加感や盛り上がりを高めることはできても、イベント会場との一体感を得られるわけではない。 So far, in two-way communication between users in remote locations (a small number of people), a system has been proposed in which users in remote locations view live distribution while sharing the excitement with each other [Non-Patent Document 1]. Although it may be possible to increase the feeling of participation and excitement among users in remote locations, it does not provide a sense of unity with the event venue.
 イベント会場と遠隔地との双方向通信により、遠隔地のユーザの感情や情動をイベント会場に共有し、相互作用を引き起こして一体感や盛り上がり感を高めるためには、遠隔地のユーザが、自分のアクション(感情や情動の共有)に対するフィードバックを感じられる(相互作用を認識する)こと、相互作用が即時的であることが必要である。 Through two-way communication between the event venue and the remote location, users at the remote location must be able to It is necessary for the user to be able to feel feedback (recognize the interaction) for their actions (share feelings and emotions), and for the interaction to be immediate.
 しかしながら、遠隔地とイベント会場との間の双方向の通信では通信遅延が生じるため、相互作用を認識するまでに時間がかかってしまい、即時性が失われる。 However, communication delays occur in two-way communication between a remote location and the event venue, so it takes time to recognize the interaction, resulting in a loss of immediacy.
 また、遠隔地に多数のユーザが存在する場合、ユーザが、自分のアクションに対するフィードバックを認識しづらい。 Additionally, when there are many users in remote locations, it is difficult for the users to recognize feedback on their actions.
 本発明は、上記事情に着目してなされたもので、その目的は、遠隔地のユーザがイベント会場への高い参加感を感じられるメディア情報強調再生装置、メディア情報強調再生方法、およびメディア情報強調再生プログラムを提供することにある。 The present invention has been made in view of the above-mentioned circumstances, and an object of the present invention is to provide a media information emphasizing playback device, a media information emphasizing playback method, and a media information emphasizing playback method that allow users in remote locations to feel a high sense of participation in an event venue. The goal is to provide a playback program.
 本発明の一態様は、メディア情報強調再生装置である。メディア情報強調再生装置は、メディア情報受信部と、ユーザ状態取得部と、感情推定部と、メディア情報強調再生部とを有する。メディア情報受信部は、映像と音声を含むメディア情報を受信する。ユーザ状態取得部は、ユーザの視聴中の状態を示す状態情報を取得する。感情推定部は、ユーザ状態取得部から入力される状態情報に基づき、ユーザの視聴中の感情を推定する。メディア情報強調再生部は、感情推定部から入力される推定結果に基づき、メディア情報受信部から入力されるメディア情報を強調再生する。 One aspect of the present invention is a media information emphasizing playback device. The media information emphasis playback device includes a media information reception section, a user state acquisition section, an emotion estimation section, and a media information emphasis playback section. The media information receiving unit receives media information including video and audio. The user status acquisition unit acquires status information indicating the viewing status of the user. The emotion estimator estimates the user's emotion during viewing based on the status information input from the user status acquisition unit. The media information emphasis reproduction section emphatically reproduces the media information input from the media information reception section based on the estimation result input from the emotion estimation section.
 本発明の一態様は、メディア情報強調再生方法である。メディア情報強調再生方法は、映像と音声を含むメディア情報を受信するステップと、ユーザの視聴中の状態を示す状態情報を取得するステップと、状態情報に基づき、ユーザの視聴中の感情を推定するステップと、ユーザの視聴中の感情の推定結果に基づき、メディア情報を強調再生するステップとを有する。 One aspect of the present invention is a media information emphasis reproduction method. The media information emphasis playback method includes the steps of receiving media information including video and audio, acquiring status information indicating the user's viewing status, and estimating the user's emotion while viewing based on the status information. and a step of emphasizing and reproducing media information based on the estimation result of the user's emotion during viewing.
 本発明の一態様は、メディア情報強調再生プログラムである。メディア情報強調再生プログラムは、プロセッサと記憶装置を有するコンピュータに、上記のメディア情報強調再生装置のメディア情報受信部とユーザ状態取得部と感情推定部とメディア情報強調再生部の機能を実行させる。 One aspect of the present invention is a media information emphasis playback program. The media information emphasis playback program causes a computer having a processor and a storage device to execute the functions of the media information reception section, user state acquisition section, emotion estimation section, and media information emphasis playback section of the above-mentioned media information emphasis playback device.
 本発明によれば、遠隔地のユーザがイベント会場への高い参加感を感じられるメディア情報強調再生装置、メディア情報強調再生方法、およびメディア情報強調再生プログラムが提供される。 According to the present invention, there are provided a media information emphasizing playback device, a media information emphasizing playback method, and a media information emphasizing playback program that allow users in remote locations to feel a high sense of participation in an event venue.
図1は、実施形態に係るメディア情報強調再生装置を含むメディア情報送受信システムのブロック図である。FIG. 1 is a block diagram of a media information transmitting and receiving system including a media information emphasizing playback device according to an embodiment. 図2は、実施形態に係るメディア情報強調再生装置のハードウェア構成を示すブロック図である。FIG. 2 is a block diagram showing the hardware configuration of the media information emphasizing playback device according to the embodiment. 図3は、実施形態に係るメディア情報強調再生装置が実行する処理の流れを示すフローチャートである。FIG. 3 is a flowchart showing the flow of processing executed by the media information emphasizing playback device according to the embodiment.
 〈構成例〉
 〔機能構成〕
 まず、図1を参照して、実施形態に係るメディア情報強調再生装置を含むメディア情報送受信システムについて説明する。図1は、実施形態に係るメディア情報強調再生装置を含むメディア情報送受信システムのブロック図である。
<Configuration example>
[Functional configuration]
First, with reference to FIG. 1, a media information transmitting and receiving system including a media information emphasizing playback device according to an embodiment will be described. FIG. 1 is a block diagram of a media information transmitting and receiving system including a media information emphasizing playback device according to an embodiment.
 メディア情報送受信システムは、一つの拠点Oと、N個の拠点Rn(n=1,2,…,N)との間で構築される。図1には、N個の拠点Rnのうちのひとつだけが図示されている。各拠点Rnの構成は同様である。 A media information transmission/reception system is constructed between one base O and N bases Rn (n=1, 2,...,N). In FIG. 1, only one of the N bases Rn is illustrated. The configuration of each base Rn is similar.
 拠点Oは、イベントが行われるイベント会場である。拠点O(イベント会場)からは、イベントの映像と音声を含むメディア情報がIPネットワーク70を介して配信される。拠点Rnは、拠点O(イベント会場)から配信されるメディア情報を、IPネットワーク70を介して受信して視聴する遠隔地である。例えば、遠隔地は、メディア情報を視聴するユーザの自宅である。 Base O is an event venue where an event will be held. Media information including video and audio of the event is distributed from base O (event venue) via the IP network 70. The base Rn is a remote location that receives and views media information distributed from the base O (event venue) via the IP network 70. For example, the remote location is the home of the user viewing the media information.
 (拠点O(イベント会場))
 拠点O(イベント会場)には、サーバ10と、映像撮影装置21と、イベント音声収録装置22と、観客音声収録装置23が設けられている。サーバ10は、メディア情報生成部11と、メディア情報送信部12とを有する。
(Base O (event venue))
The base O (event venue) is provided with a server 10, a video shooting device 21, an event audio recording device 22, and an audience audio recording device 23. The server 10 includes a media information generation section 11 and a media information transmission section 12.
 イベント会場で開催されるイベントは、例えば、音楽コンサート、演劇、スポーツ競技等であり、何でもよい。 The event held at the event venue may be, for example, a music concert, a play, a sports competition, etc.
 映像撮影装置21は、カメラおよびその関連機器を含み、イベントを撮影する。映像撮影装置21は、撮影したイベントの映像をサーバ10のメディア情報生成部11に出力する。 The video photographing device 21 includes a camera and its related equipment, and photographs the event. The video shooting device 21 outputs the shot video of the event to the media information generation unit 11 of the server 10.
 イベント音声収録装置22は、マイクおよびその関連機器を含み、イベントを実施することによって発生する音声等を収録する。以下では、便宜上、イベントを実施することによって発生する音声等を単にイベント音声と称する。例えば、イベントが、音楽コンサート、演劇、ショウ等である場合には、イベント音声は、演者が発する声、演者が生じさせる音、効果音等である。また、イベントがスポーツ競技等である場合には、イベント音声は、競技者が発する声、競技者によって生み出される音、競技の進行のために鳴らされる音等である。イベント音声収録装置22は、収録したイベント音声をサーバ10のメディア情報生成部11に出力する。 The event audio recording device 22 includes a microphone and its related equipment, and records audio etc. generated by implementing the event. Hereinafter, for convenience, sounds generated by implementing an event will be simply referred to as event sounds. For example, when the event is a music concert, a play, a show, etc., the event sounds include voices uttered by performers, sounds made by the performers, sound effects, and the like. Furthermore, when the event is a sports competition or the like, the event sounds include voices uttered by competitors, sounds produced by the competitors, sounds made for the progress of the competition, and the like. The event audio recording device 22 outputs the recorded event audio to the media information generation unit 11 of the server 10.
 観客音声収録装置23は、マイクおよびその関連機器を含み、イベントの観客が生み出す音声等を収録する。以下では、便宜上、イベントの観客が生み出す音声等を単に観客音声と称する。例えば、イベントが、スポーツ競技等である場合には、観客音声は、観客が発する歓声、観客が鳴り物を鳴らして生み出す音等である。観客音声収録装置23は、収録した観客音声をサーバ10のメディア情報生成部11に出力する。 The audience audio recording device 23 includes a microphone and its related equipment, and records audio etc. generated by the audience at the event. Hereinafter, for convenience, the sounds generated by the audience at the event will be simply referred to as audience sounds. For example, when the event is a sports competition or the like, the audience sounds include cheers emitted by the audience, sounds made by the audience making noises, and the like. The audience audio recording device 23 outputs the recorded audience audio to the media information generation unit 11 of the server 10.
 メディア情報生成部11は、映像撮影装置21から入力されるイベントの映像と、イベント音声収録装置22から入力されるイベント音声と、観客音声収録装置23から入力される観客音声とに基づき、映像と音声(イベント音声と観客音声)を含むメディア情報を生成する。メディア情報は、IPネットワーク70を介して、拠点Rnに配信される情報である。メディア情報生成部11は、生成したメディア情報をメディア情報送信部12に出力する。 The media information generation unit 11 generates a video based on the event video input from the video shooting device 21, the event audio input from the event audio recording device 22, and the audience audio input from the audience audio recording device 23. Generate media information including audio (event audio and audience audio). Media information is information distributed to base Rn via IP network 70. The media information generation section 11 outputs the generated media information to the media information transmission section 12.
 メディア情報生成部11は、メディア情報を生成する際に、イベント音声または観客音声またはそれら両方を、公知の音声分析技術を用いて分離してもよい。例えば、メディア情報生成部11は、イベント音声を、声と背景音に分離してもよい。そのような音声分析技術の一例は、「西山正志,広畑誠,小野利幸.声と背景音のボリュームバランス調整に向けた音源分離.情報処理学会研究報告.Vol.2013-CVIM-187 No.46.」に開示されている。また、メディア情報生成部11は、イベント音声を、音源ごとに分離してもよい。そのような音声分析技術の一例は、「小林瑞記,手塚宏史,稲葉真理.楽譜を用いた楽器音分離手法の提案.エンタテインメントコンピューティングシンポジウム(EC2015).2015年9月.」に開示されている。 When generating media information, the media information generation unit 11 may separate event audio, audience audio, or both using a known audio analysis technique. For example, the media information generation unit 11 may separate event audio into voices and background sounds. An example of such speech analysis technology is "Masashi Nishiyama, Makoto Hirohata, Toshiyuki Ono. Sound source separation for volume balance adjustment between voice and background sound. Information Processing Society of Japan Research Report. Vol. 2013-CVIM-187 No. 46 .'' is disclosed. Furthermore, the media information generation unit 11 may separate the event audio for each sound source. An example of such speech analysis technology is disclosed in "Mizuki Kobayashi, Hiroshi Tezuka, Mari Inaba. Proposal of instrument sound separation method using musical scores. Entertainment Computing Symposium (EC2015). September 2015." There is.
 ここでは、メディア情報生成部11がイベント音声と観客音声を分離する例を述べたが、これに代えて、イベント音声収録装置22がイベント音声を分離し、観客音声収録装置23が観客音声を分離してもよい。また、拠点Oの側において音声の分離を行う例を述べたが、この代わりに、音声の分離は拠点Rnにおいて行ってもよい。 Here, an example has been described in which the media information generation unit 11 separates event audio and audience audio, but instead of this, the event audio recording device 22 separates the event audio, and the audience audio recording device 23 separates the audience audio. You may. Further, although an example has been described in which the audio is separated at the site O, the audio may be separated at the site Rn instead.
 メディア情報送信部12は、メディア情報生成部11から入力されるメディア情報をIPネットワーク70に送信する。 The media information transmitter 12 transmits the media information input from the media information generator 11 to the IP network 70.
 ここでは、拠点O(イベント会場)に、イベント音声を収録するイベント音声収録装置22と、観客音声を収録する観客音声収録装置23とが設けられる例を述べたが、イベント音声収録装置22と観客音声収録装置23に代えて、一台の音声収録装置が設けられ、イベント音声と観客音声が混ざった音声を収録してもよい。 Here, an example has been described in which base O (event venue) is provided with the event audio recording device 22 for recording event audio and the audience audio recording device 23 for recording audience audio. Instead of the audio recording device 23, one audio recording device may be provided to record a mixture of event audio and audience audio.
 (拠点Rn(遠隔地))
 拠点Rn(遠隔地)には、拠点O(イベント会場)から配信されるメディア情報を、IPネットワーク70を介して受信して、拠点O(イベント会場)で開催中のイベントをリモートで視聴するユーザがいる。以下では、拠点Rn(遠隔地)にいるユーザを、単にユーザと称する。
(Base Rn (remote location))
At base Rn (remote location), there is a user who receives media information distributed from base O (event venue) via the IP network 70 and remotely views an event being held at base O (event venue). There is. Hereinafter, the user at base Rn (remote location) will be simply referred to as a user.
 拠点Rn(遠隔地)には、メディア情報強調再生装置30と、カメラ41と、マイク42と、生体情報計測装置43と、再生情報出力装置44が設けられている。 The base Rn (remote location) is provided with a media information emphasis playback device 30, a camera 41, a microphone 42, a biological information measurement device 43, and a playback information output device 44.
 再生情報出力装置44は、ディスプレイとスピーカを有し、メディア情報強調再生装置30から入力される再生情報に基づき、映像と音声を出力する。ユーザは、再生情報出力装置44から出力される映像と音声を視聴することで、拠点O(イベント会場)で開催されているイベントを視聴することになる。以下では、ユーザは、再生情報出力装置44を通して、イベントを視聴しているものとする。 The reproduction information output device 44 has a display and a speaker, and outputs video and audio based on the reproduction information input from the media information emphasis reproduction device 30. By viewing the video and audio output from the playback information output device 44, the user views the event being held at the base O (event venue). In the following, it is assumed that the user is viewing an event through the reproduction information output device 44.
 メディア情報強調再生装置30は、ユーザ端末であり、拠点O(イベント会場)から配信されるメディア情報を受信し、再生情報を再生情報出力装置44に出力する。 The media information emphasis playback device 30 is a user terminal, receives media information distributed from the base O (event venue), and outputs playback information to the playback information output device 44.
 メディア情報強調再生装置30は、ユーザ状態取得部31と、感情推定部32と、メディア情報強調再生部33と、メディア情報受信部34とを有する。 The media information emphasis playback device 30 includes a user state acquisition section 31 , an emotion estimation section 32 , a media information emphasis playback section 33 , and a media information reception section 34 .
 メディア情報受信部34は、拠点O(イベント会場)のサーバ10から送信されるメディア情報を、IPネットワーク70を介して受信してメディア情報強調再生部33に出力する。 The media information receiving unit 34 receives media information transmitted from the server 10 at the base O (event venue) via the IP network 70 and outputs it to the media information emphasis reproduction unit 33.
 カメラ41は、ユーザを撮影するように、ユーザ自身によって設置される。カメラ41は、ユーザを撮影し、その映像情報をユーザ状態取得部31に出力する。 The camera 41 is installed by the user himself to photograph the user. The camera 41 photographs the user and outputs the video information to the user status acquisition section 31.
 マイク42は、ユーザの音声を拾うように、ユーザ自身によって設置される。マイク42は、ユーザの音声および背景音を拾い、その音声情報をユーザ状態取得部31に出力する。 The microphone 42 is installed by the user himself so as to pick up the user's voice. The microphone 42 picks up the user's voice and background sound, and outputs the voice information to the user status acquisition unit 31.
 生体情報計測装置43は、ユーザの生体情報を計測する。生体情報は、脳派や心拍などである。このため、生体情報計測装置43に含まれる電極やセンサがユーザに、ユーザ自身によって取り付けられる。生体情報計測装置43は、ユーザの計測した生体情報をユーザ状態取得部31に出力する。 The biological information measuring device 43 measures the user's biological information. Biological information includes brain function and heart rate. Therefore, the electrodes and sensors included in the biological information measuring device 43 are attached to the user by the user himself/herself. The biological information measuring device 43 outputs the biological information measured by the user to the user state acquisition unit 31.
 ユーザ状態取得部31は、カメラ41から入力される映像情報と、マイク42から入力される音声情報と、生体情報計測装置43から入力される生体情報を、ユーザの状態を示す状態情報として取得する。ユーザ状態取得部31は、取得した状態情報を感情推定部32に出力する。 The user status acquisition unit 31 acquires video information input from the camera 41, audio information input from the microphone 42, and biological information input from the biological information measuring device 43 as status information indicating the user's status. . The user state acquisition unit 31 outputs the acquired state information to the emotion estimation unit 32.
 ここでは、ユーザの状態を取得し得る機器として、カメラ41とマイク42と生体情報計測装置43が拠点Rnに設けられる例を述べたが、必ずしもカメラ41とマイク42と生体情報計測装置43のすべてが設けられる必要はない。ユーザの状態を取得し得る機器は、少なくとも一台が設けられていればよい。 Here, an example has been described in which the camera 41, the microphone 42, and the biological information measuring device 43 are installed at the base Rn as devices capable of acquiring the user's status. need not be provided. At least one device that can acquire the user's status may be provided.
 感情推定部32は、ユーザ状態取得部31から入力される状態情報に基づき、ユーザの視聴中の感情を推定する。例えば、感情推定部32は、公知の感情推定技術を用いて、ユーザの感情が、「positive」と「neutral」と「negative」の3感情のいずれかであるかを推定する。そのような感情推定技術の一例は、「岡田敦志,上村譲史,目良和也,黒澤義明,竹澤寿幸.表情・音響情報・テキスト情報からのリアルタイム感情推定システム.The 31st Annual Conference of the Japanese Society for Artificial Intelligence, 2017.」に開示されている。感情推定部32は、推定結果をメディア情報強調再生部33に出力する。 The emotion estimating unit 32 estimates the user's emotion during viewing based on the status information input from the user status acquisition unit 31. For example, the emotion estimation unit 32 uses a known emotion estimation technique to estimate whether the user's emotion is one of three emotions: "positive," "neutral," and "negative." An example of such emotion estimation technology is "Atsushi Okada, Joeji Uemura, Kazuya Mera, Yoshiaki Kurosawa, Toshiyuki Takezawa. Real-time emotion estimation system from facial expressions, acoustic information, and text information. The 31st Annual Conference of the Japanese Society. for Artificial Intelligence, 2017. The emotion estimation section 32 outputs the estimation result to the media information emphasis reproduction section 33.
 メディア情報強調再生部33は、感情推定部32から入力される推定結果に基づき、メディア情報受信部34から入力されるメディア情報を強調再生する。 The media information emphasizing reproduction section 33 emphatically reproduces the media information input from the media information receiving section 34 based on the estimation result input from the emotion estimation section 32.
 ここで、推定結果に基づきメディア情報を強調再生するとは、推定結果に応じて、メディア情報に変化を与えて、メディア情報を再生することをいう。したがって、推定結果に基づきメディア情報を強調再生することは、推定結果によっては、メディア情報に変化を与えることなく、メディア情報をそのまま再生することも含む。以下では、推定結果に基づきメディア情報を強調再生することを、単に強調再生とも称する。 Here, emphasizing and reproducing media information based on the estimation result means reproducing the media information by changing the media information according to the estimation result. Therefore, depending on the estimation result, emphasizing and reproducing media information based on the estimation result includes reproducing the media information as it is without changing the media information. Hereinafter, emphasizing reproduction of media information based on the estimation result will also be simply referred to as emphasizing reproduction.
 また、メディア情報に変化を与えるとは、メディア情報の音声の音量を変化させること、または、メディア情報の映像に加工を施すこと、または、それらの両方を含む。また、メディア情報に変化を与えることは、いったん変化を与えた後に、再び変化を与えた結果、元のメディア情報に戻り、結果として、メディア情報に変化を与えないことも含む。 Furthermore, changing the media information includes changing the volume of the audio of the media information, processing the video of the media information, or both. Furthermore, changing the media information also includes changing the media information once and then changing it again to return to the original media information, resulting in no change to the media information.
 メディア情報強調再生部33は、感情推定部32から推定結果が入力される都度、推定結果を一時的に記憶するとともに、前回入力された推定結果と比較して、推定結果の変化の有無を判定する。メディア情報強調再生部33は、判定後、一時的に記憶する推定結果を更新する。メディア情報強調再生部33は、判定の結果、推定結果の変化があった場合には、メディア情報の強調再生を変更する。ここで、メディア情報の強調再生を変更するとは、メディア情報に与える変化を変えることをいう。 Each time an estimation result is input from the emotion estimation unit 32, the media information emphasis reproduction unit 33 temporarily stores the estimation result and compares it with the estimation result input last time to determine whether there is a change in the estimation result. do. After the determination, the media information emphasis reproduction unit 33 updates the temporarily stored estimation result. As a result of the determination, if there is a change in the estimation result, the media information emphasis playback unit 33 changes the emphasis playback of the media information. Here, changing the emphasis reproduction of media information means changing the change given to the media information.
 例えば、前述したように、感情推定部32の推定結果は「positive」と「neutral」と「negative」の3感情のいずれかであるとする。 For example, as described above, it is assumed that the estimation result of the emotion estimation unit 32 is one of the three emotions "positive", "neutral", and "negative".
 感情推定部32の推定結果が「positive」に変化したときは、ユーザは盛り上がり感を得ていると考えられる。この場合、メディア情報強調再生部33は、音声の音量をこれまでよりも大きくして、メディア情報を再生する。さらに、メディア情報強調再生部33は、映像に、盛り上がり感を表現するAR(拡張現実)エフェクトを追加して、メディア情報を再生してもよい。そのようなARエフェクトとしては、紙吹雪やライティングなどがある。 When the estimation result of the emotion estimation unit 32 changes to "positive", it is considered that the user is feeling excited. In this case, the media information emphasizing playback section 33 plays back the media information by making the audio volume louder than before. Furthermore, the media information emphasis playback section 33 may add an AR (augmented reality) effect to the video to create a sense of excitement and play back the media information. Such AR effects include confetti and lighting.
 また、感情推定部32の推定結果が「negative」に変化したときは、ユーザは気抜け感を得ていると考えられる。この場合、メディア情報強調再生部33は、音声の音量をこれまでよりも小さくして、メディア情報を再生する。さらに、メディア情報強調再生部33は、映像に、ユーザの気分の高揚を喚起するARエフェクトを追加して、メディア情報を再生してもよい。 Further, when the estimation result of the emotion estimation unit 32 changes to "negative", it is considered that the user is feeling relaxed. In this case, the media information emphasizing playback unit 33 plays back the media information at a lower audio volume than before. Furthermore, the media information emphasizing playback unit 33 may add an AR effect to the video to raise the user's mood and play back the media information.
 変化させる音声は、イベント音声と観客音声のいずれか一方であってもよいし、両方であってもよい。また、音声が分離されている場合には、変化させる音声は、分離されている音声のいずれかであってもよいし、すべてであってもよい。 The audio to be changed may be either event audio or audience audio, or both. Furthermore, when the voices are separated, the voice to be changed may be any of the separated voices or all of the separated voices.
 また、メディア情報強調再生部33は、判定の結果、推定結果の変化がなかった場合には、更新後の推定結果は更新前の推定結果と同じであるので、これまでと同様にメディア情報を再生する。 Furthermore, if the result of the determination is that there is no change in the estimation result, the estimation result after the update is the same as the estimation result before the update, so the media information emphasis reproduction unit 33 continues to play the media information as before. Reproduce.
 メディア情報強調再生部33は、推定結果に基づきメディア情報を強調再生した再生情報を再生情報出力装置44に出力する。再生情報出力装置44は、前述したように、メディア情報強調再生装置30から入力される再生情報に基づき、映像と音声を出力する。 The media information emphasis playback unit 33 outputs playback information that emphasizes playback of media information based on the estimation result to the playback information output device 44. As described above, the reproduction information output device 44 outputs video and audio based on the reproduction information input from the media information emphasis reproduction device 30.
 (ハードウェア構成)
 次に、メディア情報強調再生装置30のハードウェア構成について説明する。例えば、メディア情報強調再生装置30は、パーソナルコンピュータやサーバコンピュータ等で構成される。
(Hardware configuration)
Next, the hardware configuration of the media information emphasis playback device 30 will be explained. For example, the media information emphasizing playback device 30 is configured with a personal computer, a server computer, or the like.
 図2は、実施形態に係るメディア情報強調再生装置30のハードウェア構成を示すブロック図である。図2に示されるように、メディア情報強調再生装置30は、プロセッサ51と、ROM(Read Only Memory)52と、RAM(Random Access Memory)53と、補助記憶装置54と、入出力インタフェース55と、通信インタフェース56とを有する。 FIG. 2 is a block diagram showing the hardware configuration of the media information emphasis playback device 30 according to the embodiment. As shown in FIG. 2, the media information emphasis playback device 30 includes a processor 51, a ROM (Read Only Memory) 52, a RAM (Random Access Memory) 53, an auxiliary storage device 54, an input/output interface 55, It has a communication interface 56.
 プロセッサ51とROM52とRAM53と補助記憶装置54と入出力インタフェース55と通信インタフェース56は、バス57を介して互いに電気的に接続されており、バス57を介してデータのやりとりを行う。 The processor 51, ROM 52, RAM 53, auxiliary storage device 54, input/output interface 55, and communication interface 56 are electrically connected to each other via a bus 57, and exchange data via the bus 57.
 プロセッサ51は、例えば、CPU(Central Processing Unit)やGPU(Graphical Processing Unit)等を含む汎用ハードウェアプロセッサで構成される。プロセッサ51は、ROM52とRAM53と補助記憶装置54と入出力インタフェース55と通信インタフェース56との全体を制御する。 The processor 51 is configured with a general-purpose hardware processor including, for example, a CPU (Central Processing Unit) and a GPU (Graphical Processing Unit). The processor 51 controls the ROM 52, RAM 53, auxiliary storage device 54, input/output interface 55, and communication interface 56 as a whole.
 ROM52は、主記憶装置の一部を構成する不揮発性メモリである。ROM52は、プロセッサ51の起動時に必要な起動プログラムを非一時的に記憶している。プロセッサ51は、ROM52内のプログラムを実行することにより起動する。ROM52は、例えば、EPROM(Erasable Programmable Read Only Memory)で構成され、起動プログラムに加えて、起動時の諸設定を記憶している。 The ROM 52 is a nonvolatile memory that forms part of the main storage device. The ROM 52 non-temporarily stores a startup program necessary for starting the processor 51. The processor 51 is activated by executing a program in the ROM 52. The ROM 52 is composed of, for example, an EPROM (Erasable Programmable Read Only Memory), and stores various startup settings in addition to the startup program.
 RAM53は、主記憶装置の一部を構成する揮発性メモリである。RAM53は、プロセッサ51の処理に必要なプログラムとプログラムの実行に必要なデータを一時的に記憶する。RAM53は、プロセッサ51は、RAM53内のプログラムを実行することにより、RAM53内のデータを演算し、演算結果をRAM53に記憶させる。 The RAM 53 is a volatile memory that forms part of the main storage device. The RAM 53 temporarily stores programs necessary for processing by the processor 51 and data necessary for executing the programs. In the RAM 53, the processor 51 calculates data in the RAM 53 by executing a program in the RAM 53, and stores the calculation results in the RAM 53.
 補助記憶装置54は、HDD(Hard Disk Drive)やSSD(Solid State Drive)等の不揮発性メモリで構成される。補助記憶装置54は、プロセッサ51が実行するプログラムとプログラムの実行に必要なデータを非一時的に記憶している。プロセッサ51は、補助記憶装置54内のプログラムとデータをRAM53内に読み込み、プログラムを実行することにより各種機能を実行する。 The auxiliary storage device 54 is composed of nonvolatile memory such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive). The auxiliary storage device 54 non-temporarily stores programs executed by the processor 51 and data necessary for executing the programs. The processor 51 reads programs and data in the auxiliary storage device 54 into the RAM 53, and executes various functions by executing the programs.
 入出力インタフェース55は、外部の入力デバイス61や出力デバイス62等と接続され、入力デバイス61からの情報の入力、出力デバイス62への情報の出力を可能にする。例えば、入出力インタフェース55は、有線のインタフェースであっても、無線のインタフェースであってもよい。有線のインタフェースは、デバイスが接続されるポート等を含む。無線のインタフェースは、Bluetooth(登録商標)、WiFi(登録商標)等を含む。 The input/output interface 55 is connected to an external input device 61 , output device 62 , etc., and enables input of information from the input device 61 and output of information to the output device 62 . For example, the input/output interface 55 may be a wired interface or a wireless interface. The wired interface includes a port to which a device is connected. Wireless interfaces include Bluetooth (registered trademark), WiFi (registered trademark), and the like.
 入力デバイス61は、カメラ41、マイク42、生体情報計測装置43を含む。入力デバイス61は、さらに、キーボード、マウス、タッチパネル、受信装置、ディスクドライブ等を含み得る。入力デバイス61は、これに限らず、他の任意の入力機器を含み得る。出力デバイス62は、再生情報出力装置44を含む。出力デバイス62は、さらに、ディスプレイ、送信装置、ディスクドライブ等を含み得る。出力デバイス62は、これに限らず、他の任意の出力機器を含み得る。入力デバイス61と出力デバイス62は、両者の機能を有する入出力デバイス63で構成されてもよい。 The input device 61 includes a camera 41, a microphone 42, and a biological information measuring device 43. Input device 61 may further include a keyboard, mouse, touch panel, receiving device, disk drive, and the like. The input device 61 is not limited to this, and may include any other input equipment. Output device 62 includes playback information output device 44 . Output devices 62 may further include displays, transmitters, disk drives, and the like. The output device 62 is not limited to this, and may include any other output equipment. The input device 61 and the output device 62 may be configured with an input/output device 63 having both functions.
 補助記憶装置54に非一時的に記憶されるプログラムは、例えば、プログラムを非一時的に記録したコンピュータで読み取り可能な記録媒体64を介して、コンピュータに提供される。そのような記録媒体64は、非一時的コンピュータ読取可能記録媒体と呼ばれる。非一時的コンピュータ読取可能記録媒体は、フレキシブルディスク、光ディスク(CD-ROM、CD-R、DVD-ROM、DVD-R等)、光磁気ディスク(MO等)等のディスクや、半導体メモリ等を含む。 The program non-temporarily stored in the auxiliary storage device 54 is provided to the computer via, for example, a computer-readable recording medium 64 on which the program is non-temporarily recorded. Such storage media 64 are referred to as non-transitory computer-readable storage media. Non-transitory computer-readable recording media include disks such as flexible disks, optical disks (CD-ROM, CD-R, DVD-ROM, DVD-R, etc.), magneto-optical disks (MO, etc.), semiconductor memories, etc. .
 補助記憶装置54に非一時的に記憶されるプログラムは、メディア情報強調再生プログラムを含む。メディア情報強調再生プログラムは、メディア情報強調再生装置30を構成するコンピュータに、ユーザ状態取得部31と感情推定部32とメディア情報強調再生部33とメディア情報受信部34の機能を実施させるプログラムである。 The programs non-temporarily stored in the auxiliary storage device 54 include a media information emphasis playback program. The media information emphasis playback program is a program that causes the computer constituting the media information emphasis playback device 30 to implement the functions of the user state acquisition section 31, emotion estimation section 32, media information emphasis playback section 33, and media information reception section 34. .
 補助記憶装置54に非一時的に記憶されるプログラムは、記録媒体64がディスクである場合には、入力デバイス61であるディスクドライブと入出力インタフェース55を介して、記録媒体64が半導体メモリある場合には、入出力インタフェース55であるポートを介して、補助記憶装置54に読み込まれ非一時的に記憶される。また、プログラムは、ネットワーク上のサーバに格納されており、サーバからダウンロードされ、補助記憶装置54に非一時的に記憶されてもよい。 The program non-temporarily stored in the auxiliary storage device 54 is stored via the disk drive, which is the input device 61, and the input/output interface 55, when the recording medium 64 is a disk, or via the input/output interface 55, when the recording medium 64 is a semiconductor memory. Then, the data is read into the auxiliary storage device 54 via the port that is the input/output interface 55 and stored non-temporarily. Further, the program may be stored on a server on the network, downloaded from the server, and stored non-temporarily in the auxiliary storage device 54.
 通信インタフェース56は、IPネットワーク70との間の情報の通信を可能にする。すなわち、通信インタフェース56は、拠点O(イベント会場)から配信されるメディア情報の受信を可能にする。 The communication interface 56 enables communication of information to and from the IP network 70. That is, the communication interface 56 makes it possible to receive media information distributed from the base O (event venue).
 プロセッサ51は、起動時に、ROM52内のプログラムを実行し、RAM53内にOSを読み込み起動する。プロセッサ51は、OSによる制御の下、指示入力や外部機器の接続等を監視する。また、プロセッサ51は、OSによる制御の下、RAM53内にプログラムエリアとデータエリアを設定する。プロセッサ51は、メディア情報強調再生装置30の起動の指示入力に対して、補助記憶装置54からメディア情報強調再生プログラムをRAM53のプログラムエリアに読み込むとともに、メディア情報強調再生プログラムの実行に必要なデータを補助記憶装置54からRAM53のデータエリアに読み込む。プロセッサ51は、メディア情報強調再生プログラムに従ってデータエリアのデータを演算し、演算結果をデータエリアに書き込む。このような動作によって、プロセッサ51とRAM53と補助記憶装置54と入出力インタフェース55と通信インタフェース56は共働して、メディア情報強調再生装置30のユーザ状態取得部31と感情推定部32とメディア情報強調再生部33とメディア情報受信部34の機能を実施する。 At startup, the processor 51 executes the program in the ROM 52, loads the OS into the RAM 53, and starts it. The processor 51 monitors input of instructions, connection of external devices, etc. under the control of the OS. Further, the processor 51 sets a program area and a data area in the RAM 53 under the control of the OS. In response to an input instruction to start up the media information emphasizing playback device 30, the processor 51 reads the media information emphasizing playback program from the auxiliary storage device 54 into the program area of the RAM 53, and also loads the data necessary for executing the media information emphasizing playback program. The data is read from the auxiliary storage device 54 into the data area of the RAM 53. The processor 51 calculates data in the data area according to the media information emphasis playback program, and writes the calculation results into the data area. Through such operations, the processor 51, the RAM 53, the auxiliary storage device 54, the input/output interface 55, and the communication interface 56 work together to obtain the user status acquisition unit 31, emotion estimation unit 32, and media information of the media information emphasis playback device 30. It implements the functions of the emphasis playback section 33 and the media information reception section 34.
 〔動作例〕
 次に、図3を参照して、メディア情報強調再生装置30が実行する強調再生の処理について説明する。図3は、実施形態に係るメディア情報強調再生装置が実行する強調再生の処理の流れを示すフローチャートである。ここで、メディア情報強調再生部33は、常に再生情報を再生情報出力装置44に出力しているものとする。
[Operation example]
Next, with reference to FIG. 3, the emphasized playback process executed by the media information emphasized playback device 30 will be described. FIG. 3 is a flowchart showing the flow of emphasized playback processing executed by the media information emphasized playback device according to the embodiment. Here, it is assumed that the media information emphasis reproduction section 33 always outputs reproduction information to the reproduction information output device 44.
 ステップS1において、ユーザ状態取得部31は、カメラ41から入力される映像情報と、マイク42から入力される音声情報と、生体情報計測装置43から入力される生体情報を、ユーザの状態を示す状態情報として取得する。 In step S1, the user state acquisition unit 31 converts the video information input from the camera 41, the audio information input from the microphone 42, and the biological information input from the biological information measuring device 43 into a state indicating the user's state. Obtain as information.
 ステップS2において、感情推定部32は、ステップS1において取得された状態情報に基づき、ユーザの視聴中の感情を推定する。 In step S2, the emotion estimation unit 32 estimates the user's emotion during viewing based on the state information acquired in step S1.
 ステップS3において、メディア情報強調再生部33は、ステップS2において得られた前回の推定結果と今回の推定結果を比較して、推定結果の変化の有無を判定する。推定結果に変化がなかった場合には、ステップS1の処理に戻る。推定結果に変化があった場合には、ステップS4の処理に進む。 In step S3, the media information emphasis reproduction unit 33 compares the previous estimation result obtained in step S2 with the current estimation result, and determines whether there is a change in the estimation result. If there is no change in the estimation result, the process returns to step S1. If there is a change in the estimation result, the process advances to step S4.
 ステップS4において、メディア情報強調再生部33は、メディア情報の強調再生を変更する。 In step S4, the media information emphasis reproduction section 33 changes the emphasis reproduction of media information.
 以下、感情推定部32の推定結果が「positive」と「neutral」と「negative」の3感情のいずれかであり、推定結果に応じてメディア情報強調再生部33がメディア情報の音声の音量を変化させる例について説明する。 Hereinafter, if the estimation result of the emotion estimation section 32 is one of the three emotions "positive", "neutral", and "negative", the media information emphasis reproduction section 33 changes the volume of the audio of the media information according to the estimation result. An example will be explained below.
 ここで、音声の音量を変化させる際の係数として、A,B,C,Dが予め設定されているものとする。ここでは、Aは、0.8≦A<1を満たす数値、Bは、0.5<B<0.8を満たす数値、Cは、1≦C<1.2を満たす数値、Dは、1.2≦D<1.5を満たす数値とする。 Here, it is assumed that A, B, C, and D are set in advance as coefficients for changing the sound volume. Here, A is a numerical value satisfying 0.8≦A<1, B is a numerical value satisfying 0.5<B<0.8, C is a numerical value satisfying 1≦C<1.2, and D is a numerical value satisfying 0.5<B<0.8. The numerical value satisfies 1.2≦D<1.5.
 推定結果が「positive」から「neutral」に変化した場合、メディア情報強調再生部33は、音声の音量を、これまでの音量のA倍に変更する。 If the estimation result changes from "positive" to "neutral", the media information emphasis playback unit 33 changes the audio volume to A times the previous volume.
 推定結果が「positive」から「negative」に変化した場合、メディア情報強調再生部33は、音声の音量を、これまでの音量のB倍に変更する。 If the estimation result changes from "positive" to "negative", the media information emphasis playback unit 33 changes the audio volume to B times the previous volume.
 推定結果が「neutral」から「positive」に変化した場合、メディア情報強調再生部33は、音声の音量を、これまでの音量のC倍に変更する。 If the estimation result changes from "neutral" to "positive", the media information emphasis playback unit 33 changes the audio volume to C times the previous volume.
 推定結果が「neutral」から「negative」に変化した場合、メディア情報強調再生部33は、音声の音量を、これまでの音量のA倍に変更する。 If the estimation result changes from "neutral" to "negative", the media information emphasis playback unit 33 changes the audio volume to A times the previous volume.
 推定結果が「negative」から「neutral」に変化した場合、メディア情報強調再生部33は、音声の音量を、これまでの音量のC倍に変更する。 If the estimation result changes from "negative" to "neutral", the media information emphasis playback unit 33 changes the audio volume to C times the previous volume.
 推定結果が「negative」から「positive」に変化した場合、メディア情報強調再生部33は、音声の音量を、これまでの音量のD倍に変更する。 If the estimation result changes from "negative" to "positive," the media information emphasis playback unit 33 changes the audio volume to D times the previous volume.
 音量の大きさの変更は、推定結果が変化したタイミングで即座に行われてもよいし、一定時間後(例えば1秒後)に所定の大きさになるように線形に変更させてもよい。 The volume level may be changed immediately at the timing when the estimation result changes, or it may be changed linearly so that the volume reaches a predetermined level after a certain period of time (for example, one second).
 メディア情報の強調再生の変更後、ステップS5において、ユーザが視聴を続ける間は、ステップS1の処理に戻り、ユーザが視聴を終了するときには、ユーザによってメディア情報強調再生装置30の動作が終了される。 After changing the emphasis playback of media information, in step S5, the process returns to step S1 while the user continues viewing, and when the user finishes viewing, the operation of the media information emphasis playback device 30 is ended by the user. .
 〈効果〉
 実施形態によれば、遠隔地のユーザがイベント会場への高い参加感を感じられるメディア情報再生技術が提供される。
<effect>
According to the embodiment, a media information reproduction technique is provided that allows a user in a remote location to feel a high sense of participation in an event venue.
 すなわち、遠隔地でライブイベントを視聴するユーザの感情の起伏に合わせて、自分が視聴している配信映像が変化することで、イベント会場や他の視聴者に対して、自分の視聴の様子(応援、感情、情動)が作用(伝播)しているかのように感じられ、イベントへの参加感(満足度)や一体感を高めることができる。 In other words, the streamed video that the user is viewing changes according to the emotional ups and downs of the user viewing the live event from a remote location, making it possible for the event venue and other viewers to see how the user is viewing the event ( It is felt as if cheering, emotions, emotions) are acting (propagating), which can increase the sense of participation (satisfaction) and unity in the event.
 一連の処理をすべてユーザ端末であるメディア情報強調再生装置30で実行することで、通信遅延なく即時音量調整やエフェクト付与を行うことができ、リアルタイムで自分の鑑賞(観戦)状況がイベント会場の演者や観客、遠隔地の観客に作用したかのように感じることができる。 By executing a series of processes entirely on the media information emphasizing playback device 30, which is a user terminal, it is possible to immediately adjust the volume and add effects without communication delay, and the user's viewing (watching) situation can be seen in real time by the performer at the event venue. You can feel as if you are affecting the audience, or the audience in a remote location.
 また、ユーザ端末であるメディア情報強調再生装置30からデータをイベント会場側に送信しないことで、双方向配信基盤を整備する必要がなく、構築・運用コストを削減することができる。 Furthermore, by not transmitting data from the media information emphasizing playback device 30, which is a user terminal, to the event venue side, there is no need to prepare a bidirectional distribution infrastructure, and construction and operation costs can be reduced.
 なお、本発明は、上記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、各実施形態は適宜組み合わせて実施してもよく、その場合組み合わせた効果が得られる。更に、上記実施形態には種々の発明が含まれており、開示される複数の構成要件から選択された組み合わせにより種々の発明が抽出され得る。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、課題が解決でき、効果が得られる場合には、この構成要件が削除された構成が発明として抽出され得る。 Note that the present invention is not limited to the above-described embodiments, and can be variously modified at the implementation stage without departing from the gist thereof. Moreover, each embodiment may be implemented in combination as appropriate, and in that case, the combined effect can be obtained. Furthermore, the embodiments described above include various inventions, and various inventions can be extracted by combinations selected from the plurality of constituent features disclosed. For example, if a problem can be solved and an effect can be obtained even if some constituent features are deleted from all the constituent features shown in the embodiment, the configuration from which these constituent features are deleted can be extracted as an invention.
  O…拠点(イベント会場)
  10…サーバ
  11…メディア情報生成部
  12…メディア情報送信部
  21…映像撮影装置
  22…イベント音声収録装置
  23…観客音声収録装置
  Rn…拠点(遠隔地)
  30…メディア情報強調再生装置
  31…ユーザ状態取得部
  32…感情推定部
  33…メディア情報強調再生部
  34…メディア情報受信部
  41…カメラ
  42…マイク
  43…生体情報計測装置
  44…再生情報出力装置
  51…プロセッサ
  52…ROM
  53…RAM
  54…補助記憶装置
  55…入出力インタフェース
  56…通信インタフェース
  57…バス
  61…入力デバイス
  62…出力デバイス
  63…入出力デバイス
  64…記録媒体
  70…IPネットワーク
O... Base (event venue)
10... Server 11... Media information generation section 12... Media information transmission section 21... Video shooting device 22... Event audio recording device 23... Audience audio recording device Rn... Base (remote location)
30... Media information emphasizing playback device 31... User status acquisition section 32... Emotion estimation section 33... Media information emphasis playback section 34... Media information receiving section 41... Camera 42... Microphone 43... Biological information measuring device 44... Playback information output device 51 ...Processor 52...ROM
53...RAM
54... Auxiliary storage device 55... Input/output interface 56... Communication interface 57... Bus 61... Input device 62... Output device 63... Input/output device 64... Recording medium 70... IP network

Claims (8)

  1.  映像と音声を含むメディア情報を受信するメディア情報受信部と、
     ユーザの視聴中の状態を示す状態情報を取得するユーザ状態取得部と、
     前記ユーザ状態取得部から入力される前記状態情報に基づき、前記ユーザの視聴中の感情を推定する感情推定部と、
     前記感情推定部から入力される推定結果に基づき、前記メディア情報受信部から入力される前記メディア情報を強調再生するメディア情報強調再生部と、
     を有する、メディア情報強調再生装置。
    a media information receiving unit that receives media information including video and audio;
    a user status acquisition unit that acquires status information indicating the viewing status of the user;
    an emotion estimation unit that estimates the user's emotion during viewing based on the status information input from the user status acquisition unit;
    a media information emphasizing reproduction unit that emphatically reproduces the media information input from the media information receiving unit based on the estimation result input from the emotion estimation unit;
    A media information emphasizing playback device having:
  2.  前記感情推定部の前記推定結果が変化したときに、前記メディア情報強調再生部は、前記メディア情報の強調再生を変更する、
     請求項1に記載のメディア情報強調再生装置。
    When the estimation result of the emotion estimation unit changes, the media information emphasis playback unit changes the emphasis playback of the media information.
    The media information emphasizing playback device according to claim 1.
  3.  前記感情推定部は、前記ユーザの感情が、「positive」と「neutral」と「negative」の3感情のいずれかであるかを推定する、
     請求項2に記載のメディア情報強調再生装置。
    The emotion estimating unit estimates whether the user's emotion is one of three emotions: “positive,” “neutral,” and “negative.”
    The media information emphasizing playback device according to claim 2.
  4.  前記感情推定部の前記推定結果が「positive」に変化したときに、前記メディア情報強調再生部は、前記音声の音量を大きくして、前記メディア情報を再生する、
     請求項3に記載のメディア情報強調再生装置。
    When the estimation result of the emotion estimation unit changes to “positive”, the media information emphasis reproduction unit reproduces the media information by increasing the volume of the audio.
    The media information emphasizing playback device according to claim 3.
  5.  前記メディア情報強調再生部は、前記映像にARエフェクトを追加して、前記メディア情報を再生する、
     請求項4に記載のメディア情報強調再生装置。
    The media information emphasis reproduction unit adds an AR effect to the video and reproduces the media information.
    The media information emphasizing playback device according to claim 4.
  6.  前記感情推定部の前記推定結果が「negative」に変化したときに、前記メディア情報強調再生部は、前記音声の音量を小さくして、前記メディア情報を再生する、
     請求項3に記載のメディア情報強調再生装置。
    When the estimation result of the emotion estimation section changes to "negative", the media information emphasis reproduction section reduces the volume of the audio and reproduces the media information.
    The media information emphasizing playback device according to claim 3.
  7.  映像と音声を含むメディア情報を受信するステップと、
     ユーザの視聴中の状態を示す状態情報を取得するステップと、
     前記状態情報に基づき、前記ユーザの視聴中の感情を推定するステップと、
     前記ユーザの視聴中の感情の推定結果に基づき、前記メディア情報を強調再生するステップと、
     を有する、メディア情報強調再生方法。
    receiving media information including video and audio;
    obtaining state information indicating the user's viewing state;
    estimating the user's emotion during viewing based on the state information;
    emphasizing and reproducing the media information based on the estimation result of the user's emotion during viewing;
    A media information emphasizing reproduction method comprising:
  8.  プロセッサと記憶装置を有するコンピュータに、
     請求項1に記載のメディア情報強調再生装置の前記メディア情報受信部と前記ユーザ状態取得部と前記感情推定部と前記メディア情報強調再生部の機能を実行させる、
     メディア情報強調再生プログラム。
    A computer having a processor and a storage device,
    Executing the functions of the media information reception section, the user state acquisition section, the emotion estimation section, and the media information emphasis reproduction section of the media information emphasis playback device according to claim 1;
    Media information emphasis playback program.
PCT/JP2022/033902 2022-09-09 2022-09-09 Media information emphasis playback device, media information emphasis playback method, and media information emphasis playback program WO2024053094A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/033902 WO2024053094A1 (en) 2022-09-09 2022-09-09 Media information emphasis playback device, media information emphasis playback method, and media information emphasis playback program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/033902 WO2024053094A1 (en) 2022-09-09 2022-09-09 Media information emphasis playback device, media information emphasis playback method, and media information emphasis playback program

Publications (1)

Publication Number Publication Date
WO2024053094A1 true WO2024053094A1 (en) 2024-03-14

Family

ID=90192178

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/033902 WO2024053094A1 (en) 2022-09-09 2022-09-09 Media information emphasis playback device, media information emphasis playback method, and media information emphasis playback program

Country Status (1)

Country Link
WO (1) WO2024053094A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110142413A1 (en) * 2009-12-04 2011-06-16 Lg Electronics Inc. Digital data reproducing apparatus and method for controlling the same
WO2016088566A1 (en) * 2014-12-03 2016-06-09 ソニー株式会社 Information processing apparatus, information processing method, and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110142413A1 (en) * 2009-12-04 2011-06-16 Lg Electronics Inc. Digital data reproducing apparatus and method for controlling the same
WO2016088566A1 (en) * 2014-12-03 2016-06-09 ソニー株式会社 Information processing apparatus, information processing method, and program

Similar Documents

Publication Publication Date Title
JP4382786B2 (en) Audio mixdown device, audio mixdown program
JP4052556B2 (en) External device-linked content generation device, method and program thereof
JP5553446B2 (en) Amusement system
WO2022251077A1 (en) Simulating crowd noise for live events through emotional analysis of distributed inputs
Rossetti et al. Live Electronics, Audiovisual Compositions, and Telematic Performance: Collaborations During the Pandemic
WO2024053094A1 (en) Media information emphasis playback device, media information emphasis playback method, and media information emphasis playback program
WO2018008434A1 (en) Musical performance presentation device
JP5459331B2 (en) Post reproduction apparatus and program
JP6196839B2 (en) A communication karaoke system characterized by voice switching processing during communication duets
Chew et al. Segmental tempo analysis of performances in user-centered experiments in the distributed immersive performance project
CN114598917B (en) Display device and audio processing method
WO2022163137A1 (en) Information processing device, information processing method, and program
JPWO2013061389A1 (en) Conference call system, content display system, summary content playback method and program
JP2020014716A (en) Singing support device for music therapy
US20230353800A1 (en) Cheering support method, cheering support apparatus, and program
WO2021246104A1 (en) Control method and control system
JP2014123085A (en) Device, method, and program for further effectively performing and providing body motion and so on to be performed by viewer according to singing in karaoke
JP2018155936A (en) Sound data edition method
JP7468111B2 (en) Playback control method, control system, and program
JP7137278B2 (en) Playback control method, control system, terminal device and program
JP2020008752A (en) Live band karaoke live distribution system
WO2023084933A1 (en) Information processing device, information processing method, and program
WO2024047815A1 (en) Likelihood-of-excitement control method, likelihood-of-excitement control device, and likelihood-of-excitement control method
JP6236807B2 (en) Singing voice evaluation device and singing voice evaluation system
CN112927665B (en) Authoring method, electronic device, and computer-readable storage medium