WO2012070534A1

WO2012070534A1 - Video image and audio output device, and video image and audio output method, as well as television image receiver provided with the video image and audio output device

Info

Publication number: WO2012070534A1
Application number: PCT/JP2011/076814
Authority: WO
Inventors: 忠夫森下
Original assignee: シャープ株式会社
Priority date: 2010-11-26
Filing date: 2011-11-21
Publication date: 2012-05-31

Abstract

The present invention acquires, for each video image to display, a video image signal and an audio signal of the video image from a decoder (35). An audio recognition unit (4) generates caption data on the basis of the audio signal, and a line-of-sight recognition unit (3) assesses a video image that a user is looking at on the basis of a video image recorded by a camera (2). At an image combination unit (18), the video image signals and the caption data of video images other than the assessed video image are combined. The combined video image signals are sent to a display device which is not shown. The video image signal of the assessed video image is sent as-is to the display device which is not shown. At an audio output selection unit (22), only the audio signal of the assessed video image is output to a speaker which is not shown.

Description

VIDEO / AUDIO OUTPUT DEVICE, VIDEO / AUDIO OUTPUT METHOD, AND TELEVISION RECEIVER HAVING THE VIDEO / AUDIO OUTPUT DEVICE

The present invention relates to a video / audio output device capable of simultaneously viewing a plurality of videos, a video / audio output method, and a television receiver including the video / audio output device.

In recent years, there has been an increasing demand among viewers to watch different videos on the same screen while watching broadcast programs. Against this background, many television receivers having a function of displaying a plurality of broadcast programs or video images on the same screen have been proposed.

FIG. 13 shows a display example of a screen when a plurality of videos are displayed on the same screen. As shown in FIG. 13, by dividing the display screen into a plurality of areas, it is possible to display different arbitrary videos in the respective areas. This figure shows an example in which the display screen is divided into four areas. In this way, the viewer can view a plurality of videos at the same time.

However, most television receivers having such a function output only one of the audio of the video and mute the audio of the other video. For example, in the case of FIG. 13, only the audio of the video displayed in the upper left area on the screen is output, and the audio of the video displayed in the other three areas is muted. In this case, it is set so that the viewer can arbitrarily switch the video for outputting the sound. In such a usage method, although a plurality of videos can be viewed at the same time, the sound is muted except for videos that output audio, so the contents are difficult to understand.

Generally, a plurality of videos can be recognized simultaneously, but the number of people who can recognize a plurality of sounds at the same time is very limited. For this reason, it is easy to simultaneously output a plurality of video sounds displayed on the same screen, but it is not useful for the viewer.

Therefore, Patent Document 1 discloses a device for viewing the contents of a plurality of videos displayed on the same screen so that they can be understood. Specifically, in this document, when performing two-screen display, a configuration is disclosed in which audio is output from a built-in speaker on one screen and subtitles are displayed on the other screen instead of outputting audio. According to this configuration, when two-screen display is performed on a single video display device, the contents of a plurality of programs can be viewed at the same time without listening to the sound of one program through headphones or an external device. Is possible.

Japanese Patent Publication “Japanese Patent Laid-Open No. 2007-13725 (published on January 18, 2007)”

However, with the configuration disclosed in Patent Document 1 described above, it cannot be said that it is sufficiently possible to view a plurality of contents in an understandable manner at the same time. This is because the configuration disclosed in this document is based on the premise that the broadcast station creates subtitle data to be displayed on a video other than the video that is outputting audio. In Japan, since most broadcast stations do not create caption data for broadcast programs, even if the technology disclosed in Patent Document 1 is used, captions may not be displayed on video that does not output audio. Many. If subtitles are not displayed, it is difficult to say that a plurality of contents can be viewed in an understandable manner at the same time. Therefore, at present, the technique disclosed in Patent Document 1 is difficult to apply.

In addition, in the configuration disclosed in Patent Document 1, it is necessary to operate a button or the like in order to select a video for outputting audio from a plurality of videos. Such an operation is troublesome for a user who is viewing a broadcast program or the like, and hinders viewing.

Therefore, the present invention has been made in view of the above problems, and the purpose thereof is to create a caption of a video that does not output audio among a plurality of videos displayed on the same screen, It is another object of the present invention to provide a video / audio output device, a video / audio output method, and a television receiver including the video / audio output device, which can easily switch a video for outputting audio.

In order to solve the above-described problem, the video / audio output device according to an aspect of the present invention outputs the video signal of the video to a display unit that displays the video and the audio output unit that outputs the audio of the video. A video / audio output device including an output unit that outputs an audio signal of video, wherein the display unit includes a plurality of display areas for displaying different videos, and the video displayed in each of the display areas A subtitle generation unit that generates first subtitle data for subtitle display from the audio signal, and a first determination unit that determines which display region the user is viewing from among the plurality of display regions The video signal of the video displayed in the display area and the audio signal of the video displayed in the display area for each of the display areas other than the display area determined to be visually recognized by the user Generated from A synthesis unit that generates a synthesized video signal obtained by synthesizing the first caption data, and the output unit outputs the audio signal of the video displayed in the display area determined to be viewed by a user. In addition to outputting to the audio output unit, the video signal of the video displayed in the display area determined to be viewed by the user and the composite video signal of the other display area are output to the display unit. It is characterized by doing.

According to the above configuration, a composite video signal with captions is generated for a display area that is not visually recognized by the user. Then, the composite video signal is sent to the display unit and displayed on the screen. On the other hand, for the display area determined to be visually recognized by the user, the video signal of the video displayed in the display area is sent to the display unit as it is and displayed on the screen. Further, only the audio signal of the video displayed in the display area determined to be visually recognized by the user is sent to the audio output unit, and the audio is output. As a result, only the audio of the video in the display area determined to be viewed by the user is output, and the subtitles are displayed in the video in the other display areas.

As described above, although a plurality of videos are displayed on the display unit, only the audio of the video in the display area determined to be viewed by the user is output, and subtitles are added to the video in other display areas. it's shown. As a result, a plurality of images can be understood simultaneously. In particular, since the video / audio output apparatus generates subtitles to be displayed on other video based on the audio signal of the video, it is possible to display subtitles for any video.

When the display area that the user is viewing is changed, only the audio of the video in the display area that is newly viewed is output, and the subtitles are displayed in the video in the other display areas. Become so. As described above, according to the video / audio output device according to the aspect of the present invention, it is possible to easily switch the video for outputting the audio.
.

In addition, a television receiver according to an aspect of the present invention is characterized by including any one of the video / audio output devices described above in order to solve the above-described problem.

According to the above configuration, it is possible to create a caption of a video that does not output audio among a plurality of videos displayed on the same screen, and it is possible to easily switch a video that outputs audio. A television receiver can be provided.

The video / audio output method according to an aspect of the present invention also includes an audio output unit that outputs a video signal of the video to a display unit that displays the video and outputs the audio of the video in order to solve the above-described problem. A method of outputting an audio signal of the video, wherein the display unit includes a plurality of display areas for displaying different videos, and the audio of the video displayed in each of the display areas In the generation step of generating subtitle data for subtitle display from the signal, the determination step of determining which of the plurality of display regions the user is viewing, and the determination step, For each display region other than the display region determined to be visually recognized, the video signal of the video displayed in the display region and the sound of the video displayed in the display region A synthesizing step for synthesizing the caption data generated from the signal, and outputting the audio signal of the video displayed in the display area determined in the determining step to the audio output unit; An output step of outputting the video signal of the video displayed in the display area determined in the determination step and the composite video signal of another display area to the display unit. It is said.

According to the above method, it is possible to create a caption of a video that does not output audio among a plurality of videos displayed on the same screen, and to easily switch a video that outputs audio. .

Other objects, features, and superior points of the present invention will be fully understood from the following description. The advantages of the present invention will become apparent from the following description with reference to the accompanying drawings.

According to the video / audio output device according to one aspect of the present invention, the display unit displays a plurality of videos, but outputs only the audio of the video in the display area determined to be viewed by the user, Subtitles are displayed on the video in other display areas. As a result, a plurality of images can be understood simultaneously. In particular, since the video / audio output apparatus generates subtitles to be displayed on the video in the other display area based on the audio signal of the video, it is possible to display the subtitles for any video. Further, since only the audio of the video in the display area determined to be viewed by the user is output, it is possible to easily switch the video for outputting the audio.

It is a block diagram which shows the principal part structure of the video / audio output device which concerns on one Embodiment of this invention. In one Embodiment of this invention, it is a figure which shows the example of a display when a some image | video is displayed on the same screen. It is a block diagram which shows the detailed principal part structure of TV receiver concerning one Embodiment of this invention. It is a figure which shows the example of 1 division at the time of dividing | segmenting a display screen into a some area | region. It is a figure which shows the example of 1 division at the time of dividing | segmenting a display screen into a some area | region. It is the figure which expanded the user's eyes. It is a figure which shows the display state of the display screen of TV receiver in the case of FIG. (A) in the figure is a diagram showing a display state of the display screen of the TV receiver when the user is moving his / her line of sight, and (b) in the figure is gazing at one video. (C) is a figure which shows the display state of the display screen of TV receiver when an audio | voice output is switched. It is a figure which shows the outline of the content reproduction apparatus provided with the video / audio output device which concerns on one Embodiment of this invention. It is a block diagram which shows the principal part structure of the video / audio output device which concerns on other embodiment of this invention. It is a block diagram which shows the principal part structure of the video / audio output device which concerns on other embodiment of this invention. It is a figure which shows the image | video displayed on the display screen. It is a figure which shows one display example of the screen at the time of displaying a some image | video on the same screen.

[First Embodiment]
A first embodiment according to the present invention will be described below. In the following description, various limitations preferable for carrying out the present invention are given, but the technical scope of the present invention is not limited to the following embodiments and drawings.

As shown in FIG. 2, the video / audio output device according to the present embodiment divides the display screen 60 into a plurality of display areas, thereby enabling to display different arbitrary videos in the respective display areas. It is. This figure shows an example in which the display screen 60 is divided into four areas. In addition, the video / audio output device according to the present embodiment has a function of outputting only the audio of one of the plurality of videos to be displayed and displaying subtitles for the other videos. Such a video / audio output apparatus can be applied to a content reproduction apparatus such as a television receiver, a personal computer, or a personal computer with a television function. Hereinafter, the case where the video / audio output device according to the present embodiment is applied to a television receiver (hereinafter referred to as a TV receiver) will be described as an example. First, the overall configuration of a TV receiver including the video / audio output device according to the present embodiment will be described, and then the video / audio output device will be described in detail.

(Detailed internal configuration of the TV receiver 50)
First, a detailed configuration of the main part of the TV receiver 50 will be described with reference to FIG. FIG. 3 is a block diagram illustrating a detailed main configuration of the TV receiver 50 according to the present embodiment.

As shown in FIG. 3, the TV receiver 50 has a CPU (Central Processing Unit) 30 and a nonvolatile memory 28 connected to a bus. The operation of the TV receiver 50 is performed by the CPU 30 and the nonvolatile memory. It is controlled by various control programs stored in 28. That is, the TV receiver 50 is controlled by a computer system including the CPU 30, and a program for operating the TV receiver 50 by the computer system is stored in the nonvolatile memory 28.

The non-volatile memory 28 is usually constituted by a RAM (Random Access Memory), but may partially include a ROM (Read Only Memory). Further, a rewritable flash memory or the like may be included. The nonvolatile memory 28 stores an OS (Operating System) or various control software for operating the CPU 30, and programs such as electronic program guide (EPG) data received via broadcast waves. Data related to information, OSD image data necessary for performing OSD (On Screen Display) display, and the like are stored. The non-volatile memory 28 has a work area that functions as a work memory necessary for various control operations.

The TV receiver 50 is provided with an analog tuner unit 11 (reception unit) as well as a digital tuner unit 13 (reception unit), and can receive analog broadcasts. The external input unit 6 (second acquisition unit) includes a hard disk drive (HDD), a solid-state memory such as an SD card, a Blu-ray disc (BD), a DVD (Digital Versatile Disc), or a compact disc (CD). Various external devices 36 such as a disk device can be connected. Further, the TV receiver 50 is provided with an IP (Internet Protocol) broadcast tuner unit 29, and can receive IP broadcasts.

In addition to the above members, the TV receiver 50 includes a camera 2, a line-of-sight recognition unit 3 (first determination unit), a display device 7 (display unit), a speaker 8 (audio output unit), an AV switch unit 12, and a digital demodulation unit. 14, Demultiplexer (DEMUX; DeMultiplexer) 15, Video decode / capture unit 16, Video selector unit 17, Image composition unit 18 (Composition unit), Display control unit 19 (Output unit), Audio decode unit 20, Audio selector unit 21 , An audio output selection unit 22, an audio output control unit 23 (output unit), an EPG / OSD reservation processing unit 24, a remote control light receiving unit 25, a channel selection unit 26, a communication control unit 27, and an addition circuit 37.

The analog tuner unit 11 selects an analog television broadcast signal received via the antenna 9 for receiving an analog broadcast, and selects a channel to be received according to a channel selection instruction from the channel selection unit 26. The received signal from the analog tuner unit 11 is separated into an audio signal and a video signal in the AV switch unit 12 (first acquisition unit), the video signal is input to the video selector unit 17, and the audio signal is input to the audio selector unit. 21 is input.

The digital tuner unit 13 selects a digital television broadcast signal received via the digital broadcast receiving antenna 10 and selects a channel to be received according to a channel selection instruction from the channel selection unit 26. The received signal from the digital tuner unit 10 is demodulated by the digital demodulation unit 14 and sent to the separation unit (DEMUX) 15 (first acquisition unit).

The IP broadcast tuner unit 29 selects an IP broadcast signal received via a communication control unit 27 connected to a telephone line / LAN (Local Area Network) or the like, and responds to a channel selection instruction from the channel selection unit 26. To select a specific IP broadcast to be received. A reception signal from the IP broadcast tuner unit 29 is output to the demultiplexing unit (DEMUX) 15.

The separation unit (DEMUX) 15 separates the multiplexed video signal and audio signal input from the digital demodulation unit 14 or the IP broadcast tuner unit 29, respectively. The separated video signal is sent to the video decoding / capturing unit 16 and the audio signal is sent to the audio decoding unit 20. Further, the separation unit (DEMUX) 15 extracts data such as EPG data included in the broadcast signal and sends the data to the EPG / OSD reservation processing unit 24. The broadcast signal extracted by the separation unit (DEMUX) 15 is recorded in the nonvolatile memory 28 by writing control by the CPU 30 as necessary.

The video decode / capture unit 16 decodes the video signal separated by the separation unit (DEMUX) 15 or captures video information included in the video signal as a still image. The video signal decoded by the video decode / capture unit 16 is sent to the video selector unit 17.

As described above, the video signal from the analog tuner unit 11 is input to the video selector unit 17, and the video signal from the external input unit 6 is also input. The video selector unit 17 selects and outputs one video signal from these input video signals according to a control signal from the CPU 30, and sends it to the image composition unit 18.

The image synthesizing unit 18 synthesizes the input video signal and caption display subtitle data (first subtitle data) generated by a voice recognition unit 4 (caption generating unit) described later. As will be described in detail later, the image synthesis unit 18 synthesizes the video signal of the video that does not output audio and the caption data among the plurality of videos displayed on the display device 7. Further, the image synthesis unit 18 performs video processing such as noise reduction processing, sharpness adjustment, or contrast adjustment on the synthesized video signal (synthesized video signal), for example, to the display device 7. Convert to an optimal video signal. When there is no subtitle data to be synthesized (for example, in the case of a video that outputs audio or when only one video is displayed on the display device 7), the video signal from the video selector unit 17 is video. Processing is performed as it is.

The display controller 19 controls the input video signal to be output to the display device 7 for display. The display control unit 19 can output the video signal by combining the video signal with EPG data or OSD data created by the EPG / OSD reservation processing unit 24 described later. The display device 7 displays the transmitted video signal on the screen.

The audio decoding unit 20 decodes the audio signal separated by the separation unit (DEMUX) 15. The audio decoding unit 20 sends the decoded audio signal to the audio selector unit 21.

The audio selector unit 21 receives the audio signal from the AV switch unit 12, the audio signal from the external input unit 6, and the audio signal from the audio decoding unit 20, and the video selected by the video selector unit 17 under the control of the CPU 30. Select the audio signal corresponding to the signal. The selected voice signal is output to the voice recognition unit 4 and the voice output selection unit 22. Although details will be described later, the voice recognition unit 4 performs processing for generating caption data from the selected voice signal. The generated caption data is output to the image synthesis unit 18 and synthesized with a predetermined video signal. On the other hand, the audio output selection unit 22 selects an audio signal of a video that outputs audio from among a plurality of videos displayed on the display device 7. The selected audio signal is output to the audio output control unit 23, converted to an audio signal optimal for reproduction on the speaker 8 by the audio output control unit 23, and supplied to the speaker 8. If there is only one video to be displayed on the display device 7, the audio output selection unit 22 does not need to select the audio to be output.

The EPG / OSD reservation processing unit 24 creates an electronic program guide based on EPG data periodically updated and stored, and draws OSD data stored in advance in the nonvolatile memory 28. The OSD data is data for drawing various information such as a setting menu screen, a volume cage, a current time, or a channel selection stored in advance in the nonvolatile memory 28, for example.

Also, the EPG / OSD reservation processing unit 24 determines the layout of the display position of the OSD data to be drawn on the display screen of the display device 7 in accordance with an instruction from the CPU 30. The EPG data or OSD data created by the EPG / OSD reservation processing unit 24 is added to the video signal output from the image synthesis unit 18 by the adding circuit 20 and output to the display device 7. The EPG / OSD reservation processing unit 24 also performs program reservation processing using the electronic program guide.

The communication control unit 27 performs control to establish communication via a network such as a telephone network, a LAN, the Internet, or a home network standard such as DLNA (Digital Living Network Alliance). You may connect with another apparatus through a network, or you may connect with a video service through the Internet. The connection with the other device may be either wired or wireless.

The remote control light receiving unit 25 is for receiving an optical signal from the remote controller 5 (hereinafter referred to as the remote controller 5) and receiving a control signal from the remote controller 5. Instructions from the viewer, such as turning on / off the power of the TV receiver 50, increasing / decreasing the volume, and selecting a viewing channel, are performed via the remote controller 5.

The camera 2 records the user (especially the eye 40), and the line-of-sight recognition unit 3 determines the position of the line of sight of the user based on the video recorded by the camera 2 as described in detail later.

Camera 2, separation unit (DEMUX) 15, video code / capture unit 16, audio decoding unit 20, EPG / OSD / reservation processing unit 24, remote control light receiving unit 25, channel selection unit 26, communication control unit 27, nonvolatile memory 28, The IP broadcast tuner unit 29 and the CPU 30 are connected via a bus.

(Configuration of video / audio output device 1)
Below, the detailed structure of the audio-video output apparatus 1 is demonstrated with reference to FIG. FIG. 1 is a block diagram showing a main configuration of the video / audio output apparatus 1.

As shown in FIG. 1, the video / audio output device 1 includes a camera 2, a line-of-sight recognition unit 3, a voice recognition unit 4, an image synthesis unit 18, a voice output selection unit 22, and a decoder 35. The decoder 35 separates and decodes the video signal and the audio signal from the broadcast signal received by the TV receiver 50 or the external device 36. That is, the video signal suitable for video output and the audio signal suitable for audio output are separated from the broadcast signal or the external device 36 such as the above-described separation unit (DEMUX) 15, video selector unit 17, and audio selector unit 21. Various members necessary for the process until generation are included. In this figure, in order to make the figure easy to understand, the above-mentioned various members are collectively used as a decoder 35.

Here, as described above, the video / audio output device 1 is a device that can display a different arbitrary video in each display area by dividing the display screen into a plurality of display areas. Therefore, it is necessary to separate and generate video signals and audio signals from the broadcast signal or the external device 36 by the number of videos to be displayed on the display device 7. Therefore, the video / audio output device 1 includes as many subunits 1 ′ having the audio recognition unit 4 and the decoder 35 as one structural unit as the number of videos to be displayed on the display device 7. That is, when the display device 7 displays four different images, the display device 7 includes four subunits 1 '. In other words, by increasing / decreasing the subunit 1 ', it is possible to increase / decrease the number of videos that can be viewed simultaneously.

Incidentally, when a plurality of broadcast programs are displayed on the display device 7, it is necessary to receive broadcast signals of the broadcast programs as many as the number of broadcast programs to be displayed. Therefore, it is necessary to provide the analog tuner unit 11 and the digital tuner unit 13 as many as the number of broadcast signals to be received. In other words, it is preferable to provide the same number of analog tuner sections 11 and digital tuner sections 13 as the number of subunits 1 ′. The analog tuner unit 11 and the digital tuner unit 13 may be included in the decoder 35.

Note that the video displayed on the display device 7 may be either a digital broadcast program or an analog broadcast program, and further, any of a terrestrial broadcast program, a satellite broadcast program, a satellite broadcast program, and a CATV program. Also good. In addition to the broadcast program, video from the external device 36 may be displayed via the external input unit 6, and there is no particular limitation.

(Processing flow of the video / audio output device 1)
Next, the processing of the video / audio output device 1 will be described with reference to FIG. 1 again. First, it is assumed that a plurality of broadcast programs selected by the user or a plurality of videos of the external device 36 are displayed on the display screen of the TV receiver 50.

Here, as described above, the video / audio output device 1 processes the broadcast signal of the broadcast program to be displayed or the data included in the external device 36 by the decoder 35 of the subunit 1 ′, Obtain video and audio signals. The obtained video signal and audio signal are output to the image synthesis unit 18 and the audio output selection unit 22, respectively, as shown in FIG. At this time, the audio signal from the decoder 35 is also output to the audio recognition unit 4. The voice recognition unit 4 performs processing for generating caption data from the voice signal. More specifically, subtitle data in which a person's conversation and narration in the video are converted into text is generated based on the audio signal. The generated caption data is output to the image composition unit 18.

On the other hand, the camera 2 captures the user, and the line-of-sight recognition unit 3 determines the position of the user's line of sight based on the video recorded by the camera 2. More specifically, the position of the user's line of sight is determined based on the video recorded by the camera 2, and the user visually recognizes any video among the plurality of videos displayed on the screen of the TV receiver 50. It is determined whether it is. The determination result is output to the image synthesis unit 18 and the audio output selection unit 22.

In the image synthesizing unit 18, for a video other than the video determined to be viewed by the user based on the determination result from the line-of-sight recognition unit 3, the video signal of the video and the voice recognition unit 4 The caption data generated from the signal is combined. That is, for a video that is not visually recognized by the user, a video signal with subtitles is generated. The synthesized video signal is sent to the display device 7 via the display control unit 19 and displayed on the screen. On the other hand, for the video determined to be visually recognized by the user, the video signal of the video is sent as it is to the display device 7 via the display control unit 19 and displayed on the screen.

In the audio output selection unit 22, only the audio signal of the video determined to be visually recognized by the user based on the determination result from the line-of-sight recognition unit 3 is sent to the speaker 8 via the audio output control unit 23. , Audio is output. As a result, in the TV receiver 50, only the audio of the video determined to be visually recognized by the user is output, and the subtitles are displayed instead of the audio for the other video.

Thus, in the video / audio output device 1 according to the present embodiment, only the audio of the video determined by the line-of-sight recognition unit 3 as being visually recognized by the user is output. That is, when the user moves his / her line of sight to view another video, the line-of-sight recognition unit 3 determines the position of the user's line of sight and identifies the video that the user has newly viewed. . The information is output to the image synthesizing unit 18 and the audio output selecting unit 22, and the image synthesizing unit 18 regards the video signal of the video and the video for the video other than the video that is newly determined to be visually recognized by the user. Is combined with the subtitle data. The synthesized video signal is sent to the display device 7 via the display control unit 19 and displayed on the screen. On the other hand, for a video that has been newly determined to be visually recognized by the user, the video signal of the video is directly sent to the display device 7 via the display control unit 19 and displayed on the screen. Therefore, the output of the audio of the video that has been determined to be viewed by the user is stopped, and the video with subtitles is displayed. On the other hand, subtitles are not displayed for videos that are newly determined to be visually recognized by the user, and audio is output instead.

As described above, a plurality of videos are displayed on the TV receiver 50, but only the audio of the video determined to be visually recognized by the user is output, and subtitles are displayed on the other videos. . As a result, a plurality of images can be understood simultaneously. In particular, since the video / audio output device 1 generates subtitles to be displayed on other video based on the audio signal of the video, it is possible to display subtitles for any broadcast program or external device 36. is there.

Then, when the video that the user is viewing is changed, only the audio of the video that is newly viewed is output, and the subtitles are displayed on the other video. As described above, according to the video / audio output device 1 according to the present embodiment, it is possible to easily switch the video for outputting the audio.

(Split display screen)
In order to display a plurality of videos on the display screen of the display device 7, it has been described that the display screen is divided into a plurality of regions. However, the division method is not particularly limited. An example in which the display screen is divided into a plurality of regions is shown in FIGS. 4 shows an example in which the display screen 60 is divided into four areas, and FIG. 5 shows an example in which the display screen 60 is divided into seven areas.

As shown in FIG. 4, the display screen 60 may be equally divided into four areas A to D, or as shown in FIG. 5, the display screen 60 is unevenly divided into seven areas A to G. May be. As described above, the display screen 60 can be divided equally, or can be divided according to the size. Various changes can be made to the dividing method. Various variations of the division method may be given to the TV receiver 50 so that the user can select a desired division method.

(Processing of voice recognition unit 4)
As described above, the voice recognition unit 4 generates caption data based on a video audio signal, and a known method can be applied as the generation method. For example, a technique disclosed in JP 2004-80069 A can be used. Specifically, first, the number of types of sound is selected from the sound signal, and noise discrimination is performed. This is because the actual sound is mixed with car sounds, wind sounds, and other noises depending on the video state, and it is difficult to make text unless it is distinguished from human voices. Then, the audio signal is converted into characters and converted into characters. Here, as disclosed in this document, a configuration that discriminates characters, discriminates between men and women, or selects characters to generate caption data is also preferably used in this embodiment.

Using the conventionally known techniques as described above, the audio of the video can be converted into text and displayed as subtitles on the display screen. Here, the technology applicable as the processing method of the speech recognition unit 4 according to the present embodiment is not limited to the above-described processing method, and it goes without saying that other technologies can also be applied.

(Processing of the line-of-sight recognition unit 3)
As described above, the line-of-sight recognition unit 3 determines the position of the user's line of sight, and a known method can be applied as the determination method. For example, the technique disclosed in Japanese Patent Laid-Open No. 2005-100366 can be used. Specifically, in this document, three methods, a direction-specific image correlation method, a black pixel region detection method, and an edge feature point detection method, are disclosed as eye gaze direction detection methods. Below, each detection method is demonstrated easily.

In the image correlation method for each direction and the black pixel region detection method described above, the sections of the display screen are sequentially blinked, and the position of the user's eyeball when the user follows it is recorded, and the direction is determined. As a standard. First, in the image correlation method for each direction, an eye peripheral image is registered as the above-mentioned reference for determining the direction, the eye position of the user is matched with the registered eye peripheral image, and the highest correlation is obtained. The line-of-sight direction is determined from the given eye peripheral image. Further, in the black pixel region detection method, an iris region image including a pupil is registered as a reference for determining the direction, and an image obtained by enlarging the position of the user's eye is matched with the above iris region image. And the line-of-sight direction is determined from the image of the iris region that gives the highest correlation.

On the other hand, in the edge feature point detection method, attention is paid to luminance changes in the iris region, white eye region, and eyelid in the eyeball, and the line-of-sight direction is detected using edge detection. In order to facilitate edge detection, edge detection is performed by a Sobel filter without image enhancement and smoothing by a median filter.

By using a conventionally known technique as described above, it is possible to detect the direction of the user's line of sight and specify the video that the user is viewing. Here, the technology applicable as the processing method of the line-of-sight recognition unit 3 according to the present embodiment is not limited to the detection method described above, and it goes without saying that other technologies can also be applied.

For example, in addition to the detection method described above, there is a method of determining the user's line-of-sight direction. This will be described with reference to FIG. 6 and FIG. FIG. 6 is an enlarged view of the user's eye 40. FIG. 7 is a diagram showing a display state of the display screen 60 of the TV receiver 50 in the case of FIG. In the following description, it is assumed that the display screen 60 of the TV receiver 50 is divided into four areas.

First, the user's eye 40 is extracted from the video recorded by the camera 2. Then, the white eye portion is divided into four equal parts from the extracted outline of the eye 40, and it is determined which of the four regions the black eye portion occupies most. For example, in the case of FIG. 6, the user's black eye occupies the most upper left area. That is, it can be determined that the user is viewing the upper right direction of the display screen 60. As a result, as shown in FIG. 7, the TV receiver 50 outputs only the audio of the video displayed in the upper right area, and the subtitles are displayed in the video displayed in the other areas.

The gaze direction detection technique as described above is generally used for communication of a patient with amyotrophic side sclerosis (ALS). High recognition accuracy is required for such line-of-sight recognition of ALS patients. However, as in the present embodiment, when the position where the user is viewing on the display screen 60 is detected, it is easy if the display device 7 is large, and there are few malfunctions.

(Operation of the line-of-sight recognition unit 3)
While viewing the display screen 60 on which a plurality of videos are displayed, the user also visually recognizes a video other than the video in which sound is output. That is, the user's line of sight is also directed to a video other than the video in which sound is output. In that case, if the audio output is switched every time the user's line of sight is directed to another video, the user is confused and the contents of a plurality of videos cannot be understood at the same time.

Therefore, in the present embodiment, it is preferable that the audio output is switched when the user gazes at a predetermined time for a video that is not output. Hereinafter, the configuration will be described with reference to FIG. (A) in FIG. 8 is a diagram showing a display state of the display screen 60 of the TV receiver 50 when the user moves his / her line of sight. (B) in FIG. 8 is a diagram showing a display state of the display screen 60 of the TV receiver 50 when the user is gazing at one video. (C) in FIG. 8 is a diagram showing a display state of the display screen 60 of the TV receiver 50 when the audio output is switched. In these drawings, it is assumed that the display screen 60 of the TV receiver 50 is divided into four areas (A to D).

Suppose that the audio of the video in the area A on the display screen 60 is output as shown in (a) of FIG. Here, as described above, the line-of-sight recognition unit 3 does not determine that the user's line of sight is directed to another area unless an area other than the area A (areas B to D) is watched for a predetermined time or more. That is, the audio output is not switched even if the video of the regions B to D is visually recognized by moving the line of sight in the directions of the arrows X, Y, and Z in the drawing while the audio of the video of the region A is output. Therefore, while listening to the audio of the video in the area A, it is possible to check what video is displayed in the areas B to D, what type of video is displayed, and the like. .

For example, when it is desired to output the audio of the video in the area B, the area B is watched for a predetermined time or more as shown in (b) of FIG. When the line-of-sight recognition unit 3 detects that the user has watched the area B for a predetermined time or more, the line-of-sight recognition unit 3 determines that the user has newly viewed the area B. Then, the information is output to the image synthesizing unit 18 and the audio output selecting unit 22, and the image synthesizing unit 18 synthesizes the video signal of the video and the subtitle data of the video for the video in the areas A, C, and D. . The synthesized video signal is sent to the display device 7 via the display control unit 19 and displayed on the screen. On the other hand, for the video of the area B that is newly determined to be visually recognized by the user, the video signal of the video is sent as it is to the display device 7 via the display control unit 19 and displayed on the screen. Is done. Therefore, as shown in (c) in FIG. 8, the output of the audio of the video in the area A that has been determined to be visually recognized by the user is stopped, and the video with subtitles is displayed. On the other hand, subtitles are not displayed for the video in the region B that has been newly determined to be visually recognized by the user, and audio is output instead. The predetermined time may be any time and is not particularly limited.

(Application of video / audio output device 1)
In the above, as shown in FIG. 9, the example in which the video / audio output device 1 is applied to the TV receiver 50 which is the content reproduction device 100 has been described, but the present invention is not necessarily limited thereto. For example, a personal computer, a portable information terminal (PDA; Personal Data Assistant), a mobile phone, a personal computer with a television function, or the like can be applied as the content reproduction apparatus 100 described above.

[Second Embodiment]
Hereinafter, a second embodiment according to the present invention will be described. As in the case of the first embodiment, in the following description, various limitations preferable for carrying out the present invention are given, but the technical scope of the present invention is described in the following embodiment and drawings. It is not limited.

(Configuration of video / audio output device 1a)
The video / audio output apparatus according to the present embodiment is characterized by including a selector that switches caption data to be output to the image composition unit. The detailed configuration will be described with reference to FIG. FIG. 10 is a block diagram showing a main configuration of the video / audio output device 1a.

As shown in FIG. 10, the video / audio output device 1 a includes a voice recognition unit 4, an image synthesis unit 18, a voice output selection unit 22, a selector 31, and a decoder 35. The video / audio output device 1 a includes the subunits 1 a ′ having the audio recognition unit 4, the selector 31, and the decoder 35 as one structural unit, as many as the number of videos to be displayed on the display device 7. The video / audio output device 1a also includes a camera 2 and a line-of-sight recognition unit 3, but these members are not shown in the figure. Further, since members other than the selector 31 are the same members as those in the first embodiment, their functions are not mentioned here.

Depending on the broadcast program, the broadcast signal may include subtitle data (second subtitle data) for subtitle display of the broadcast program created by the broadcast station. In this case, the video signal, the audio signal, and the caption data can be obtained by separating the broadcast signal by the decoder 35. Therefore, in the video / audio output device 1a according to the present embodiment, when caption data is included in the received broadcast signal, the decoder 35 separates and generates the video signal, the audio signal, and the caption data, and the caption Data is output to the selector 31. The video signal is output to the image synthesis unit 18, and the audio signal is output to the audio recognition unit 4 and the audio output selection unit 22.

Subtitle data generated by the voice recognition unit 4 is also output to the selector 31. However, when the broadcast signal includes subtitle data, the selector 31 preferentially subtitles data obtained from the broadcast signal to synthesize images. It is set to be used. Therefore, in the selector 31, when the caption data obtained from the broadcast signal and the caption data obtained from the voice recognition unit 4 are input, the former caption data is output to the image composition unit 18. On the other hand, when caption data is not included in the broadcast signal and only caption data obtained from the speech recognition unit 4 is input, the caption data obtained from the speech recognition unit 4 is output to the image composition unit 18. .

In the image composition unit 18, for a video other than the video determined to be viewed by the user based on the determination result from the line-of-sight recognition unit 3, the video signal of the video and the video sent from the selector 31. Is combined with the subtitle data. The synthesized video signal (synthesized video signal) is sent to the display device 7 via the display control unit 19 and displayed on the screen. On the other hand, for the video determined to be visually recognized by the user, the video signal of the video is sent as it is to the display device 7 via the display control unit 19 and displayed on the screen.

In addition, the audio output selection unit 22 sends only the audio signal of the video determined to be visually recognized by the user to the speaker 8 via the audio output control unit 23 based on the determination result from the line-of-sight recognition unit 3. , Audio is output. As a result, in the TV receiver 50, only the audio of the video determined to be viewed by the user is output, and the subtitles are displayed on the other videos.

Thus, when subtitle data is included in the broadcast signal of a broadcast program, the selector 31 is configured to preferentially output the subtitle data obtained from the broadcast signal to the image composition unit 18. That is, in the video / audio output device 1a, when a video other than the video determined to be visually recognized by the user is a broadcast program, and the subtitle data is included in the broadcast signal of the program, the broadcast station ( More precisely, subtitle data generated by a broadcast program creator) is preferentially used. According to this, the subtitle intended by the creator of the broadcast program can be displayed to the user.

[Third Embodiment]
Hereinafter, a third embodiment according to the present invention will be described. As in the case of the first embodiment, in the following description, various limitations preferable for carrying out the present invention are given, but the technical scope of the present invention is described in the following embodiment and drawings. It is not limited.

(Configuration of video / audio output device 1b)
The video / audio output device according to the present embodiment includes a selector that switches subtitle data to be output to the image synthesis unit, and a subtitle recognition unit that controls subtitle display. The detailed configuration will be described with reference to FIG. FIG. 11 is a block diagram showing a main configuration of the video / audio output device 1b.

As shown in FIG. 11, the video / audio output device 1b includes a voice recognition unit 4, an image synthesis unit 18, an audio output selection unit 22, a selector 31, a caption recognition unit 32, and a decoder 35. The video / audio output device 1b includes as many subunits 1b 'having the audio recognition unit 4, the selector 31, the subtitle recognition unit 32, and the decoder 35 as one structural unit for the number of videos to be displayed on the display device 7. The video / audio output device 1b also includes a camera 2 and a line-of-sight recognition unit 3, but these members are not shown in the figure. Further, since members other than the caption recognition unit 32 are the same members as those in the first and second embodiments, their functions are not mentioned here.

Depending on the movie or drama being broadcast on TV, captions may be embedded (included) in the video. That is, subtitles may be part of the video. In this case, when the video signal of the video and the caption data of the video sent from the selector 31 are combined by the image combining unit 18, the caption is displayed twice. This makes subtitles difficult to read and difficult to understand. Therefore, in the video / audio output device 1b according to the present embodiment, the caption recognition unit 32 (second determination unit) determines whether or not captions are embedded in the video of the broadcast program to be displayed. Therefore, when it is determined that captions are embedded in a video other than the video that the user is viewing, the image synthesis unit 18 does not synthesize the caption data from the selector 31 with the video signal of the video.

More specifically, from the received broadcast signal, program information of the broadcast program (information such as program title, genre, program content, and performers), video signal and audio signal (and possibly subtitle data) are sent to the decoder 35. To separate and generate. The program information and the video signal are output to the caption recognition unit 32. The video signal is also output to the image synthesis unit 18, and the audio signal is output to the audio recognition unit 4 and the audio output selection unit 22. Note that when the broadcast signal includes caption data, the caption data is output to the selector 31.

As described above, the subtitle data generated by the voice recognition unit 4 is also output to the selector 31. However, when the subtitle data is included in the broadcast signal, the selector 31 receives the subtitle data obtained from the broadcast signal. It is set to be used preferentially for image composition. Therefore, in the selector 31, when the caption data obtained from the broadcast signal and the caption data obtained from the voice recognition unit 4 are input, the former caption data is output to the image composition unit 18. On the other hand, when caption data is not included in the broadcast signal and only caption data obtained from the speech recognition unit 4 is input, the caption data obtained from the speech recognition unit 4 is output to the image composition unit 18. .

On the other hand, the subtitle recognition unit 32 determines whether the genre of the broadcast program is a movie or a drama based on the input program information. At the same time, based on the input video signal, it is determined whether or not a character string is included in the video of the broadcast program. Specifically, as shown in FIG. 12, character strings are displayed at the left end (region P in the drawing), right end (region Q in the drawing), and lower end (region R in the drawing) on the video display screen 60. Is detected by pattern recognition. If the subtitle recognition unit 32 determines that the genre of the broadcast program is a movie or a drama and that the video of the broadcast program includes a character string, the subtitle is embedded in the video of the broadcast program as a result. judge. Therefore, the caption recognition unit 32 instructs the image composition unit 18 not to combine the caption data from the selector 31 with the video signal of the program.

Thereby, the image synthesizing unit 18 does not synthesize the subtitle data from the selector 31 with the video signal of the program for the broadcast program specified by the subtitle recognizing unit 32 based on the instruction from the subtitle recognizing unit 32. Then, the video signal is sent to the display control unit 19 as it is. The video signal sent to the display control unit 19 is sent to the display device 7 and displayed on the screen. For the video determined by the line-of-sight recognition unit 3 to be viewed by the user, the video signal of the video is sent to the display device 7 as it is via the display control unit 19 and displayed on the screen. For broadcast programs not designated by the caption recognition unit 32, the caption data from the selector 31 is combined with the video signal of the program as usual. The synthesized video signal is sent to the display device 7 via the display control unit 19 and displayed on the screen. Even when the broadcast program designated by the caption recognition unit 32 is an image determined by the line-of-sight recognition unit 3 to be viewed by the user, it is processed as usual.

As described above, when the video to be displayed to the user is a broadcast program, if the genre is a movie or a drama and the video of the broadcast program includes a character string, the video of the broadcast program is displayed. Determines that subtitles are embedded. When the video other than the video determined to be visually recognized by the user is determined as such, the video / audio output device 1b is configured not to synthesize the caption data from the selector 31 with the video signal of the program. . Therefore, in the video / audio output device 1b, when a movie or drama to be broadcasted on TV is displayed on a video other than the video determined to be visually recognized by the user, the video or the video of the drama includes a character string. Since it can be determined that captions are embedded in the video, the caption data from the selector 31 is not displayed on the display screen. According to this, the original subtitles attached to the movie or drama can be displayed.

In the above description, the broadcast program genre is limited to a movie or a drama. However, the genre is not limited to this, and other genres may be included in the limited genre. In the above, when the video to be displayed to the user is a broadcast program, if the genre is a movie or a drama, and the video of the broadcast program includes a character string, the video of the broadcast program However, the means for determining whether or not captions are embedded in the video is not necessarily limited to this.

Furthermore, although the video / audio output device 1b described above has a configuration in which the caption recognition unit 32 is added to the video / audio output device 1a according to the second embodiment, the present invention is not necessarily limited thereto. For example, a configuration in which the caption recognition unit 32 is added to the video / audio output device 1 according to the first embodiment also falls within the scope of the present invention.

The present invention is not limited to the above-described embodiment, and various modifications can be made within the scope shown in the claims. That is, embodiments obtained by combining technical means appropriately modified within the scope of the claims are also included in the technical scope of the present invention.

Incidentally, it goes without saying that the above-described TV receiver 50 can also function as a normal TV receiver that displays only one image on the display screen of the display device 7. Therefore, for example, the configuration may be such that the user can switch between a mode for viewing a plurality of videos and a mode for viewing only one video via the remote controller 5. Further, although the channel selection method by the user has not been described in particular, a conventionally known method can be adopted as the channel selection method. Therefore, for example, the user can select one or a plurality of channels to be viewed via the remote controller 5, or can select a channel by another method.

[Summary of Embodiment]
As described above, in the video / audio output device according to one aspect of the present invention, the video signal and the audio signal of the broadcast program are acquired from the reception unit that receives the broadcast wave and the received broadcast wave. When the broadcast wave includes second subtitle data for subtitle display, the broadcast wave further includes a first acquisition unit that also acquires the second subtitle data, and the synthesizing unit is visually recognized by the user When the program is displayed in the display area other than the display area determined to be, and the first acquisition unit acquires the second subtitle data of the program, the program displayed in the display area is displayed. The composite video signal is generated by synthesizing the video signal and the second subtitle data of the program displayed in the display area instead of the first subtitle data.

According to the above configuration, when the second subtitle data for subtitle display is included in the broadcast wave, the video / audio output apparatus is configured to use the second subtitle data obtained from the broadcast wave preferentially. Has been. Therefore, the video / audio output apparatus preferentially uses the second subtitle data generated by the broadcast station (more precisely, the creator of the broadcast program). According to this, the subtitle intended by the creator of the broadcast program can be displayed to the user.

The video / audio output device according to an aspect of the present invention further includes a second determination unit that determines whether or not a caption is included in the video displayed in each display area, and the synthesis unit Is generated from the video signal of the video displayed in the display area determined to include the subtitle in the video and the audio signal of the video displayed in the display area. The synthesized video signal synthesized with one caption data is not generated, and the output unit outputs the video signal of the video displayed in the display area determined to include the caption in the video. It outputs to the said display part.

According to the above configuration, when subtitles are embedded in a video displayed in a display area other than the display area determined to be visually recognized by the user, the video / audio output device adds the video signal to the video signal. The first subtitle data is not synthesized. According to this, the original subtitles attached to the video can be displayed.

In the video / audio output device according to an aspect of the present invention, the video signal of the program being broadcast, the audio signal, and the genre of the program are received from the receiving unit that receives the broadcast wave, and the received broadcast wave. A first acquisition unit that acquires program information including information, and the second determination unit displays the program in the display region other than the display region determined to be visually recognized by the user In addition, when the genre of the program is a movie or a drama and a character string is included in the video, it is determined that the subtitle is included in the video.

According to the above configuration, when the genre of the program is a movie or a drama and the program video includes a character string, it is determined that subtitles are embedded in the video of the program. To do. As a result, the video / audio output apparatus is configured not to synthesize the first subtitle data or the second subtitle data with the video signal of the video. According to this, the original subtitles attached to the movie or drama can be displayed.

In the video / audio output device according to an aspect of the present invention, the first determination unit has the user's line of sight directed to any one of the display areas for a predetermined time or more. When this is detected, it is determined that the user is viewing the display area.

According to the above configuration, the determination unit does not determine that the user's line of sight is directed to another area unless one of the display areas is determined in advance for a predetermined time. That is, the audio output is not switched even if the visual line is moved and the video of the other display area is visually recognized while the audio of the video of one display area is output. Therefore, while listening to the audio of the video in one display area, confirm what kind of video is being displayed in the other display area, what kind of content is being displayed, etc. Can do.

The video / audio output device according to an aspect of the present invention further includes a camera that records the user, and the first determination unit includes a plurality of display areas based on the video recorded by the camera. It is characterized by determining which display area the user is viewing.

According to the above composition, the display area that is visually recognized by the user can be specified based on the video recording the user, particularly the user's eyeball.

The video / audio output device according to an aspect of the present invention is further characterized by further including a second acquisition unit that acquires the video signal and the audio signal from an externally connected device.

According to the above configuration, video from an externally connected device can also be displayed.

The specific embodiments or examples made in the detailed description section of the invention are merely to clarify the technical contents of the present invention, and are limited to such specific examples and are interpreted in a narrow sense. It should be understood that various modifications may be made within the spirit of the invention and the scope of the following claims.

The video / audio output device according to one embodiment of the present invention can be applied to, for example, a television receiver, a personal computer, a personal digital assistant (PDA), a mobile phone, or a personal computer with a television function. is there.

1, 1a, 1b Video / audio output device 2 Camera 3 Line of sight recognition unit 4 Audio recognition unit 18 Image synthesis unit 22 Audio output selection unit 31 Selector 32 Subtitle recognition unit 35 Decoder 40 Eye 50 Television receiver 60 Display screen 100 Content playback device

Claims

A video / audio output device including an output unit that outputs an audio signal of the video to an audio output unit that outputs the audio of the video and outputs a video signal of the video to a display unit that displays the video,
The display unit includes a plurality of display areas for displaying different images.
A subtitle generating unit that generates first subtitle data for subtitle display from the audio signal of the video displayed in each of the display areas;
A first determination unit that determines which of the plurality of display areas the user is viewing;
For each display area other than the display area determined to be viewed by the user, the video signal of the video displayed in the display area and the audio signal of the video displayed in the display area A combining unit that generates a combined video signal by combining the generated first caption data;
The output unit outputs the audio signal of the video displayed in the display area determined to be visually recognized by the user to the audio output unit, and the display determined to be visually recognized by the user A video / audio output device that outputs the video signal of the video displayed in a region and the synthesized video signal of another display region to the display unit.
A receiver for receiving broadcast waves;
When the video signal and the audio signal of the program being broadcast are acquired from the received broadcast wave, and the second subtitle data for subtitle display is included in the broadcast wave, the second subtitle data is also A first acquisition unit for acquiring,
The synthesizing unit displays the program in the display area other than the display area determined to be viewed by the user, and the first acquisition unit acquires the second subtitle data of the program. The composite video signal is generated by synthesizing the video signal of the program displayed in the display area and the second subtitle data of the program displayed in the display area instead of the first subtitle data. The video / audio output apparatus according to claim 1.
A second determination unit that determines whether or not a caption is included in the video displayed in each display area;
The synthesizing unit is generated from the video signal of the video displayed in the display area determined to include the subtitle in the video and the audio signal of the video displayed in the display area Without generating the synthesized video signal synthesized with the first caption data,
The said output part outputs the said video signal of the said image | video displayed on the said display area determined with the said subtitles being included in the said video | video to the said display part. Video and audio output device.
A receiver for receiving broadcast waves;
A first acquisition unit for acquiring, from the received broadcast wave, program information including the video signal of the program being broadcast, the audio signal, and genre information of the program;
When the program is displayed in the display area other than the display area determined to be visually recognized by the user, the second determination unit is configured such that the genre of the program is a movie or a drama, and 4. The video / audio output apparatus according to claim 3, wherein when the character string is included in the video, it is determined that the subtitle is included in the video.
When the first determination unit detects that the user's line of sight has been directed to any one of the plurality of display areas for a predetermined time, the user visually recognizes the display area. The video / audio output device according to any one of claims 1 to 4, wherein the video / audio output device is determined.
A camera for recording users;
The first determination unit determines which of the plurality of display areas the user is viewing based on video recorded by the camera. The video / audio output device according to claim 5.
7. The video / audio output device according to claim 1, further comprising a second acquisition unit that acquires the video signal and the audio signal from an externally connected device.
A television receiver comprising the video / audio output device according to any one of claims 1 to 7.
A video / audio output method for outputting a video signal of the video to a display unit that displays the video, and outputting an audio signal of the video to an audio output unit that outputs the audio of the video,
The display unit includes a plurality of display areas for displaying different images.
Generating subtitle data for subtitle display from the audio signal of the video displayed in each display area;
A determining step of determining which of the plurality of display areas the user is viewing;
The video signal of the video displayed in the display area and the video displayed in the display area for each display area other than the display area determined to be visually recognized by the user in the determination step Generating a synthesized video signal by synthesizing the caption data generated from the audio signal;
The audio signal of the video displayed in the display area determined in the determination step is output to the audio output unit, and the video signal of the video displayed in the display area determined in the determination step And an output step of outputting the synthesized video signal of the other display area to the display unit.