WO2012070534A1 - Video image and audio output device, and video image and audio output method, as well as television image receiver provided with the video image and audio output device - Google Patents

Video image and audio output device, and video image and audio output method, as well as television image receiver provided with the video image and audio output device Download PDF

Info

Publication number
WO2012070534A1
WO2012070534A1 PCT/JP2011/076814 JP2011076814W WO2012070534A1 WO 2012070534 A1 WO2012070534 A1 WO 2012070534A1 JP 2011076814 W JP2011076814 W JP 2011076814W WO 2012070534 A1 WO2012070534 A1 WO 2012070534A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
unit
audio
display
signal
Prior art date
Application number
PCT/JP2011/076814
Other languages
French (fr)
Japanese (ja)
Inventor
忠夫 森下
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Publication of WO2012070534A1 publication Critical patent/WO2012070534A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • H04N21/4314Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for fitting data in a restricted space on the screen, e.g. EPG data in a rectangular grid
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Definitions

  • the present invention relates to a video / audio output device capable of simultaneously viewing a plurality of videos, a video / audio output method, and a television receiver including the video / audio output device.
  • FIG. 13 shows a display example of a screen when a plurality of videos are displayed on the same screen. As shown in FIG. 13, by dividing the display screen into a plurality of areas, it is possible to display different arbitrary videos in the respective areas. This figure shows an example in which the display screen is divided into four areas. In this way, the viewer can view a plurality of videos at the same time.
  • a plurality of videos can be recognized simultaneously, but the number of people who can recognize a plurality of sounds at the same time is very limited. For this reason, it is easy to simultaneously output a plurality of video sounds displayed on the same screen, but it is not useful for the viewer.
  • Patent Document 1 discloses a device for viewing the contents of a plurality of videos displayed on the same screen so that they can be understood.
  • a configuration is disclosed in which audio is output from a built-in speaker on one screen and subtitles are displayed on the other screen instead of outputting audio.
  • the contents of a plurality of programs can be viewed at the same time without listening to the sound of one program through headphones or an external device. Is possible.
  • Patent Document 1 it is necessary to operate a button or the like in order to select a video for outputting audio from a plurality of videos. Such an operation is troublesome for a user who is viewing a broadcast program or the like, and hinders viewing.
  • the present invention has been made in view of the above problems, and the purpose thereof is to create a caption of a video that does not output audio among a plurality of videos displayed on the same screen, It is another object of the present invention to provide a video / audio output device, a video / audio output method, and a television receiver including the video / audio output device, which can easily switch a video for outputting audio.
  • the video / audio output device outputs the video signal of the video to a display unit that displays the video and the audio output unit that outputs the audio of the video.
  • a video / audio output device including an output unit that outputs an audio signal of video, wherein the display unit includes a plurality of display areas for displaying different videos, and the video displayed in each of the display areas
  • a subtitle generation unit that generates first subtitle data for subtitle display from the audio signal, and a first determination unit that determines which display region the user is viewing from among the plurality of display regions
  • the video signal of the video displayed in the display area and the audio signal of the video displayed in the display area for each of the display areas other than the display area determined to be visually recognized by the user Generated from A synthesis unit that generates a synthesized video signal obtained by synthesizing the first caption data, and the output unit outputs the audio signal of the video displayed in the display area determined to be viewed by a user.
  • a composite video signal with captions is generated for a display area that is not visually recognized by the user. Then, the composite video signal is sent to the display unit and displayed on the screen. On the other hand, for the display area determined to be visually recognized by the user, the video signal of the video displayed in the display area is sent to the display unit as it is and displayed on the screen. Further, only the audio signal of the video displayed in the display area determined to be visually recognized by the user is sent to the audio output unit, and the audio is output. As a result, only the audio of the video in the display area determined to be viewed by the user is output, and the subtitles are displayed in the video in the other display areas.
  • a television receiver is characterized by including any one of the video / audio output devices described above in order to solve the above-described problem.
  • a television receiver can be provided.
  • the video / audio output method also includes an audio output unit that outputs a video signal of the video to a display unit that displays the video and outputs the audio of the video in order to solve the above-described problem.
  • a method of outputting an audio signal of the video wherein the display unit includes a plurality of display areas for displaying different videos, and the audio of the video displayed in each of the display areas
  • the generation step of generating subtitle data for subtitle display from the signal the determination step of determining which of the plurality of display regions the user is viewing, and the determination step, For each display region other than the display region determined to be visually recognized, the video signal of the video displayed in the display region and the sound of the video displayed in the display region
  • a synthesizing step for synthesizing the caption data generated from the signal, and outputting the audio signal of the video displayed in the display area determined in the determining step to the audio output unit;
  • the display unit displays a plurality of videos, but outputs only the audio of the video in the display area determined to be viewed by the user, Subtitles are displayed on the video in other display areas.
  • a plurality of images can be understood simultaneously.
  • the video / audio output apparatus since the video / audio output apparatus generates subtitles to be displayed on the video in the other display area based on the audio signal of the video, it is possible to display the subtitles for any video. Further, since only the audio of the video in the display area determined to be viewed by the user is output, it is possible to easily switch the video for outputting the audio.
  • FIG. (A) in the figure is a diagram showing a display state of the display screen of the TV receiver when the user is moving his / her line of sight, and (b) in the figure is gazing at one video.
  • (C) is a figure which shows the display state of the display screen of TV receiver when an audio
  • the video / audio output device divides the display screen 60 into a plurality of display areas, thereby enabling to display different arbitrary videos in the respective display areas. It is.
  • This figure shows an example in which the display screen 60 is divided into four areas.
  • the video / audio output device has a function of outputting only the audio of one of the plurality of videos to be displayed and displaying subtitles for the other videos.
  • a content reproduction apparatus such as a television receiver, a personal computer, or a personal computer with a television function.
  • the video / audio output device according to the present embodiment is applied to a television receiver (hereinafter referred to as a TV receiver) will be described as an example.
  • a TV receiver a television receiver
  • the overall configuration of a TV receiver including the video / audio output device according to the present embodiment will be described, and then the video / audio output device will be described in detail.
  • FIG. 3 is a block diagram illustrating a detailed main configuration of the TV receiver 50 according to the present embodiment.
  • the TV receiver 50 has a CPU (Central Processing Unit) 30 and a nonvolatile memory 28 connected to a bus.
  • the operation of the TV receiver 50 is performed by the CPU 30 and the nonvolatile memory. It is controlled by various control programs stored in 28. That is, the TV receiver 50 is controlled by a computer system including the CPU 30, and a program for operating the TV receiver 50 by the computer system is stored in the nonvolatile memory 28.
  • a computer system including the CPU 30, and a program for operating the TV receiver 50 by the computer system is stored in the nonvolatile memory 28.
  • the non-volatile memory 28 is usually constituted by a RAM (Random Access Memory), but may partially include a ROM (Read Only Memory). Further, a rewritable flash memory or the like may be included.
  • the nonvolatile memory 28 stores an OS (Operating System) or various control software for operating the CPU 30, and programs such as electronic program guide (EPG) data received via broadcast waves. Data related to information, OSD image data necessary for performing OSD (On Screen Display) display, and the like are stored.
  • the non-volatile memory 28 has a work area that functions as a work memory necessary for various control operations.
  • the TV receiver 50 is provided with an analog tuner unit 11 (reception unit) as well as a digital tuner unit 13 (reception unit), and can receive analog broadcasts.
  • the external input unit 6 includes a hard disk drive (HDD), a solid-state memory such as an SD card, a Blu-ray disc (BD), a DVD (Digital Versatile Disc), or a compact disc (CD).
  • Various external devices 36 such as a disk device can be connected.
  • the TV receiver 50 is provided with an IP (Internet Protocol) broadcast tuner unit 29, and can receive IP broadcasts.
  • IP Internet Protocol
  • the TV receiver 50 includes a camera 2, a line-of-sight recognition unit 3 (first determination unit), a display device 7 (display unit), a speaker 8 (audio output unit), an AV switch unit 12, and a digital demodulation unit. 14, Demultiplexer (DEMUX; DeMultiplexer) 15, Video decode / capture unit 16, Video selector unit 17, Image composition unit 18 (Composition unit), Display control unit 19 (Output unit), Audio decode unit 20, Audio selector unit 21 , An audio output selection unit 22, an audio output control unit 23 (output unit), an EPG / OSD reservation processing unit 24, a remote control light receiving unit 25, a channel selection unit 26, a communication control unit 27, and an addition circuit 37.
  • Demultiplexer DEMUX; DeMultiplexer
  • Video decode / capture unit 16 Video selector unit 17, Image composition unit 18 (Composition unit), Display control unit 19 (Output unit), Audio decode unit 20, Audio selector unit 21 ,
  • the analog tuner unit 11 selects an analog television broadcast signal received via the antenna 9 for receiving an analog broadcast, and selects a channel to be received according to a channel selection instruction from the channel selection unit 26.
  • the received signal from the analog tuner unit 11 is separated into an audio signal and a video signal in the AV switch unit 12 (first acquisition unit), the video signal is input to the video selector unit 17, and the audio signal is input to the audio selector unit. 21 is input.
  • the digital tuner unit 13 selects a digital television broadcast signal received via the digital broadcast receiving antenna 10 and selects a channel to be received according to a channel selection instruction from the channel selection unit 26.
  • the received signal from the digital tuner unit 10 is demodulated by the digital demodulation unit 14 and sent to the separation unit (DEMUX) 15 (first acquisition unit).
  • DEMUX separation unit
  • the IP broadcast tuner unit 29 selects an IP broadcast signal received via a communication control unit 27 connected to a telephone line / LAN (Local Area Network) or the like, and responds to a channel selection instruction from the channel selection unit 26. To select a specific IP broadcast to be received. A reception signal from the IP broadcast tuner unit 29 is output to the demultiplexing unit (DEMUX) 15.
  • DEMUX demultiplexing unit
  • the separation unit (DEMUX) 15 separates the multiplexed video signal and audio signal input from the digital demodulation unit 14 or the IP broadcast tuner unit 29, respectively.
  • the separated video signal is sent to the video decoding / capturing unit 16 and the audio signal is sent to the audio decoding unit 20.
  • the separation unit (DEMUX) 15 extracts data such as EPG data included in the broadcast signal and sends the data to the EPG / OSD reservation processing unit 24.
  • the broadcast signal extracted by the separation unit (DEMUX) 15 is recorded in the nonvolatile memory 28 by writing control by the CPU 30 as necessary.
  • the video decode / capture unit 16 decodes the video signal separated by the separation unit (DEMUX) 15 or captures video information included in the video signal as a still image.
  • the video signal decoded by the video decode / capture unit 16 is sent to the video selector unit 17.
  • the video signal from the analog tuner unit 11 is input to the video selector unit 17, and the video signal from the external input unit 6 is also input.
  • the video selector unit 17 selects and outputs one video signal from these input video signals according to a control signal from the CPU 30, and sends it to the image composition unit 18.
  • the image synthesizing unit 18 synthesizes the input video signal and caption display subtitle data (first subtitle data) generated by a voice recognition unit 4 (caption generating unit) described later. As will be described in detail later, the image synthesis unit 18 synthesizes the video signal of the video that does not output audio and the caption data among the plurality of videos displayed on the display device 7. Further, the image synthesis unit 18 performs video processing such as noise reduction processing, sharpness adjustment, or contrast adjustment on the synthesized video signal (synthesized video signal), for example, to the display device 7. Convert to an optimal video signal. When there is no subtitle data to be synthesized (for example, in the case of a video that outputs audio or when only one video is displayed on the display device 7), the video signal from the video selector unit 17 is video. Processing is performed as it is.
  • the display controller 19 controls the input video signal to be output to the display device 7 for display.
  • the display control unit 19 can output the video signal by combining the video signal with EPG data or OSD data created by the EPG / OSD reservation processing unit 24 described later.
  • the display device 7 displays the transmitted video signal on the screen.
  • the audio decoding unit 20 decodes the audio signal separated by the separation unit (DEMUX) 15.
  • the audio decoding unit 20 sends the decoded audio signal to the audio selector unit 21.
  • the audio selector unit 21 receives the audio signal from the AV switch unit 12, the audio signal from the external input unit 6, and the audio signal from the audio decoding unit 20, and the video selected by the video selector unit 17 under the control of the CPU 30. Select the audio signal corresponding to the signal.
  • the selected voice signal is output to the voice recognition unit 4 and the voice output selection unit 22.
  • the voice recognition unit 4 performs processing for generating caption data from the selected voice signal.
  • the generated caption data is output to the image synthesis unit 18 and synthesized with a predetermined video signal.
  • the audio output selection unit 22 selects an audio signal of a video that outputs audio from among a plurality of videos displayed on the display device 7.
  • the selected audio signal is output to the audio output control unit 23, converted to an audio signal optimal for reproduction on the speaker 8 by the audio output control unit 23, and supplied to the speaker 8. If there is only one video to be displayed on the display device 7, the audio output selection unit 22 does not need to select the audio to be output.
  • the EPG / OSD reservation processing unit 24 creates an electronic program guide based on EPG data periodically updated and stored, and draws OSD data stored in advance in the nonvolatile memory 28.
  • the OSD data is data for drawing various information such as a setting menu screen, a volume cage, a current time, or a channel selection stored in advance in the nonvolatile memory 28, for example.
  • the EPG / OSD reservation processing unit 24 determines the layout of the display position of the OSD data to be drawn on the display screen of the display device 7 in accordance with an instruction from the CPU 30.
  • the EPG data or OSD data created by the EPG / OSD reservation processing unit 24 is added to the video signal output from the image synthesis unit 18 by the adding circuit 20 and output to the display device 7.
  • the EPG / OSD reservation processing unit 24 also performs program reservation processing using the electronic program guide.
  • the communication control unit 27 performs control to establish communication via a network such as a telephone network, a LAN, the Internet, or a home network standard such as DLNA (Digital Living Network Alliance). You may connect with another apparatus through a network, or you may connect with a video service through the Internet. The connection with the other device may be either wired or wireless.
  • a network such as a telephone network, a LAN, the Internet, or a home network standard such as DLNA (Digital Living Network Alliance).
  • the connection with the other device may be either wired or wireless.
  • the remote control light receiving unit 25 is for receiving an optical signal from the remote controller 5 (hereinafter referred to as the remote controller 5) and receiving a control signal from the remote controller 5. Instructions from the viewer, such as turning on / off the power of the TV receiver 50, increasing / decreasing the volume, and selecting a viewing channel, are performed via the remote controller 5.
  • the camera 2 records the user (especially the eye 40), and the line-of-sight recognition unit 3 determines the position of the line of sight of the user based on the video recorded by the camera 2 as described in detail later.
  • Camera 2 separation unit (DEMUX) 15, video code / capture unit 16, audio decoding unit 20, EPG / OSD / reservation processing unit 24, remote control light receiving unit 25, channel selection unit 26, communication control unit 27, nonvolatile memory 28,
  • the IP broadcast tuner unit 29 and the CPU 30 are connected via a bus.
  • FIG. 1 is a block diagram showing a main configuration of the video / audio output apparatus 1.
  • the video / audio output device 1 includes a camera 2, a line-of-sight recognition unit 3, a voice recognition unit 4, an image synthesis unit 18, a voice output selection unit 22, and a decoder 35.
  • the decoder 35 separates and decodes the video signal and the audio signal from the broadcast signal received by the TV receiver 50 or the external device 36. That is, the video signal suitable for video output and the audio signal suitable for audio output are separated from the broadcast signal or the external device 36 such as the above-described separation unit (DEMUX) 15, video selector unit 17, and audio selector unit 21.
  • DEMUX separation unit
  • Various members necessary for the process until generation are included. In this figure, in order to make the figure easy to understand, the above-mentioned various members are collectively used as a decoder 35.
  • the video / audio output device 1 is a device that can display a different arbitrary video in each display area by dividing the display screen into a plurality of display areas. Therefore, it is necessary to separate and generate video signals and audio signals from the broadcast signal or the external device 36 by the number of videos to be displayed on the display device 7. Therefore, the video / audio output device 1 includes as many subunits 1 ′ having the audio recognition unit 4 and the decoder 35 as one structural unit as the number of videos to be displayed on the display device 7. That is, when the display device 7 displays four different images, the display device 7 includes four subunits 1 '. In other words, by increasing / decreasing the subunit 1 ', it is possible to increase / decrease the number of videos that can be viewed simultaneously.
  • the analog tuner unit 11 and the digital tuner unit 13 may be included in the decoder 35.
  • the video displayed on the display device 7 may be either a digital broadcast program or an analog broadcast program, and further, any of a terrestrial broadcast program, a satellite broadcast program, a satellite broadcast program, and a CATV program. Also good.
  • video from the external device 36 may be displayed via the external input unit 6, and there is no particular limitation.
  • the video / audio output device 1 processes the broadcast signal of the broadcast program to be displayed or the data included in the external device 36 by the decoder 35 of the subunit 1 ′, Obtain video and audio signals.
  • the obtained video signal and audio signal are output to the image synthesis unit 18 and the audio output selection unit 22, respectively, as shown in FIG.
  • the audio signal from the decoder 35 is also output to the audio recognition unit 4.
  • the voice recognition unit 4 performs processing for generating caption data from the voice signal. More specifically, subtitle data in which a person's conversation and narration in the video are converted into text is generated based on the audio signal.
  • the generated caption data is output to the image composition unit 18.
  • the camera 2 captures the user, and the line-of-sight recognition unit 3 determines the position of the user's line of sight based on the video recorded by the camera 2. More specifically, the position of the user's line of sight is determined based on the video recorded by the camera 2, and the user visually recognizes any video among the plurality of videos displayed on the screen of the TV receiver 50. It is determined whether it is. The determination result is output to the image synthesis unit 18 and the audio output selection unit 22.
  • the video signal of the video and the voice recognition unit 4 The caption data generated from the signal is combined. That is, for a video that is not visually recognized by the user, a video signal with subtitles is generated.
  • the synthesized video signal is sent to the display device 7 via the display control unit 19 and displayed on the screen.
  • the video signal of the video is sent as it is to the display device 7 via the display control unit 19 and displayed on the screen.
  • the audio output selection unit 22 only the audio signal of the video determined to be visually recognized by the user based on the determination result from the line-of-sight recognition unit 3 is sent to the speaker 8 via the audio output control unit 23. , Audio is output. As a result, in the TV receiver 50, only the audio of the video determined to be visually recognized by the user is output, and the subtitles are displayed instead of the audio for the other video.
  • the video / audio output device 1 only the audio of the video determined by the line-of-sight recognition unit 3 as being visually recognized by the user is output. That is, when the user moves his / her line of sight to view another video, the line-of-sight recognition unit 3 determines the position of the user's line of sight and identifies the video that the user has newly viewed. .
  • the information is output to the image synthesizing unit 18 and the audio output selecting unit 22, and the image synthesizing unit 18 regards the video signal of the video and the video for the video other than the video that is newly determined to be visually recognized by the user. Is combined with the subtitle data.
  • the synthesized video signal is sent to the display device 7 via the display control unit 19 and displayed on the screen.
  • the video signal of the video is directly sent to the display device 7 via the display control unit 19 and displayed on the screen. Therefore, the output of the audio of the video that has been determined to be viewed by the user is stopped, and the video with subtitles is displayed.
  • subtitles are not displayed for videos that are newly determined to be visually recognized by the user, and audio is output instead.
  • a plurality of videos are displayed on the TV receiver 50, but only the audio of the video determined to be visually recognized by the user is output, and subtitles are displayed on the other videos. .
  • a plurality of images can be understood simultaneously.
  • the video / audio output device 1 since the video / audio output device 1 generates subtitles to be displayed on other video based on the audio signal of the video, it is possible to display subtitles for any broadcast program or external device 36. is there.
  • the video / audio output device 1 it is possible to easily switch the video for outputting the audio.
  • FIGS. 4 shows an example in which the display screen 60 is divided into four areas
  • FIG. 5 shows an example in which the display screen 60 is divided into seven areas.
  • the display screen 60 may be equally divided into four areas A to D, or as shown in FIG. 5, the display screen 60 is unevenly divided into seven areas A to G. May be. As described above, the display screen 60 can be divided equally, or can be divided according to the size. Various changes can be made to the dividing method. Various variations of the division method may be given to the TV receiver 50 so that the user can select a desired division method.
  • the voice recognition unit 4 generates caption data based on a video audio signal, and a known method can be applied as the generation method.
  • a technique disclosed in JP 2004-80069 A can be used. Specifically, first, the number of types of sound is selected from the sound signal, and noise discrimination is performed. This is because the actual sound is mixed with car sounds, wind sounds, and other noises depending on the video state, and it is difficult to make text unless it is distinguished from human voices. Then, the audio signal is converted into characters and converted into characters.
  • a configuration that discriminates characters, discriminates between men and women, or selects characters to generate caption data is also preferably used in this embodiment.
  • the audio of the video can be converted into text and displayed as subtitles on the display screen.
  • the technology applicable as the processing method of the speech recognition unit 4 according to the present embodiment is not limited to the above-described processing method, and it goes without saying that other technologies can also be applied.
  • the line-of-sight recognition unit 3 determines the position of the user's line of sight, and a known method can be applied as the determination method.
  • a known method can be applied as the determination method.
  • the technique disclosed in Japanese Patent Laid-Open No. 2005-100366 can be used.
  • three methods, a direction-specific image correlation method, a black pixel region detection method, and an edge feature point detection method are disclosed as eye gaze direction detection methods. Below, each detection method is demonstrated easily.
  • the sections of the display screen are sequentially blinked, and the position of the user's eyeball when the user follows it is recorded, and the direction is determined.
  • an eye peripheral image is registered as the above-mentioned reference for determining the direction, the eye position of the user is matched with the registered eye peripheral image, and the highest correlation is obtained.
  • the line-of-sight direction is determined from the given eye peripheral image.
  • an iris region image including a pupil is registered as a reference for determining the direction, and an image obtained by enlarging the position of the user's eye is matched with the above iris region image. And the line-of-sight direction is determined from the image of the iris region that gives the highest correlation.
  • edge detection attention is paid to luminance changes in the iris region, white eye region, and eyelid in the eyeball, and the line-of-sight direction is detected using edge detection.
  • edge detection is performed by a Sobel filter without image enhancement and smoothing by a median filter.
  • the technology applicable as the processing method of the line-of-sight recognition unit 3 according to the present embodiment is not limited to the detection method described above, and it goes without saying that other technologies can also be applied.
  • FIG. 6 is an enlarged view of the user's eye 40.
  • FIG. 7 is a diagram showing a display state of the display screen 60 of the TV receiver 50 in the case of FIG. In the following description, it is assumed that the display screen 60 of the TV receiver 50 is divided into four areas.
  • the user's eye 40 is extracted from the video recorded by the camera 2. Then, the white eye portion is divided into four equal parts from the extracted outline of the eye 40, and it is determined which of the four regions the black eye portion occupies most. For example, in the case of FIG. 6, the user's black eye occupies the most upper left area. That is, it can be determined that the user is viewing the upper right direction of the display screen 60. As a result, as shown in FIG. 7, the TV receiver 50 outputs only the audio of the video displayed in the upper right area, and the subtitles are displayed in the video displayed in the other areas.
  • the gaze direction detection technique as described above is generally used for communication of a patient with amyotrophic side sclerosis (ALS). High recognition accuracy is required for such line-of-sight recognition of ALS patients.
  • ALS amyotrophic side sclerosis
  • FIG. 8 is a diagram showing a display state of the display screen 60 of the TV receiver 50 when the user moves his / her line of sight.
  • B in FIG. 8 is a diagram showing a display state of the display screen 60 of the TV receiver 50 when the user is gazing at one video.
  • C in FIG. 8 is a diagram showing a display state of the display screen 60 of the TV receiver 50 when the audio output is switched.
  • the display screen 60 of the TV receiver 50 is divided into four areas (A to D).
  • the line-of-sight recognition unit 3 does not determine that the user's line of sight is directed to another area unless an area other than the area A (areas B to D) is watched for a predetermined time or more. That is, the audio output is not switched even if the video of the regions B to D is visually recognized by moving the line of sight in the directions of the arrows X, Y, and Z in the drawing while the audio of the video of the region A is output. Therefore, while listening to the audio of the video in the area A, it is possible to check what video is displayed in the areas B to D, what type of video is displayed, and the like. .
  • the area B is watched for a predetermined time or more as shown in (b) of FIG.
  • the line-of-sight recognition unit 3 detects that the user has watched the area B for a predetermined time or more
  • the line-of-sight recognition unit 3 determines that the user has newly viewed the area B.
  • the information is output to the image synthesizing unit 18 and the audio output selecting unit 22, and the image synthesizing unit 18 synthesizes the video signal of the video and the subtitle data of the video for the video in the areas A, C, and D.
  • the synthesized video signal is sent to the display device 7 via the display control unit 19 and displayed on the screen.
  • the video signal of the video is sent as it is to the display device 7 via the display control unit 19 and displayed on the screen. Is done. Therefore, as shown in (c) in FIG. 8, the output of the audio of the video in the area A that has been determined to be visually recognized by the user is stopped, and the video with subtitles is displayed. On the other hand, subtitles are not displayed for the video in the region B that has been newly determined to be visually recognized by the user, and audio is output instead.
  • the predetermined time may be any time and is not particularly limited.
  • FIG. 10 is a block diagram showing a main configuration of the video / audio output device 1a.
  • the video / audio output device 1 a includes a voice recognition unit 4, an image synthesis unit 18, a voice output selection unit 22, a selector 31, and a decoder 35.
  • the video / audio output device 1 a includes the subunits 1 a ′ having the audio recognition unit 4, the selector 31, and the decoder 35 as one structural unit, as many as the number of videos to be displayed on the display device 7.
  • the video / audio output device 1a also includes a camera 2 and a line-of-sight recognition unit 3, but these members are not shown in the figure. Further, since members other than the selector 31 are the same members as those in the first embodiment, their functions are not mentioned here.
  • the broadcast signal may include subtitle data (second subtitle data) for subtitle display of the broadcast program created by the broadcast station.
  • the video signal, the audio signal, and the caption data can be obtained by separating the broadcast signal by the decoder 35. Therefore, in the video / audio output device 1a according to the present embodiment, when caption data is included in the received broadcast signal, the decoder 35 separates and generates the video signal, the audio signal, and the caption data, and the caption Data is output to the selector 31.
  • the video signal is output to the image synthesis unit 18, and the audio signal is output to the audio recognition unit 4 and the audio output selection unit 22.
  • Subtitle data generated by the voice recognition unit 4 is also output to the selector 31.
  • the selector 31 preferentially subtitles data obtained from the broadcast signal to synthesize images. It is set to be used. Therefore, in the selector 31, when the caption data obtained from the broadcast signal and the caption data obtained from the voice recognition unit 4 are input, the former caption data is output to the image composition unit 18. On the other hand, when caption data is not included in the broadcast signal and only caption data obtained from the speech recognition unit 4 is input, the caption data obtained from the speech recognition unit 4 is output to the image composition unit 18. .
  • the video signal of the video and the video sent from the selector 31. Is combined with the subtitle data.
  • the synthesized video signal (synthesized video signal) is sent to the display device 7 via the display control unit 19 and displayed on the screen.
  • the video signal of the video is sent as it is to the display device 7 via the display control unit 19 and displayed on the screen.
  • the audio output selection unit 22 sends only the audio signal of the video determined to be visually recognized by the user to the speaker 8 via the audio output control unit 23 based on the determination result from the line-of-sight recognition unit 3. , Audio is output.
  • the TV receiver 50 only the audio of the video determined to be viewed by the user is output, and the subtitles are displayed on the other videos.
  • the selector 31 is configured to preferentially output the subtitle data obtained from the broadcast signal to the image composition unit 18. That is, in the video / audio output device 1a, when a video other than the video determined to be visually recognized by the user is a broadcast program, and the subtitle data is included in the broadcast signal of the program, the broadcast station ( More precisely, subtitle data generated by a broadcast program creator) is preferentially used. According to this, the subtitle intended by the creator of the broadcast program can be displayed to the user.
  • the video / audio output device includes a selector that switches subtitle data to be output to the image synthesis unit, and a subtitle recognition unit that controls subtitle display.
  • the detailed configuration will be described with reference to FIG.
  • FIG. 11 is a block diagram showing a main configuration of the video / audio output device 1b.
  • the video / audio output device 1b includes a voice recognition unit 4, an image synthesis unit 18, an audio output selection unit 22, a selector 31, a caption recognition unit 32, and a decoder 35.
  • the video / audio output device 1b includes as many subunits 1b 'having the audio recognition unit 4, the selector 31, the subtitle recognition unit 32, and the decoder 35 as one structural unit for the number of videos to be displayed on the display device 7.
  • the video / audio output device 1b also includes a camera 2 and a line-of-sight recognition unit 3, but these members are not shown in the figure. Further, since members other than the caption recognition unit 32 are the same members as those in the first and second embodiments, their functions are not mentioned here.
  • captions may be embedded (included) in the video. That is, subtitles may be part of the video.
  • the caption recognition unit 32 determines whether or not captions are embedded in the video of the broadcast program to be displayed. Therefore, when it is determined that captions are embedded in a video other than the video that the user is viewing, the image synthesis unit 18 does not synthesize the caption data from the selector 31 with the video signal of the video.
  • program information of the broadcast program (information such as program title, genre, program content, and performers), video signal and audio signal (and possibly subtitle data) are sent to the decoder 35.
  • the program information and the video signal are output to the caption recognition unit 32.
  • the video signal is also output to the image synthesis unit 18, and the audio signal is output to the audio recognition unit 4 and the audio output selection unit 22. Note that when the broadcast signal includes caption data, the caption data is output to the selector 31.
  • the subtitle data generated by the voice recognition unit 4 is also output to the selector 31.
  • the selector 31 receives the subtitle data obtained from the broadcast signal. It is set to be used preferentially for image composition. Therefore, in the selector 31, when the caption data obtained from the broadcast signal and the caption data obtained from the voice recognition unit 4 are input, the former caption data is output to the image composition unit 18. On the other hand, when caption data is not included in the broadcast signal and only caption data obtained from the speech recognition unit 4 is input, the caption data obtained from the speech recognition unit 4 is output to the image composition unit 18. .
  • the subtitle recognition unit 32 determines whether the genre of the broadcast program is a movie or a drama based on the input program information. At the same time, based on the input video signal, it is determined whether or not a character string is included in the video of the broadcast program. Specifically, as shown in FIG. 12, character strings are displayed at the left end (region P in the drawing), right end (region Q in the drawing), and lower end (region R in the drawing) on the video display screen 60. Is detected by pattern recognition. If the subtitle recognition unit 32 determines that the genre of the broadcast program is a movie or a drama and that the video of the broadcast program includes a character string, the subtitle is embedded in the video of the broadcast program as a result. judge. Therefore, the caption recognition unit 32 instructs the image composition unit 18 not to combine the caption data from the selector 31 with the video signal of the program.
  • the image synthesizing unit 18 does not synthesize the subtitle data from the selector 31 with the video signal of the program for the broadcast program specified by the subtitle recognizing unit 32 based on the instruction from the subtitle recognizing unit 32.
  • the video signal is sent to the display control unit 19 as it is.
  • the video signal sent to the display control unit 19 is sent to the display device 7 and displayed on the screen.
  • the video signal of the video is sent to the display device 7 as it is via the display control unit 19 and displayed on the screen.
  • the caption data from the selector 31 is combined with the video signal of the program as usual.
  • the synthesized video signal is sent to the display device 7 via the display control unit 19 and displayed on the screen. Even when the broadcast program designated by the caption recognition unit 32 is an image determined by the line-of-sight recognition unit 3 to be viewed by the user, it is processed as usual.
  • the audio output selection unit 22 sends only the audio signal of the video determined to be visually recognized by the user to the speaker 8 via the audio output control unit 23 based on the determination result from the line-of-sight recognition unit 3. , Audio is output.
  • the TV receiver 50 only the audio of the video determined to be viewed by the user is output, and the subtitles are displayed on the other videos.
  • the video to be displayed to the user is a broadcast program
  • the genre is a movie or a drama and the video of the broadcast program includes a character string
  • the video of the broadcast program is displayed. Determines that subtitles are embedded.
  • the video / audio output device 1b is configured not to synthesize the caption data from the selector 31 with the video signal of the program. . Therefore, in the video / audio output device 1b, when a movie or drama to be broadcasted on TV is displayed on a video other than the video determined to be visually recognized by the user, the video or the video of the drama includes a character string. Since it can be determined that captions are embedded in the video, the caption data from the selector 31 is not displayed on the display screen. According to this, the original subtitles attached to the movie or drama can be displayed.
  • the broadcast program genre is limited to a movie or a drama. However, the genre is not limited to this, and other genres may be included in the limited genre.
  • the video to be displayed to the user is a broadcast program
  • the genre is a movie or a drama
  • the video of the broadcast program includes a character string
  • the means for determining whether or not captions are embedded in the video is not necessarily limited to this.
  • the video / audio output device 1b described above has a configuration in which the caption recognition unit 32 is added to the video / audio output device 1a according to the second embodiment, the present invention is not necessarily limited thereto.
  • a configuration in which the caption recognition unit 32 is added to the video / audio output device 1 according to the first embodiment also falls within the scope of the present invention.
  • the above-described TV receiver 50 can also function as a normal TV receiver that displays only one image on the display screen of the display device 7. Therefore, for example, the configuration may be such that the user can switch between a mode for viewing a plurality of videos and a mode for viewing only one video via the remote controller 5.
  • the channel selection method by the user has not been described in particular, a conventionally known method can be adopted as the channel selection method. Therefore, for example, the user can select one or a plurality of channels to be viewed via the remote controller 5, or can select a channel by another method.
  • the video signal and the audio signal of the broadcast program are acquired from the reception unit that receives the broadcast wave and the received broadcast wave.
  • the broadcast wave includes second subtitle data for subtitle display
  • the broadcast wave further includes a first acquisition unit that also acquires the second subtitle data, and the synthesizing unit is visually recognized by the user
  • the program is displayed in the display area other than the display area determined to be, and the first acquisition unit acquires the second subtitle data of the program
  • the program displayed in the display area is displayed.
  • the composite video signal is generated by synthesizing the video signal and the second subtitle data of the program displayed in the display area instead of the first subtitle data.
  • the video / audio output apparatus when the second subtitle data for subtitle display is included in the broadcast wave, the video / audio output apparatus is configured to use the second subtitle data obtained from the broadcast wave preferentially. Has been. Therefore, the video / audio output apparatus preferentially uses the second subtitle data generated by the broadcast station (more precisely, the creator of the broadcast program). According to this, the subtitle intended by the creator of the broadcast program can be displayed to the user.
  • the video / audio output device further includes a second determination unit that determines whether or not a caption is included in the video displayed in each display area, and the synthesis unit Is generated from the video signal of the video displayed in the display area determined to include the subtitle in the video and the audio signal of the video displayed in the display area.
  • the synthesized video signal synthesized with one caption data is not generated, and the output unit outputs the video signal of the video displayed in the display area determined to include the caption in the video. It outputs to the said display part.
  • the video / audio output device adds the video signal to the video signal.
  • the first subtitle data is not synthesized. According to this, the original subtitles attached to the video can be displayed.
  • the video signal of the program being broadcast, the audio signal, and the genre of the program are received from the receiving unit that receives the broadcast wave, and the received broadcast wave.
  • a first acquisition unit that acquires program information including information, and the second determination unit displays the program in the display region other than the display region determined to be visually recognized by the user.
  • the genre of the program is a movie or a drama and a character string is included in the video, it is determined that the subtitle is included in the video.
  • the video / audio output apparatus is configured not to synthesize the first subtitle data or the second subtitle data with the video signal of the video. According to this, the original subtitles attached to the movie or drama can be displayed.
  • the first determination unit has the user's line of sight directed to any one of the display areas for a predetermined time or more. When this is detected, it is determined that the user is viewing the display area.
  • the determination unit does not determine that the user's line of sight is directed to another area unless one of the display areas is determined in advance for a predetermined time. That is, the audio output is not switched even if the visual line is moved and the video of the other display area is visually recognized while the audio of the video of one display area is output. Therefore, while listening to the audio of the video in one display area, confirm what kind of video is being displayed in the other display area, what kind of content is being displayed, etc. Can do.
  • the video / audio output device further includes a camera that records the user, and the first determination unit includes a plurality of display areas based on the video recorded by the camera. It is characterized by determining which display area the user is viewing.
  • the display area that is visually recognized by the user can be specified based on the video recording the user, particularly the user's eyeball.
  • the video / audio output device is further characterized by further including a second acquisition unit that acquires the video signal and the audio signal from an externally connected device.
  • video from an externally connected device can also be displayed.
  • the video / audio output device can be applied to, for example, a television receiver, a personal computer, a personal digital assistant (PDA), a mobile phone, or a personal computer with a television function. is there.
  • a television receiver for example, a personal computer, a personal digital assistant (PDA), a mobile phone, or a personal computer with a television function. is there.
  • PDA personal digital assistant
  • Video / audio output device 2 Camera 3 Line of sight recognition unit 4
  • Audio recognition unit 18 Image synthesis unit 22
  • Audio output selection unit 31 Selector 32
  • Subtitle recognition unit 35 Decoder 40 Eye 50
  • Television receiver 60 Display screen 100 Content playback device

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

The present invention acquires, for each video image to display, a video image signal and an audio signal of the video image from a decoder (35). An audio recognition unit (4) generates caption data on the basis of the audio signal, and a line-of-sight recognition unit (3) assesses a video image that a user is looking at on the basis of a video image recorded by a camera (2). At an image combination unit (18), the video image signals and the caption data of video images other than the assessed video image are combined. The combined video image signals are sent to a display device which is not shown. The video image signal of the assessed video image is sent as-is to the display device which is not shown. At an audio output selection unit (22), only the audio signal of the assessed video image is output to a speaker which is not shown.

Description

映像音声出力装置、および映像音声出力方法、ならびに該映像音声出力装置を備えたテレビジョン受像機VIDEO / AUDIO OUTPUT DEVICE, VIDEO / AUDIO OUTPUT METHOD, AND TELEVISION RECEIVER HAVING THE VIDEO / AUDIO OUTPUT DEVICE
 本発明は、複数の映像を同時に視聴可能な映像音声出力装置、および映像音声出力方法、ならびに該映像音声出力装置を備えたテレビジョン受像機に関する。 The present invention relates to a video / audio output device capable of simultaneously viewing a plurality of videos, a video / audio output method, and a television receiver including the video / audio output device.
 近年、放送番組を視聴しながら、同一画面上で別の映像を見たいという要望が、視聴者の間に高まっている。これを背景に、同一画面上に複数の放送番組あるいはビデオ映像等を表示する機能を持ったテレビジョン受像機が多数提案されている。 In recent years, there has been an increasing demand among viewers to watch different videos on the same screen while watching broadcast programs. Against this background, many television receivers having a function of displaying a plurality of broadcast programs or video images on the same screen have been proposed.
 同一画面上に複数の映像を表示した際の画面の一表示例を図13に示す。図13に示すように、表示画面を複数の領域に分割することによって、それぞれの領域に異なる任意の映像を表示することが可能である。本図では、表示画面を4つの領域に分割した例を示している。このようにすることによって、視聴者は、複数の映像を同時に見ることができるようになる。 FIG. 13 shows a display example of a screen when a plurality of videos are displayed on the same screen. As shown in FIG. 13, by dividing the display screen into a plurality of areas, it is possible to display different arbitrary videos in the respective areas. This figure shows an example in which the display screen is divided into four areas. In this way, the viewer can view a plurality of videos at the same time.
 しかし、このような機能を持つテレビジョン受像機では、どれか1つの映像の音声のみを出力し、他の映像の音声は消音にするものがほとんどである。例えば、図13の場合では、画面上の左上の領域に表示されている映像の音声のみが出力されており、他の3つの領域に表示されている映像の音声は消音にされている。この場合、音声を出力する映像を視聴者が任意に切り替えることができるように設定されている。このような使用方法では、複数の映像を同時に見ることができるものの、音声を出力している映像以外は音声を消音にしているため、内容が分かり難い。 However, most television receivers having such a function output only one of the audio of the video and mute the audio of the other video. For example, in the case of FIG. 13, only the audio of the video displayed in the upper left area on the screen is output, and the audio of the video displayed in the other three areas is muted. In this case, it is set so that the viewer can arbitrarily switch the video for outputting the sound. In such a usage method, although a plurality of videos can be viewed at the same time, the sound is muted except for videos that output audio, so the contents are difficult to understand.
 一般的に、複数の映像は同時に認識可能であるが、複数の音声を同時に認識することができる人は非常に限られている。そのため、同一画面上に表示されている複数の映像の音声を同時に出力する技術は容易ではあるが、視聴者にとっては使い物にならない。 Generally, a plurality of videos can be recognized simultaneously, but the number of people who can recognize a plurality of sounds at the same time is very limited. For this reason, it is easy to simultaneously output a plurality of video sounds displayed on the same screen, but it is not useful for the viewer.
 そこで、同一画面上に表示されている複数の映像の内容を理解可能に視聴する工夫が特許文献1ではされている。具体的には、本文献には、二画面表示を行う場合、一方の画面では内蔵スピーカから音声を出力し、他方の画面では音声を出力する代わりに字幕を表示する構成が開示されている。この構成によれば、一台の映像表示装置で二画面表示を行うときに、ヘッドフォンあるいは外部機器にて一方の番組の音声を聞くことなく、同時に複数の番組の内容を理解可能に視聴することが可能となる。 Therefore, Patent Document 1 discloses a device for viewing the contents of a plurality of videos displayed on the same screen so that they can be understood. Specifically, in this document, when performing two-screen display, a configuration is disclosed in which audio is output from a built-in speaker on one screen and subtitles are displayed on the other screen instead of outputting audio. According to this configuration, when two-screen display is performed on a single video display device, the contents of a plurality of programs can be viewed at the same time without listening to the sound of one program through headphones or an external device. Is possible.
日本国公開特許公報「特開2007-13725号公報(2007年1月18日公開)」Japanese Patent Publication “Japanese Patent Laid-Open No. 2007-13725 (published on January 18, 2007)”
 しかしながら、上述した特許文献1に開示されている構成では、同時に複数の内容を理解可能に視聴することが十分に可能であるとは言えない。それは、本文献に開示されている構成は、音声を出力している映像以外の映像に表示する字幕のデータを放送局が作成していることを前提としているためである。日本国では、ほとんどの放送局が放送番組の字幕データを作成していないため、特許文献1に開示されている技術を用いても、音声を出力していない映像には字幕が表示されないケースが多い。字幕が表示されていなければ、同時に複数の内容を理解可能に視聴できるとは言い難い。したがって、現状では、特許文献1に開示されている技術は適用し難い。 However, with the configuration disclosed in Patent Document 1 described above, it cannot be said that it is sufficiently possible to view a plurality of contents in an understandable manner at the same time. This is because the configuration disclosed in this document is based on the premise that the broadcast station creates subtitle data to be displayed on a video other than the video that is outputting audio. In Japan, since most broadcast stations do not create caption data for broadcast programs, even if the technology disclosed in Patent Document 1 is used, captions may not be displayed on video that does not output audio. Many. If subtitles are not displayed, it is difficult to say that a plurality of contents can be viewed in an understandable manner at the same time. Therefore, at present, the technique disclosed in Patent Document 1 is difficult to apply.
 また、特許文献1に開示されている構成では、複数の映像の中から音声を出力する映像を選ぶためにボタン等の操作をする必要がある。そのような操作は、放送番組等を視聴中のユーザにとっては煩わしく、視聴の妨げになってしまう。 In addition, in the configuration disclosed in Patent Document 1, it is necessary to operate a button or the like in order to select a video for outputting audio from a plurality of videos. Such an operation is troublesome for a user who is viewing a broadcast program or the like, and hinders viewing.
 そこで、本発明は、上記の課題に鑑みてなされたものであり、その目的は、同一画面上に表示されている複数の映像のうち、音声を出力しない映像の字幕を作成することができ、なおかつ音声を出力する映像の切り替えを簡単に行うことができる映像音声出力装置、および映像音声出力方法、ならびに該映像音声出力装置を備えたテレビジョン受像機を提供することにある。 Therefore, the present invention has been made in view of the above problems, and the purpose thereof is to create a caption of a video that does not output audio among a plurality of videos displayed on the same screen, It is another object of the present invention to provide a video / audio output device, a video / audio output method, and a television receiver including the video / audio output device, which can easily switch a video for outputting audio.
 本発明の一態様に係る映像音声出力装置は、上記課題を解決するために、映像を表示する表示部に該映像の映像信号を出力すると共に、上記映像の音声を出力する音声出力部に該映像の音声信号を出力する出力部を備えた映像音声出力装置であって、上記表示部は、互いに異なる映像を表示する複数の表示領域を備えており、各上記表示領域に表示される上記映像の上記音声信号から字幕表示用の第1字幕データを生成する字幕生成部と、複数の上記表示領域のうち、いずれの上記表示領域をユーザが視認しているのかを判定する第1判定部と、ユーザが視認していると判定された上記表示領域以外の上記表示領域ごとに、該表示領域に表示される上記映像の上記映像信号と、該表示領域に表示される上記映像の上記音声信号から生成した上記第1字幕データとを合成した合成映像信号を生成する合成部とを備え、上記出力部は、ユーザが視認していると判定された上記表示領域に表示される上記映像の上記音声信号を上記音声出力部に出力すると共に、ユーザが視認していると判定された上記表示領域に表示される上記映像の上記映像信号、および他の上記表示領域の上記合成映像信号を上記表示部に出力することを特徴としている。 In order to solve the above-described problem, the video / audio output device according to an aspect of the present invention outputs the video signal of the video to a display unit that displays the video and the audio output unit that outputs the audio of the video. A video / audio output device including an output unit that outputs an audio signal of video, wherein the display unit includes a plurality of display areas for displaying different videos, and the video displayed in each of the display areas A subtitle generation unit that generates first subtitle data for subtitle display from the audio signal, and a first determination unit that determines which display region the user is viewing from among the plurality of display regions The video signal of the video displayed in the display area and the audio signal of the video displayed in the display area for each of the display areas other than the display area determined to be visually recognized by the user Generated from A synthesis unit that generates a synthesized video signal obtained by synthesizing the first caption data, and the output unit outputs the audio signal of the video displayed in the display area determined to be viewed by a user. In addition to outputting to the audio output unit, the video signal of the video displayed in the display area determined to be viewed by the user and the composite video signal of the other display area are output to the display unit. It is characterized by doing.
 上記の構成によれば、ユーザが視認していない表示領域については、字幕が付された合成映像信号が生成される。そして、合成映像信号は、表示部に送られ、画面上に表示される。これに対して、ユーザが視認していると判定された表示領域については、該表示領域に表示する映像の映像信号がそのまま表示部に送られ、画面上に表示される。また、ユーザが視認していると判定された表示領域に表示する映像の音声信号のみが音声出力部に送られ、音声が出力される。結果、ユーザが視認していると判定された表示領域の映像の音声のみが出力され、他の表示領域の映像には字幕が表示される。 According to the above configuration, a composite video signal with captions is generated for a display area that is not visually recognized by the user. Then, the composite video signal is sent to the display unit and displayed on the screen. On the other hand, for the display area determined to be visually recognized by the user, the video signal of the video displayed in the display area is sent to the display unit as it is and displayed on the screen. Further, only the audio signal of the video displayed in the display area determined to be visually recognized by the user is sent to the audio output unit, and the audio is output. As a result, only the audio of the video in the display area determined to be viewed by the user is output, and the subtitles are displayed in the video in the other display areas.
 以上のように、表示部には複数の映像が表示されているが、ユーザが視認していると判定された表示領域の映像の音声のみを出力し、他の表示領域の映像には字幕を表示している。これによって、複数の映像を同時に理解することができる。特に、他の映像に表示する字幕は、該映像の音声信号に基づいて映像音声出力装置が生成しているので、いかなる映像であっても字幕を表示することが可能である。 As described above, although a plurality of videos are displayed on the display unit, only the audio of the video in the display area determined to be viewed by the user is output, and subtitles are added to the video in other display areas. it's shown. As a result, a plurality of images can be understood simultaneously. In particular, since the video / audio output apparatus generates subtitles to be displayed on other video based on the audio signal of the video, it is possible to display subtitles for any video.
 そして、ユーザが視認している表示領域を変えた場合には、今度は新たに視認するようになった表示領域の映像の音声のみが出力され、他の表示領域の映像には字幕が表示されるようになる。このように、本発明の一態様に係る映像音声出力装置によれば、音声を出力する映像の切り替えを簡単に行うことができる。
When the display area that the user is viewing is changed, only the audio of the video in the display area that is newly viewed is output, and the subtitles are displayed in the video in the other display areas. Become so. As described above, according to the video / audio output device according to the aspect of the present invention, it is possible to easily switch the video for outputting the audio.
.
 また、本発明の一態様に係るテレビジョン受像機は、上記課題を解決するために、上述したいずれかの映像音声出力装置を備えていることを特徴としている。 In addition, a television receiver according to an aspect of the present invention is characterized by including any one of the video / audio output devices described above in order to solve the above-described problem.
 上記の構成によれば、同一画面上に表示されている複数の映像のうち、音声を出力しない映像の字幕を作成することができ、なおかつ音声を出力する映像の切り替えを簡単に行うことができるテレビジョン受像機を提供することができる。 According to the above configuration, it is possible to create a caption of a video that does not output audio among a plurality of videos displayed on the same screen, and it is possible to easily switch a video that outputs audio. A television receiver can be provided.
 また、本発明の一態様に係る映像音声出力方法は、上記課題を解決するために、映像を表示する表示部に該映像の映像信号を出力すると共に、上記映像の音声を出力する音声出力部に該映像の音声信号を出力する映像音声出力方法であって、上記表示部は、互いに異なる映像を表示する複数の表示領域を備えており、各上記表示領域に表示される上記映像の上記音声信号から字幕表示用の字幕データを生成する生成ステップと、複数の上記表示領域のうち、いずれの上記表示領域をユーザが視認しているのかを判定する判定ステップと、上記判定ステップにおいて、ユーザが視認していると判定された上記表示領域以外の上記表示領域ごとに、該表示領域に表示される上記映像の上記映像信号と、該表示領域に表示される上記映像の上記音声信号から生成した上記字幕データとを合成した合成映像信号を生成する合成ステップと、上記判定ステップにおいて判定された上記表示領域に表示される上記映像の上記音声信号を上記音声出力部に出力すると共に、上記判定ステップにおいて判定された上記表示領域に表示される上記映像の上記映像信号、および他の上記表示領域の上記合成映像信号を上記表示部に出力する出力ステップとを備えていることを特徴としている。 The video / audio output method according to an aspect of the present invention also includes an audio output unit that outputs a video signal of the video to a display unit that displays the video and outputs the audio of the video in order to solve the above-described problem. A method of outputting an audio signal of the video, wherein the display unit includes a plurality of display areas for displaying different videos, and the audio of the video displayed in each of the display areas In the generation step of generating subtitle data for subtitle display from the signal, the determination step of determining which of the plurality of display regions the user is viewing, and the determination step, For each display region other than the display region determined to be visually recognized, the video signal of the video displayed in the display region and the sound of the video displayed in the display region A synthesizing step for synthesizing the caption data generated from the signal, and outputting the audio signal of the video displayed in the display area determined in the determining step to the audio output unit; An output step of outputting the video signal of the video displayed in the display area determined in the determination step and the composite video signal of another display area to the display unit. It is said.
 上記の方法によれば、同一画面上に表示されている複数の映像のうち、音声を出力しない映像の字幕を作成することができ、なおかつ音声を出力する映像の切り替えを簡単に行うことができる。 According to the above method, it is possible to create a caption of a video that does not output audio among a plurality of videos displayed on the same screen, and to easily switch a video that outputs audio. .
 本発明の他の目的、特徴、および優れた点は、以下に示す記載によって十分分かるであろう。また、本発明の利点は、添付図面を参照した次の説明で明白になるであろう。 Other objects, features, and superior points of the present invention will be fully understood from the following description. The advantages of the present invention will become apparent from the following description with reference to the accompanying drawings.
 本発明の一態様に係る映像音声出力装置によれば、表示部には複数の映像が表示されているが、ユーザが視認していると判定された表示領域の映像の音声のみを出力し、他の表示領域の映像には字幕を表示している。これによって、複数の映像を同時に理解することができる。特に、他の表示領域の映像に表示する字幕は、該映像の音声信号に基づいて映像音声出力装置が生成しているので、いかなる映像であっても字幕を表示することが可能である。また、ユーザが視認していると判定された表示領域の映像の音声のみが出力されるため、音声を出力する映像の切り替えを簡単に行うことができる。 According to the video / audio output device according to one aspect of the present invention, the display unit displays a plurality of videos, but outputs only the audio of the video in the display area determined to be viewed by the user, Subtitles are displayed on the video in other display areas. As a result, a plurality of images can be understood simultaneously. In particular, since the video / audio output apparatus generates subtitles to be displayed on the video in the other display area based on the audio signal of the video, it is possible to display the subtitles for any video. Further, since only the audio of the video in the display area determined to be viewed by the user is output, it is possible to easily switch the video for outputting the audio.
本発明の一実施形態に係る映像音声出力装置の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the video / audio output device which concerns on one Embodiment of this invention. 本発明の一実施形態において、同一画面上に複数の映像を表示した際の画面の一表示例を示す図である。In one Embodiment of this invention, it is a figure which shows the example of a display when a some image | video is displayed on the same screen. 本発明の一実施形態に係るTV受像機の詳細な要部構成を示すブロック図である。It is a block diagram which shows the detailed principal part structure of TV receiver concerning one Embodiment of this invention. 表示画面を複数の領域に分割する際の一分割例を示す図である。It is a figure which shows the example of 1 division at the time of dividing | segmenting a display screen into a some area | region. 表示画面を複数の領域に分割する際の一分割例を示す図である。It is a figure which shows the example of 1 division at the time of dividing | segmenting a display screen into a some area | region. ユーザの眼を拡大した図である。It is the figure which expanded the user's eyes. 図6の場合におけるTV受像機の表示画面の表示状態を示す図である。It is a figure which shows the display state of the display screen of TV receiver in the case of FIG. 図中の(a)は、ユーザが視線を動かしているときのTV受像機の表示画面の表示状態を示す図であり、図中の(b)は、ユーザが1つの映像を注視しているときのTV受像機の表示画面の表示状態を示す図であり、図中の(c)は、音声出力を切り替えたときのTV受像機の表示画面の表示状態を示す図である。(A) in the figure is a diagram showing a display state of the display screen of the TV receiver when the user is moving his / her line of sight, and (b) in the figure is gazing at one video. (C) is a figure which shows the display state of the display screen of TV receiver when an audio | voice output is switched. 本発明の一実施形態に係る映像音声出力装置を備えたコンテンツ再生装置の概略を示す図である。It is a figure which shows the outline of the content reproduction apparatus provided with the video / audio output device which concerns on one Embodiment of this invention. 本発明の他の実施形態に係る映像音声出力装置の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the video / audio output device which concerns on other embodiment of this invention. 本発明の他の実施形態に係る映像音声出力装置の要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the video / audio output device which concerns on other embodiment of this invention. 表示画面に表示した映像を示す図である。It is a figure which shows the image | video displayed on the display screen. 同一画面上に複数の映像を表示した際の画面の一表示例を示す図である。It is a figure which shows one display example of the screen at the time of displaying a some image | video on the same screen.
 〔第1の実施形態〕
 以下に、本発明に従った第1の実施形態を説明する。なお、以下の説明では、本発明を実施するために好ましい種々の限定が付されているが、本発明の技術的範囲は以下の実施の形態および図面に限定されるものではない。
[First Embodiment]
A first embodiment according to the present invention will be described below. In the following description, various limitations preferable for carrying out the present invention are given, but the technical scope of the present invention is not limited to the following embodiments and drawings.
 本実施形態に係る映像音声出力装置は、図2に示すように、表示画面60を複数の表示領域に分割することによって、それぞれの表示領域に異なる任意の映像を表示することを可能にした装置である。本図では、表示画面60を4つの領域に分割した例を示している。また、本実施形態に係る映像音声出力装置は、表示する複数の映像のうち、いずれかの映像の音声のみを出力し、他の映像については字幕を表示する機能を有している。このような映像音声出力装置は、テレビジョン受像機、パーソナルコンピュータ、またはテレビジョン機能付きパーソナルコンピュータ等のコンテンツ再生装置に適用可能である。以下では、本実施形態に係る映像音声出力装置について、テレビジョン受像機(以下、TV受像機と称す)に適用した場合を例に挙げて説明する。まず、本実施形態に係る映像音声出力装置を備えたTV受像機の全体構成について説明し、その後に該映像音声出力装置について詳細に説明する。 As shown in FIG. 2, the video / audio output device according to the present embodiment divides the display screen 60 into a plurality of display areas, thereby enabling to display different arbitrary videos in the respective display areas. It is. This figure shows an example in which the display screen 60 is divided into four areas. In addition, the video / audio output device according to the present embodiment has a function of outputting only the audio of one of the plurality of videos to be displayed and displaying subtitles for the other videos. Such a video / audio output apparatus can be applied to a content reproduction apparatus such as a television receiver, a personal computer, or a personal computer with a television function. Hereinafter, the case where the video / audio output device according to the present embodiment is applied to a television receiver (hereinafter referred to as a TV receiver) will be described as an example. First, the overall configuration of a TV receiver including the video / audio output device according to the present embodiment will be described, and then the video / audio output device will be described in detail.
 (TV受像機50の詳細な内部構成)
 まず、TV受像機50の詳細な要部構成について、図3を参照して説明する。図3は、本実施形態に係るTV受像機50の詳細な要部構成を示すブロック図である。
(Detailed internal configuration of the TV receiver 50)
First, a detailed configuration of the main part of the TV receiver 50 will be described with reference to FIG. FIG. 3 is a block diagram illustrating a detailed main configuration of the TV receiver 50 according to the present embodiment.
 図3に示すように、TV受像機50は、バスに接続されたCPU(Central Processing Unit)30および不揮発性メモリ28を有しており、該TV受像機50の動作は、CPU30および不揮発性メモリ28に記憶された各種の制御プログラムにより制御される。すなわち、TV受像機50では、CPU30を含むコンピュータ・システムによって制御されており、TV受像機50をコンピュータ・システムによって動作させるためのプログラムは不揮発性メモリ28に記憶されている。 As shown in FIG. 3, the TV receiver 50 has a CPU (Central Processing Unit) 30 and a nonvolatile memory 28 connected to a bus. The operation of the TV receiver 50 is performed by the CPU 30 and the nonvolatile memory. It is controlled by various control programs stored in 28. That is, the TV receiver 50 is controlled by a computer system including the CPU 30, and a program for operating the TV receiver 50 by the computer system is stored in the nonvolatile memory 28.
 不揮発性メモリ28は、通常RAM(Random Access Memory)によって構成されるが、一部にROM(Read Only Memory)を含んでいても良い。また、書き換え可能なフラッシュメモリ等を含んでいても良い。当該不揮発性メモリ28には、CPU30を動作させるためのOS(Operating System)または各種の制御ソフト等が記憶され、放送波を介して受信した電子番組表(EPG;Electronic Program Guide)データ等の番組情報に関するデータ、OSD(On Screen Display)表示を行う際に必要となるOSD用画像データ等が記憶されている。また、不揮発性メモリ28は、各種制御動作に必要なワークメモリとして働くワーク領域を有している。 The non-volatile memory 28 is usually constituted by a RAM (Random Access Memory), but may partially include a ROM (Read Only Memory). Further, a rewritable flash memory or the like may be included. The nonvolatile memory 28 stores an OS (Operating System) or various control software for operating the CPU 30, and programs such as electronic program guide (EPG) data received via broadcast waves. Data related to information, OSD image data necessary for performing OSD (On Screen Display) display, and the like are stored. The non-volatile memory 28 has a work area that functions as a work memory necessary for various control operations.
 TV受像機50には、デジタルチューナ部13(受信部)と共に、アナログチューナ部11(受信部)が設けられており、アナログ放送も受信可能とされている。また、外部入力部6(第2取得部)には、ハードディスクドライブ(HDD)、またはSDカード等の固体メモリ、ブルーレイディスク(BD)、DVD(Digital Versatile Disc)、またはコンパクトディスク(CD)等のディスク装置等、種々の外部機器36が接続可能とされている。さらに、TV受像機50にはIP(Internet Protocol)放送チューナ部29が備えられており、IP放送の受信も可能とされている。 The TV receiver 50 is provided with an analog tuner unit 11 (reception unit) as well as a digital tuner unit 13 (reception unit), and can receive analog broadcasts. The external input unit 6 (second acquisition unit) includes a hard disk drive (HDD), a solid-state memory such as an SD card, a Blu-ray disc (BD), a DVD (Digital Versatile Disc), or a compact disc (CD). Various external devices 36 such as a disk device can be connected. Further, the TV receiver 50 is provided with an IP (Internet Protocol) broadcast tuner unit 29, and can receive IP broadcasts.
 TV受像機50は、上記部材以外にも、カメラ2、視線認識部3(第1判定部)、表示装置7(表示部)、スピーカ8(音声出力部)、AVスイッチ部12、デジタル復調部14、分離部(DEMUX;DeMultiplexer)15、ビデオデコード/キャプチャ部16、ビデオセレクタ部17、画像合成部18(合成部)、表示制御部19(出力部)、オーディオデコード部20、オーディオセレクタ部21、音声出力選択部22、音声出力制御部23(出力部)、EPG/OSD予約処理部24、リモコン受光部25、選局部26、通信制御部27、および加算回路37を有している。 In addition to the above members, the TV receiver 50 includes a camera 2, a line-of-sight recognition unit 3 (first determination unit), a display device 7 (display unit), a speaker 8 (audio output unit), an AV switch unit 12, and a digital demodulation unit. 14, Demultiplexer (DEMUX; DeMultiplexer) 15, Video decode / capture unit 16, Video selector unit 17, Image composition unit 18 (Composition unit), Display control unit 19 (Output unit), Audio decode unit 20, Audio selector unit 21 , An audio output selection unit 22, an audio output control unit 23 (output unit), an EPG / OSD reservation processing unit 24, a remote control light receiving unit 25, a channel selection unit 26, a communication control unit 27, and an addition circuit 37.
 アナログチューナ部11は、アナログ放送受信用のアンテナ9を介して受信するアナログテレビ放送信号を選局するもので、選局部26からの選局指示に応じて受信するチャンネルの選局を行う。該アナログチューナ部11からの受信信号は、AVスイッチ部12(第1取得部)において、音声信号と映像信号とに分離され、映像信号はビデオセレクタ部17に入力され、音声信号はオーディオセレクタ部21に入力される。 The analog tuner unit 11 selects an analog television broadcast signal received via the antenna 9 for receiving an analog broadcast, and selects a channel to be received according to a channel selection instruction from the channel selection unit 26. The received signal from the analog tuner unit 11 is separated into an audio signal and a video signal in the AV switch unit 12 (first acquisition unit), the video signal is input to the video selector unit 17, and the audio signal is input to the audio selector unit. 21 is input.
 デジタルチューナ部13は、デジタル放送受信用アンテナ10を介して受信するデジタルテレビ放送信号を選局するもので、選局部26からの選局指示に応じて受信するチャンネルの選局を行う。該デジタルチューナ部10からの受信信号は、デジタル復調部14にて復調され、分離部(DEMUX)15(第1取得部)に送られる。 The digital tuner unit 13 selects a digital television broadcast signal received via the digital broadcast receiving antenna 10 and selects a channel to be received according to a channel selection instruction from the channel selection unit 26. The received signal from the digital tuner unit 10 is demodulated by the digital demodulation unit 14 and sent to the separation unit (DEMUX) 15 (first acquisition unit).
 IP放送チューナ部29は、電話回線・LAN(Local Area Network)等に接続された通信制御部27を介して受信するIP放送信号を選局するもので、選局部26からの選局指示に応じて受信する特定のIP放送の選局を行う。該IP放送チューナ部29からの受信信号は、分離部(DEMUX)15に出力される。 The IP broadcast tuner unit 29 selects an IP broadcast signal received via a communication control unit 27 connected to a telephone line / LAN (Local Area Network) or the like, and responds to a channel selection instruction from the channel selection unit 26. To select a specific IP broadcast to be received. A reception signal from the IP broadcast tuner unit 29 is output to the demultiplexing unit (DEMUX) 15.
 分離部(DEMUX)15は、デジタル復調部14またはIP放送チューナ部29から入力された多重化された映像信号および音声信号をそれぞれ分離する。分離した映像信号をビデオデコード/キャプチャ部16に送り、音声信号をオーディオデコード部20に送る。さらに、分離部(DEMUX)15は、放送信号に含まれるEPGデータ等のデータを抽出し、EPG/OSD予約処理部24に送る。なお、分離部(DEMUX)15によって抽出された放送信号は、必要に応じてCPU30による書き込み制御により不揮発性メモリ28に記録される。 The separation unit (DEMUX) 15 separates the multiplexed video signal and audio signal input from the digital demodulation unit 14 or the IP broadcast tuner unit 29, respectively. The separated video signal is sent to the video decoding / capturing unit 16 and the audio signal is sent to the audio decoding unit 20. Further, the separation unit (DEMUX) 15 extracts data such as EPG data included in the broadcast signal and sends the data to the EPG / OSD reservation processing unit 24. The broadcast signal extracted by the separation unit (DEMUX) 15 is recorded in the nonvolatile memory 28 by writing control by the CPU 30 as necessary.
 ビデオデコード/キャプチャ部16は、分離部(DEMUX)15によって分離された映像信号をデコードしたり、映像信号に含まれるビデオ情報を静止画としてキャプチャしたりする。該ビデオデコード/キャプチャ部16によってデコードされた映像信号は、ビデオセレクタ部17に送られる。 The video decode / capture unit 16 decodes the video signal separated by the separation unit (DEMUX) 15 or captures video information included in the video signal as a still image. The video signal decoded by the video decode / capture unit 16 is sent to the video selector unit 17.
 ビデオセレクタ部17には、既に説明したとおり、アナログチューナ部11からの映像信号が入力されており、また、外部入力部6からの映像信号も入力されている。ビデオセレクタ部17は、CPU30からの制御信号により、これらの入力映像信号から1つの映像信号を選んで出力し、画像合成部18に送る。 As described above, the video signal from the analog tuner unit 11 is input to the video selector unit 17, and the video signal from the external input unit 6 is also input. The video selector unit 17 selects and outputs one video signal from these input video signals according to a control signal from the CPU 30, and sends it to the image composition unit 18.
 画像合成部18では、入力された映像信号と、後述する音声認識部4(字幕生成部)によって生成された字幕表示用の字幕データ(第1字幕データ)とを合成する。詳細は後述するが、画像合成部18では、表示装置7に表示する複数の映像のうち、音声を出力しない映像の映像信号と、字幕データとを合成する。また、該画像合成部18では、合成した映像信号(合成映像信号)に対して、例えば、ノイズリダクションの処理、シャープネスの調整、またはコントラストの調整等の映像処理を行い、表示装置7に対して最適な映像信号となるように変換する。なお、合成する字幕データがない場合(例えば、音声を出力する映像の場合、あるいは表示装置7に表示する映像が1つのみである場合等)には、ビデオセレクタ部17からの映像信号に映像処理をそのまま行う。 The image synthesizing unit 18 synthesizes the input video signal and caption display subtitle data (first subtitle data) generated by a voice recognition unit 4 (caption generating unit) described later. As will be described in detail later, the image synthesis unit 18 synthesizes the video signal of the video that does not output audio and the caption data among the plurality of videos displayed on the display device 7. Further, the image synthesis unit 18 performs video processing such as noise reduction processing, sharpness adjustment, or contrast adjustment on the synthesized video signal (synthesized video signal), for example, to the display device 7. Convert to an optimal video signal. When there is no subtitle data to be synthesized (for example, in the case of a video that outputs audio or when only one video is displayed on the display device 7), the video signal from the video selector unit 17 is video. Processing is performed as it is.
 表示制御部19は、入力された映像信号を表示装置7に出力して表示させるように制御するものである。なお、表示制御部19は、上記映像信号を、後述するEPG/OSD予約処理部24によって作成されたEPGデータ、あるいはOSDデータと合わせて上記映像信号を出力することができる。表示装置7では、送られてきた映像信号を画面上に表示する。 The display controller 19 controls the input video signal to be output to the display device 7 for display. The display control unit 19 can output the video signal by combining the video signal with EPG data or OSD data created by the EPG / OSD reservation processing unit 24 described later. The display device 7 displays the transmitted video signal on the screen.
 オーディオデコード部20は、分離部(DEMUX)15によって分離された音声信号をデコードするものである。該オーディオデコード部20は、デコードした音声信号をオーディオセレクタ部21に送る。 The audio decoding unit 20 decodes the audio signal separated by the separation unit (DEMUX) 15. The audio decoding unit 20 sends the decoded audio signal to the audio selector unit 21.
 オーディオセレクタ部21では、AVスイッチ部12からの音声信号、外部入力部6からの音声信号、およびオーディオデコード部20からの音声信号を受け、CPU30からの制御によってビデオセレクタ部17で選択された映像信号に対応した音声信号を選択する。選択した音声信号を、音声認識部4および音声出力選択部22に出力する。詳細は後述するが、音声認識部4では、選択された音声信号から字幕データを生成する処理を行う。生成した字幕データは、画像合成部18に出力され、所定の映像信号と合成される。一方、音声出力選択部22では、表示装置7に表示する複数の映像のうち、音声を出力する映像の音声信号を選択する。そして、選択された音声信号は音声出力制御部23に出力され、該音声出力制御部23にてスピーカ8での再生に最適な音声信号に変換されてスピーカ8に供給される。なお、表示装置7に表示する映像が1つのみである場合は、音声出力選択部22において出力する音声を選択するまでもない。 The audio selector unit 21 receives the audio signal from the AV switch unit 12, the audio signal from the external input unit 6, and the audio signal from the audio decoding unit 20, and the video selected by the video selector unit 17 under the control of the CPU 30. Select the audio signal corresponding to the signal. The selected voice signal is output to the voice recognition unit 4 and the voice output selection unit 22. Although details will be described later, the voice recognition unit 4 performs processing for generating caption data from the selected voice signal. The generated caption data is output to the image synthesis unit 18 and synthesized with a predetermined video signal. On the other hand, the audio output selection unit 22 selects an audio signal of a video that outputs audio from among a plurality of videos displayed on the display device 7. The selected audio signal is output to the audio output control unit 23, converted to an audio signal optimal for reproduction on the speaker 8 by the audio output control unit 23, and supplied to the speaker 8. If there is only one video to be displayed on the display device 7, the audio output selection unit 22 does not need to select the audio to be output.
 EPG/OSD予約処理部24は、定期的に更新保存されたEPGデータに基づき電子番組表を作成し、また、不揮発性メモリ28に予め保存されているOSDデータを描画する。OSDデータとは、例えば、予め不揮発性メモリ28に記憶された設定メニュー画面、音量ケージ、現在時刻、または選局チャンネル等の各種情報を描画するためのデータである。 The EPG / OSD reservation processing unit 24 creates an electronic program guide based on EPG data periodically updated and stored, and draws OSD data stored in advance in the nonvolatile memory 28. The OSD data is data for drawing various information such as a setting menu screen, a volume cage, a current time, or a channel selection stored in advance in the nonvolatile memory 28, for example.
 また、EPG/OSD予約処理部24は、CPU30からの指示に応じて、描画するOSDデータの表示装置7の表示画面上における表示位置等のレイアウトも決定する。EPG/OSD予約処理部24が作成したEPGデータあるいはOSDデータは、画像合成部18が出力する映像信号に加算回路20によって加算され、表示装置7に出力される。なお、EPG/OSD予約処理部24は、電子番組表を利用して番組の予約処理等も行う。 Also, the EPG / OSD reservation processing unit 24 determines the layout of the display position of the OSD data to be drawn on the display screen of the display device 7 in accordance with an instruction from the CPU 30. The EPG data or OSD data created by the EPG / OSD reservation processing unit 24 is added to the video signal output from the image synthesis unit 18 by the adding circuit 20 and output to the display device 7. The EPG / OSD reservation processing unit 24 also performs program reservation processing using the electronic program guide.
 通信制御部27は、電話回線、LAN、インターネット、またはDLNA(Digital Living Network Alliance)等のホームネットワーク規格等のネットワーク網を介して通信を確立させるように制御を行う。ネットワークを通じて他の機器と接続しても、インターネットを通じて映像サービスと接続しても良い。また、上記他の機器との接続は、有線または無線のどちらでも良い。 The communication control unit 27 performs control to establish communication via a network such as a telephone network, a LAN, the Internet, or a home network standard such as DLNA (Digital Living Network Alliance). You may connect with another apparatus through a network, or you may connect with a video service through the Internet. The connection with the other device may be either wired or wireless.
 リモコン受光部25は、リモートコントローラ5(以下、リモコン5と称す)からの光信号を受信して、該リモコン5からの制御信号を受け付けるためのものである。TV受像機50の電源のオン/オフ、音量のアップ/ダウン、および視聴チャンネルの選局等、視聴者からの指示は、このリモコン5を介して行われる。 The remote control light receiving unit 25 is for receiving an optical signal from the remote controller 5 (hereinafter referred to as the remote controller 5) and receiving a control signal from the remote controller 5. Instructions from the viewer, such as turning on / off the power of the TV receiver 50, increasing / decreasing the volume, and selecting a viewing channel, are performed via the remote controller 5.
 カメラ2は、ユーザ(特に眼40)を録画しており、詳しくは後述するが、視線認識部3では、カメラ2が録画した映像に基づいて、ユーザの視線の位置を判定する。 The camera 2 records the user (especially the eye 40), and the line-of-sight recognition unit 3 determines the position of the line of sight of the user based on the video recorded by the camera 2 as described in detail later.
 カメラ2、分離部(DEMUX)15、ビデオコード/キャプチャ部16、オーディオデコード部20、EPG/OSD/予約処理部24、リモコン受光部25、選局部26、通信制御部27、不揮発性メモリ28、IP放送チューナ部29、およびCPU30はバスを介して接続されている。 Camera 2, separation unit (DEMUX) 15, video code / capture unit 16, audio decoding unit 20, EPG / OSD / reservation processing unit 24, remote control light receiving unit 25, channel selection unit 26, communication control unit 27, nonvolatile memory 28, The IP broadcast tuner unit 29 and the CPU 30 are connected via a bus.
 (映像音声出力装置1の構成)
 以下では、映像音声出力装置1の詳しい構成について、図1を参照して説明する。図1は、映像音声出力装置1の要部構成を示すブロック図である。
(Configuration of video / audio output device 1)
Below, the detailed structure of the audio-video output apparatus 1 is demonstrated with reference to FIG. FIG. 1 is a block diagram showing a main configuration of the video / audio output apparatus 1.
 図1に示すように、映像音声出力装置1は、カメラ2、視線認識部3、音声認識部4、画像合成部18、音声出力選択部22、およびデコーダ35を備えている。デコーダ35とは、TV受像機50が受信した放送信号あるいは外部機器36から映像信号および音声信号を分離し、デコードするものである。すなわち、上述の分離部(DEMUX)15、ビデオセレクタ部17、およびオーディオセレクタ部21等、放送信号あるいは外部機器36から、映像出力に適した映像信号、および音声出力に適した音声信号を分離・生成するまでの過程に必要な各種部材が含まれる。本図では、図を分かりやすくするために、上記各種部材をまとめてデコーダ35としている。 As shown in FIG. 1, the video / audio output device 1 includes a camera 2, a line-of-sight recognition unit 3, a voice recognition unit 4, an image synthesis unit 18, a voice output selection unit 22, and a decoder 35. The decoder 35 separates and decodes the video signal and the audio signal from the broadcast signal received by the TV receiver 50 or the external device 36. That is, the video signal suitable for video output and the audio signal suitable for audio output are separated from the broadcast signal or the external device 36 such as the above-described separation unit (DEMUX) 15, video selector unit 17, and audio selector unit 21. Various members necessary for the process until generation are included. In this figure, in order to make the figure easy to understand, the above-mentioned various members are collectively used as a decoder 35.
 ここで、上述したように、映像音声出力装置1は、表示画面を複数の表示領域に分割することによって、それぞれの表示領域に異なる任意の映像を表示することを可能にした装置である。したがって、表示装置7に表示する映像の数だけ映像信号および音声信号を放送信号または外部機器36から分離・生成する必要がある。そこで、映像音声出力装置1では、音声認識部4およびデコーダ35を1つの構成単位としたサブユニット1’を、表示装置7に表示する映像の数だけ備えている。すなわち、表示装置7において、4つの異なる映像を表示する場合には、サブユニット1’を4つ備える。換言すれば、サブユニット1’を増やす/減らすことによって、同時に視聴可能な映像の数を増やす/減らすことができる。 Here, as described above, the video / audio output device 1 is a device that can display a different arbitrary video in each display area by dividing the display screen into a plurality of display areas. Therefore, it is necessary to separate and generate video signals and audio signals from the broadcast signal or the external device 36 by the number of videos to be displayed on the display device 7. Therefore, the video / audio output device 1 includes as many subunits 1 ′ having the audio recognition unit 4 and the decoder 35 as one structural unit as the number of videos to be displayed on the display device 7. That is, when the display device 7 displays four different images, the display device 7 includes four subunits 1 '. In other words, by increasing / decreasing the subunit 1 ', it is possible to increase / decrease the number of videos that can be viewed simultaneously.
 ところで、表示装置7に複数の放送番組を表示する場合には、表示する放送番組の数だけ該放送番組の放送信号を受信する必要がある。したがって、受信する放送信号の数だけアナログチューナ部11およびデジタルチューナ部13を設ける必要がある。換言すれば、サブユニット1’と同じ数のアナログチューナ部11およびデジタルチューナ部13を設けることが好ましい。なお、該アナログチューナ部11および該デジタルチューナ部13は、デコーダ35に含ませても良い。 Incidentally, when a plurality of broadcast programs are displayed on the display device 7, it is necessary to receive broadcast signals of the broadcast programs as many as the number of broadcast programs to be displayed. Therefore, it is necessary to provide the analog tuner unit 11 and the digital tuner unit 13 as many as the number of broadcast signals to be received. In other words, it is preferable to provide the same number of analog tuner sections 11 and digital tuner sections 13 as the number of subunits 1 ′. The analog tuner unit 11 and the digital tuner unit 13 may be included in the decoder 35.
 なお、表示装置7に表示する映像は、デジタル放送番組およびアナログ放送番組のいずれであっても良く、さらには、地上波放送番組、衛星放送番組、衛星放送番組、およびCATV番組のいずれであっても良い。また、放送番組に限らず、外部入力部6を介して、外部機器36からの映像を表示しても良く、特に限定はない。 Note that the video displayed on the display device 7 may be either a digital broadcast program or an analog broadcast program, and further, any of a terrestrial broadcast program, a satellite broadcast program, a satellite broadcast program, and a CATV program. Also good. In addition to the broadcast program, video from the external device 36 may be displayed via the external input unit 6, and there is no particular limitation.
 (映像音声出力装置1の処理の流れ)
 続いて、映像音声出力装置1の処理について、再び図1を参照して説明する。まず、TV受像機50の表示画面上には、ユーザが選択した任意の放送番組あるいは外部機器36の映像が複数表示されているとする。
(Processing flow of the video / audio output device 1)
Next, the processing of the video / audio output device 1 will be described with reference to FIG. 1 again. First, it is assumed that a plurality of broadcast programs selected by the user or a plurality of videos of the external device 36 are displayed on the display screen of the TV receiver 50.
 ここで、上述したように、映像音声出力装置1では、表示する放送番組の放送信号であったり、外部機器36に含まれるデータであったりをそれぞれサブユニット1’のデコーダ35にて処理し、映像信号および音声信号を取得する。得られた映像信号および音声信号は、図1に示すように、画像合成部18および音声出力選択部22にそれぞれ出力される。この際、デコーダ35からの音声信号は、音声認識部4にも出力される。音声認識部4では、音声信号から字幕データを生成する処理を行う。より詳細には、音声信号を基に、映像中の人の会話およびナレーション等をテキスト化した字幕データを生成する。生成した字幕データは、画像合成部18に出力される。 Here, as described above, the video / audio output device 1 processes the broadcast signal of the broadcast program to be displayed or the data included in the external device 36 by the decoder 35 of the subunit 1 ′, Obtain video and audio signals. The obtained video signal and audio signal are output to the image synthesis unit 18 and the audio output selection unit 22, respectively, as shown in FIG. At this time, the audio signal from the decoder 35 is also output to the audio recognition unit 4. The voice recognition unit 4 performs processing for generating caption data from the voice signal. More specifically, subtitle data in which a person's conversation and narration in the video are converted into text is generated based on the audio signal. The generated caption data is output to the image composition unit 18.
 一方、カメラ2はユーザを撮影しており、視線認識部3はカメラ2が録画した映像に基づいてユーザの視線の位置を判定する。より詳細には、カメラ2が録画した映像を基に、ユーザの視線の位置を判定し、TV受像機50の画面上に表示されている複数の映像のうち、いずれの映像をユーザが視認しているのかを判定する。その判定結果は画像合成部18および音声出力選択部22に出力される。 On the other hand, the camera 2 captures the user, and the line-of-sight recognition unit 3 determines the position of the user's line of sight based on the video recorded by the camera 2. More specifically, the position of the user's line of sight is determined based on the video recorded by the camera 2, and the user visually recognizes any video among the plurality of videos displayed on the screen of the TV receiver 50. It is determined whether it is. The determination result is output to the image synthesis unit 18 and the audio output selection unit 22.
 画像合成部18では、視線認識部3からの判定結果に基づいて、ユーザが視認していると判定された映像以外の映像については、該映像の映像信号と音声認識部4が該映像の音声信号から生成した字幕データとを合成する。すなわち、ユーザが視認していない映像については、字幕が付された映像信号が生成される。合成された映像信号は、表示制御部19を介して表示装置7に送られ、画面上に表示される。これに対して、ユーザが視認していると判定された映像については、該映像の映像信号がそのまま表示制御部19を介して表示装置7に送られ、画面上に表示される。 In the image synthesizing unit 18, for a video other than the video determined to be viewed by the user based on the determination result from the line-of-sight recognition unit 3, the video signal of the video and the voice recognition unit 4 The caption data generated from the signal is combined. That is, for a video that is not visually recognized by the user, a video signal with subtitles is generated. The synthesized video signal is sent to the display device 7 via the display control unit 19 and displayed on the screen. On the other hand, for the video determined to be visually recognized by the user, the video signal of the video is sent as it is to the display device 7 via the display control unit 19 and displayed on the screen.
 また、音声出力選択部22では、視線認識部3からの判定結果に基づいて、ユーザが視認していると判定された映像の音声信号のみが音声出力制御部23を介してスピーカ8に送られ、音声が出力される。結果、TV受像機50では、ユーザが視認していると判定された映像の音声のみが出力され、他の映像には音声が出力される代わりに、字幕が表示される。 In the audio output selection unit 22, only the audio signal of the video determined to be visually recognized by the user based on the determination result from the line-of-sight recognition unit 3 is sent to the speaker 8 via the audio output control unit 23. , Audio is output. As a result, in the TV receiver 50, only the audio of the video determined to be visually recognized by the user is output, and the subtitles are displayed instead of the audio for the other video.
 このように、本実施形態に係る映像音声出力装置1では、ユーザが視認していると視線認識部3が判定した映像の音声のみを出力している。すなわち、ユーザが視線を動かし、他の映像を視認するようになった場合には、視線認識部3がユーザの視線の位置を判定し、ユーザが新たに視認するようになった映像を特定する。その情報は画像合成部18および音声出力選択部22に出力され、画像合成部18では、新たにユーザが視認するようになったと判定された映像以外の映像について、該映像の映像信号と該映像の字幕データとを合成する。合成された映像信号は、表示制御部19を介して表示装置7に送られ、画面上に表示される。これに対して、新たにユーザが視認するようになったと判定された映像については、該映像の映像信号がそのまま表示制御部19を介して表示装置7に送られ、画面上に表示される。したがって、今までユーザが視認していると判定されていた映像の音声の出力は停止し、字幕が付された映像が表示される。一方、新たにユーザが視認するようになったと判定された映像については字幕が表示されなくなり、代わりに音声が出力されるようになる。 Thus, in the video / audio output device 1 according to the present embodiment, only the audio of the video determined by the line-of-sight recognition unit 3 as being visually recognized by the user is output. That is, when the user moves his / her line of sight to view another video, the line-of-sight recognition unit 3 determines the position of the user's line of sight and identifies the video that the user has newly viewed. . The information is output to the image synthesizing unit 18 and the audio output selecting unit 22, and the image synthesizing unit 18 regards the video signal of the video and the video for the video other than the video that is newly determined to be visually recognized by the user. Is combined with the subtitle data. The synthesized video signal is sent to the display device 7 via the display control unit 19 and displayed on the screen. On the other hand, for a video that has been newly determined to be visually recognized by the user, the video signal of the video is directly sent to the display device 7 via the display control unit 19 and displayed on the screen. Therefore, the output of the audio of the video that has been determined to be viewed by the user is stopped, and the video with subtitles is displayed. On the other hand, subtitles are not displayed for videos that are newly determined to be visually recognized by the user, and audio is output instead.
 以上のように、TV受像機50には複数の映像が表示されているが、ユーザが視認していると判定された映像の音声のみを出力し、他の映像には字幕を表示している。これによって、複数の映像を同時に理解することができる。特に、他の映像に表示する字幕は、該映像の音声信号に基づいて映像音声出力装置1が生成しているので、いかなる放送番組あるいは外部機器36であっても字幕を表示することが可能である。 As described above, a plurality of videos are displayed on the TV receiver 50, but only the audio of the video determined to be visually recognized by the user is output, and subtitles are displayed on the other videos. . As a result, a plurality of images can be understood simultaneously. In particular, since the video / audio output device 1 generates subtitles to be displayed on other video based on the audio signal of the video, it is possible to display subtitles for any broadcast program or external device 36. is there.
 そして、ユーザが視認している映像を変えた場合には、今度は新たに視認するようになった映像の音声のみが出力され、他の映像には字幕が表示されるようになる。このように、本実施形態に係る映像音声出力装置1によれば、音声を出力する映像の切り替えを簡単に行うことができる。 Then, when the video that the user is viewing is changed, only the audio of the video that is newly viewed is output, and the subtitles are displayed on the other video. As described above, according to the video / audio output device 1 according to the present embodiment, it is possible to easily switch the video for outputting the audio.
 (表示画面の分割)
 複数の映像を表示装置7の表示画面上に表示するために、該表示画面を複数の領域に分割すると述べたが、その分割方法には特に限定はない。そこで、表示画面を複数の領域に分割する例を図4および図5に示す。図4では、表示画面60を4つの領域に分割する場合の例を示しており、図5では、表示画面60を7つの領域に分割する場合の例を示している。
(Split display screen)
In order to display a plurality of videos on the display screen of the display device 7, it has been described that the display screen is divided into a plurality of regions. However, the division method is not particularly limited. An example in which the display screen is divided into a plurality of regions is shown in FIGS. 4 shows an example in which the display screen 60 is divided into four areas, and FIG. 5 shows an example in which the display screen 60 is divided into seven areas.
 図4に示すように、表示画面60を均等に4つの領域A~Dに分割しても良いし、図5に示すように、表示画面60を不均等に7つの領域A~Gに分割しても良い。このように、表示画面60を均等に分割したり、大きさに大小をつけて分割したり、その分割方法には種々の変更が可能である。ユーザが所望な分割方法を選択できるように、TV受像機50には分割方法の多々のバリエーションが与えられていても良い。 As shown in FIG. 4, the display screen 60 may be equally divided into four areas A to D, or as shown in FIG. 5, the display screen 60 is unevenly divided into seven areas A to G. May be. As described above, the display screen 60 can be divided equally, or can be divided according to the size. Various changes can be made to the dividing method. Various variations of the division method may be given to the TV receiver 50 so that the user can select a desired division method.
 (音声認識部4の処理)
 音声認識部4は、上述したように、映像の音声信号を基に字幕データを生成するが、その生成方法としては公知の方法を適用することができる。例えば、特開2004-80069号公報等に開示されている技術を用いることができる。具体的には、まず音声信号から何種類の音声かを選別すると共に、雑音判別を行う。これは、実際の音声には、映像状態により、車の音、風の音、およびその他の雑音が混ざっており、人物の音声との区別を行わなければ文字化することが困難であるためである。そして、音声信号を文字化すると共に、文字変換を行う。ここで、本文献に開示されているように、登場人物を判別したり、男性および女性を判別したり、人物を選別して字幕データを生成する構成も本実施形態に好適に用いられる。
(Processing of voice recognition unit 4)
As described above, the voice recognition unit 4 generates caption data based on a video audio signal, and a known method can be applied as the generation method. For example, a technique disclosed in JP 2004-80069 A can be used. Specifically, first, the number of types of sound is selected from the sound signal, and noise discrimination is performed. This is because the actual sound is mixed with car sounds, wind sounds, and other noises depending on the video state, and it is difficult to make text unless it is distinguished from human voices. Then, the audio signal is converted into characters and converted into characters. Here, as disclosed in this document, a configuration that discriminates characters, discriminates between men and women, or selects characters to generate caption data is also preferably used in this embodiment.
 以上のような従来公知である技術を用いて、映像の音声をテキスト化し、表示画面上に字幕として表示することができる。ここで、本実施形態に係る音声認識部4の処理方法として適用可能な技術は、上述の処理方法に限定されるわけではなく、他の技術も適用可能であるのは言うまでもない。 Using the conventionally known techniques as described above, the audio of the video can be converted into text and displayed as subtitles on the display screen. Here, the technology applicable as the processing method of the speech recognition unit 4 according to the present embodiment is not limited to the above-described processing method, and it goes without saying that other technologies can also be applied.
 (視線認識部3の処理)
 視線認識部3は、上述したように、ユーザの視線の位置を判定するが、その判定方法としては公知の方法を適用することができる。例えば、特開2005-100366号公報に開示されている技術を用いることができる。具体的には、本文献には、視線方向の検出方法として、方向別画像相関法、黒画素領域検出法、およびエッジ特徴点検出法の3つの方法が開示されている。以下に、それぞれの検出方法について簡単に説明する。
(Processing of the line-of-sight recognition unit 3)
As described above, the line-of-sight recognition unit 3 determines the position of the user's line of sight, and a known method can be applied as the determination method. For example, the technique disclosed in Japanese Patent Laid-Open No. 2005-100366 can be used. Specifically, in this document, three methods, a direction-specific image correlation method, a black pixel region detection method, and an edge feature point detection method, are disclosed as eye gaze direction detection methods. Below, each detection method is demonstrated easily.
 上記の方向別画像相関法および黒画素領域検出法においては、表示画面の区画を順次点滅させていき、ユーザにそれを眼で追ってもらった際のユーザの眼球の位置等を記録し、方向決定用の基準としている。まず方向別画像相関法では、上記の方向決定用の基準として眼周辺画像を登録しておき、ユーザの眼の位置と、登録しておいた眼周辺画像とのマッチングを行い、最も高い相関を与える眼周辺画像より視線方向を決定する。また、黒画素領域検出法では、上記の方向決定用の基準として瞳孔を含む虹彩領域の画像を登録しておき、ユーザの眼の位置を拡大した画像と、上記の虹彩領域の画像とのマッチングを行い、最も高い相関を与える虹彩領域の画像より視線方向を決定する。 In the image correlation method for each direction and the black pixel region detection method described above, the sections of the display screen are sequentially blinked, and the position of the user's eyeball when the user follows it is recorded, and the direction is determined. As a standard. First, in the image correlation method for each direction, an eye peripheral image is registered as the above-mentioned reference for determining the direction, the eye position of the user is matched with the registered eye peripheral image, and the highest correlation is obtained. The line-of-sight direction is determined from the given eye peripheral image. Further, in the black pixel region detection method, an iris region image including a pupil is registered as a reference for determining the direction, and an image obtained by enlarging the position of the user's eye is matched with the above iris region image. And the line-of-sight direction is determined from the image of the iris region that gives the highest correlation.
 一方、エッジ特徴点検出法では、眼球内の虹彩領域と白目領域と瞼との輝度の変化に着目し、エッジ検出を用いて視線方向を検出する。エッジ検出を容易にするために、画像強調およびメディアン・フィルタによる平滑化をこない、ソーベル・フィルタによりエッジの検出を行う。 On the other hand, in the edge feature point detection method, attention is paid to luminance changes in the iris region, white eye region, and eyelid in the eyeball, and the line-of-sight direction is detected using edge detection. In order to facilitate edge detection, edge detection is performed by a Sobel filter without image enhancement and smoothing by a median filter.
 以上のような従来公知である技術を用いて、ユーザの視線の方向を検出し、ユーザが視認している映像を特定することができる。ここで、本実施形態に係る視線認識部3の処理方法として適用可能な技術は、上述した検出方法に限定されるわけではなく、他の技術も適用可能であるのは言うまでもない。 By using a conventionally known technique as described above, it is possible to detect the direction of the user's line of sight and specify the video that the user is viewing. Here, the technology applicable as the processing method of the line-of-sight recognition unit 3 according to the present embodiment is not limited to the detection method described above, and it goes without saying that other technologies can also be applied.
 例えば、上述した検出方法以外にも、ユーザの視線方向を判定する方法がある。それについて、図6および図7を参照して説明する。図6は、ユーザの眼40を拡大した図である。図7は、図6の場合におけるTV受像機50の表示画面60の表示状態を示す図である。以下では、TV受像機50の表示画面60が4つの領域に分割されている場合を想定して説明する。 For example, in addition to the detection method described above, there is a method of determining the user's line-of-sight direction. This will be described with reference to FIG. 6 and FIG. FIG. 6 is an enlarged view of the user's eye 40. FIG. 7 is a diagram showing a display state of the display screen 60 of the TV receiver 50 in the case of FIG. In the following description, it is assumed that the display screen 60 of the TV receiver 50 is divided into four areas.
 まずカメラ2が録画した映像から、ユーザの眼40の部分を抜き出す。そして、抜き出された眼40の輪郭から白目部分を4等分にし、黒目の部分が4つの領域のうち、いずれの領域を最も多く占めるかを判定する。例えば、図6の場合では、ユーザの黒目の部分は、左上の領域を最も多く占めている。すなわち、ユーザは、表示画面60の右上方向を視認していると判断することができる。結果、TV受像機50では、図7に示すように、右上の領域に表示されている映像の音声のみが出力され、他の領域に表示されている映像には字幕が表示される。 First, the user's eye 40 is extracted from the video recorded by the camera 2. Then, the white eye portion is divided into four equal parts from the extracted outline of the eye 40, and it is determined which of the four regions the black eye portion occupies most. For example, in the case of FIG. 6, the user's black eye occupies the most upper left area. That is, it can be determined that the user is viewing the upper right direction of the display screen 60. As a result, as shown in FIG. 7, the TV receiver 50 outputs only the audio of the video displayed in the upper right area, and the subtitles are displayed in the video displayed in the other areas.
 上述したような視線方向の検出技術は、一般的に筋萎縮性側策硬化症(ALS)患者の意思伝達のために用いられる。このようなALS患者の視線認識には高い認識精度が求められる。しかし、本実施形態のように、表示画面60上をユーザが視認している位置を検出する場合は、表示装置7が大型であれば容易であり、誤動作も少ない。 The gaze direction detection technique as described above is generally used for communication of a patient with amyotrophic side sclerosis (ALS). High recognition accuracy is required for such line-of-sight recognition of ALS patients. However, as in the present embodiment, when the position where the user is viewing on the display screen 60 is detected, it is easy if the display device 7 is large, and there are few malfunctions.
 (視線認識部3の動作)
 ユーザは、複数の映像が表示された表示画面60を視聴中に、音声が出力されている映像以外の映像も視認する。すなわち、ユーザの視線は、音声が出力されている映像以外の映像にも向けられることになる。その場合、ユーザの視線が他の映像に向けられる度に音声出力を切り替えていたのでは、ユーザを混乱させてしまうし、複数の映像の内容を同時に理解するには達しない。
(Operation of the line-of-sight recognition unit 3)
While viewing the display screen 60 on which a plurality of videos are displayed, the user also visually recognizes a video other than the video in which sound is output. That is, the user's line of sight is also directed to a video other than the video in which sound is output. In that case, if the audio output is switched every time the user's line of sight is directed to another video, the user is confused and the contents of a plurality of videos cannot be understood at the same time.
 そこで、本実施形態では、音声が出力されていない映像を予め定めた時間以上ユーザが注視した場合に、音声出力を切り替えるような構成にすることが好ましい。以下に、その構成について、図8を参照して説明する。図8中の(a)は、ユーザが視線を動かしているときのTV受像機50の表示画面60の表示状態を示す図である。図8中の(b)は、ユーザが1つの映像を注視しているときのTV受像機50の表示画面60の表示状態を示す図である。図8中の(c)は、音声出力を切り替えたときのTV受像機50の表示画面60の表示状態を示す図である。これらの図では、TV受像機50の表示画面60が4つの領域(A~D)に分割されている場合を想定している。 Therefore, in the present embodiment, it is preferable that the audio output is switched when the user gazes at a predetermined time for a video that is not output. Hereinafter, the configuration will be described with reference to FIG. (A) in FIG. 8 is a diagram showing a display state of the display screen 60 of the TV receiver 50 when the user moves his / her line of sight. (B) in FIG. 8 is a diagram showing a display state of the display screen 60 of the TV receiver 50 when the user is gazing at one video. (C) in FIG. 8 is a diagram showing a display state of the display screen 60 of the TV receiver 50 when the audio output is switched. In these drawings, it is assumed that the display screen 60 of the TV receiver 50 is divided into four areas (A to D).
 図8中の(a)に示すように、表示画面60上の領域Aの映像の音声が出力されているとする。ここで、上述したように、領域A以外の領域(領域B~D)を予め定め時間以上注視しないと、視線認識部3は、ユーザの視線が他の領域に向けられていると判定しない。すなわち、領域Aの映像の音声が出力されているまま、視線を図中の矢印X、Y、およびZ方向に動かして領域B~Dの映像も視認しても音声出力が切り替わることがない。したがって、領域Aの映像の音声を聴きながらも、領域B~Dにてどのような映像が表示されているのか、あるいはどのような内容の映像が表示されているのか等を確認することができる。 Suppose that the audio of the video in the area A on the display screen 60 is output as shown in (a) of FIG. Here, as described above, the line-of-sight recognition unit 3 does not determine that the user's line of sight is directed to another area unless an area other than the area A (areas B to D) is watched for a predetermined time or more. That is, the audio output is not switched even if the video of the regions B to D is visually recognized by moving the line of sight in the directions of the arrows X, Y, and Z in the drawing while the audio of the video of the region A is output. Therefore, while listening to the audio of the video in the area A, it is possible to check what video is displayed in the areas B to D, what type of video is displayed, and the like. .
 そして、例えば、領域Bの映像の音声を出力することを希望する場合は、図8中の(b)に示すように、領域Bを予め定めた時間以上注視する。ユーザが領域Bを予め定めた時間以上注視したのを視線認識部3が検知すると、該視線認識部3は、ユーザが新たに領域Bを視認するようになったと判定する。そして、その情報は画像合成部18および音声出力選択部22に出力され、画像合成部18では、領域A、CおよびDの映像について、該映像の映像信号と該映像の字幕データとを合成する。合成された映像信号は、表示制御部19を介して表示装置7に送られ、画面上に表示される。これに対して、新たにユーザが視認するようになったと判定された領域Bの映像については、該映像の映像信号がそのまま表示制御部19を介して表示装置7に送られ、画面上に表示される。したがって、図8中の(c)に示すように、今までユーザが視認していると判定されていた領域Aの映像の音声の出力は停止し、字幕が付された映像が表示される。一方、新たにユーザが視認するようになったと判定された領域Bの映像については字幕が表示されなくなり、代わりに音声が出力されるようになる。なお、予め定めた時間はいかなる程度の時間であってもよく、特に限定があるわけではない。 For example, when it is desired to output the audio of the video in the area B, the area B is watched for a predetermined time or more as shown in (b) of FIG. When the line-of-sight recognition unit 3 detects that the user has watched the area B for a predetermined time or more, the line-of-sight recognition unit 3 determines that the user has newly viewed the area B. Then, the information is output to the image synthesizing unit 18 and the audio output selecting unit 22, and the image synthesizing unit 18 synthesizes the video signal of the video and the subtitle data of the video for the video in the areas A, C, and D. . The synthesized video signal is sent to the display device 7 via the display control unit 19 and displayed on the screen. On the other hand, for the video of the area B that is newly determined to be visually recognized by the user, the video signal of the video is sent as it is to the display device 7 via the display control unit 19 and displayed on the screen. Is done. Therefore, as shown in (c) in FIG. 8, the output of the audio of the video in the area A that has been determined to be visually recognized by the user is stopped, and the video with subtitles is displayed. On the other hand, subtitles are not displayed for the video in the region B that has been newly determined to be visually recognized by the user, and audio is output instead. The predetermined time may be any time and is not particularly limited.
 (映像音声出力装置1の適用)
 以上では、図9に示すように、映像音声出力装置1をコンテンツ再生装置100であるTV受像機50に適用する例を挙げたが、必ずしもこれに限定されるわけではない。例えば、上記のコンテンツ再生装置100としては、パーソナルコンピュータ、携帯型情報端末(PDA;Personal Data Assistant)、携帯電話、またはテレビジョン機能付きパーソナルコンピュータ等が適用可能である。
(Application of video / audio output device 1)
In the above, as shown in FIG. 9, the example in which the video / audio output device 1 is applied to the TV receiver 50 which is the content reproduction device 100 has been described, but the present invention is not necessarily limited thereto. For example, a personal computer, a portable information terminal (PDA; Personal Data Assistant), a mobile phone, a personal computer with a television function, or the like can be applied as the content reproduction apparatus 100 described above.
 〔第2の実施形態〕
 以下に、本発明に従った第2の実施形態を説明する。なお、第1の実施形態の場合と同様、以下の説明では、本発明を実施するために好ましい種々の限定が付されているが、本発明の技術的範囲は以下の実施の形態及び図面に限定されるものではない。
[Second Embodiment]
Hereinafter, a second embodiment according to the present invention will be described. As in the case of the first embodiment, in the following description, various limitations preferable for carrying out the present invention are given, but the technical scope of the present invention is described in the following embodiment and drawings. It is not limited.
 (映像音声出力装置1aの構成)
 本実施形態に係る映像音声出力装置は、画像合成部に出力する字幕データを切り替えるセレクタを備えていることを特徴としている。その詳細な構成について、図10を参照して説明する。図10は、映像音声出力装置1aの要部構成を示すブロック図である。
(Configuration of video / audio output device 1a)
The video / audio output apparatus according to the present embodiment is characterized by including a selector that switches caption data to be output to the image composition unit. The detailed configuration will be described with reference to FIG. FIG. 10 is a block diagram showing a main configuration of the video / audio output device 1a.
 図10に示すように、映像音声出力装置1aは、音声認識部4、画像合成部18、音声出力選択部22、セレクタ31、およびデコーダ35を備えている。映像音声出力装置1aでは、音声認識部4、セレクタ31、およびデコーダ35を1つの構成単位としたサブユニット1a’を、表示装置7に表示する映像の数だけ備えている。映像音声出力装置1aは、カメラ2および視線認識部3も備えているが、本図ではこれらの部材の図示を省略している。また、セレクタ31以外の部材は、第1の実施形態と同じ部材であるため、ここではその機能については言及しない。 As shown in FIG. 10, the video / audio output device 1 a includes a voice recognition unit 4, an image synthesis unit 18, a voice output selection unit 22, a selector 31, and a decoder 35. The video / audio output device 1 a includes the subunits 1 a ′ having the audio recognition unit 4, the selector 31, and the decoder 35 as one structural unit, as many as the number of videos to be displayed on the display device 7. The video / audio output device 1a also includes a camera 2 and a line-of-sight recognition unit 3, but these members are not shown in the figure. Further, since members other than the selector 31 are the same members as those in the first embodiment, their functions are not mentioned here.
 放送番組によっては、放送信号に放送局が作成した該放送番組の字幕表示用の字幕データ(第2字幕データ)が含まれている場合がある。この場合は、該放送信号をデコーダ35にて分離することによって、映像信号、音声信号、および字幕データを得ることができる。そこで、本実施形態に係る映像音声出力装置1aでは、受信した放送信号に字幕データが含まれている場合には、デコーダ35にて映像信号、音声信号、および字幕データを分離・生成し、字幕データをセレクタ31に出力する。また、映像信号は画像合成部18に出力され、音声信号は音声認識部4および音声出力選択部22に出力される。 Depending on the broadcast program, the broadcast signal may include subtitle data (second subtitle data) for subtitle display of the broadcast program created by the broadcast station. In this case, the video signal, the audio signal, and the caption data can be obtained by separating the broadcast signal by the decoder 35. Therefore, in the video / audio output device 1a according to the present embodiment, when caption data is included in the received broadcast signal, the decoder 35 separates and generates the video signal, the audio signal, and the caption data, and the caption Data is output to the selector 31. The video signal is output to the image synthesis unit 18, and the audio signal is output to the audio recognition unit 4 and the audio output selection unit 22.
 セレクタ31には音声認識部4が生成した字幕データも出力されるが、セレクタ31は、放送信号に字幕データが含まれている場合には、放送信号から得た字幕データを優先して画像合成に使用するように設定されている。そのため、セレクタ31では、放送信号から得た字幕データと、音声認識部4から得た字幕データとが入力された場合には、前者の字幕データを画像合成部18に出力する。一方、放送信号に字幕データが含まれておらず、音声認識部4から得た字幕データのみが入力された場合には、該音声認識部4から得た字幕データを画像合成部18に出力する。 Subtitle data generated by the voice recognition unit 4 is also output to the selector 31. However, when the broadcast signal includes subtitle data, the selector 31 preferentially subtitles data obtained from the broadcast signal to synthesize images. It is set to be used. Therefore, in the selector 31, when the caption data obtained from the broadcast signal and the caption data obtained from the voice recognition unit 4 are input, the former caption data is output to the image composition unit 18. On the other hand, when caption data is not included in the broadcast signal and only caption data obtained from the speech recognition unit 4 is input, the caption data obtained from the speech recognition unit 4 is output to the image composition unit 18. .
 画像合成部18では、視線認識部3からの判定結果に基づいて、ユーザが視認していると判定された映像以外の映像については、該映像の映像信号とセレクタ31から送られてきた該映像の字幕データとを合成する。合成された映像信号(合成映像信号)は、表示制御部19を介して表示装置7に送られ、画面上に表示される。これに対して、ユーザが視認していると判定された映像については、該映像の映像信号がそのまま表示制御部19を介して表示装置7に送られ、画面上に表示される。 In the image composition unit 18, for a video other than the video determined to be viewed by the user based on the determination result from the line-of-sight recognition unit 3, the video signal of the video and the video sent from the selector 31. Is combined with the subtitle data. The synthesized video signal (synthesized video signal) is sent to the display device 7 via the display control unit 19 and displayed on the screen. On the other hand, for the video determined to be visually recognized by the user, the video signal of the video is sent as it is to the display device 7 via the display control unit 19 and displayed on the screen.
 また、音声出力選択部22では、視線認識部3からの判定結果に基づいて、ユーザが視認していると判定された映像の音声信号のみを音声出力制御部23を介してスピーカ8に送られ、音声が出力される。結果、TV受像機50では、ユーザが視認していると判定された映像の音声のみが出力され、他の映像には字幕が表示される。 In addition, the audio output selection unit 22 sends only the audio signal of the video determined to be visually recognized by the user to the speaker 8 via the audio output control unit 23 based on the determination result from the line-of-sight recognition unit 3. , Audio is output. As a result, in the TV receiver 50, only the audio of the video determined to be viewed by the user is output, and the subtitles are displayed on the other videos.
 このように、放送番組の放送信号に字幕データが含まれている場合には、セレクタ31では該放送信号から得た字幕データを優先的に画像合成部18に出力するように構成されている。すなわち、映像音声出力装置1aでは、ユーザが視認していると判定された映像以外の映像が放送番組であり、なおかつ該番組の放送信号に字幕データが含まれている場合には、放送局(より正確には放送番組の作成者)が生成した字幕データを優先的に用いる。これによれば、ユーザに対して、放送番組の作成者が意図した字幕を表示することができる。 Thus, when subtitle data is included in the broadcast signal of a broadcast program, the selector 31 is configured to preferentially output the subtitle data obtained from the broadcast signal to the image composition unit 18. That is, in the video / audio output device 1a, when a video other than the video determined to be visually recognized by the user is a broadcast program, and the subtitle data is included in the broadcast signal of the program, the broadcast station ( More precisely, subtitle data generated by a broadcast program creator) is preferentially used. According to this, the subtitle intended by the creator of the broadcast program can be displayed to the user.
 〔第3の実施形態〕
 以下に、本発明に従った第3の実施形態を説明する。なお、第1の実施形態の場合と同様、以下の説明では、本発明を実施するために好ましい種々の限定が付されているが、本発明の技術的範囲は以下の実施の形態及び図面に限定されるものではない。
[Third Embodiment]
Hereinafter, a third embodiment according to the present invention will be described. As in the case of the first embodiment, in the following description, various limitations preferable for carrying out the present invention are given, but the technical scope of the present invention is described in the following embodiment and drawings. It is not limited.
 (映像音声出力装置1bの構成)
 本実施形態に係る映像音声出力装置は、画像合成部に出力する字幕データを切り替えるセレクタ、および字幕表示の制御を行う字幕認識部を備えていることを特徴としている。その詳細な構成について、図11を参照して説明する。図11は、映像音声出力装置1bの要部構成を示すブロック図である。
(Configuration of video / audio output device 1b)
The video / audio output device according to the present embodiment includes a selector that switches subtitle data to be output to the image synthesis unit, and a subtitle recognition unit that controls subtitle display. The detailed configuration will be described with reference to FIG. FIG. 11 is a block diagram showing a main configuration of the video / audio output device 1b.
 図11に示すように、映像音声出力装置1bは、音声認識部4、画像合成部18、音声出力選択部22、セレクタ31、字幕認識部32、およびデコーダ35を備えている。映像音声出力装置1bでは、音声認識部4、セレクタ31、字幕認識部32、およびデコーダ35を1つの構成単位としたサブユニット1b’を、表示装置7に表示する映像の数だけ備えている。映像音声出力装置1bは、カメラ2および視線認識部3も備えているが、本図ではこれらの部材の図示を省略している。また、字幕認識部32以外の部材は、第1および第2の実施形態と同じ部材であるため、ここではその機能については言及しない。 As shown in FIG. 11, the video / audio output device 1b includes a voice recognition unit 4, an image synthesis unit 18, an audio output selection unit 22, a selector 31, a caption recognition unit 32, and a decoder 35. The video / audio output device 1b includes as many subunits 1b 'having the audio recognition unit 4, the selector 31, the subtitle recognition unit 32, and the decoder 35 as one structural unit for the number of videos to be displayed on the display device 7. The video / audio output device 1b also includes a camera 2 and a line-of-sight recognition unit 3, but these members are not shown in the figure. Further, since members other than the caption recognition unit 32 are the same members as those in the first and second embodiments, their functions are not mentioned here.
 TV放送される映画あるいはドラマによっては、映像に字幕が埋め込まれている(含まれている)場合がある。すなわち、字幕が映像の一部となっている場合がある。この場合は、該映像の映像信号と、セレクタ31から送られた該映像の字幕データとを画像合成部18にて合成すると、字幕が二重に表示されてしまう。これでは、字幕が読みづらくなり、内容を理解するのは難しい。そこで、本実施形態に係る映像音声出力装置1bでは、字幕認識部32(第2判定部)が、表示する放送番組の映像には字幕が埋め込まれているか否かを判定する。そこで、ユーザが視認している映像以外の映像には字幕が埋め込まれていると判定された場合には、画像合成部18では、セレクタ31からの字幕データを該映像の映像信号に合成しない。 Depending on the movie or drama being broadcast on TV, captions may be embedded (included) in the video. That is, subtitles may be part of the video. In this case, when the video signal of the video and the caption data of the video sent from the selector 31 are combined by the image combining unit 18, the caption is displayed twice. This makes subtitles difficult to read and difficult to understand. Therefore, in the video / audio output device 1b according to the present embodiment, the caption recognition unit 32 (second determination unit) determines whether or not captions are embedded in the video of the broadcast program to be displayed. Therefore, when it is determined that captions are embedded in a video other than the video that the user is viewing, the image synthesis unit 18 does not synthesize the caption data from the selector 31 with the video signal of the video.
 より詳細には、受信した放送信号から、放送番組の番組情報(番組タイトル、ジャンル、番組内容、および出演者等の情報)、映像信号および音声信号(場合によっては字幕データも)をデコーダ35にて分離・生成する。番組情報および映像信号は字幕認識部32に出力される。また、映像信号は画像合成部18にも出力され、音声信号は音声認識部4および音声出力選択部22に出力される。なお、放送信号に字幕データが含まれている場合には、該字幕データはセレクタ31に出力される。 More specifically, from the received broadcast signal, program information of the broadcast program (information such as program title, genre, program content, and performers), video signal and audio signal (and possibly subtitle data) are sent to the decoder 35. To separate and generate. The program information and the video signal are output to the caption recognition unit 32. The video signal is also output to the image synthesis unit 18, and the audio signal is output to the audio recognition unit 4 and the audio output selection unit 22. Note that when the broadcast signal includes caption data, the caption data is output to the selector 31.
 上述したように、セレクタ31には音声認識部4が生成した字幕データも出力されるが、セレクタ31は、放送信号に字幕データが含まれている場合には、放送信号から得た字幕データを優先して画像合成に使用するように設定されている。そのため、セレクタ31では、放送信号から得た字幕データと、音声認識部4から得た字幕データとが入力された場合には、前者の字幕データを画像合成部18に出力する。一方、放送信号に字幕データが含まれておらず、音声認識部4から得た字幕データのみが入力された場合には、該音声認識部4から得た字幕データを画像合成部18に出力する。 As described above, the subtitle data generated by the voice recognition unit 4 is also output to the selector 31. However, when the subtitle data is included in the broadcast signal, the selector 31 receives the subtitle data obtained from the broadcast signal. It is set to be used preferentially for image composition. Therefore, in the selector 31, when the caption data obtained from the broadcast signal and the caption data obtained from the voice recognition unit 4 are input, the former caption data is output to the image composition unit 18. On the other hand, when caption data is not included in the broadcast signal and only caption data obtained from the speech recognition unit 4 is input, the caption data obtained from the speech recognition unit 4 is output to the image composition unit 18. .
 一方、字幕認識部32では、入力された番組情報に基づいて、放送番組のジャンルが映画あるいはドラマであるか否かを判定する。それと共に、入力された映像信号に基づいて、該放送番組の映像に文字列が含まれているか否かを判定する。具体的には、図12に示すように、上記映像の表示画面60上の左端(図中の領域P)、右端(図中の領域Q)、および下端(図中の領域R)に文字列が含まれるか否かをパターン認識によって検出する。字幕認識部32が、放送番組のジャンルが映画またはドラマであり、なおかつ該放送番組の映像に文字列が含まれていると判定すると、結果その放送番組の映像には字幕が埋め込まれていると判定する。そこで、字幕認識部32は、該番組の映像信号にセレクタ31からの字幕データを合成しないように画像合成部18に指示する。 On the other hand, the subtitle recognition unit 32 determines whether the genre of the broadcast program is a movie or a drama based on the input program information. At the same time, based on the input video signal, it is determined whether or not a character string is included in the video of the broadcast program. Specifically, as shown in FIG. 12, character strings are displayed at the left end (region P in the drawing), right end (region Q in the drawing), and lower end (region R in the drawing) on the video display screen 60. Is detected by pattern recognition. If the subtitle recognition unit 32 determines that the genre of the broadcast program is a movie or a drama and that the video of the broadcast program includes a character string, the subtitle is embedded in the video of the broadcast program as a result. judge. Therefore, the caption recognition unit 32 instructs the image composition unit 18 not to combine the caption data from the selector 31 with the video signal of the program.
 これにより、画像合成部18では、字幕認識部32からの指示に基づき、該字幕認識部32から指定された放送番組については、セレクタ31からの字幕データを該番組の映像信号に合成せずに、そのまま該映像信号を表示制御部19に送る。表示制御部19に送られた映像信号は表示装置7に送られ、画面上に表示される。また、視線認識部3によりユーザが視認していると判定された映像については、該映像の映像信号がそのまま表示制御部19を介して表示装置7に送られ、画面上に表示される。なお、字幕認識部32から指定されなかった放送番組については、通常どおりセレクタ31からの字幕データを該番組の映像信号に合成する。合成された映像信号は表示制御部19を介して表示装置7に送られ、画面上に表示される。字幕認識部32が指定した放送番組が、視線認識部3によりユーザが視認していると判定された映像であった場合も、通常どおりに処理される。 Thereby, the image synthesizing unit 18 does not synthesize the subtitle data from the selector 31 with the video signal of the program for the broadcast program specified by the subtitle recognizing unit 32 based on the instruction from the subtitle recognizing unit 32. Then, the video signal is sent to the display control unit 19 as it is. The video signal sent to the display control unit 19 is sent to the display device 7 and displayed on the screen. For the video determined by the line-of-sight recognition unit 3 to be viewed by the user, the video signal of the video is sent to the display device 7 as it is via the display control unit 19 and displayed on the screen. For broadcast programs not designated by the caption recognition unit 32, the caption data from the selector 31 is combined with the video signal of the program as usual. The synthesized video signal is sent to the display device 7 via the display control unit 19 and displayed on the screen. Even when the broadcast program designated by the caption recognition unit 32 is an image determined by the line-of-sight recognition unit 3 to be viewed by the user, it is processed as usual.
 また、音声出力選択部22では、視線認識部3からの判定結果に基づいて、ユーザが視認していると判定された映像の音声信号のみを音声出力制御部23を介してスピーカ8に送られ、音声が出力される。結果、TV受像機50では、ユーザが視認していると判定された映像の音声のみが出力され、他の映像には字幕が表示される。 In addition, the audio output selection unit 22 sends only the audio signal of the video determined to be visually recognized by the user to the speaker 8 via the audio output control unit 23 based on the determination result from the line-of-sight recognition unit 3. , Audio is output. As a result, in the TV receiver 50, only the audio of the video determined to be viewed by the user is output, and the subtitles are displayed on the other videos.
 このように、ユーザに表示する映像が放送番組である場合に、そのジャンルが映画またはドラマであり、なおかつ該放送番組の映像に文字列が含まれている場合には、その放送番組の映像には字幕が埋め込まれていると判定する。ユーザが視認していると判定された映像以外の映像がそのように判定された場合、映像音声出力装置1bでは該番組の映像信号にセレクタ31からの字幕データを合成しないように構成されている。したがって、映像音声出力装置1bでは、ユーザが視認していると判定された映像以外の映像に、TV放送される映画またはドラマを表示する場合において、該映画または該ドラマの映像に文字列が含まれていると、その映像には字幕が埋め込まれていると判定できるので、セレクタ31からの字幕データは表示画面上に表示されない。これによれば、その映画またはドラマに付されていた本来の字幕を表示することができる。 As described above, when the video to be displayed to the user is a broadcast program, if the genre is a movie or a drama and the video of the broadcast program includes a character string, the video of the broadcast program is displayed. Determines that subtitles are embedded. When the video other than the video determined to be visually recognized by the user is determined as such, the video / audio output device 1b is configured not to synthesize the caption data from the selector 31 with the video signal of the program. . Therefore, in the video / audio output device 1b, when a movie or drama to be broadcasted on TV is displayed on a video other than the video determined to be visually recognized by the user, the video or the video of the drama includes a character string. Since it can be determined that captions are embedded in the video, the caption data from the selector 31 is not displayed on the display screen. According to this, the original subtitles attached to the movie or drama can be displayed.
 なお、以上では、放送番組のジャンルが映画またはドラマである場合に限定したが、これに限らず、限定するジャンルに他のジャンルも含めても良いし、他のジャンルであっても良い。また、以上では、ユーザに表示する映像が放送番組である場合に、そのジャンルが映画またはドラマであり、なおかつ該放送番組の映像に文字列が含まれている場合には、その放送番組の映像には字幕が埋め込まれていると判定しているが、該映像に字幕が埋め込まれているか否かを判定する手段は、必ずしもこれに限定されるわけではない。 In the above description, the broadcast program genre is limited to a movie or a drama. However, the genre is not limited to this, and other genres may be included in the limited genre. In the above, when the video to be displayed to the user is a broadcast program, if the genre is a movie or a drama, and the video of the broadcast program includes a character string, the video of the broadcast program However, the means for determining whether or not captions are embedded in the video is not necessarily limited to this.
 さらに、上述した映像音声出力装置1bは、第2の実施形態に係る映像音声出力装置1aに字幕認識部32を加えた構成をしているが、必ずしもこれに限定されるわけではない。例えば、第1の実施形態に係る映像音声出力装置1に字幕認識部32を加えた構成も、本発明の範疇に入る。 Furthermore, although the video / audio output device 1b described above has a configuration in which the caption recognition unit 32 is added to the video / audio output device 1a according to the second embodiment, the present invention is not necessarily limited thereto. For example, a configuration in which the caption recognition unit 32 is added to the video / audio output device 1 according to the first embodiment also falls within the scope of the present invention.
 本発明は、上述した実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。すなわち、請求項に示した範囲で適宜変更した技術的手段を組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiment, and various modifications can be made within the scope shown in the claims. That is, embodiments obtained by combining technical means appropriately modified within the scope of the claims are also included in the technical scope of the present invention.
 ところで、上述したTV受像機50は、表示装置7の表示画面に1つ映像のみを映す通常のTV受像機として機能することも可能であることは言うまでもない。したがって、例えば、リモコン5を介して、複数の映像を視聴するモードと1つの映像のみを視聴するモードとをユーザが切り替えられるような構成にしても良い。また、ユーザによるチャンネルの選局方法については特に述べなかったが、選局方法は従来公知の手法を取り入れることができる。したがって、例えば、ユーザはリモコン5を介して視聴するチャンネルを1つまたは複数選局することができるし、他の方法によって選局することもできる。 Incidentally, it goes without saying that the above-described TV receiver 50 can also function as a normal TV receiver that displays only one image on the display screen of the display device 7. Therefore, for example, the configuration may be such that the user can switch between a mode for viewing a plurality of videos and a mode for viewing only one video via the remote controller 5. Further, although the channel selection method by the user has not been described in particular, a conventionally known method can be adopted as the channel selection method. Therefore, for example, the user can select one or a plurality of channels to be viewed via the remote controller 5, or can select a channel by another method.
 〔実施形態の総括〕
 以上のように、本発明の一態様に係る映像音声出力装置においては、放送波を受信する受信部と、受信した上記放送波から、放送中の番組の上記映像信号および上記音声信号を取得し、該放送波に字幕表示用の第2字幕データが含まれている場合には、該第2字幕データも取得する第1取得部とをさらに備え、上記合成部は、ユーザが視認していると判定された上記表示領域以外の上記表示領域に上記番組が表示され、なおかつ上記第1取得部が該番組の上記第2字幕データを取得した場合に、該表示領域に表示される上記番組の上記映像信号と、上記第1字幕データではなく、該表示領域に表示される上記番組の上記第2字幕データとを合成した上記合成映像信号を生成することを特徴としている。
[Summary of Embodiment]
As described above, in the video / audio output device according to one aspect of the present invention, the video signal and the audio signal of the broadcast program are acquired from the reception unit that receives the broadcast wave and the received broadcast wave. When the broadcast wave includes second subtitle data for subtitle display, the broadcast wave further includes a first acquisition unit that also acquires the second subtitle data, and the synthesizing unit is visually recognized by the user When the program is displayed in the display area other than the display area determined to be, and the first acquisition unit acquires the second subtitle data of the program, the program displayed in the display area is displayed. The composite video signal is generated by synthesizing the video signal and the second subtitle data of the program displayed in the display area instead of the first subtitle data.
 上記の構成によれば、放送波に字幕表示用の第2字幕データが含まれている場合には、映像音声出力装置では優先的に該放送波から得た第2字幕データを用いるように構成されている。したがって、映像音声出力装置では、放送局(より正確には放送番組の作成者)が生成した第2字幕データを優先的に用いる。これによれば、ユーザに対して、放送番組の作成者が意図した字幕を表示することができる。 According to the above configuration, when the second subtitle data for subtitle display is included in the broadcast wave, the video / audio output apparatus is configured to use the second subtitle data obtained from the broadcast wave preferentially. Has been. Therefore, the video / audio output apparatus preferentially uses the second subtitle data generated by the broadcast station (more precisely, the creator of the broadcast program). According to this, the subtitle intended by the creator of the broadcast program can be displayed to the user.
 また、本発明の一態様に係る映像音声出力装置においては、各上記表示領域に表示される上記映像内に字幕が含まれているか否かを判定する第2判定部をさらに備え、上記合成部は、上記映像内に上記字幕が含まれていると判定された上記表示領域に表示される上記映像の上記映像信号と、該表示領域に表示される上記映像の上記音声信号から生成した上記第1字幕データとを合成した上記合成映像信号を生成せず、上記出力部は、上記映像内に上記字幕が含まれていると判定された上記表示領域に表示される上記映像の上記映像信号を上記表示部に出力することを特徴としている。 The video / audio output device according to an aspect of the present invention further includes a second determination unit that determines whether or not a caption is included in the video displayed in each display area, and the synthesis unit Is generated from the video signal of the video displayed in the display area determined to include the subtitle in the video and the audio signal of the video displayed in the display area. The synthesized video signal synthesized with one caption data is not generated, and the output unit outputs the video signal of the video displayed in the display area determined to include the caption in the video. It outputs to the said display part.
 上記の構成によれば、ユーザが視認していると判定された表示領域以外の表示領域に表示する映像に字幕が埋め込まれている場合には、映像音声出力装置では該映像の映像信号に上記の第1字幕データを合成しないように構成されている。これによれば、その映像に付されていた本来の字幕を表示することができる。 According to the above configuration, when subtitles are embedded in a video displayed in a display area other than the display area determined to be visually recognized by the user, the video / audio output device adds the video signal to the video signal. The first subtitle data is not synthesized. According to this, the original subtitles attached to the video can be displayed.
 また、本発明の一態様に係る映像音声出力装置においては、放送波を受信する受信部と、受信した上記放送波から、放送中の番組の上記映像信号、上記音声信号、および該番組のジャンル情報を含む番組情報を取得する第1取得部とをさらに備え、上記第2判定部は、ユーザが視認していると判定された上記表示領域以外の上記表示領域に上記番組が表示される場合に、該番組のジャンルが映画またはドラマであり、なおかつ上記映像内に文字列が含まれているときに上記映像内に上記字幕が含まれていると判定することを特徴としている。 In the video / audio output device according to an aspect of the present invention, the video signal of the program being broadcast, the audio signal, and the genre of the program are received from the receiving unit that receives the broadcast wave, and the received broadcast wave. A first acquisition unit that acquires program information including information, and the second determination unit displays the program in the display region other than the display region determined to be visually recognized by the user In addition, when the genre of the program is a movie or a drama and a character string is included in the video, it is determined that the subtitle is included in the video.
 上記の構成によれば、上記の番組のジャンルが映画またはドラマであり、なおかつ該番組の映像に文字列が含まれている場合には、該番組の映像には字幕が埋め込まれていると判定する。その結果、映像音声出力装置では該映像の映像信号に上記の第1字幕データあるいは第2字幕データを合成しないように構成されている。これによれば、その映画またはドラマに付されていた本来の字幕を表示することができる。 According to the above configuration, when the genre of the program is a movie or a drama and the program video includes a character string, it is determined that subtitles are embedded in the video of the program. To do. As a result, the video / audio output apparatus is configured not to synthesize the first subtitle data or the second subtitle data with the video signal of the video. According to this, the original subtitles attached to the movie or drama can be displayed.
 また、本発明の一態様に係る映像音声出力装置においては、上記第1判定部は、複数の上記表示領域のうち、いずれかの上記表示領域に予め定めた時間以上ユーザの視線が向けられたことを検知した場合に、ユーザが該表示領域を視認していると判定することを特徴としている。 In the video / audio output device according to an aspect of the present invention, the first determination unit has the user's line of sight directed to any one of the display areas for a predetermined time or more. When this is detected, it is determined that the user is viewing the display area.
 上記の構成によれば、いずれかの表示領域を予め定め時間以上注視しないと、判定部は、ユーザの視線が他の領域に向けられていると判定しない。すなわち、1つの表示領域の映像の音声が出力されているまま、視線を動かして他の表示領域の映像も視認しても音声出力が切り替わることがない。したがって、1つの表示領域の映像の音声を聴きながらも、他の表示領域にてどのような映像が表示されているのか、あるいはどのような内容の映像が表示されているのか等を確認することができる。 According to the above configuration, the determination unit does not determine that the user's line of sight is directed to another area unless one of the display areas is determined in advance for a predetermined time. That is, the audio output is not switched even if the visual line is moved and the video of the other display area is visually recognized while the audio of the video of one display area is output. Therefore, while listening to the audio of the video in one display area, confirm what kind of video is being displayed in the other display area, what kind of content is being displayed, etc. Can do.
 また、本発明の一態様に係る映像音声出力装置においては、ユーザを録画するカメラをさらに備え、上記第1判定部は、上記カメラが録画した映像に基づいて、複数の上記表示領域のうち、いずれの上記表示領域をユーザが視認しているのかを判定することを特徴としている。 The video / audio output device according to an aspect of the present invention further includes a camera that records the user, and the first determination unit includes a plurality of display areas based on the video recorded by the camera. It is characterized by determining which display area the user is viewing.
 上記の合成によれば、ユーザ、特にユーザの眼球を録画した映像に基づいて、ユーザの視認している表示領域を特定することができる。 According to the above composition, the display area that is visually recognized by the user can be specified based on the video recording the user, particularly the user's eyeball.
 また、本発明の一態様に係る映像音声出力装置においては、外部に接続された機器から上記映像信号および上記音声信号を取得する第2取得部をさらに備えていることを特徴としている。 The video / audio output device according to an aspect of the present invention is further characterized by further including a second acquisition unit that acquires the video signal and the audio signal from an externally connected device.
 上記の構成によれば、外部に接続された機器からの映像を表示することもできる。 According to the above configuration, video from an externally connected device can also be displayed.
 発明の詳細な説明の項においてなされた具体的な実施形態または実施例は、あくまでも、本発明の技術内容を明らかにするものであって、そのような具体例にのみ限定して狭義に解釈されるべきものではなく、本発明の精神と次に記載する請求の範囲内で、いろいろと変更して実施することができるものである。 The specific embodiments or examples made in the detailed description section of the invention are merely to clarify the technical contents of the present invention, and are limited to such specific examples and are interpreted in a narrow sense. It should be understood that various modifications may be made within the spirit of the invention and the scope of the following claims.
 本発明の一態様に係る映像音声出力装置は、例えば、テレビジョン受像機、パーソナルコンピュータ、携帯型情報端末(PDA;Personal Data Assistant)、携帯電話、またはテレビジョン機能付きパーソナルコンピュータ等に適用可能である。 The video / audio output device according to one embodiment of the present invention can be applied to, for example, a television receiver, a personal computer, a personal digital assistant (PDA), a mobile phone, or a personal computer with a television function. is there.
1,1a,1b 映像音声出力装置
2 カメラ
3 視線認識部
4 音声認識部
18 画像合成部
22 音声出力選択部
31 セレクタ
32 字幕認識部
35 デコーダ
40 眼
50 テレビジョン受像機
60 表示画面
100 コンテンツ再生装置
1, 1a, 1b Video / audio output device 2 Camera 3 Line of sight recognition unit 4 Audio recognition unit 18 Image synthesis unit 22 Audio output selection unit 31 Selector 32 Subtitle recognition unit 35 Decoder 40 Eye 50 Television receiver 60 Display screen 100 Content playback device

Claims (9)

  1.  映像を表示する表示部に該映像の映像信号を出力すると共に、上記映像の音声を出力する音声出力部に該映像の音声信号を出力する出力部を備えた映像音声出力装置であって、
     上記表示部は、互いに異なる映像を表示する複数の表示領域を備えており、
     各上記表示領域に表示される上記映像の上記音声信号から字幕表示用の第1字幕データを生成する字幕生成部と、
     複数の上記表示領域のうち、いずれの上記表示領域をユーザが視認しているのかを判定する第1判定部と、
     ユーザが視認していると判定された上記表示領域以外の上記表示領域ごとに、該表示領域に表示される上記映像の上記映像信号と、該表示領域に表示される上記映像の上記音声信号から生成した上記第1字幕データとを合成した合成映像信号を生成する合成部とを備え、
     上記出力部は、ユーザが視認していると判定された上記表示領域に表示される上記映像の上記音声信号を上記音声出力部に出力すると共に、ユーザが視認していると判定された上記表示領域に表示される上記映像の上記映像信号、および他の上記表示領域の上記合成映像信号を上記表示部に出力することを特徴とする映像音声出力装置。
    A video / audio output device including an output unit that outputs an audio signal of the video to an audio output unit that outputs the audio of the video and outputs a video signal of the video to a display unit that displays the video,
    The display unit includes a plurality of display areas for displaying different images.
    A subtitle generating unit that generates first subtitle data for subtitle display from the audio signal of the video displayed in each of the display areas;
    A first determination unit that determines which of the plurality of display areas the user is viewing;
    For each display area other than the display area determined to be viewed by the user, the video signal of the video displayed in the display area and the audio signal of the video displayed in the display area A combining unit that generates a combined video signal by combining the generated first caption data;
    The output unit outputs the audio signal of the video displayed in the display area determined to be visually recognized by the user to the audio output unit, and the display determined to be visually recognized by the user A video / audio output device that outputs the video signal of the video displayed in a region and the synthesized video signal of another display region to the display unit.
  2.  放送波を受信する受信部と、
     受信した上記放送波から、放送中の番組の上記映像信号および上記音声信号を取得し、該放送波に字幕表示用の第2字幕データが含まれている場合には、該第2字幕データも取得する第1取得部とをさらに備え、
     上記合成部は、ユーザが視認していると判定された上記表示領域以外の上記表示領域に上記番組が表示され、なおかつ上記第1取得部が該番組の上記第2字幕データを取得した場合に、該表示領域に表示される上記番組の上記映像信号と、上記第1字幕データではなく、該表示領域に表示される上記番組の上記第2字幕データとを合成した上記合成映像信号を生成することを特徴とする請求項1に記載の映像音声出力装置。
    A receiver for receiving broadcast waves;
    When the video signal and the audio signal of the program being broadcast are acquired from the received broadcast wave, and the second subtitle data for subtitle display is included in the broadcast wave, the second subtitle data is also A first acquisition unit for acquiring,
    The synthesizing unit displays the program in the display area other than the display area determined to be viewed by the user, and the first acquisition unit acquires the second subtitle data of the program. The composite video signal is generated by synthesizing the video signal of the program displayed in the display area and the second subtitle data of the program displayed in the display area instead of the first subtitle data. The video / audio output apparatus according to claim 1.
  3.  各上記表示領域に表示される上記映像内に字幕が含まれているか否かを判定する第2判定部をさらに備え、
     上記合成部は、上記映像内に上記字幕が含まれていると判定された上記表示領域に表示される上記映像の上記映像信号と、該表示領域に表示される上記映像の上記音声信号から生成した上記第1字幕データとを合成した上記合成映像信号を生成せず、
     上記出力部は、上記映像内に上記字幕が含まれていると判定された上記表示領域に表示される上記映像の上記映像信号を上記表示部に出力することを特徴とする請求項1に記載の映像音声出力装置。
    A second determination unit that determines whether or not a caption is included in the video displayed in each display area;
    The synthesizing unit is generated from the video signal of the video displayed in the display area determined to include the subtitle in the video and the audio signal of the video displayed in the display area Without generating the synthesized video signal synthesized with the first caption data,
    The said output part outputs the said video signal of the said image | video displayed on the said display area determined with the said subtitles being included in the said video | video to the said display part. Video and audio output device.
  4.  放送波を受信する受信部と、
     受信した上記放送波から、放送中の番組の上記映像信号、上記音声信号、および該番組のジャンル情報を含む番組情報を取得する第1取得部とをさらに備え、
     上記第2判定部は、ユーザが視認していると判定された上記表示領域以外の上記表示領域に上記番組が表示される場合に、該番組のジャンルが映画またはドラマであり、なおかつ上記映像内に文字列が含まれているときに上記映像内に上記字幕が含まれていると判定することを特徴とする請求項3に記載の映像音声出力装置。
    A receiver for receiving broadcast waves;
    A first acquisition unit for acquiring, from the received broadcast wave, program information including the video signal of the program being broadcast, the audio signal, and genre information of the program;
    When the program is displayed in the display area other than the display area determined to be visually recognized by the user, the second determination unit is configured such that the genre of the program is a movie or a drama, and 4. The video / audio output apparatus according to claim 3, wherein when the character string is included in the video, it is determined that the subtitle is included in the video.
  5.  上記第1判定部は、複数の上記表示領域のうち、いずれかの上記表示領域に予め定めた時間以上ユーザの視線が向けられたことを検知した場合に、ユーザが該表示領域を視認していると判定することを特徴とする請求項1~4のいずれか1項に記載の映像音声出力装置。 When the first determination unit detects that the user's line of sight has been directed to any one of the plurality of display areas for a predetermined time, the user visually recognizes the display area. The video / audio output device according to any one of claims 1 to 4, wherein the video / audio output device is determined.
  6.  ユーザを録画するカメラをさらに備え、
     上記第1判定部は、上記カメラが録画した映像に基づいて、複数の上記表示領域のうち、いずれの上記表示領域をユーザが視認しているのかを判定することを特徴とする請求項1~5のいずれか1項に記載の映像音声出力装置。
    A camera for recording users;
    The first determination unit determines which of the plurality of display areas the user is viewing based on video recorded by the camera. The video / audio output device according to claim 5.
  7.  外部に接続された機器から上記映像信号および上記音声信号を取得する第2取得部をさらに備えていることを特徴とする請求項1~6のいずれか1項に記載の映像音声出力装置。 7. The video / audio output device according to claim 1, further comprising a second acquisition unit that acquires the video signal and the audio signal from an externally connected device.
  8.  請求項1~7のいずれか1項に記載の映像音声出力装置を備えていることを特徴とするテレビジョン受像機。 A television receiver comprising the video / audio output device according to any one of claims 1 to 7.
  9.  映像を表示する表示部に該映像の映像信号を出力すると共に、上記映像の音声を出力する音声出力部に該映像の音声信号を出力する映像音声出力方法であって、
     上記表示部は、互いに異なる映像を表示する複数の表示領域を備えており、
     各上記表示領域に表示される上記映像の上記音声信号から字幕表示用の字幕データを生成する生成ステップと、
     複数の上記表示領域のうち、いずれの上記表示領域をユーザが視認しているのかを判定する判定ステップと、
     上記判定ステップにおいて、ユーザが視認していると判定された上記表示領域以外の上記表示領域ごとに、該表示領域に表示される上記映像の上記映像信号と、該表示領域に表示される上記映像の上記音声信号から生成した上記字幕データとを合成した合成映像信号を生成する合成ステップと、
     上記判定ステップにおいて判定された上記表示領域に表示される上記映像の上記音声信号を上記音声出力部に出力すると共に、上記判定ステップにおいて判定された上記表示領域に表示される上記映像の上記映像信号、および他の上記表示領域の上記合成映像信号を上記表示部に出力する出力ステップとを備えていることを特徴とする映像音声出力方法。
    A video / audio output method for outputting a video signal of the video to a display unit that displays the video, and outputting an audio signal of the video to an audio output unit that outputs the audio of the video,
    The display unit includes a plurality of display areas for displaying different images.
    Generating subtitle data for subtitle display from the audio signal of the video displayed in each display area;
    A determining step of determining which of the plurality of display areas the user is viewing;
    The video signal of the video displayed in the display area and the video displayed in the display area for each display area other than the display area determined to be visually recognized by the user in the determination step Generating a synthesized video signal by synthesizing the caption data generated from the audio signal;
    The audio signal of the video displayed in the display area determined in the determination step is output to the audio output unit, and the video signal of the video displayed in the display area determined in the determination step And an output step of outputting the synthesized video signal of the other display area to the display unit.
PCT/JP2011/076814 2010-11-26 2011-11-21 Video image and audio output device, and video image and audio output method, as well as television image receiver provided with the video image and audio output device WO2012070534A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2010-263890 2010-11-26
JP2010263890 2010-11-26

Publications (1)

Publication Number Publication Date
WO2012070534A1 true WO2012070534A1 (en) 2012-05-31

Family

ID=46145876

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2011/076814 WO2012070534A1 (en) 2010-11-26 2011-11-21 Video image and audio output device, and video image and audio output method, as well as television image receiver provided with the video image and audio output device

Country Status (1)

Country Link
WO (1) WO2012070534A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3886444A4 (en) * 2018-11-27 2022-07-13 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Video processing method and apparatus, and electronic device and computer-readable medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000278626A (en) * 1999-03-29 2000-10-06 Sanyo Electric Co Ltd Multiple screens sound output controller
JP2005203863A (en) * 2004-01-13 2005-07-28 Casio Comput Co Ltd Television broadcast receiving device
JP2005252365A (en) * 2004-03-01 2005-09-15 Sony Corp Image signal processing apparatus, image signal processing method, program and medium recording the same
JP2007013725A (en) * 2005-06-30 2007-01-18 Toshiba Corp Video display device and video display method
JP2010109852A (en) * 2008-10-31 2010-05-13 Hitachi Ltd Video indexing method, video recording and playback device, and video playback device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000278626A (en) * 1999-03-29 2000-10-06 Sanyo Electric Co Ltd Multiple screens sound output controller
JP2005203863A (en) * 2004-01-13 2005-07-28 Casio Comput Co Ltd Television broadcast receiving device
JP2005252365A (en) * 2004-03-01 2005-09-15 Sony Corp Image signal processing apparatus, image signal processing method, program and medium recording the same
JP2007013725A (en) * 2005-06-30 2007-01-18 Toshiba Corp Video display device and video display method
JP2010109852A (en) * 2008-10-31 2010-05-13 Hitachi Ltd Video indexing method, video recording and playback device, and video playback device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3886444A4 (en) * 2018-11-27 2022-07-13 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Video processing method and apparatus, and electronic device and computer-readable medium
US11418832B2 (en) 2018-11-27 2022-08-16 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Video processing method, electronic device and computer-readable storage medium

Similar Documents

Publication Publication Date Title
JP4796209B1 (en) Display device, control device, television receiver, display device control method, program, and recording medium
WO2011125905A1 (en) Automatic operation-mode setting apparatus for television receiver, television receiver provided with automatic operation-mode setting apparatus, and automatic operation-mode setting method
US8750386B2 (en) Content reproduction device
JP2014072586A (en) Display device, display method, television receiver, program, and recording medium
JP5362834B2 (en) Display device, program, and computer-readable storage medium storing program
JP2007311942A (en) Content display apparatus
US20180241925A1 (en) Reception device, reception method, and program
JPWO2012001905A1 (en) Playback apparatus, voice selection method, and voice selection program
US20120194633A1 (en) Digital Broadcast Receiver
JP2007282077A (en) Broadcast receiver and broadcast receiving method
JP2011166315A (en) Display device, method of controlling the same, program, and recording medium
WO2011118837A1 (en) Display device and control method of same, television, program and storage medium
WO2012070534A1 (en) Video image and audio output device, and video image and audio output method, as well as television image receiver provided with the video image and audio output device
JP2010124429A (en) Video processing apparatus, video processing method and video processing program
KR101798961B1 (en) Broadcasting Signal Receiver and Driving Method thereof
JP5082562B2 (en) Digital broadcast receiving method and apparatus
WO2011013669A1 (en) Display device, program, and computer-readable storage medium having program recorded therein
JP2014150434A (en) Image output device
KR20100072681A (en) Apparatus and method for image displaying in image display device
KR20170106740A (en) Apparatus and method for playing subtitle based on gaze
KR20090002810A (en) Method for storing the broadcast on a data broadcast and a imaging apparatus having the same
KR101124735B1 (en) Apparatus and method for displaying subtitle and digital caption in digital display device
JP2008099091A (en) Image processing method and television receiver
JP2015126384A (en) Electronic apparatus, image display apparatus, and display method of the same
KR20100043581A (en) Apparatus and method for output controlling main screen and sub screen of television

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11843782

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11843782

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP