WO2006022071A1 - Video display and video displaying method - Google Patents

Video display and video displaying method Download PDF

Info

Publication number
WO2006022071A1
WO2006022071A1 PCT/JP2005/011423 JP2005011423W WO2006022071A1 WO 2006022071 A1 WO2006022071 A1 WO 2006022071A1 JP 2005011423 W JP2005011423 W JP 2005011423W WO 2006022071 A1 WO2006022071 A1 WO 2006022071A1
Authority
WO
WIPO (PCT)
Prior art keywords
speaker
display
video
information
subtitle
Prior art date
Application number
PCT/JP2005/011423
Other languages
French (fr)
Japanese (ja)
Inventor
Tatsuya Nishi
Original Assignee
Matsushita Electric Industrial Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co., Ltd. filed Critical Matsushita Electric Industrial Co., Ltd.
Publication of WO2006022071A1 publication Critical patent/WO2006022071A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4348Demultiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/44Receiver circuitry for the reception of television signals according to analogue transmission standards
    • H04N5/60Receiver circuitry for the reception of television signals according to analogue transmission standards for the sound signals

Definitions

  • the present invention relates to a video display device and a video display method, and more particularly to a video display device and a video display method for displaying subtitles.
  • Patent Document 1 discloses such a technique as shown in FIG.
  • Patent Document 2 as shown in FIG. 2, a balloon frame corresponding to the person displayed in the image is displayed, and data obtained by converting speech into characters is associated with the speaker of the speech.
  • a technique for displaying subtitles (character data) in a balloon frame This makes it easy to identify speakers who are difficult with subtitles alone, and makes it easy to understand the program contents even in silence.
  • Patent Document 1 Japanese Patent Laid-Open No. 2004-056286
  • Patent Document 2 Japanese Unexamined Patent Application Publication No. 2004-080069
  • An object of the present invention is to provide a video display device and a video display method that allow a viewer to easily grasp the contents of a program even in a silent state.
  • the video display device of the present invention creates a subtitle display image that associates subtitles with speaker information that allows the viewer to recognize the subtitle speaker, and synthesizes the generated subtitle display image and video.
  • a display processing unit and a display unit that displays an image synthesized by the display processing unit.
  • FIG. 1 is a diagram showing an image display method disclosed in Patent Document 1.
  • FIG. 2 is a diagram showing an image display method disclosed in Patent Document 2.
  • FIG. 3 is a block diagram showing a configuration of a broadcasting system according to Embodiment 1 of the present invention.
  • FIG. 4 is a conceptual diagram showing the processing of the caption processing unit shown in FIG.
  • FIG. 6 is a conceptual diagram showing the processing of the speaker information extraction unit shown in FIG.
  • FIG. 7 is a conceptual diagram showing the state of processing of the display processing unit shown in FIG.
  • FIG. 8 is a flowchart showing the processing procedure of the caption processing unit shown in FIG.
  • FIG. 13 is a block diagram showing a configuration of the second embodiment of the present invention.
  • FIG. 14 is a block diagram showing the configuration of the third embodiment of the present invention.
  • FIG. 15 is a block diagram showing the configuration of the third embodiment of the present invention.
  • FIG. 16 is a block diagram showing a configuration of a fourth embodiment of the present invention.
  • FIG. 17 is a block diagram showing the configuration of the fourth embodiment of the present invention.
  • FIG. 18 is a block diagram showing the configuration of the fourth embodiment of the present invention.
  • FIG. 19 is a block diagram showing a configuration of a fifth embodiment of the present invention.
  • FIG. 20 is a block diagram showing a configuration of a sixth embodiment of the present invention.
  • FIG. 21 is a block diagram showing the configuration of the seventh embodiment of the present invention.
  • FIG. 22 is a block diagram showing the configuration of the seventh embodiment of the present invention.
  • FIG. 23 is a block diagram showing a configuration of the eighth embodiment of the present invention.
  • FIG. 24 is a block diagram showing the configuration of the ninth embodiment of the present invention.
  • FIG.25B Diagram showing subtitle display order in descending order
  • FIG. 3 shows the configuration of the broadcasting system according to Embodiment 1 of the present invention.
  • the input device 101 is a camera, a microphone, a keyboard, or the like, through which caption information, video / audio content, and data content are input.
  • the video encoding unit 102 encodes video information in the audio / video content using a compression method such as Mpeg2, Mpeg4, or H.264, and outputs the encoded video information to the multiplexing processing unit 104. To do.
  • a compression method such as Mpeg2, Mpeg4, or H.264
  • the audio encoding unit 103 encodes audio information in the video / audio content using a compression method such as AAC, and outputs the encoded audio information to the multiplexing processing unit 104.
  • the multiplexing processing unit 104 includes video information output from the video encoding unit 102, audio information output from the audio encoding unit 103, other program information, program identification information, text information, Broadcast contents such as image information (hereinafter referred to as “other broadcast contents”) are multiplexed, and the multiplexed signal is output to the transmission path code unit 105.
  • Transmission path code unit 105 performs transmission processing such as encoding and modulation on the signal output from multiplexing processing unit 104, and transmits a broadcast wave from antenna 106.
  • the tuner unit 112 extracts the frequency signal of the channel specified by the user from the broadcast wave received via the antenna 111, and performs code demodulation processing of the extracted frequency signal.
  • the demodulated signal is output to the demultiplexing unit 113.
  • Demultiplexing section 113 separates the signal output from tuner section 112 into subtitle information, video information, and other broadcast contents, and outputs the separated subtitle information to subtitle processing section 115 to display the video information as video.
  • the data is output to the processing unit 117, and other broadcast contents are output to the speaker information extraction unit 118.
  • the caption information includes the speaker identification information, which is information such as an ID for identifying the speaker, and information on the caption itself, and the other broadcast content includes a speaker who recognizes the speaker of the caption. Information and speaker identification information.
  • the timer 114 measures the current time, and notifies the caption processing unit 115 and the display processing unit 120 of the measured current time.
  • the caption processing unit 115 stores the caption information output from the demultiplexing unit 113 in the caption history storage unit 116 for each speaker based on the speaker identification information. At this time, the display time is also stored in the caption history storage unit 116 with the current time notified from the timer 114 as the display time of the caption information. Also, the area for displaying the caption for each speaker is set as the speaker frame, the position for displaying the speaker frame (hereinafter simply referred to as “display position”) is determined, and the determined display position is also determined. Stored in the subtitle history storage unit 116.
  • FIG. 4 conceptually shows the processing of the caption processing unit 115. In this embodiment, as shown in FIG. 5, three speaker frames are prepared, and the display positions are “1”, “2”, and “3” in order from the top, and the determination of the display position is vacant. The lowest number among the display positions is determined.
  • the caption history storage unit 116 manages the speaker identification information, the display time, the display position, and the caption display information that is a set of captions in a table as shown in FIG.
  • the video processing unit 117 decodes the video stream output from the demultiplexing unit 113 and encoded with H.264 or the like, and outputs the decoded signal to the display processing unit 120.
  • the speaker information extraction unit 118 extracts the data power speaker identification information and the speaker information output from the multiplexing / separation unit 113, and sets the extracted speaker identification information and the speaker information as a pair. And stored in the speaker information storage unit 119.
  • Figure 6 schematically shows the processing performed by the speaker information extraction unit 118. As shown in FIG. 6, the speaker information storage unit 119 manages the speaker identification information and the speaker information as a set in a table.
  • the display processing unit 120 divides one image area into a subtitle display area for displaying subtitles and a video display area for displaying video, and the video output from the video processing unit 117 is used as the video display area.
  • the subtitle display information stored in the subtitle history storage unit 116 and the speaker information stored in the speaker information storage unit 119 are arranged in the subtitle display area, and these display images are synthesized.
  • the speaker frames are sorted based on the time notified from the timer 114 and the display time stored in the caption history storage unit 116. Since the display processing unit 120 separates the subtitle display and the video display on the same screen, the video display and the subtitle display are not overlapped, and the video or subtitle display can be prevented from being invisible. .
  • the synthesized image is output to the display unit 121.
  • Fig. 7 conceptually shows how the display processing unit 120 processes.
  • the display processing unit 120 dynamically distributes the speaker frame according to the presence or absence of the speaker, and arranges the speaker frame when there is a speaker's speech. If there is no speaker's speech, no speaker frame is placed, so that if the screen ratio of the video and the screen ratio of the video display device are different, the surplus area can be used effectively as a subtitle display area. .
  • the display unit 121 displays the composite image output from the display processing unit 120.
  • step (hereinafter abbreviated as “ST”) 131 subtitles that have been displayed for more than a specified time (for example, 5 seconds) are deleted from the subtitle display information of subtitle history storage section 116. Move to ST132.
  • Figure 9 shows how subtitles are deleted.
  • ST132 subtitle display information that has passed a specified time after displaying the subtitle, or subtitle display information that has been used for two or more frames and only the subtitle is deleted, is moved to ST133.
  • Figure 10 shows the screen display information deleted.
  • the deletion designation time of the caption should be equal to the deletion designation time of the caption display information.
  • the specified deletion time is the time to display the subtitle of the first speaker after displaying the subtitle of the second speaker different from the first speaker while displaying the subtitle of the first speaker. .
  • ST133 it is determined whether or not the new subtitle information has been acquired from the demultiplexing unit 113. If it is determined that new subtitle information has been acquired (YES), the process proceeds to ST134, where new subtitle information is acquired. If it is determined that the information has not been acquired (NO), the process returns to ST131, and the processes of ST131 to ST133 are repeated until it is determined that new caption information has been acquired.
  • ST134 it is determined whether or not there is power in the speaker frame. Specifically, since there is an upper limit on the number of speaker frames that can be displayed depending on the specifications of the screen size, whether or not the number of information speaker frames stored in the caption history storage unit 116 is the upper limit. Is determined. For example, in the case of the upper limit power, if the number of speakers is 3 or less, it is determined that the number is not the upper limit, and if the number of speakers is 4, the upper limit is determined. That is, if it is not the upper limit, it is determined that there is an empty speaker frame (YES), and the process proceeds to ST136. If it is the upper limit, it is determined that there is no empty speaker frame (NO), and the process proceeds to ST135.
  • ST136 whether the same speaker identification information as the speaker identification information included in the new caption information acquired from the demultiplexing unit 113 exists in the caption history storage unit 116, that is, the stored card Determine whether or not. If it is determined that it exists (YES), the process proceeds to ST138, and if it is determined that it does not exist (NO), the process proceeds to ST137.
  • new caption display information is recorded in a free area of caption history storage section 116 based on the new caption information.
  • the subtitle display information including the speaker identification information included in the newly acquired subtitle information and the same speaker identification information stored in the subtitle history storage unit 116 has the power (stored). Whether or not) is determined.
  • ST131 subtitles after the specified time have also been deleted in ST131
  • ST132 when the subtitle display information within the specified time is not deleted from the subtitle display, only the subtitle display information is deleted. Therefore, it is determined whether or not only the caption is deleted. If it is determined that subtitles are present (YES), the process proceeds to ST140, and if it is determined that no subtitles are present (NO), the process proceeds to ST139.
  • new caption information is stored in the caption display information (not including the caption) including the same speaker identification information stored in caption caption storage section 116.
  • ST140 it is determined whether or not the display position next to the display position corresponding to the same speaker identification information stored in subtitle history storage section 116 is empty. For example, if the display position of the lowest speaker identification information stored in the caption history storage unit 116 is the second from the top, the next display position, that is, the top 3 It is determined whether or not the display position of the eye is empty. If it is determined that it is empty (YES), the process proceeds to ST 142, and if it is determined that it is not empty (NO), the process proceeds to ST 141. Also, if the same speaker identification information is the lowest display position and there is no next display position, it is determined that it is not empty and (NO).
  • ST142 among the same speaker identification information stored in subtitle history storage section 116 in ST141, it is determined that the next display position of the lowest display position is not empty. Create a space at the display position of. Specifically, the same speaker identification information If the display position of the lowest information is the second from the top, and if the next display position, that is, the third display position from the top is not empty, then the top three The display position of the caption display information for the eyes is shifted up to the fourth position, and the fourth and subsequent positions from the top are also shifted up.
  • the caption processing unit 115 dynamically stores the caption information for each speaker in the caption history storage unit 116, and deletes the caption information stored in the caption history storage unit 116 in order of age.
  • one image area is divided into a caption display area and a video display area, and information indicating a speaker is associated with captions indicating the content of the speaker.
  • a plurality of icons, speaker frames, fonts, character colors, character sizes, etc. may be prepared, and the display designation information may be used so that any one of these may be used.
  • the speaker information extraction unit 118 can combine the other broadcast content capability speaker identification information with the display designation information.
  • the subtitle display image is created according to the extracted display designation information.
  • the broadcast content does not contain multiple speaker information, the default information prepared in advance is used.
  • the display processing unit 120 displays a speaker frame of a plurality of speakers, and when a speaker's speech continues, a speaker frame of another speaker is displayed. May be deleted, and the speaker frame of a speaker who makes a continuous speech in the deleted area may be expanded. If the speaker frame is expanded to the maximum and the speech continues, scroll the subtitles. Thereby, a long subtitle can be displayed.
  • the other broadcast contents include time control mode (TMD) and display start time (STM) t and time information, and when this time information is used, explain.
  • TMD time control mode
  • STM display start time
  • the speaker information extraction processing unit 151 extracts time information from the data output from the demultiplexing unit 113.
  • the extracted time information is output to the caption processing unit 115 and the display processing unit 120.
  • the video display device 150 can omit the timer for measuring the current time, and the device scale can be reduced.
  • storage device 161 is a DVD (Digital Versatile Disc), an SD card, a hard disk, or the like, and video and audio content and data content are stored. ing.
  • the video display device 160 can simultaneously display video, speaker information, and subtitles using the video / audio content and data content stored in the storage device 161.
  • the video display device 165 receives the broadcast wave, records the received signal demodulated by the tuner unit 112, records it in the recording processing unit 166, and stores it in the storage device 161. You may have the reception video recording function to memorize. In this case, the received broadcast wave may be demodulated and displayed in real time, or stored in the storage device 161 and displayed.
  • FIG. 16 shows the configuration of a broadcasting system according to Embodiment 4 of the present invention.
  • the communication unit 171 transmits / receives video / audio content and data content from the servo 180 via a communication network such as the Internet network.
  • the communication method of the communication unit 171 may be wired or wireless regardless of the type, such as a network adapter, a wireless local area network (LAN), Bluetooth, or infrared communication.
  • the server 180 inputs speaker information using a camera, a keyboard, or the like as the input device 181, stores the speaker information in the speaker information storage unit 182, and stores the speaker information in the video display device 170 via the communication unit 183. Send.
  • the video display device 170 can acquire the video / audio content and the caption information from the broadcast wave, and can acquire the speaker information from the communication network. This makes it possible to When the speaker information of a program is acquired in advance via a network or data broadcasting, and the video of that program is played, the viewer can use the acquired speaker information to Can be easily grasped.
  • the communication unit 171 may acquire the video / audio content, the caption information, and the speaker information from the communication network.
  • subtitle information and speaker information may be acquired from the communication network, and video / audio content may be acquired from the broadcast wave.
  • Video, speaker information, and subtitles can be displayed at the same time even in analog broadcasting that is not included.
  • FIG. 19 shows the configuration of a broadcasting system according to Embodiment 5 of the present invention.
  • the authentication processing unit 192 acquires the authentication information input by the user from the input device 191, and sends an inquiry about the acquired authentication information via the communication unit 171. To 200.
  • the speaker information distribution apparatus 200 receives an inquiry for authentication from the video display apparatus 190 via the communication unit 201, and the authentication processing unit 202 collates the authentication information. A plurality of kinds of speaker information stored in the speaker information storage unit 203 are transmitted. Note that the speaker information stored in the speaker information storage unit 203 is input by the input device 204.
  • the storage device 193 in the video display device 190 is an SD card or the like having a secure area, and a plurality of types of speaker information acquired from the speaker information distribution device 200 and program identification information using the speaker information. (Program name, broadcast station name, channel, start time, end time, other ID, etc.) and authentication information input from the input device 191 are stored, and only the authentication processing unit 192 can access.
  • the authentication processing unit 192 accesses the storage device 193, reads the information stored in the storage device 193, and stores the speaker information. Write to part 119.
  • the authentication processing unit 192 deletes the information written in the speaker information storage unit 119.
  • the authentication process Information here, multiple types of speaker information
  • only a video display device that has been successfully authenticated can obtain a plurality of types of speaker information, and can perform rich subtitle display.
  • FIG. 20 shows the configuration of a broadcast system according to Embodiment 6 of the present invention.
  • the video display device 210 includes a first communication unit 171 connected to the speaker information distribution device 200 via a communication network such as the Internet network, a non-contact IC such as a watermelon (registered trademark), a wireless A second communication unit 211 that performs communication using a tag, infrared rays, or the like is provided.
  • a communication network such as the Internet network
  • a non-contact IC such as a watermelon (registered trademark)
  • a wireless A second communication unit 211 that performs communication using a tag, infrared rays, or the like is provided.
  • the key distribution device 220 When receiving the key acquisition request from the video display device 210 via the communication unit 221, the key distribution device 220 receives the key (or the key and the address of the speaker information distribution device) managed by the key distribution management unit 222. The video information is distributed to the video display device 210, and the information is notified to the speaker information distribution device 200. Note that authentication information (key, ID) managed by the key distribution management unit 222 is input by the input device 223.
  • Speaker information distribution apparatus 200 receives a key distribution notification from key distribution apparatus 220 and adds the information to the authentication information managed by authentication processing unit 202. In addition, upon receiving an authentication inquiry using a key from the video display device 210, the authentication processing unit 202 performs authentication, and a plurality of information stored in the speaker information storage unit 203 is only stored for the video display device that has been successfully authenticated. Send the type of speaker information.
  • the video display device that has acquired the key from the key distribution device 220 can obtain rich speaker information and perform subtitle display using a plurality of types of speaker information. For this reason, providing a key service to users who have purchased multiple speaker information, or distributing keys at the store where the program was purchased when purchasing goods related to the program, provides a service It can be done.
  • only the video display device that has acquired the key can obtain a plurality of types of speaker information, and a rich subtitle display using a plurality of types of speaker information. It can be performed.
  • audio processing unit 231 decodes the audio stream output from multiplexing / demultiplexing unit 113, and audio analysis is performed on the decoded audio stream. Output to part 232
  • the sound analysis unit 232 analyzes the sound stream output from the sound processing unit 231 and outputs analysis results such as volume and pitch to the display processing unit 233. Also, by analyzing the characteristics of the voice, information expressing emotions such as emotions, gender information, and information indicating age (for example, baby, child, adult, elderly person, etc.) are generated and output to the display processing unit 233. .
  • the display processing unit 233 creates a caption display image using the audio analysis result output from the audio analysis unit 232.
  • the volume is associated with the character size
  • the pitch is associated with the character color.
  • information representing emotions is associated with fonts, and gender is associated with highlight colors.
  • the decoration corresponding to each content of the voice analysis result is not limited to this.
  • the seventh embodiment by visually displaying the result of analyzing the voice of the speaker in the caption display image, information other than characters can be displayed in caption, and the program is displayed. The contents can be grasped more easily.
  • the video display device 235 has a video analysis unit 236, the video analysis unit 236 analyzes the video stream output from the video processing unit 117, and the display processing unit 233 The decoration corresponding to the analysis result such as the size may be performed.
  • the display processing unit 233 may perform decoration corresponding to scenes such as morning, noon, night, sea, mountain, and soccer.
  • audio processing section 231 decodes the audio stream output from multiplexing / demultiplexing section 113, and the decoded audio stream is sent to the speaker. Output to analysis unit 241.
  • the speaker analysis unit 241 detects a speaker from video and audio, and extracts an image of the speaker.
  • the extracted image is enlarged / reduced to a specified size to obtain speaker information.
  • the speaker information is stored in the subtitle history storage unit 116 together with the subtitle information processed by the subtitle processing unit 115. It is assumed that the technology for detecting a speaker's video and audio power uses existing technology, for example, the technology described in Patent Document 1.
  • the voice recognition unit 251 converts the voice stream output from the voice processing unit 231 into voice information, thereby converting it into character information. Generate caption information.
  • the generated caption information is stored in the caption history storage 1
  • caption display can be performed even when the speaker information and the caption information are not included in the broadcast wave.
  • the subtitle display order may be displayed in order (ascending order) from the top of the subtitle display area, or in order (descending order) from the bottom of the subtitle display area as shown in Fig. 25B. You can do it.
  • the size of the speaker frame is changed step by step to bright colors, plain colors, the subtitle text color is gradually reduced, and the font size is further increased. May be reduced step by step, or the display order may be numbered. Thereby, the user can recognize the display order of the subtitles without reading the subtitles. You can let the user set the display order (descending or ascending order) of the captions!
  • a subtitle display image in which subtitles and speaker information for allowing a viewer to recognize a subtitle speaker are associated is created, and the created subtitle display image and video are synthesized.
  • the subtitle and the speaker in the video can be associated with each other, so that even if the speaker is displayed on the video, the viewer can recognize the program contents in a silent state. It can be easily grasped.
  • a screen comprising speaker information acquisition means for acquiring speaker information, and speaker information storage means for storing the acquired speaker information.
  • An image display device for displaying the acquired speaker information.
  • a third aspect of the present invention is the video display apparatus according to the above aspect, wherein the speaker information acquisition means acquires speaker information in advance before starting reception of a program.
  • a fourth aspect of the present invention is the video display device according to the above aspect, wherein the speaker information acquisition means acquires speaker information together with reception of a program.
  • the display processing unit is different from the subtitle display area in which the display processing unit displays one subtitle display area and the subtitle display area.
  • the video display device is divided into a video display area to be displayed, video including a speaker is arranged in the video display area, and subtitles and speaker information corresponding to the speaker are arranged in the subtitle display area.
  • the display processing unit dynamically sets a speaker frame, which is a region for displaying a caption for each speaker, according to the presence or absence of the speaker.
  • Distribute It is a video display device.
  • the display processing means displays a subtitle of a second speaker different from the first speaker while displaying the subtitle of the first speaker. From the video display device, the subtitles of the first speaker are deleted.
  • the display processing means is displayed when an upper limit of the number of speaker frames can be displayed and when a new speaker appears.
  • This is a video display device in which a speaker frame is deleted and a speaker frame corresponding to a new speaker is arranged.
  • the display processing means is a plurality of types of speaker information that allows a viewer to recognize the same speaker acquired by the speaker information acquisition means.
  • the video display device creates a caption display image based on display designation information indicating which speaker information is used.
  • rich subtitle display can be performed by designating speaker information from among a plurality of types that allows the viewer to recognize the same speaker.
  • the display processing means displays a speaker frame of a plurality of speakers and a speaker's speech continues
  • This is a video display device that deletes the speaker frame of the speaker and extends the speaker frame of any of the speakers to the deleted area.
  • a speaker frame of a speaker having a continuous speech is replaced with another speaker frame being displayed. It is possible to display a long subtitle by deleting and expanding to the deleted area.
  • An eleventh aspect of the present invention includes, in the above aspect, an analysis unit that analyzes video or audio, and the display processing unit is a video that decorates subtitles based on an analysis result of the analysis unit. It is a display device.
  • the display processing means associates the display order of the caption display information with the decoration of the caption, and installs the caption according to the display order of the caption display information. It is a video display device to decorate.
  • a thirteenth aspect of the present invention is a broadcast wave transmission device that transmits video, subtitles, and speaker information that allows a viewer to recognize a speaker of the subtitles as a broadcast wave, and a transmission from the broadcast wave transmission device.
  • a video display device having display means for displaying the image synthesized by the display processing means.
  • the subtitle and the speaker in the video can be associated with each other, so that even if the speaker is displayed on the video, the viewer can recognize the program content in a silent state. It can be easily grasped.
  • a fourteenth aspect of the present invention provides a recording device that records speaker information that allows a viewer to recognize a speaker of video, captions, and captions, and subtitles included in the information recorded in the recording device.
  • a video having: a subtitle display image associated with speaker information; a display processing unit that combines the generated subtitle display image with the video; and a display unit that displays the image combined by the display processing unit. And a display device.
  • the subtitles and the speakers in the video can be associated with each other. Therefore, even if the speakers are displayed on the video, the viewer can recognize them, and the program contents can be displayed in a silent state. easily I can grasp it.
  • an authentication processing means for performing an authentication process and a video display device authenticated by the authentication processing means allow a plurality of different speakers to recognize the same speaker.
  • An authentication system comprising: a display processing unit configured to combine an image and a video; and a display unit configured to display an image combined by the display processing unit.
  • the video display device authenticated by the authentication device acquires a plurality of different speaker information that allows the viewer to recognize the same speaker, the authenticated video display device Rich subtitle display can be performed.
  • a subtitle display image in which subtitles and speaker information for allowing a viewer to recognize a subtitle speaker are associated is created, and the created subtitle display image and video are synthesized.
  • a video display method comprising: a display processing step; and a display step for displaying an image synthesized in the display processing step.
  • the subtitles and the speakers in the video can be associated with each other, so that even if the speakers are displayed on the video, the viewer can recognize them, and the program contents can be displayed in a silent state. It can be easily grasped.
  • the video display device and video display method according to the present invention make it easy for the viewer to grasp the program contents even in a silent state! It can be applied to a mobile phone having a small effect and a small screen size.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Controls And Circuits For Display Device (AREA)
  • Television Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A video display enabling the viewer to easily grasp the program content even in a no-sound state. In the video display, a demultiplexing section (113) outputs data, and subtitle information, out of the data outputted from the demultiplexing section (113), classified by speaker according to speaker identification information, is stored in a subtitle record storage section (116). From the data outputted from the demultiplexing section (113), a speaker information extracting section (118) extracts speaker identification information and the speaker information, and the set of the speaker identification information and the speaker information is stored in a speaker information storage section (119). A display processing section (120) divides one image area into a subtitle display area for displaying subtitle and a video display area for displaying a video, allocates the video outputted from a video processing section (117) to the video display area, allocates the subtitle display information stored in the subtitle record storage section (116) and the speaker information stored in the speaker information storage (119) to the subtitle display area, and combines these display images. The combined images are displayed on the display section (121).

Description

明 細 書  Specification
映像表示装置及び映像表示方法  Video display device and video display method
技術分野  Technical field
[0001] 本発明は、映像表示装置及び映像表示方法に関し、特に、字幕の表示を行う映像 表示装置及び映像表示方法に関する。  [0001] The present invention relates to a video display device and a video display method, and more particularly to a video display device and a video display method for displaying subtitles.
背景技術  Background art
[0002] 近年、テレビの視聴が可能な携帯電話等の小型携帯端末が普及しつつあり、ユー ザは移動中又は移動先でもテレビの電波を受信できれば場所に制約されることなく テレビを視聴することができる。  [0002] In recent years, small mobile terminals such as mobile phones capable of watching TV are becoming widespread, and a user can watch TV without being restricted by a location as long as the user can receive radio waves from the TV while moving or at a destination. be able to.
[0003] このような小型携帯端末でのテレビ視聴を想定した場合、公共スペースでの視聴も 考えられる。特に、公共交通機関の利用中や病院等の待ち時間に視聴する場合な ど、周囲への配慮が必要な場所では、テレビの音声が周囲に伝わらないように視聴 することが求められる。  [0003] When such TV viewing on a small portable terminal is assumed, viewing in a public space is also conceivable. In particular, it is necessary to watch TV audio so that it is not transmitted to the surroundings, such as when using public transportation or when waiting at a hospital waiting time.
[0004] このような場所では、一般に、ヘッドフォンを装着し、周囲に音声が伝わることを防 止している力 ヘッドフォンを取り出して装着するには手間がかかり、短時間の視聴が 予想される場合にはヘッドフォンの使用は好ましくな 、。  [0004] In such a place, it is generally necessary to wear headphones to prevent sound from being transmitted to the surroundings. It takes time and effort to take out and put on headphones, and a short viewing time is expected. The use of headphones is preferred.
[0005] また、ヘッドフォンを装着せずに、音声を消して無音状態とし、字幕放送等を利用し て視聴することも考えられ、このような技術として特許文献 1には、図 1に示すように、 また、特許文献 2には、図 2に示すように、画像に表示された人物に対応する吹き出 し用の枠を表示し、音声を文字に変換したデータをその音声の発言者に対応させ、 字幕 (文字データ)を吹き出し用の枠に表示する技術がそれぞれ開示されている。こ れにより、字幕のみでは困難な発言者の特定を容易に行うことができ、無音状態でも 番組内容の把握が容易となる。  [0005] In addition, it is conceivable that the headphone is not worn and the sound is muted and the sound is silenced and the subtitle broadcast is used for viewing. Patent Document 1 discloses such a technique as shown in FIG. In addition, in Patent Document 2, as shown in FIG. 2, a balloon frame corresponding to the person displayed in the image is displayed, and data obtained by converting speech into characters is associated with the speaker of the speech. And a technique for displaying subtitles (character data) in a balloon frame. This makes it easy to identify speakers who are difficult with subtitles alone, and makes it easy to understand the program contents even in silence.
特許文献 1:特開 2004— 056286号公報  Patent Document 1: Japanese Patent Laid-Open No. 2004-056286
特許文献 2:特開 2004 - 080069号公報  Patent Document 2: Japanese Unexamined Patent Application Publication No. 2004-080069
発明の開示  Disclosure of the invention
発明が解決しょうとする課題 [0006] し力しながら、上記特許文献 1及び特許文献 2に開示された技術では、吹き出し用 の枠を表示するため映像が隠れてしまう。特に、小型携帯端末では、表示画面も小 型であるため画面の多くの領域が吹き出し用の枠で占有されてしまい、重要な映像 が隠れてしまう。また、番糸且の演出等によっては、必ずしも、字幕に表示された内容が 映像に表示されている人物の発言と一致するとは限らないので、このような場合、上 記特許文献 1及び特許文献 2に開示された技術では、発言者と吹き出し用の枠を対 応させることができない。 Problems to be solved by the invention [0006] However, with the techniques disclosed in Patent Document 1 and Patent Document 2, a frame for speech balloons is displayed to hide the video. In particular, in a small portable terminal, since the display screen is also small, many areas of the screen are occupied by balloon frames, and important images are hidden. In addition, depending on the production of the warp and the like, the content displayed in the subtitle does not necessarily match the remarks of the person displayed in the video. In such a case, the above-mentioned Patent Document 1 and Patent Document With the technology disclosed in 2, it is not possible to associate the speaker with the balloon frame.
[0007] 本発明の目的は、無音状態でも番組内容を視聴者に容易に把握させる映像表示 装置及び映像表示方法を提供することである。  [0007] An object of the present invention is to provide a video display device and a video display method that allow a viewer to easily grasp the contents of a program even in a silent state.
課題を解決するための手段  Means for solving the problem
[0008] 本発明の映像表示装置は、字幕と字幕の発言者を視聴者に認知させる発言者情 報とを対応付けた字幕表示画像を作成し、作成した字幕表示画像と映像とを合成す る表示処理手段と、前記表示処理手段によって合成された画像を表示する表示手段 と、を具備する構成を採る。 [0008] The video display device of the present invention creates a subtitle display image that associates subtitles with speaker information that allows the viewer to recognize the subtitle speaker, and synthesizes the generated subtitle display image and video. A display processing unit, and a display unit that displays an image synthesized by the display processing unit.
[0009] この構成によれば、字幕と映像内の発言者とを対応付けられるので、映像に表示さ れて 、な 、発言者でも視聴者は認識することができ、無音状態で番組内容を容易に 把握することができる。 [0009] According to this configuration, since the subtitle and the speaker in the video can be associated with each other, even if the speaker is displayed on the video, the viewer can recognize the program contents in a silent state. It can be easily grasped.
発明の効果  The invention's effect
[0010] 本発明によれば、無音状態でも番組内容を視聴者に容易に把握させる映像表示 装置及び映像表示方法を提供することができる。  [0010] According to the present invention, it is possible to provide a video display device and a video display method that allow a viewer to easily grasp the contents of a program even in a silent state.
図面の簡単な説明  Brief Description of Drawings
[0011] [図 1]特許文献 1に開示された画像表示方法を示す図  FIG. 1 is a diagram showing an image display method disclosed in Patent Document 1.
[図 2]特許文献 2に開示された画像表示方法を示す図  FIG. 2 is a diagram showing an image display method disclosed in Patent Document 2.
[図 3]本発明の実施の形態 1に係る放送システムの構成を示すブロック図  FIG. 3 is a block diagram showing a configuration of a broadcasting system according to Embodiment 1 of the present invention.
[図 4]図 3に示す字幕処理部の処理の様子を示す概念図  FIG. 4 is a conceptual diagram showing the processing of the caption processing unit shown in FIG.
[図 5]発言者枠の表示位置を示す図  [Figure 5] Diagram showing the display position of the speaker frame
[図 6]図 3に示す発言者情報抽出部の処理の様子を示す概念図  FIG. 6 is a conceptual diagram showing the processing of the speaker information extraction unit shown in FIG.
[図 7]図 3に示す表示処理部の処理の様子を示す概念図 [図 8]図 3に示す字幕処理部の処理の手順を示すフロー図 FIG. 7 is a conceptual diagram showing the state of processing of the display processing unit shown in FIG. FIG. 8 is a flowchart showing the processing procedure of the caption processing unit shown in FIG.
圆 9]字幕を削除した様子を示す概念図  [9] Conceptual diagram showing how subtitles are deleted
圆 10]字幕表示情報を削除した様子を示す概念図  圆 10] Conceptual diagram showing how caption display information is deleted
[図 11]表示指定情報を示す概念図  [Figure 11] Conceptual diagram showing display designation information
圆 12]表示指定情報を選択する様子を示す概念図  [12] Conceptual diagram showing how to select display designation information
[図 13]本発明の実施の形態 2( ;こ係る映像表示装置の構成を示すブロ:ック図  FIG. 13 is a block diagram showing a configuration of the second embodiment of the present invention;
[図 14]本発明の実施の形態 3( ;こ係る映像表示装置の構成を示すブロ:ック図  FIG. 14 is a block diagram showing the configuration of the third embodiment of the present invention;
[図 15]本発明の実施の形態 3( ;こ係る映像表示装置の構成を示すブロ:ック図  FIG. 15 is a block diagram showing the configuration of the third embodiment of the present invention;
[図 16]本発明の実施の形態 4( ;こ係る放送システムの構成を示すブロック図  FIG. 16 is a block diagram showing a configuration of a fourth embodiment of the present invention;
[図 17]本発明の実施の形態 4( ;こ係る映像表示装置の構成を示すブロ:ック図  FIG. 17 is a block diagram showing the configuration of the fourth embodiment of the present invention;
[図 18]本発明の実施の形態 4( ;こ係る映像表示装置の構成を示すブロ:ック図  FIG. 18 is a block diagram showing the configuration of the fourth embodiment of the present invention;
[図 19]本発明の実施の形態 5( ;こ係る放送システムの構成を示すブロック図  FIG. 19 is a block diagram showing a configuration of a fifth embodiment of the present invention;
[図 20]本発明の実施の形態 6( ;こ係る放送システムの構成を示すブロック図  FIG. 20 is a block diagram showing a configuration of a sixth embodiment of the present invention;
[図 21]本発明の実施の形態 7( ;こ係る映像表示装置の構成を示すブロ:ック図  FIG. 21 is a block diagram showing the configuration of the seventh embodiment of the present invention;
[図 22]本発明の実施の形態 7( ;こ係る映像表示装置の構成を示すブロ:ック図  FIG. 22 is a block diagram showing the configuration of the seventh embodiment of the present invention;
[図 23]本発明の実施の形態 8( ;こ係る映像表示装置の構成を示すブロ:ック図  FIG. 23 is a block diagram showing a configuration of the eighth embodiment of the present invention;
[図 24]本発明の実施の形態 9( ;こ係る映像表示装置の構成を示すブロ:ック図  FIG. 24 is a block diagram showing the configuration of the ninth embodiment of the present invention;
[図 25A]字幕の表示順序を昇順とした場合を示す図  [Fig.25A] Diagram showing subtitle display order in ascending order
[図 25B]字幕の表示順序を降順とした場合を示す図  [Fig.25B] Diagram showing subtitle display order in descending order
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0012] 以下、本発明の実施の形態について、図面を参照して詳細に説明する。なお、実 施の形態において、同一の機能を有する構成には同一の符号を付し、重複する説 明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that in the embodiment, the same reference numerals are given to configurations having the same functions, and duplicate descriptions are omitted.
[0013] (実施の形態 1) [0013] (Embodiment 1)
図 3は、本発明の実施の形態 1に係る放送システムの構成を示す。まず、放送波送 出装置 100の構成について説明する。入力装置 101は、カメラ、マイク、キーボード 等であり、これらによって字幕情報、映像音声コンテンツ、データコンテンツが入力さ れる。 [0014] 映像符号ィ匕部 102は、映像音声コンテンツのうち映像情報を Mpeg2、 Mpeg4又 は H. 264などの圧縮方式で符号ィ匕し、符号化した映像情報を多重化処理部 104に 出力する。 FIG. 3 shows the configuration of the broadcasting system according to Embodiment 1 of the present invention. First, the configuration of the broadcast wave transmitting apparatus 100 will be described. The input device 101 is a camera, a microphone, a keyboard, or the like, through which caption information, video / audio content, and data content are input. [0014] The video encoding unit 102 encodes video information in the audio / video content using a compression method such as Mpeg2, Mpeg4, or H.264, and outputs the encoded video information to the multiplexing processing unit 104. To do.
[0015] 音声符号ィ匕部 103は、映像音声コンテンツのうち音声情報を AACなどの圧縮方式 で符号ィ匕し、符号ィ匕した音声情報を多重化処理部 104に出力する。  [0015] The audio encoding unit 103 encodes audio information in the video / audio content using a compression method such as AAC, and outputs the encoded audio information to the multiplexing processing unit 104.
[0016] 多重化処理部 104は、映像符号ィ匕部 102から出力された映像情報と、音声符号化 部 103から出力された音声情報と、それ以外の番組情報や番組特定情報、テキスト 情報、画像情報などの放送内容 (以下、「その他放送内容」という)を多重化し、多重 化した信号を伝送路符号ィ匕部 105に出力する。  The multiplexing processing unit 104 includes video information output from the video encoding unit 102, audio information output from the audio encoding unit 103, other program information, program identification information, text information, Broadcast contents such as image information (hereinafter referred to as “other broadcast contents”) are multiplexed, and the multiplexed signal is output to the transmission path code unit 105.
[0017] 伝送路符号ィ匕部 105は、多重化処理部 104から出力された信号に符号化、変調な どの送信処理を行い、アンテナ 106から放送波を送出する。  [0017] Transmission path code unit 105 performs transmission processing such as encoding and modulation on the signal output from multiplexing processing unit 104, and transmits a broadcast wave from antenna 106.
[0018] 次に、映像表示装置 110の構成について説明する。チューナ部 112は、アンテナ 1 11を介して受信した放送波からユーザによって指定されたチャンネルの周波数信号 を抽出し、抽出した周波数信号の符号復調処理を行う。復調処理された信号は多重 化分離部 113に出力される。  Next, the configuration of the video display device 110 will be described. The tuner unit 112 extracts the frequency signal of the channel specified by the user from the broadcast wave received via the antenna 111, and performs code demodulation processing of the extracted frequency signal. The demodulated signal is output to the demultiplexing unit 113.
[0019] 多重化分離部 113は、チューナ部 112から出力された信号を字幕情報、映像情報 、その他放送内容に分離し、分離した字幕情報を字幕処理部 115に出力し、映像情 報を映像処理部 117に出力し、その他放送内容を発言者情報抽出部 118に出力す る。なお、字幕情報には、発言者を識別する IDなどの情報である発言者識別情報と 字幕そのものの情報とを含むものとし、その他放送内容には、字幕の発言者をユー ザに認知させる発言者情報と発言者識別情報とを含むものとする。  [0019] Demultiplexing section 113 separates the signal output from tuner section 112 into subtitle information, video information, and other broadcast contents, and outputs the separated subtitle information to subtitle processing section 115 to display the video information as video. The data is output to the processing unit 117, and other broadcast contents are output to the speaker information extraction unit 118. Note that the caption information includes the speaker identification information, which is information such as an ID for identifying the speaker, and information on the caption itself, and the other broadcast content includes a speaker who recognizes the speaker of the caption. Information and speaker identification information.
[0020] タイマ 114は、現在時刻を計測し、計測した現在時刻を字幕処理部 115及び表示 処理部 120に通知する。  The timer 114 measures the current time, and notifies the caption processing unit 115 and the display processing unit 120 of the measured current time.
[0021] 字幕処理部 115は、多重化分離部 113から出力された字幕情報を発言者識別情 報に基づいて、発言者別に字幕履歴記憶部 116に記憶する。このとき、タイマ 114か ら通知された現在時刻を字幕情報の表示時刻として、表示時刻も字幕履歴記憶部 1 16に記憶する。また、発言者毎の字幕を表示する領域を発言者枠として、この発言 者枠を表示する位置 (以下、単に「表示位置」という)を決定し、決定した表示位置も 字幕履歴記憶部 116に記憶する。字幕処理部 115の処理の様子を概念的に図 4に 示す。また、本実施の形態では、図 5に示すように、発言者枠を 3つ用意し、上段から 順に表示位置「1」「2」「3」とし、表示位置の決定については、空いている表示位置の うち、最も若い番号を決定する。 [0021] The caption processing unit 115 stores the caption information output from the demultiplexing unit 113 in the caption history storage unit 116 for each speaker based on the speaker identification information. At this time, the display time is also stored in the caption history storage unit 116 with the current time notified from the timer 114 as the display time of the caption information. Also, the area for displaying the caption for each speaker is set as the speaker frame, the position for displaying the speaker frame (hereinafter simply referred to as “display position”) is determined, and the determined display position is also determined. Stored in the subtitle history storage unit 116. FIG. 4 conceptually shows the processing of the caption processing unit 115. In this embodiment, as shown in FIG. 5, three speaker frames are prepared, and the display positions are “1”, “2”, and “3” in order from the top, and the determination of the display position is vacant. The lowest number among the display positions is determined.
[0022] 字幕履歴記憶部 116は、発言者識別情報、表示時刻、表示位置、字幕を一組とす る字幕表示情報を図 4に示すようにテーブルで管理する。  [0022] The caption history storage unit 116 manages the speaker identification information, the display time, the display position, and the caption display information that is a set of captions in a table as shown in FIG.
[0023] 映像処理部 117は、多重化分離部 113から出力され、 H. 264などで符号ィ匕された 映像ストリームを復号し、復号した信号を表示処理部 120に出力する。  The video processing unit 117 decodes the video stream output from the demultiplexing unit 113 and encoded with H.264 or the like, and outputs the decoded signal to the display processing unit 120.
[0024] 発言者情報抽出部 118は、多重化分離部 113から出力されたデータ力 発言者識 別情報と発言者情報とを抽出し、抽出した発言者識別情報と発言者情報とを組にし て、発言者情報記憶部 119に記憶する。発言者情報抽出部 118の処理の様子を概 念的に図 6に示す。図 6に示すように、発言者情報記憶部 119は、発言者識別情報 と発言者情報とを組にして、テーブルで管理する。  The speaker information extraction unit 118 extracts the data power speaker identification information and the speaker information output from the multiplexing / separation unit 113, and sets the extracted speaker identification information and the speaker information as a pair. And stored in the speaker information storage unit 119. Figure 6 schematically shows the processing performed by the speaker information extraction unit 118. As shown in FIG. 6, the speaker information storage unit 119 manages the speaker identification information and the speaker information as a set in a table.
[0025] 表示処理部 120は、一つの画像領域を、字幕を表示する字幕表示領域と、映像を 表示する映像表示領域とに分割し、映像処理部 117から出力された映像を映像表示 領域に配し、字幕履歴記憶部 116に記憶された字幕表示情報と発言者情報記憶部 119に記憶された発言者情報を前記字幕表示領域に配し、これらの表示画像を合 成する。このとき、タイマ 114から通知された時刻と字幕履歴記憶部 116に記憶され た表示時刻に基づいて、発言者枠をソートする。表示処理部 120により、字幕表示と 映像表示を同一画面上でそれぞれ分けることになるので、映像表示と字幕表示が重 なることがなくなり、映像又は字幕の表示が見えなくなることを防止することができる。 合成した画像は表示部 121に出力される。表示処理部 120の処理の様子を概念的 に図 7に示す。  [0025] The display processing unit 120 divides one image area into a subtitle display area for displaying subtitles and a video display area for displaying video, and the video output from the video processing unit 117 is used as the video display area. The subtitle display information stored in the subtitle history storage unit 116 and the speaker information stored in the speaker information storage unit 119 are arranged in the subtitle display area, and these display images are synthesized. At this time, the speaker frames are sorted based on the time notified from the timer 114 and the display time stored in the caption history storage unit 116. Since the display processing unit 120 separates the subtitle display and the video display on the same screen, the video display and the subtitle display are not overlapped, and the video or subtitle display can be prevented from being invisible. . The synthesized image is output to the display unit 121. Fig. 7 conceptually shows how the display processing unit 120 processes.
[0026] なお、表示処理部 120は、発言者の発言の有無に応じて、発言者枠を動的に配す ることになり、発言者の発言がある場合に発言者枠を配し、発言者の発言がない場 合には発言者枠を配さないことにより、映像の画面比率と映像表示装置の画面比率 とが異なる場合、余った領域を字幕表示領域として有効利用することができる。  [0026] It should be noted that the display processing unit 120 dynamically distributes the speaker frame according to the presence or absence of the speaker, and arranges the speaker frame when there is a speaker's speech. If there is no speaker's speech, no speaker frame is placed, so that if the screen ratio of the video and the screen ratio of the video display device are different, the surplus area can be used effectively as a subtitle display area. .
[0027] 表示部 121は、表示処理部 120から出力された合成画像を表示する。 [0028] 次に、上述した構成を有する映像表示装置 100の主な動作について図 8を用いて 説明する。図 8において、ステップ (以下、「ST」と省略する) 131では、字幕を表示し て力も指定時間 (例えば 5秒)経過した字幕を字幕履歴記憶部 116が有する字幕表 示情報から削除し、 ST132に移行する。字幕を削除した様子を図 9に示す。 The display unit 121 displays the composite image output from the display processing unit 120. Next, main operations of the video display apparatus 100 having the above-described configuration will be described with reference to FIG. In FIG. 8, in step (hereinafter abbreviated as “ST”) 131, subtitles that have been displayed for more than a specified time (for example, 5 seconds) are deleted from the subtitle display information of subtitle history storage section 116. Move to ST132. Figure 9 shows how subtitles are deleted.
[0029] ST132では、字幕を表示してから指定時間経過した字幕表示情報、又は 2枠以上 使用され、かつ字幕のみ削除された字幕表示情報を削除し、 ST133に移行する。字 幕表示情報を削除した様子を図 10に示す。なお、字幕と同時に字幕表示情報も削 除する場合は、字幕の削除指定時間と字幕表示情報の削除指定時間を等しくすれ ばよい。ちなみに、削除指定時間は、第 1発言者の字幕を表示中に、第 1発言者とは 異なる第 2発言者の字幕を表示してから、前記第 1発言者の字幕を削除する時間と する。これにより、第 1発言者の発言が終了しても、第 1発言者の字幕が表示されて いることになり、複数人が同時に発言した場合でも、視聴者は内容を容易に把握する ことができる。  [0029] In ST132, subtitle display information that has passed a specified time after displaying the subtitle, or subtitle display information that has been used for two or more frames and only the subtitle is deleted, is moved to ST133. Figure 10 shows the screen display information deleted. When deleting the caption display information at the same time as the caption, the deletion designation time of the caption should be equal to the deletion designation time of the caption display information. By the way, the specified deletion time is the time to display the subtitle of the first speaker after displaying the subtitle of the second speaker different from the first speaker while displaying the subtitle of the first speaker. . As a result, even if the first speaker finishes speaking, the subtitles of the first speaker are displayed, and even if multiple people speak at the same time, the viewer can easily grasp the contents. it can.
[0030] ST133では、多重化分離部 113から新たな字幕情報を取得した力否かが判定さ れ、新たな字幕情報を取得した (YES)と判定されると ST134に移行し、新たな字幕 情報を取得していない (NO)と判定されると ST131に戻り、新たな字幕情報を取得し たと判定されるまで ST131〜ST133の処理を繰り返す。  [0030] In ST133, it is determined whether or not the new subtitle information has been acquired from the demultiplexing unit 113. If it is determined that new subtitle information has been acquired (YES), the process proceeds to ST134, where new subtitle information is acquired. If it is determined that the information has not been acquired (NO), the process returns to ST131, and the processes of ST131 to ST133 are repeated until it is determined that new caption information has been acquired.
[0031] ST134では、発言者枠に空きがある力否かが判定される。具体的には、画面サイ ズゃ端末の仕様によって表示可能な発言者枠数には上限があるため、字幕履歴記 憶部 116に記憶されている情報力 発言者枠数が上限であるか否かが判定される。 例えば、上限力 である場合、発言者枠数が 3以下であれば上限ではないと判定さ れ、発言者枠数が 4であれば上限であると判定される。すなわち、上限でなければ、 発言者枠に空きがある (YES)と判定され、 ST136に移行し、上限であれば、発言者 枠に空きがない(NO)と判定され、 ST135に移行する。  In ST134, it is determined whether or not there is power in the speaker frame. Specifically, since there is an upper limit on the number of speaker frames that can be displayed depending on the specifications of the screen size, whether or not the number of information speaker frames stored in the caption history storage unit 116 is the upper limit. Is determined. For example, in the case of the upper limit power, if the number of speakers is 3 or less, it is determined that the number is not the upper limit, and if the number of speakers is 4, the upper limit is determined. That is, if it is not the upper limit, it is determined that there is an empty speaker frame (YES), and the process proceeds to ST136. If it is the upper limit, it is determined that there is no empty speaker frame (NO), and the process proceeds to ST135.
[0032] ST135では、 ST134において発言者枠に空きがないと判定されたので、字幕履 歴記憶部 116の中から表示時刻の最も古い字幕表示情報を削除することにより発言 者枠を確保し、 ST136に移行する。これにより、登場人物が多数いる場合でも、限ら れた字幕表示領域を有効に利用することができると共に、視聴者は内容を容易に把 握することができる。 [0032] In ST135, since it is determined in ST134 that there is no space in the speaker frame, the speaker frame is secured by deleting the caption display information with the oldest display time from the caption history storage unit 116. Move to ST136. As a result, even when there are many characters, the limited caption display area can be used effectively, and the viewer can easily grasp the contents. Can be gripped.
[0033] ST136では、多重化分離部 113から取得した新しい字幕情報に含まれる発言者 識別情報と同一の発言者識別情報が字幕履歴記憶部 116に存在するか、すなわち 、記憶されているカゝ否かを判定する。存在する (YES)と判定されると ST138に移行 し、存在しない(NO)と判定されると ST137に移行する。  [0033] In ST136, whether the same speaker identification information as the speaker identification information included in the new caption information acquired from the demultiplexing unit 113 exists in the caption history storage unit 116, that is, the stored card Determine whether or not. If it is determined that it exists (YES), the process proceeds to ST138, and if it is determined that it does not exist (NO), the process proceeds to ST137.
[0034] ST137では、新 、字幕情報を基に、字幕履歴記憶部 116の空き領域に新たな 字幕表示情報を記録する。  In ST137, new caption display information is recorded in a free area of caption history storage section 116 based on the new caption information.
[0035] ST138では、新たに取得した字幕情報に含まれる発言者識別情報と字幕履歴記 憶部 116に記憶された同一の発言者識別情報を含む字幕表示情報に字幕が存在 する力 (記憶されている力 )否かが判定される。ここでは、 ST131において、字幕表 示力も指定時間経過後の字幕が削除され、 ST132において、字幕表示から指定時 間内の字幕表示情報が削除されない場合には、字幕表示情報のうち字幕のみが削 除されていることがあるので、字幕のみが削除されている力否かを判定することになる 。字幕が存在する (YES)と判定されると ST140に移行し、字幕が存在しない (NO) と判定されると ST139に移行する。  [0035] In ST138, the subtitle display information including the speaker identification information included in the newly acquired subtitle information and the same speaker identification information stored in the subtitle history storage unit 116 has the power (stored). Whether or not) is determined. Here, in ST131, subtitles after the specified time have also been deleted in ST131, and in ST132, when the subtitle display information within the specified time is not deleted from the subtitle display, only the subtitle display information is deleted. Therefore, it is determined whether or not only the caption is deleted. If it is determined that subtitles are present (YES), the process proceeds to ST140, and if it is determined that no subtitles are present (NO), the process proceeds to ST139.
[0036] ST139では、字幕履歴記憶部 116に記憶された同一の発言者識別情報を含む字 幕表示情報 (字幕を含まない)に、新しい字幕情報を記憶する。  [0036] In ST139, new caption information is stored in the caption display information (not including the caption) including the same speaker identification information stored in caption caption storage section 116.
[0037] ST140では、字幕履歴記憶部 116に記憶された同一の発言者識別情報に対応す る表示位置の次の表示位置が空きとなっている力否かが判定される。例えば、字幕 履歴記憶部 116に記憶された同一の発言者識別情報のうち、最も下となるものの表 示位置が上から 2つ目であったとすると、その次の表示位置、すなわち、上から 3つの 目の表示位置が空きである力否かが判定される。空きである (YES)と判定されると S T142に移行し、空きではない(NO)と判定されると ST141に移行する。また、同一 の発言者識別情報が最下段の表示位置であり、次の表示位置が存在しない場合は 空きではな 、 (NO)と判定される。  In ST140, it is determined whether or not the display position next to the display position corresponding to the same speaker identification information stored in subtitle history storage section 116 is empty. For example, if the display position of the lowest speaker identification information stored in the caption history storage unit 116 is the second from the top, the next display position, that is, the top 3 It is determined whether or not the display position of the eye is empty. If it is determined that it is empty (YES), the process proceeds to ST 142, and if it is determined that it is not empty (NO), the process proceeds to ST 141. Also, if the same speaker identification information is the lowest display position and there is no next display position, it is determined that it is not empty and (NO).
[0038] ST142では、 ST141において字幕履歴記憶部 116に記憶された同一の発言者 識別情報のうち、表示位置が最も下となるものの次の表示位置が空きではないと判 定されたので、次の表示位置に空きを作成する。具体的には、同一の発言者識別情 報のうち、最も下となるものの表示位置が上から 2つ目であったとし、その次の表示位 置、すなわち、上から 3つの目の表示位置が空きではないとすると、上から 3つ目の 字幕表示情報の表示位置を 4つめに繰り上げシフトし、上から 4つ目以降も同様に繰 り上げシフトする。 [0038] In ST142, among the same speaker identification information stored in subtitle history storage section 116 in ST141, it is determined that the next display position of the lowest display position is not empty. Create a space at the display position of. Specifically, the same speaker identification information If the display position of the lowest information is the second from the top, and if the next display position, that is, the third display position from the top is not empty, then the top three The display position of the caption display information for the eyes is shifted up to the fourth position, and the fourth and subsequent positions from the top are also shifted up.
[0039] このように、字幕処理部 115は動的に発言者別の字幕情報を字幕履歴記憶部 116 に記憶し、字幕履歴記憶部 116に記憶された字幕情報を古 、順に削除する。  As described above, the caption processing unit 115 dynamically stores the caption information for each speaker in the caption history storage unit 116, and deletes the caption information stored in the caption history storage unit 116 in order of age.
[0040] このように実施の形態 1によれば、一つの画像領域を字幕表示領域と映像表示領 域とに分割し、発言者を示す情報と発言者の発言内容を示す字幕とを対応させた字 幕表示画像を字幕表示領域に表示し、映像を映像表示領域に表示することにより、 映像を隠すことなぐ無音状態でも発言者の発言内容をユーザに認知させることがで きるので、番組内容を容易に把握することができる。  [0040] Thus, according to Embodiment 1, one image area is divided into a caption display area and a video display area, and information indicating a speaker is associated with captions indicating the content of the speaker. By displaying the caption display image in the caption display area and displaying the video in the video display area, the content of the program can be recognized by the user even in a silent state without hiding the video. Can be easily grasped.
[0041] なお、本実施の形態では、発言者情報としてのアイコン、発言者枠、フォント、文字 色、文字サイズ等がそれぞれ 1種類の場合を想定して説明したが、図 11に示すよう に、アイコン、発言者枠、フォント、文字色、文字サイズ等をそれぞれ複数用意し、こ れらのいずれかを用いるように表示指定情報で指定してもよい。この場合、その他放 送内容に同一の発言者を示す複数の発言者情報が含まれていれば、発言者情報 抽出部 118においてその他放送内容力 発言者識別情報と表示指定情報との組合 せが抽出され、図 12に示すように、抽出された表示指定情報に従って字幕表示画像 が作成される。また、その他放送内容に複数の発言者情報が含まれていなければ、 予め用意されたデフォルトの情報を使用する。  In the present embodiment, description has been made assuming that there is only one type of icon, speaker frame, font, character color, character size, etc. as speaker information. However, as shown in FIG. A plurality of icons, speaker frames, fonts, character colors, character sizes, etc. may be prepared, and the display designation information may be used so that any one of these may be used. In this case, if the other broadcast contents include multiple speaker information indicating the same speaker, the speaker information extraction unit 118 can combine the other broadcast content capability speaker identification information with the display designation information. As shown in FIG. 12, the subtitle display image is created according to the extracted display designation information. Also, if the broadcast content does not contain multiple speaker information, the default information prepared in advance is used.
[0042] なお、本実施の形態における表示処理部 120は、複数の発言者の発言者枠を表 示中に、いずれかの発言者の発言が連続する場合、他の発言者の発言者枠を削除 し、削除された領域に連続する発言を行う発言者の発言者枠を拡張するようにしても よい。発言者枠が最大限拡張され、さらに発言が連続する場合は、字幕をスクロール させる。これにより、長い字幕を表示することができる。  [0042] Note that the display processing unit 120 according to the present embodiment displays a speaker frame of a plurality of speakers, and when a speaker's speech continues, a speaker frame of another speaker is displayed. May be deleted, and the speaker frame of a speaker who makes a continuous speech in the deleted area may be expanded. If the speaker frame is expanded to the maximum and the speech continues, scroll the subtitles. Thereby, a long subtitle can be displayed.
[0043] (実施の形態 2)  [Embodiment 2]
本発明の実施の形態 2では、その他放送内容に時刻制御モード (TMD)及び表示 開始時刻 (STM) t 、つた時刻情報が含まれ、この時刻情報を用いる場合にっ 、て 説明する。 In the second embodiment of the present invention, the other broadcast contents include time control mode (TMD) and display start time (STM) t and time information, and when this time information is used, explain.
[0044] 本発明の実施の形態 2に係る映像表示装置 150は、図 13に示すように、発言者情 報抽出処理部 151が多重化分離部 113から出力されたデータから時刻情報を抽出 し、抽出した時刻情報を字幕処理部 115及び表示処理部 120に出力する。  In the video display device 150 according to Embodiment 2 of the present invention, as shown in FIG. 13, the speaker information extraction processing unit 151 extracts time information from the data output from the demultiplexing unit 113. The extracted time information is output to the caption processing unit 115 and the display processing unit 120.
[0045] これにより、映像表示装置 150は現在時刻を計測するタイマを省くことができ、装置 規模の削減を図ることができる。  [0045] Thereby, the video display device 150 can omit the timer for measuring the current time, and the device scale can be reduced.
[0046] (実施の形態 3)  [Embodiment 3]
本発明の実施の形態 3に係る映像表示装置 160は、図 14に示すように、記憶装置 161が DVD (Digital Versatile Disc)、 SDカード又はハードディスクなどであり、映像 音声コンテンツ及びデータコンテンツが蓄積されている。  In video display device 160 according to Embodiment 3 of the present invention, as shown in FIG. 14, storage device 161 is a DVD (Digital Versatile Disc), an SD card, a hard disk, or the like, and video and audio content and data content are stored. ing.
[0047] これにより、映像表示装置 160は、記憶装置 161に蓄積された映像音声コンテンツ 及びデータコンテンツを用いて、映像、発言者情報及び字幕を同時に表示すること ができる。  Thus, the video display device 160 can simultaneously display video, speaker information, and subtitles using the video / audio content and data content stored in the storage device 161.
[0048] なお、図 15に示すように、映像表示装置 165は、放送波を受信し、チューナ部 112 で復調処理した受信信号を録画処理部 166にお 、て録画処理し、記憶装置 161に 記憶する受信録画機能を有してもよい。この場合、受信した放送波を復調し、リアル タイムで表示してもよいし、記憶装置 161に記憶した後、表示してもよい。  As shown in FIG. 15, the video display device 165 receives the broadcast wave, records the received signal demodulated by the tuner unit 112, records it in the recording processing unit 166, and stores it in the storage device 161. You may have the reception video recording function to memorize. In this case, the received broadcast wave may be demodulated and displayed in real time, or stored in the storage device 161 and displayed.
[0049] (実施の形態 4)  [0049] (Embodiment 4)
図 16は、本発明の実施の形態 4に係る放送システムの構成を示す。この図におい て、映像表示装置 170は、通信部 171がインターネット網などの通信網を介してサー ノ 180から映像音声コンテンツ及びデータコンテンツを送受信する。ここで、通信部 1 71の通信方式については、ネットワークアダプタ、無線 LAN (Local Area Network)、 Bluetooth,赤外線通信などその種別は問わず、有線でも無線でもよい。  FIG. 16 shows the configuration of a broadcasting system according to Embodiment 4 of the present invention. In this figure, in the video display device 170, the communication unit 171 transmits / receives video / audio content and data content from the servo 180 via a communication network such as the Internet network. Here, the communication method of the communication unit 171 may be wired or wireless regardless of the type, such as a network adapter, a wireless local area network (LAN), Bluetooth, or infrared communication.
[0050] サーバ 180は、入力装置 181であるカメラ、キーボード等によって発言者情報を入 力し、発言者情報を発言者情報記憶部 182に記憶し、通信部 183を介して映像表示 装置 170に送信する。  [0050] The server 180 inputs speaker information using a camera, a keyboard, or the like as the input device 181, stores the speaker information in the speaker information storage unit 182, and stores the speaker information in the video display device 170 via the communication unit 183. Send.
[0051] これにより、映像表示装置 170は、映像音声コンテンツ及び字幕情報を放送波から 取得し、発言者情報を通信網から取得することができる。これにより、例えば、インタ 一ネットやデータ放送等により、ある番組の発言者情報を予め取得しておき、当該番 組の映像を再生する場合に、取得しておいた発言者情報を用いることにより、視聴者 は番組内容を容易に把握することができる。 [0051] Thereby, the video display device 170 can acquire the video / audio content and the caption information from the broadcast wave, and can acquire the speaker information from the communication network. This makes it possible to When the speaker information of a program is acquired in advance via a network or data broadcasting, and the video of that program is played, the viewer can use the acquired speaker information to Can be easily grasped.
[0052] なお、図 17に示すように、映像表示装置 172において、通信部 171が映像音声コ ンテンッ、字幕情報及び発言者情報を通信網から取得するようにしてもよい。また、 図 18に示すように、映像表示装置 173において、字幕情報及び発言者情報を通信 網から取得し、映像音声コンテンツを放送波から取得するようにしてもよぐこの場合 、発言者情報を含まないアナログ放送においても、映像、発言者情報及び字幕を同 時に表示することができる。  As shown in FIG. 17, in the video display device 172, the communication unit 171 may acquire the video / audio content, the caption information, and the speaker information from the communication network. In addition, as shown in FIG. 18, in the video display device 173, subtitle information and speaker information may be acquired from the communication network, and video / audio content may be acquired from the broadcast wave. Video, speaker information, and subtitles can be displayed at the same time even in analog broadcasting that is not included.
[0053] (実施の形態 5)  [0053] (Embodiment 5)
図 19は、本発明の実施の形態 5に係る放送システムの構成を示す。この図におい て、映像表示装置 190は、認証処理部 192が入力装置 191からユーザによって入力 された認証情報を取得し、取得した認証情報の問い合わせを通信部 171を介して発 言者情報配信装置 200に行う。  FIG. 19 shows the configuration of a broadcasting system according to Embodiment 5 of the present invention. In this figure, in the video display device 190, the authentication processing unit 192 acquires the authentication information input by the user from the input device 191, and sends an inquiry about the acquired authentication information via the communication unit 171. To 200.
[0054] 発言者情報配信装置 200は、映像表示装置 190から通信部 201を介して認証の 問い合わせを受け、認証処理部 202が認証情報の照合を行い、認証に成功した映 像表示装置に対してのみ発言者情報記憶部 203に記憶された複数種類の発言者 情報を送信する。なお、発言者情報記憶部 203に記憶された発言者情報は、入力 装置 204によって入力されたものである。  [0054] The speaker information distribution apparatus 200 receives an inquiry for authentication from the video display apparatus 190 via the communication unit 201, and the authentication processing unit 202 collates the authentication information. A plurality of kinds of speaker information stored in the speaker information storage unit 203 are transmitted. Note that the speaker information stored in the speaker information storage unit 203 is input by the input device 204.
[0055] 映像表示装置 190における記憶装置 193は、セキュア領域を有する SDカードなど であり、発言者情報配信装置 200から取得した複数種類の発言者情報及びこの発 言者情報を使用する番組識別情報 (番組名、放送局名、チャンネル、開始時間、終 了時間、その他 ID等)と、入力装置 191から入力された認証情報とを記憶し、認証処 理部 192のみがアクセス可能である。  [0055] The storage device 193 in the video display device 190 is an SD card or the like having a secure area, and a plurality of types of speaker information acquired from the speaker information distribution device 200 and program identification information using the speaker information. (Program name, broadcast station name, channel, start time, end time, other ID, etc.) and authentication information input from the input device 191 are stored, and only the authentication processing unit 192 can access.
[0056] 映像表示装置 190は番組識別情報で定められた番組の視聴を開始する際、認証 処理部 192が記憶装置 193にアクセスし、記憶装置 193に記憶された情報を読み取 つて発言者情報記憶部 119に書き込む。番組の視聴を終了する際、認証処理部 19 2は発言者情報記憶部 119に書き込んだ情報を削除する。これにより、認証処理によ つて得られる情報 (ここでは、複数種類の発言者情報)の漏洩を防止することができる [0056] When the video display device 190 starts viewing a program specified by the program identification information, the authentication processing unit 192 accesses the storage device 193, reads the information stored in the storage device 193, and stores the speaker information. Write to part 119. When the viewing of the program ends, the authentication processing unit 192 deletes the information written in the speaker information storage unit 119. As a result, the authentication process Information (here, multiple types of speaker information) can be prevented from leaking
[0057] このように、認証が成功した映像表示装置のみがリッチな発言者情報を入手し、表 示指定情報を用いて字幕表示を行うことができる。このため、リッチな発言者情報を 購入したユーザに対して認証情報を配信したり、ある番組のホームページなどのアン ケートに回答したユーザに対して認証情報を配信したりするといつたサービスを提供 することができる。 [0057] In this way, only the video display device that has been successfully authenticated can obtain rich speaker information and perform subtitle display using display designation information. For this reason, a service is provided when authentication information is distributed to users who have purchased rich speaker information or authentication information is distributed to users who have answered a questionnaire such as a homepage of a program. be able to.
[0058] このように実施の形態 5によれば、認証が成功した映像表示装置のみが複数種類 の発言者情報を入手することができ、リッチな字幕表示を行うことができる。  As described above, according to the fifth embodiment, only a video display device that has been successfully authenticated can obtain a plurality of types of speaker information, and can perform rich subtitle display.
[0059] (実施の形態 6)  [Embodiment 6]
図 20は、本発明の実施の形態 6に係る放送システムの構成を示す。この図におい て、映像表示装置 210は、インターネット網などの通信網を介して発言者情報配信装 置 200と接続する第 1通信部 171と、スイカ(登録商標)などの非接触式 IC、無線タグ 、赤外線などを用 、て通信を行う第 2通信部 211を有する。  FIG. 20 shows the configuration of a broadcast system according to Embodiment 6 of the present invention. In this figure, the video display device 210 includes a first communication unit 171 connected to the speaker information distribution device 200 via a communication network such as the Internet network, a non-contact IC such as a watermelon (registered trademark), a wireless A second communication unit 211 that performs communication using a tag, infrared rays, or the like is provided.
[0060] 鍵配信装置 220は、通信部 221を介して映像表示装置 210から鍵取得要求を受け ると、鍵配信管理部 222が管理する鍵 (又は鍵と発言者情報配信装置のアドレス)を 映像表示装置 210に配信し、その旨を発言者情報配信装置 200に通知する。なお、 鍵配信管理部 222が管理する認証情報 (鍵、 ID)は入力装置 223によって入力され る。  When receiving the key acquisition request from the video display device 210 via the communication unit 221, the key distribution device 220 receives the key (or the key and the address of the speaker information distribution device) managed by the key distribution management unit 222. The video information is distributed to the video display device 210, and the information is notified to the speaker information distribution device 200. Note that authentication information (key, ID) managed by the key distribution management unit 222 is input by the input device 223.
[0061] 発言者情報配信装置 200は、鍵配信装置 220から鍵の配信通知を受け、認証処 理部 202が管理する認証情報にその情報を追記する。また、映像表示装置 210から 鍵を用いた認証の問い合わせを受けると、認証処理部 202は認証を行い、認証が成 功した映像表示装置に対してのみ発言者情報記憶部 203に記憶された複数種類の 発言者情報を送信する。  Speaker information distribution apparatus 200 receives a key distribution notification from key distribution apparatus 220 and adds the information to the authentication information managed by authentication processing unit 202. In addition, upon receiving an authentication inquiry using a key from the video display device 210, the authentication processing unit 202 performs authentication, and a plurality of information stored in the speaker information storage unit 203 is only stored for the video display device that has been successfully authenticated. Send the type of speaker information.
[0062] このように、鍵配信装置 220から鍵を取得した映像表示装置のみがリッチな発言者 情報を入手し、複数種類の発言者情報を用いた字幕表示を行うことができる。このた め、複数の発言者情報を購入したユーザに対して鍵を配信したり、番組に関連する 物品を購入した際、購入した店舗にて鍵を配信したりすると ヽつたサービスを提供す ることがでさる。 In this way, only the video display device that has acquired the key from the key distribution device 220 can obtain rich speaker information and perform subtitle display using a plurality of types of speaker information. For this reason, providing a key service to users who have purchased multiple speaker information, or distributing keys at the store where the program was purchased when purchasing goods related to the program, provides a service It can be done.
[0063] このように本実施の形態 6によれば、鍵を取得した映像表示装置のみが複数種類 の発言者情報を入手することができ、複数種類の発言者情報を用いたリッチな字幕 表示を行うことができる。  As described above, according to the sixth embodiment, only the video display device that has acquired the key can obtain a plurality of types of speaker information, and a rich subtitle display using a plurality of types of speaker information. It can be performed.
[0064] (実施の形態 7)  [0064] (Embodiment 7)
本発明の実施の形態 7に係る映像表示装置 230は、図 21に示すように、音声処理 部 231が多重化分離部 113から出力された音声ストリームを復号し、復号した音声ス トリームを音声解析部 232に出力する。  In video display device 230 according to Embodiment 7 of the present invention, as shown in FIG. 21, audio processing unit 231 decodes the audio stream output from multiplexing / demultiplexing unit 113, and audio analysis is performed on the decoded audio stream. Output to part 232
[0065] 音声解析部 232は、音声処理部 231から出力された音声ストリームを解析し、音量 、音程などの解析結果を表示処理部 233に出力する。また、音声の特徴を解析する ことにより、喜怒哀楽といった感情を表す情報、性別情報、年齢を表す情報 (例えば 、赤ちゃん、子供、成人、老人など)を生成し、表示処理部 233に出力する。  The sound analysis unit 232 analyzes the sound stream output from the sound processing unit 231 and outputs analysis results such as volume and pitch to the display processing unit 233. Also, by analyzing the characteristics of the voice, information expressing emotions such as emotions, gender information, and information indicating age (for example, baby, child, adult, elderly person, etc.) are generated and output to the display processing unit 233. .
[0066] 表示処理部 233は、音声解析部 232から出力された音声解析結果を用いて、字幕 表示画像を作成する。例えば、音量を文字サイズと対応させ、音程を文字色と対応さ せる。また、感情を表す情報はフォントと対応させ、性別はハイライトの色と対応させる 。ただし、音声解析結果のそれぞれの内容に対応させる装飾はこれに限らない。  The display processing unit 233 creates a caption display image using the audio analysis result output from the audio analysis unit 232. For example, the volume is associated with the character size, and the pitch is associated with the character color. Also, information representing emotions is associated with fonts, and gender is associated with highlight colors. However, the decoration corresponding to each content of the voice analysis result is not limited to this.
[0067] このように本実施の形態 7によれば、字幕表示画像で発言者の音声を解析した結 果を視覚的に表示することにより、文字以外の情報を字幕表示することができ、番組 内容をより容易に把握することができる。  As described above, according to the seventh embodiment, by visually displaying the result of analyzing the voice of the speaker in the caption display image, information other than characters can be displayed in caption, and the program is displayed. The contents can be grasped more easily.
[0068] なお、図 22に示すように、映像表示装置 235が映像解析部 236を有し、映像解析 部 236が映像処理部 117から出力された映像ストリームを解析し、表示処理部 233 が明るさなどの解析結果に対応する装飾を行ってもよい。表示処理部 233は、朝、昼 、夜、海、山、サッカーなどのシーンに対応した装飾を行ってもよい。  As shown in FIG. 22, the video display device 235 has a video analysis unit 236, the video analysis unit 236 analyzes the video stream output from the video processing unit 117, and the display processing unit 233 The decoration corresponding to the analysis result such as the size may be performed. The display processing unit 233 may perform decoration corresponding to scenes such as morning, noon, night, sea, mountain, and soccer.
[0069] また、例えば、発言者の音声を検知すると、発言中であることを示すアイコン (発言 している顔)や発言者枠に変更し、発言者の音声を検知しなくなると、発言中ではな V、ことを示すアイコン(聞 、て 、る顔)や発言者枠に変更するようにしてもよ!、。ちなみ に、発言者情報として静止画のアイコンだけではなぐアニメーション GIFといった簡 易動画のアイコンによって発言者を示してもょ 、。 [0070] (実施の形態 8) [0069] Also, for example, when a voice of a speaker is detected, an icon (speaking face) indicating that the speaker is speaking or a speaker frame is changed, and if the voice of the speaker is no longer detected, Then, V, you can change it to an icon indicating that it is (listen, face, face) or speaker frame! By the way, as a speaker information, let's show the speaker by a simple video icon such as an animated GIF that is not just a still image icon. [0070] (Embodiment 8)
本発明の実施の形態 8に係る映像表示装置 240は、図 23に示すように、音声処理 部 231が多重化分離部 113から出力された音声ストリームを復号し、復号した音声ス トリームを発言者解析部 241に出力する。  In video display apparatus 240 according to Embodiment 8 of the present invention, as shown in FIG. 23, audio processing section 231 decodes the audio stream output from multiplexing / demultiplexing section 113, and the decoded audio stream is sent to the speaker. Output to analysis unit 241.
[0071] 発言者解析部 241は、発言者を映像と音声から検出し、発言者の画像を抽出する[0071] The speaker analysis unit 241 detects a speaker from video and audio, and extracts an image of the speaker.
。そして、抽出した画像を指定のサイズに拡大縮小し、発言者情報とする。発言者情 報は字幕処理部 115で処理された字幕情報と共に字幕履歴記憶部 116に記憶され る。なお、発言者を映像と音声力 検出する技術は既存の技術を用い、例えば、特 許文献 1に記載の技術を用いるものとする。 . Then, the extracted image is enlarged / reduced to a specified size to obtain speaker information. The speaker information is stored in the subtitle history storage unit 116 together with the subtitle information processed by the subtitle processing unit 115. It is assumed that the technology for detecting a speaker's video and audio power uses existing technology, for example, the technology described in Patent Document 1.
[0072] これにより、発言者情報を含まないアナログ放送においても、映像、発言者情報及 び字幕を同時に表示することができる。 [0072] Accordingly, video, speaker information, and subtitles can be displayed simultaneously even in analog broadcasting that does not include speaker information.
[0073] (実施の形態 9) [Embodiment 9]
本発明の実施の形態 9に係る映像表示装置 250は、図 24に示すように、音声認識 部 251が音声処理部 231から出力された音声ストリームを音声認識することにより、 文字情報に変換し、字幕情報を生成する。生成された字幕情報は字幕履歴記憶部 1 In the video display device 250 according to Embodiment 9 of the present invention, as shown in FIG. 24, the voice recognition unit 251 converts the voice stream output from the voice processing unit 231 into voice information, thereby converting it into character information. Generate caption information. The generated caption information is stored in the caption history storage 1
16に記憶される。 Memorized in 16.
[0074] これにより、発言者情報及び字幕情報が放送波に含まれない場合でも、字幕表示 を行うことができる。  [0074] Thereby, even when the speaker information and the caption information are not included in the broadcast wave, caption display can be performed.
[0075] (他の実施の形態)  [0075] (Other embodiments)
字幕の表示順序は、図 25Aに示すように、字幕表示領域の上から順 (昇順)に表示 してもよいし、図 25Bに示すように、字幕表示領域の下から順(降順)に表示してもよ い。このとき、字幕の表示順序に対応させて発言者枠のノ、イライトを明るい色力 地 味な色に段階的に変化させたり、字幕の文字色を段階的に薄くしたり、さらには文字 サイズを段階的に小さくしたり、表示順序の番号付けを行ったりしてもよい。これにより 、ユーザは字幕の表示順序を、字幕を読むことなぐ認知することができる。なお、字 幕の表示順序 (降順又は昇順)をユーザに設定させてもよ!、。  As shown in Fig. 25A, the subtitle display order may be displayed in order (ascending order) from the top of the subtitle display area, or in order (descending order) from the bottom of the subtitle display area as shown in Fig. 25B. You can do it. At this time, in accordance with the display order of the subtitles, the size of the speaker frame is changed step by step to bright colors, plain colors, the subtitle text color is gradually reduced, and the font size is further increased. May be reduced step by step, or the display order may be numbered. Thereby, the user can recognize the display order of the subtitles without reading the subtitles. You can let the user set the display order (descending or ascending order) of the captions!
[0076] 本発明の第 1の態様は、字幕と字幕の発言者を視聴者に認知させる発言者情報と を対応付けた字幕表示画像を作成し、作成した字幕表示画像と映像とを合成する表 示処理手段と、前記表示処理手段によって合成された画像を表示する表示手段と、 を具備する映像表示装置である。 [0076] In the first aspect of the present invention, a subtitle display image in which subtitles and speaker information for allowing a viewer to recognize a subtitle speaker are associated is created, and the created subtitle display image and video are synthesized. table Display processing means, and display means for displaying an image synthesized by the display processing means.
[0077] この構成によれば、字幕と映像内の発言者とを対応付けられるので、映像に表示さ れて 、な 、発言者でも視聴者は認識することができ、無音状態で番組内容を容易に 把握することができる。 [0077] According to this configuration, the subtitle and the speaker in the video can be associated with each other, so that even if the speaker is displayed on the video, the viewer can recognize the program contents in a silent state. It can be easily grasped.
[0078] 本発明の第 2の態様は、上記態様において、発言者情報を取得する発言者情報取 得手段と、取得された発言者情報を記憶する発言者情報記憶手段と、を具備する映 像表示装置である。  [0078] According to a second aspect of the present invention, in the above aspect, there is provided a screen comprising speaker information acquisition means for acquiring speaker information, and speaker information storage means for storing the acquired speaker information. An image display device.
[0079] 本発明の第 3の態様は、上記態様において、前記発言者情報取得手段は、番組の 受信開始前に予め発言者情報を取得する映像表示装置である。  [0079] A third aspect of the present invention is the video display apparatus according to the above aspect, wherein the speaker information acquisition means acquires speaker information in advance before starting reception of a program.
[0080] これらの構成によれば、例えば、インターネットやデータ放送等により、ある番組の 発言者情報を予め取得しておき、当該番組の映像のみを再生する場合でも、取得し ておいた発言者情報を用いることにより、視聴者は番組内容を容易に把握することが できる。  [0080] According to these configurations, for example, even when the speaker information of a certain program is acquired in advance through the Internet, data broadcasting, or the like, and only the video of the program is reproduced, the acquired speaker is acquired. By using information, viewers can easily understand the contents of the program.
[0081] 本発明の第 4の態様は、上記態様において、前記発言者情報取得手段が、番組の 受信と共に発言者情報を取得する映像表示装置である。  [0081] A fourth aspect of the present invention is the video display device according to the above aspect, wherein the speaker information acquisition means acquires speaker information together with reception of a program.
[0082] この構成によれば、予め発言者情報を取得する手間を省くことができ、視聴者の利 便性を向上させることができる。  [0082] According to this configuration, it is possible to save the trouble of acquiring speaker information in advance, and to improve the convenience of the viewer.
[0083] 本発明の第 5の態様は、上記態様において、前記表示処理手段が、一つの画像領 域を、字幕を表示する字幕表示領域と、前記字幕表示領域とは異なり、かつ、映像を 表示する映像表示領域とに分割し、発言者を含む映像を前記映像表示領域に配し 、前記発言者に対応する字幕及び発言者情報を前記字幕表示領域に配する映像 表示装置である。  [0083] According to a fifth aspect of the present invention, in the above aspect, the display processing unit is different from the subtitle display area in which the display processing unit displays one subtitle display area and the subtitle display area. The video display device is divided into a video display area to be displayed, video including a speaker is arranged in the video display area, and subtitles and speaker information corresponding to the speaker are arranged in the subtitle display area.
[0084] この構成によれば、字幕表示と映像表示を同一画面上でそれぞれ分けて表示する ことになるので、映像表示と字幕表示が重なることがなくなり、映像又は字幕の表示 が見えなくなることを防止することができる。  [0084] According to this configuration, since the subtitle display and the video display are displayed separately on the same screen, the video display and the subtitle display do not overlap, and the video or subtitle display cannot be seen. Can be prevented.
[0085] 本発明の第 6の態様は、上記態様において、前記表示処理手段が、発言者の発言 の有無に応じて、発言者毎の字幕を表示する領域である発言者枠を動的に配する 映像表示装置である。 [0085] According to a sixth aspect of the present invention, in the above aspect, the display processing unit dynamically sets a speaker frame, which is a region for displaying a caption for each speaker, according to the presence or absence of the speaker. Distribute It is a video display device.
[0086] この構成によれば、発言者の発言がある場合に発言者枠を配し、発言者の発言が ない場合には発言者枠を配さないことにより、映像の画面比率と映像表示装置の画 面比率とが異なる場合、余った領域を字幕表示領域として有効利用することができる  [0086] According to this configuration, by arranging a speaker frame when there is a speaker's utterance and by not arranging a speaker frame when there is no speaker's utterance, the video screen ratio and video display If the screen ratio of the device is different, the surplus area can be used effectively as a caption display area.
[0087] 本発明の第 7の態様は、上記態様において、前記表示処理手段が、第 1発言者の 字幕を表示中に、第 1発言者とは異なる第 2発言者の字幕を表示してから、前記第 1 発言者の字幕を削除する映像表示装置である。 [0087] In a seventh aspect of the present invention, in the above aspect, the display processing means displays a subtitle of a second speaker different from the first speaker while displaying the subtitle of the first speaker. From the video display device, the subtitles of the first speaker are deleted.
[0088] この構成によれば、第 1発言者の発言が終了しても、第 1発言者の字幕が表示され ていることになり、複数人が同時に発言した場合でも、視聴者は内容を容易に把握 することができる。 [0088] According to this configuration, even when the first speaker finishes speaking, the subtitles of the first speaker are displayed, and even if multiple people speak at the same time, the viewer can change the content. It can be easily grasped.
[0089] 本発明の第 8の態様は、上記態様において、前記表示処理手段が、発言者枠が表 示可能な数の上限で表示され、かつ、新たな発言者が登場した場合、表示されてい る発言者枠を削除し、新たな発言者に対応する発言者枠を配する映像表示装置で ある。  [0089] In an eighth aspect of the present invention, in the above aspect, the display processing means is displayed when an upper limit of the number of speaker frames can be displayed and when a new speaker appears. This is a video display device in which a speaker frame is deleted and a speaker frame corresponding to a new speaker is arranged.
[0090] この構成によれば、登場人物が多数 ヽる場合でも、限られた字幕表示領域を有効 に利用することができ、視聴者は内容を容易に把握することができる。  [0090] According to this configuration, even when there are many characters, the limited caption display area can be used effectively, and the viewer can easily grasp the contents.
[0091] 本発明の第 9の態様は、上記態様において、前記表示処理手段が、前記発言者情 報取得手段によって取得された同一の発言者を視聴者に認知させる複数種類の発 言者情報のうち、いずれの発言者情報を用いるかを示す表示指定情報に基づいて、 字幕表示画像を作成する映像表示装置である。  [0091] In a ninth aspect of the present invention, in the above aspect, the display processing means is a plurality of types of speaker information that allows a viewer to recognize the same speaker acquired by the speaker information acquisition means. The video display device creates a caption display image based on display designation information indicating which speaker information is used.
[0092] この構成〖こよれば、同一の発言者を視聴者に認知させる発言者情報を複数種類の 中から指定することにより、リッチな字幕表示を行うことができる。  According to this configuration, rich subtitle display can be performed by designating speaker information from among a plurality of types that allows the viewer to recognize the same speaker.
[0093] 本発明の第 10の態様は、上記態様において、前記表示処理手段が、複数の発言 者の発言者枠を表示中に、いずれかの発言者の発言が連続する場合、他の発言者 の発言者枠を削除し、削除された領域に前記いずれかの発言者の発言者枠を拡張 する映像表示装置である。  [0093] According to a tenth aspect of the present invention, in the above aspect, when the display processing means displays a speaker frame of a plurality of speakers and a speaker's speech continues, This is a video display device that deletes the speaker frame of the speaker and extends the speaker frame of any of the speakers to the deleted area.
[0094] この構成によれば、発言が連続する発言者の発言者枠を、他の表示中の発言者枠 を削除し、削除した領域に拡張することにより、長い字幕を表示することができる。 [0094] According to this configuration, a speaker frame of a speaker having a continuous speech is replaced with another speaker frame being displayed. It is possible to display a long subtitle by deleting and expanding to the deleted area.
[0095] 本発明の第 11の態様は、上記態様において、映像又は音声を解析する解析手段 を具備し、前記表示処理手段は、前記解析手段の解析結果に基づいて、字幕を装 飾する映像表示装置である。  [0095] An eleventh aspect of the present invention includes, in the above aspect, an analysis unit that analyzes video or audio, and the display processing unit is a video that decorates subtitles based on an analysis result of the analysis unit. It is a display device.
[0096] この構成によれば、発言者の音声又は映像を解析した結果を視覚的に表示するこ とにより、文字以外の情報を字幕表示することができ、番組内容をより容易に把握す ることがでさる。 [0096] According to this configuration, by visually displaying the result of analyzing the voice or video of the speaker, information other than characters can be displayed in subtitles, and the program content can be grasped more easily. That's right.
[0097] 本発明の第 12の態様は、上記態様において、前記表示処理手段が、字幕表示情 報の表示順と字幕の装飾とを対応付け、字幕表示情報の表示順に応じて、字幕を装 飾する映像表示装置である。  [0097] In a twelfth aspect of the present invention, in the above aspect, the display processing means associates the display order of the caption display information with the decoration of the caption, and installs the caption according to the display order of the caption display information. It is a video display device to decorate.
[0098] この構成によれば、字幕を読むことなく字幕の表示順を視覚的に認識することがで きるので、番組の内容を容易に把握することができる。  [0098] According to this configuration, it is possible to visually recognize the display order of the subtitles without reading the subtitles, so that the contents of the program can be easily grasped.
[0099] 本発明の第 13の態様は、映像、字幕及び字幕の発言者を視聴者に認知させる発 言者情報を放送波として送出する放送波送出装置と、前記放送波送出装置から送 出された放送波を受信する受信手段と、受信した放送波に含まれる字幕と発言者情 報とを対応付けた字幕表示画像を作成し、作成した字幕表示画像と映像とを合成す る表示処理手段と、前記表示処理手段によって合成された画像を表示する表示手段 と、を有する映像表示装置と、を具備する放送システムである。  [0099] A thirteenth aspect of the present invention is a broadcast wave transmission device that transmits video, subtitles, and speaker information that allows a viewer to recognize a speaker of the subtitles as a broadcast wave, and a transmission from the broadcast wave transmission device. Receiving processing for receiving the broadcast wave generated, and creating a caption display image in which the caption contained in the received broadcast wave is associated with the speaker information, and combining the created caption display image and video And a video display device having display means for displaying the image synthesized by the display processing means.
[0100] この構成によれば、字幕と映像内の発言者とを対応付けられるので、映像に表示さ れて 、な 、発言者でも視聴者は認識することができ、無音状態で番組内容を容易に 把握することができる。  [0100] According to this configuration, the subtitle and the speaker in the video can be associated with each other, so that even if the speaker is displayed on the video, the viewer can recognize the program content in a silent state. It can be easily grasped.
[0101] 本発明の第 14の態様は、映像、字幕及び字幕の発言者を視聴者に認知させる発 言者情報を記録する録画装置と、前記録画装置に記録された情報に含まれる字幕と 発言者情報とを対応付けた字幕表示画像を作成し、作成した字幕表示画像と映像と を合成する表示処理手段と、前記表示処理手段によって合成された画像を表示する 表示手段と、を有する映像表示装置と、を具備する録画再生システムである。  [0101] A fourteenth aspect of the present invention provides a recording device that records speaker information that allows a viewer to recognize a speaker of video, captions, and captions, and subtitles included in the information recorded in the recording device. A video having: a subtitle display image associated with speaker information; a display processing unit that combines the generated subtitle display image with the video; and a display unit that displays the image combined by the display processing unit. And a display device.
[0102] この構成によれば、字幕と映像内の発言者とを対応付けられるので、映像に表示さ れて 、な 、発言者でも視聴者は認識することができ、無音状態で番組内容を容易に 把握することができる。 [0102] According to this configuration, the subtitles and the speakers in the video can be associated with each other. Therefore, even if the speakers are displayed on the video, the viewer can recognize them, and the program contents can be displayed in a silent state. easily I can grasp it.
[0103] 本発明の第 15の態様は、認証処理を行う認証処理手段と、前記認証処理手段に より認証された映像表示装置に、同一の発言者を視聴者に認知させる複数の異なる 発言者情報を送信する送信手段と、を有する認証装置と、字幕と前記認証装置から 送信された複数の発言者情報の!/、ずれかを対応付けた字幕表示画像を作成し、作 成した字幕表示画像と映像とを合成する表示処理手段と、前記表示処理手段によつ て合成された画像を表示する表示手段と、を有する映像表示装置と、を具備する認 証システムである。  [0103] In a fifteenth aspect of the present invention, an authentication processing means for performing an authentication process and a video display device authenticated by the authentication processing means allow a plurality of different speakers to recognize the same speaker. A subtitle display image created by creating a subtitle display image in which a subtitle is associated with! / Or a deviation of a plurality of speaker information transmitted from the authentication device; An authentication system comprising: a display processing unit configured to combine an image and a video; and a display unit configured to display an image combined by the display processing unit.
[0104] この構成によれば、認証装置によって認証された映像表示装置が同一の発言者を 視聴者に認知させる複数の異なる発言者情報を取得することになるので、認証され た映像表示装置がリッチな字幕表示を行うことができる。  [0104] According to this configuration, since the video display device authenticated by the authentication device acquires a plurality of different speaker information that allows the viewer to recognize the same speaker, the authenticated video display device Rich subtitle display can be performed.
[0105] 本発明の第 16の態様は、字幕と字幕の発言者を視聴者に認知させる発言者情報 とを対応付けた字幕表示画像を作成し、作成した字幕表示画像と映像とを合成する 表示処理工程と、前記表示処理工程で合成された画像を表示する表示工程と、を具 備する映像表示方法である。 [0105] In the sixteenth aspect of the present invention, a subtitle display image in which subtitles and speaker information for allowing a viewer to recognize a subtitle speaker are associated is created, and the created subtitle display image and video are synthesized. A video display method comprising: a display processing step; and a display step for displaying an image synthesized in the display processing step.
[0106] この方法によれば、字幕と映像内の発言者とを対応付けられるので、映像に表示さ れて 、な 、発言者でも視聴者は認識することができ、無音状態で番組内容を容易に 把握することができる。 [0106] According to this method, the subtitles and the speakers in the video can be associated with each other, so that even if the speakers are displayed on the video, the viewer can recognize them, and the program contents can be displayed in a silent state. It can be easily grasped.
[0107] 本明細書は、 2004年 8月 25日出願の特願 2004— 245734に基づくものである。  [0107] This specification is based on Japanese Patent Application No. 2004-245734 filed on Aug. 25, 2004.
この内容は全てここに含めておく。  All this content is included here.
産業上の利用可能性  Industrial applicability
[0108] 本願発明にかかる映像表示装置及び映像表示方法は、無音状態でも番組内容を 視聴者に容易に把握させると!ヽぅ効果を有し、小型の画面サイズを有する携帯電話 等に適用することができる。 [0108] The video display device and video display method according to the present invention make it easy for the viewer to grasp the program contents even in a silent state! It can be applied to a mobile phone having a small effect and a small screen size.

Claims

請求の範囲 The scope of the claims
[1] 字幕と字幕の発言者を視聴者に認知させる発言者情報とを対応付けた字幕表示 画像を作成し、作成した字幕表示画像と映像とを合成する表示処理手段と、 前記表示処理手段によって合成された画像を表示する表示手段と、  [1] A display processing unit that generates a subtitle display image in which subtitles and speaker information that allows a viewer to recognize a subtitle speaker are associated, and combines the generated subtitle display image and video. The display processing unit Display means for displaying an image synthesized by
を具備する映像表示装置。  A video display device comprising:
[2] 発言者情報を取得する発言者情報取得手段と、  [2] Speaker information acquisition means for acquiring speaker information;
取得された発言者情報を記憶する発言者情報記憶手段と、  Speaker information storage means for storing the acquired speaker information;
を具備する請求項 1に記載の映像表示装置。  The video display device according to claim 1, further comprising:
[3] 前記発言者情報取得手段は、番組の受信開始前に予め発言者情報を取得する請 求項 2に記載の映像表示装置。 [3] The video display device according to claim 2, wherein the speaker information acquisition means acquires speaker information in advance before starting reception of a program.
[4] 前記発言者情報取得手段は、番組の受信と共に発言者情報を取得する請求項 2 に記載の映像表示装置。 4. The video display device according to claim 2, wherein the speaker information acquisition means acquires speaker information together with reception of a program.
[5] 前記表示処理手段は、一つの画像領域を、字幕を表示する字幕表示領域と、前記 字幕表示領域とは異なり、かつ、映像を表示する映像表示領域とに分割し、発言者 を含む映像を前記映像表示領域に配し、前記発言者に対応する字幕及び発言者情 報を前記字幕表示領域に配する請求項 1に記載の映像表示装置。 [5] The display processing unit divides one image area into a subtitle display area for displaying subtitles and a video display area for displaying video, which is different from the subtitle display area, and includes a speaker. The video display device according to claim 1, wherein video is arranged in the video display area, and subtitles and speaker information corresponding to the speaker are arranged in the subtitle display area.
[6] 前記表示処理手段は、発言者の発言の有無に応じて、発言者毎の字幕を表示す る領域である発言者枠を動的に配する請求項 1に記載の映像表示装置。 6. The video display device according to claim 1, wherein the display processing means dynamically arranges a speaker frame, which is a region for displaying a caption for each speaker, according to the presence or absence of the speaker.
[7] 前記表示処理手段は、第 1発言者の字幕を表示中に、第 1発言者とは異なる第 2発 言者の字幕を表示してから、前記第 1発言者の字幕を削除する請求項 1に記載の映 像表示装置。 [7] The display processing means displays the subtitle of the second speaker different from the first speaker while displaying the subtitle of the first speaker, and then deletes the subtitle of the first speaker. The image display device according to claim 1.
[8] 前記表示処理手段は、発言者枠が表示可能な数の上限で表示され、かつ、新たな 発言者が登場した場合、表示されている発言者枠を削除し、新たな発言者に対応す る発言者枠を配する請求項 1に記載の映像表示装置。  [8] The display processing means displays the maximum number of speaker frames that can be displayed, and when a new speaker appears, deletes the displayed speaker frame and makes it a new speaker. The video display device according to claim 1, wherein a corresponding speaker frame is arranged.
[9] 前記表示処理手段は、前記発言者情報取得手段によって取得された同一の発言 者を視聴者に認知させる複数種類の発言者情報のうち、 V、ずれの発言者情報を用 Vヽるかを示す表示指定情報に基づ ヽて、字幕表示画像を作成する請求項 1に記載 の映像表示装置。 [9] The display processing means uses V, out of a plurality of kinds of speaker information for allowing the viewer to recognize the same speaker acquired by the speaker information acquisition means. The video display device according to claim 1, wherein a subtitle display image is created based on display designation information indicating whether or not.
[10] 前記表示処理手段は、複数の発言者の発言者枠を表示中に、いずれかの発言者 の発言が連続する場合、他の発言者の発言者枠を削除し、削除された領域に前記[10] The display processing means deletes a speaker frame of another speaker when a speaker's speech continues while displaying the speaker frames of a plurality of speakers, To the above
V、ずれかの発言者の発言者枠を拡張する請求項 1に記載の映像表示装置。 The video display device according to claim 1, wherein the speaker frame of V or any speaker is expanded.
[11] 映像又は音声を解析する解析手段を具備し、 [11] comprises analysis means for analyzing video or audio;
前記表示処理手段は、前記解析手段の解析結果に基づいて、字幕を装飾する請 求項 1に記載の映像表示装置。  The video display device according to claim 1, wherein the display processing means decorates a caption based on an analysis result of the analysis means.
[12] 前記表示処理手段は、字幕表示情報の表示順と字幕の装飾とを対応付け、字幕 表示情報の表示順に応じて、字幕を装飾する請求項 1に記載の映像表示装置。 12. The video display device according to claim 1, wherein the display processing unit associates the display order of the caption display information with decoration of the caption, and decorates the caption according to the display order of the caption display information.
[13] 映像、字幕及び字幕の発言者を視聴者に認知させる発言者情報を放送波として送 出する放送波送出装置と、 [13] A broadcast wave transmission device that transmits, as a broadcast wave, speaker information that allows viewers to recognize video, captions, and caption speakers.
前記放送波送出装置から送出された放送波を受信する受信手段と、  Receiving means for receiving a broadcast wave sent from the broadcast wave sending device;
受信した放送波に含まれる字幕と発言者情報とを対応付けた字幕表示画像を作成 し、作成した字幕表示画像と映像とを合成する表示処理手段と、  Display processing means for creating a caption display image in which the caption contained in the received broadcast wave is associated with the speaker information, and synthesizing the created caption display image and video;
前記表示処理手段によって合成された画像を表示する表示手段と、  Display means for displaying an image synthesized by the display processing means;
を有する映像表示装置と、  A video display device comprising:
を具備する放送システム。  A broadcasting system comprising:
[14] 映像、字幕及び字幕の発言者を視聴者に認知させる発言者情報を記録する録画 装置と、 [14] A recording device for recording video information, subtitles, and speaker information that allows viewers to recognize the speakers of the subtitles;
前記録画装置に記録された情報に含まれる字幕と発言者情報とを対応付けた字幕 表示画像を作成し、作成した字幕表示画像と映像とを合成する表示処理手段と、 前記表示処理手段によって合成された画像を表示する表示手段と、  A display processing unit that generates a subtitle display image in which subtitles included in the information recorded in the recording device are associated with speaker information, and combines the generated subtitle display image and video; and the display processing unit Display means for displaying the captured image;
を有する映像表示装置と、  A video display device comprising:
を具備する録画再生システム。  A recording / playback system comprising:
[15] 認証処理を行う認証処理手段と、 [15] Authentication processing means for performing authentication processing;
前記認証処理手段により認証された映像表示装置に、同一の発言者を視聴者に 認知させる複数の異なる発言者情報を送信する送信手段と、  Transmitting means for transmitting a plurality of different speaker information for allowing the viewer to recognize the same speaker to the video display device authenticated by the authentication processing means;
を有する認証装置と、  An authentication device having
字幕と前記認証装置から送信された複数の発言者情報のいずれかを対応付けた 字幕表示画像を作成し、作成した字幕表示画像と映像とを合成する表示処理手段と 前記表示処理手段によって合成された画像を表示する表示手段と、 A subtitle is associated with one of a plurality of speaker information transmitted from the authentication device A display processing unit that generates a caption display image, combines the generated caption display image and video, and a display unit that displays an image combined by the display processing unit;
を有する映像表示装置と、  A video display device comprising:
を具備する認証システム。  An authentication system comprising:
字幕と字幕の発言者を視聴者に認知させる発言者情報とを対応付けた字幕表示 画像を作成し、作成した字幕表示画像と映像とを合成する表示処理工程と、 前記表示処理工程で合成された画像を表示する表示工程と、  A subtitle display image in which subtitles and speaker information for allowing the viewer to recognize the subtitle speaker are associated with each other is generated, and the generated subtitle display image and the video are combined. A display process for displaying the image,
を具備する映像表示方法。  A video display method comprising:
PCT/JP2005/011423 2004-08-25 2005-06-22 Video display and video displaying method WO2006022071A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2004-245734 2004-08-25
JP2004245734 2004-08-25

Publications (1)

Publication Number Publication Date
WO2006022071A1 true WO2006022071A1 (en) 2006-03-02

Family

ID=35967292

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2005/011423 WO2006022071A1 (en) 2004-08-25 2005-06-22 Video display and video displaying method

Country Status (1)

Country Link
WO (1) WO2006022071A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007142568A (en) * 2005-11-15 2007-06-07 Toshiba Corp Portable caption specification changing apparatus for digital broadcast and changing method thereof
JP2007142569A (en) * 2005-11-15 2007-06-07 Toshiba Corp Portable caption specification conversion apparatus of digital broadcast and converting method thereof
EP2180693A1 (en) * 2008-10-22 2010-04-28 Samsung Electronics Co., Ltd. Display apparatus and control method thereof
JP2010251841A (en) * 2009-04-10 2010-11-04 Nikon Corp Image extraction program and image extraction device
JP2016082355A (en) * 2014-10-15 2016-05-16 富士通株式会社 Input information support device, input information support method, and input information support program
WO2019230225A1 (en) * 2018-05-29 2019-12-05 ソニー株式会社 Image processing device, image processing method, and program
JP2020010224A (en) * 2018-07-10 2020-01-16 ヤマハ株式会社 Terminal device, information providing system, operation method of terminal device, and information providing method
US20240022682A1 (en) * 2022-07-13 2024-01-18 Sony Interactive Entertainment LLC Systems and methods for communicating audio data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002056006A (en) * 2000-08-10 2002-02-20 Nippon Hoso Kyokai <Nhk> Video/voice retrieving device
JP2002232802A (en) * 2001-01-31 2002-08-16 Mitsubishi Electric Corp Video display device
JP2002232798A (en) * 2001-01-30 2002-08-16 Toshiba Corp Broadcast receiver and its control method
JP2002341890A (en) * 2001-05-17 2002-11-29 Matsushita Electric Ind Co Ltd Method for speech recognition and character representation and device for the same
JP2003224842A (en) * 2002-01-31 2003-08-08 Matsushita Electric Ind Co Ltd Contents distribution method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002056006A (en) * 2000-08-10 2002-02-20 Nippon Hoso Kyokai <Nhk> Video/voice retrieving device
JP2002232798A (en) * 2001-01-30 2002-08-16 Toshiba Corp Broadcast receiver and its control method
JP2002232802A (en) * 2001-01-31 2002-08-16 Mitsubishi Electric Corp Video display device
JP2002341890A (en) * 2001-05-17 2002-11-29 Matsushita Electric Ind Co Ltd Method for speech recognition and character representation and device for the same
JP2003224842A (en) * 2002-01-31 2003-08-08 Matsushita Electric Ind Co Ltd Contents distribution method

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007142569A (en) * 2005-11-15 2007-06-07 Toshiba Corp Portable caption specification conversion apparatus of digital broadcast and converting method thereof
JP2007142568A (en) * 2005-11-15 2007-06-07 Toshiba Corp Portable caption specification changing apparatus for digital broadcast and changing method thereof
EP2180693A1 (en) * 2008-10-22 2010-04-28 Samsung Electronics Co., Ltd. Display apparatus and control method thereof
JP2010251841A (en) * 2009-04-10 2010-11-04 Nikon Corp Image extraction program and image extraction device
JP2016082355A (en) * 2014-10-15 2016-05-16 富士通株式会社 Input information support device, input information support method, and input information support program
US11450352B2 (en) 2018-05-29 2022-09-20 Sony Corporation Image processing apparatus and image processing method
WO2019230225A1 (en) * 2018-05-29 2019-12-05 ソニー株式会社 Image processing device, image processing method, and program
EP3787285A4 (en) * 2018-05-29 2021-03-03 Sony Corporation Image processing device, image processing method, and program
JPWO2019230225A1 (en) * 2018-05-29 2021-07-15 ソニーグループ株式会社 Image processing device, image processing method, program
JP7272356B2 (en) 2018-05-29 2023-05-12 ソニーグループ株式会社 Image processing device, image processing method, program
JP2020010224A (en) * 2018-07-10 2020-01-16 ヤマハ株式会社 Terminal device, information providing system, operation method of terminal device, and information providing method
JP7087745B2 (en) 2018-07-10 2022-06-21 ヤマハ株式会社 Terminal device, information provision system, operation method of terminal device and information provision method
US20240022682A1 (en) * 2022-07-13 2024-01-18 Sony Interactive Entertainment LLC Systems and methods for communicating audio data

Similar Documents

Publication Publication Date Title
WO2006022071A1 (en) Video display and video displaying method
CN108847214B (en) Voice processing method, client, device, terminal, server and storage medium
JP2003333445A (en) Caption extractor
CN102348014B (en) For using sound that the apparatus and method of augmented reality service are provided
JP2003333445A5 (en) Title extraction device and system
JP5994968B2 (en) Content utilization apparatus, control method, program, and recording medium
US20050021665A1 (en) Content delivery server, terminal, and program
JPH09506755A (en) Wireless pager with pre-stored images and method and system for use therewith
KR20130005406A (en) Method and apparatus for transmitting message in portable terminnal
US20090096782A1 (en) Message service method supporting three-dimensional image on mobile phone, and mobile phone therefor
JP2002268963A (en) Radio data transmission/reception control method using bluetooth function, radio data transmission/reception system, server and terminal to be used for the same
JP2014006669A (en) Recommended content notification system, control method and control program thereof, and recording medium
JP2007195105A (en) Information acquisition support system and information acquisition method by portable information terminal using sound information
KR101618777B1 (en) A server and method for extracting text after uploading a file to synchronize between video and audio
JP2002288213A (en) Data-forwarding device, data two-way transmission device, data exchange system, data-forwarding method, data-forwarding program, and data two-way transmission program
JP2008113331A (en) Telephone system, telephone set, server device, and program
US7120583B2 (en) Information presentation system, information presentation apparatus, control method thereof and computer readable memory
JP6706591B2 (en) Broadcast receiver, notification method, program, and storage medium
JP2005124169A (en) Video image contents forming apparatus with balloon title, transmitting apparatus, reproducing apparatus, provisioning system, and data structure and record medium used therein
CN1843036A (en) Real-time media dictionary
US20070110397A1 (en) Playback apparatus and bookmark system
CN106657255A (en) File sharing method and device and terminal device
JP2005332404A (en) Content providing system
JP2004253923A (en) Information receiver
JP2005159743A (en) Video display apparatus, video display program, information distribution apparatus, and information communication system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase