WO2006022071A1

WO2006022071A1 - Video display and video displaying method

Info

Publication number: WO2006022071A1
Application number: PCT/JP2005/011423
Authority: WO
Inventors: Tatsuya Nishi
Original assignee: Matsushita Electric Industrial Co., Ltd.
Priority date: 2004-08-25
Filing date: 2005-06-22
Publication date: 2006-03-02

Abstract

A video display enabling the viewer to easily grasp the program content even in a no-sound state. In the video display, a demultiplexing section (113) outputs data, and subtitle information, out of the data outputted from the demultiplexing section (113), classified by speaker according to speaker identification information, is stored in a subtitle record storage section (116). From the data outputted from the demultiplexing section (113), a speaker information extracting section (118) extracts speaker identification information and the speaker information, and the set of the speaker identification information and the speaker information is stored in a speaker information storage section (119). A display processing section (120) divides one image area into a subtitle display area for displaying subtitle and a video display area for displaying a video, allocates the video outputted from a video processing section (117) to the video display area, allocates the subtitle display information stored in the subtitle record storage section (116) and the speaker information stored in the speaker information storage (119) to the subtitle display area, and combines these display images. The combined images are displayed on the display section (121).

Description

Specification

Video display device and video display method

Technical field

[0001] The present invention relates to a video display device and a video display method, and more particularly to a video display device and a video display method for displaying subtitles.

Background art

[0002] In recent years, small mobile terminals such as mobile phones capable of watching TV are becoming widespread, and a user can watch TV without being restricted by a location as long as the user can receive radio waves from the TV while moving or at a destination. be able to.

[0003] When such TV viewing on a small portable terminal is assumed, viewing in a public space is also conceivable. In particular, it is necessary to watch TV audio so that it is not transmitted to the surroundings, such as when using public transportation or when waiting at a hospital waiting time.

[0004] In such a place, it is generally necessary to wear headphones to prevent sound from being transmitted to the surroundings. It takes time and effort to take out and put on headphones, and a short viewing time is expected. The use of headphones is preferred.

[0005] In addition, it is conceivable that the headphone is not worn and the sound is muted and the sound is silenced and the subtitle broadcast is used for viewing. Patent Document 1 discloses such a technique as shown in FIG. In addition, in Patent Document 2, as shown in FIG. 2, a balloon frame corresponding to the person displayed in the image is displayed, and data obtained by converting speech into characters is associated with the speaker of the speech. And a technique for displaying subtitles (character data) in a balloon frame. This makes it easy to identify speakers who are difficult with subtitles alone, and makes it easy to understand the program contents even in silence.

Patent Document 1: Japanese Patent Laid-Open No. 2004-056286

Patent Document 2: Japanese Unexamined Patent Application Publication No. 2004-080069

Disclosure of the invention

Problems to be solved by the invention [0006] However, with the techniques disclosed in Patent Document 1 and Patent Document 2, a frame for speech balloons is displayed to hide the video. In particular, in a small portable terminal, since the display screen is also small, many areas of the screen are occupied by balloon frames, and important images are hidden. In addition, depending on the production of the warp and the like, the content displayed in the subtitle does not necessarily match the remarks of the person displayed in the video. In such a case, the above-mentioned Patent Document 1 and Patent Document With the technology disclosed in 2, it is not possible to associate the speaker with the balloon frame.

[0007] An object of the present invention is to provide a video display device and a video display method that allow a viewer to easily grasp the contents of a program even in a silent state.

Means for solving the problem

[0008] The video display device of the present invention creates a subtitle display image that associates subtitles with speaker information that allows the viewer to recognize the subtitle speaker, and synthesizes the generated subtitle display image and video. A display processing unit, and a display unit that displays an image synthesized by the display processing unit.

[0009] According to this configuration, since the subtitle and the speaker in the video can be associated with each other, even if the speaker is displayed on the video, the viewer can recognize the program contents in a silent state. It can be easily grasped.

The invention's effect

[0010] According to the present invention, it is possible to provide a video display device and a video display method that allow a viewer to easily grasp the contents of a program even in a silent state.

Brief Description of Drawings

FIG. 1 is a diagram showing an image display method disclosed in Patent Document 1.

FIG. 2 is a diagram showing an image display method disclosed in Patent Document 2.

FIG. 3 is a block diagram showing a configuration of a broadcasting system according to Embodiment 1 of the present invention.

FIG. 4 is a conceptual diagram showing the processing of the caption processing unit shown in FIG.

[Figure 5] Diagram showing the display position of the speaker frame

FIG. 6 is a conceptual diagram showing the processing of the speaker information extraction unit shown in FIG.

FIG. 7 is a conceptual diagram showing the state of processing of the display processing unit shown in FIG. FIG. 8 is a flowchart showing the processing procedure of the caption processing unit shown in FIG.

[9] Conceptual diagram showing how subtitles are deleted

圆 10] Conceptual diagram showing how caption display information is deleted

[Figure 11] Conceptual diagram showing display designation information

[12] Conceptual diagram showing how to select display designation information

FIG. 13 is a block diagram showing a configuration of the second embodiment of the present invention;

FIG. 14 is a block diagram showing the configuration of the third embodiment of the present invention;

FIG. 15 is a block diagram showing the configuration of the third embodiment of the present invention;

FIG. 16 is a block diagram showing a configuration of a fourth embodiment of the present invention;

FIG. 17 is a block diagram showing the configuration of the fourth embodiment of the present invention;

FIG. 18 is a block diagram showing the configuration of the fourth embodiment of the present invention;

FIG. 19 is a block diagram showing a configuration of a fifth embodiment of the present invention;

FIG. 20 is a block diagram showing a configuration of a sixth embodiment of the present invention;

FIG. 21 is a block diagram showing the configuration of the seventh embodiment of the present invention;

FIG. 22 is a block diagram showing the configuration of the seventh embodiment of the present invention;

FIG. 23 is a block diagram showing a configuration of the eighth embodiment of the present invention;

FIG. 24 is a block diagram showing the configuration of the ninth embodiment of the present invention;

[Fig.25A] Diagram showing subtitle display order in ascending order

[Fig.25B] Diagram showing subtitle display order in descending order

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that in the embodiment, the same reference numerals are given to configurations having the same functions, and duplicate descriptions are omitted.

[0013] (Embodiment 1)

FIG. 3 shows the configuration of the broadcasting system according to Embodiment 1 of the present invention. First, the configuration of the broadcast wave transmitting apparatus 100 will be described. The input device 101 is a camera, a microphone, a keyboard, or the like, through which caption information, video / audio content, and data content are input. [0014] The video encoding unit 102 encodes video information in the audio / video content using a compression method such as Mpeg2, Mpeg4, or H.264, and outputs the encoded video information to the multiplexing processing unit 104. To do.

[0015] The audio encoding unit 103 encodes audio information in the video / audio content using a compression method such as AAC, and outputs the encoded audio information to the multiplexing processing unit 104.

The multiplexing processing unit 104 includes video information output from the video encoding unit 102, audio information output from the audio encoding unit 103, other program information, program identification information, text information, Broadcast contents such as image information (hereinafter referred to as “other broadcast contents”) are multiplexed, and the multiplexed signal is output to the transmission path code unit 105.

[0017] Transmission path code unit 105 performs transmission processing such as encoding and modulation on the signal output from multiplexing processing unit 104, and transmits a broadcast wave from antenna 106.

Next, the configuration of the video display device 110 will be described. The tuner unit 112 extracts the frequency signal of the channel specified by the user from the broadcast wave received via the antenna 111, and performs code demodulation processing of the extracted frequency signal. The demodulated signal is output to the demultiplexing unit 113.

[0019] Demultiplexing section 113 separates the signal output from tuner section 112 into subtitle information, video information, and other broadcast contents, and outputs the separated subtitle information to subtitle processing section 115 to display the video information as video. The data is output to the processing unit 117, and other broadcast contents are output to the speaker information extraction unit 118. Note that the caption information includes the speaker identification information, which is information such as an ID for identifying the speaker, and information on the caption itself, and the other broadcast content includes a speaker who recognizes the speaker of the caption. Information and speaker identification information.

The timer 114 measures the current time, and notifies the caption processing unit 115 and the display processing unit 120 of the measured current time.

[0021] The caption processing unit 115 stores the caption information output from the demultiplexing unit 113 in the caption history storage unit 116 for each speaker based on the speaker identification information. At this time, the display time is also stored in the caption history storage unit 116 with the current time notified from the timer 114 as the display time of the caption information. Also, the area for displaying the caption for each speaker is set as the speaker frame, the position for displaying the speaker frame (hereinafter simply referred to as “display position”) is determined, and the determined display position is also determined. Stored in the subtitle history storage unit 116. FIG. 4 conceptually shows the processing of the caption processing unit 115. In this embodiment, as shown in FIG. 5, three speaker frames are prepared, and the display positions are “1”, “2”, and “3” in order from the top, and the determination of the display position is vacant. The lowest number among the display positions is determined.

[0022] The caption history storage unit 116 manages the speaker identification information, the display time, the display position, and the caption display information that is a set of captions in a table as shown in FIG.

The video processing unit 117 decodes the video stream output from the demultiplexing unit 113 and encoded with H.264 or the like, and outputs the decoded signal to the display processing unit 120.

The speaker information extraction unit 118 extracts the data power speaker identification information and the speaker information output from the multiplexing / separation unit 113, and sets the extracted speaker identification information and the speaker information as a pair. And stored in the speaker information storage unit 119. Figure 6 schematically shows the processing performed by the speaker information extraction unit 118. As shown in FIG. 6, the speaker information storage unit 119 manages the speaker identification information and the speaker information as a set in a table.

[0025] The display processing unit 120 divides one image area into a subtitle display area for displaying subtitles and a video display area for displaying video, and the video output from the video processing unit 117 is used as the video display area. The subtitle display information stored in the subtitle history storage unit 116 and the speaker information stored in the speaker information storage unit 119 are arranged in the subtitle display area, and these display images are synthesized. At this time, the speaker frames are sorted based on the time notified from the timer 114 and the display time stored in the caption history storage unit 116. Since the display processing unit 120 separates the subtitle display and the video display on the same screen, the video display and the subtitle display are not overlapped, and the video or subtitle display can be prevented from being invisible. . The synthesized image is output to the display unit 121. Fig. 7 conceptually shows how the display processing unit 120 processes.

[0026] It should be noted that the display processing unit 120 dynamically distributes the speaker frame according to the presence or absence of the speaker, and arranges the speaker frame when there is a speaker's speech. If there is no speaker's speech, no speaker frame is placed, so that if the screen ratio of the video and the screen ratio of the video display device are different, the surplus area can be used effectively as a subtitle display area. .

The display unit 121 displays the composite image output from the display processing unit 120. Next, main operations of the video display apparatus 100 having the above-described configuration will be described with reference to FIG. In FIG. 8, in step (hereinafter abbreviated as “ST”) 131, subtitles that have been displayed for more than a specified time (for example, 5 seconds) are deleted from the subtitle display information of subtitle history storage section 116. Move to ST132. Figure 9 shows how subtitles are deleted.

[0029] In ST132, subtitle display information that has passed a specified time after displaying the subtitle, or subtitle display information that has been used for two or more frames and only the subtitle is deleted, is moved to ST133. Figure 10 shows the screen display information deleted. When deleting the caption display information at the same time as the caption, the deletion designation time of the caption should be equal to the deletion designation time of the caption display information. By the way, the specified deletion time is the time to display the subtitle of the first speaker after displaying the subtitle of the second speaker different from the first speaker while displaying the subtitle of the first speaker. . As a result, even if the first speaker finishes speaking, the subtitles of the first speaker are displayed, and even if multiple people speak at the same time, the viewer can easily grasp the contents. it can.

[0030] In ST133, it is determined whether or not the new subtitle information has been acquired from the demultiplexing unit 113. If it is determined that new subtitle information has been acquired (YES), the process proceeds to ST134, where new subtitle information is acquired. If it is determined that the information has not been acquired (NO), the process returns to ST131, and the processes of ST131 to ST133 are repeated until it is determined that new caption information has been acquired.

In ST134, it is determined whether or not there is power in the speaker frame. Specifically, since there is an upper limit on the number of speaker frames that can be displayed depending on the specifications of the screen size, whether or not the number of information speaker frames stored in the caption history storage unit 116 is the upper limit. Is determined. For example, in the case of the upper limit power, if the number of speakers is 3 or less, it is determined that the number is not the upper limit, and if the number of speakers is 4, the upper limit is determined. That is, if it is not the upper limit, it is determined that there is an empty speaker frame (YES), and the process proceeds to ST136. If it is the upper limit, it is determined that there is no empty speaker frame (NO), and the process proceeds to ST135.

[0032] In ST135, since it is determined in ST134 that there is no space in the speaker frame, the speaker frame is secured by deleting the caption display information with the oldest display time from the caption history storage unit 116. Move to ST136. As a result, even when there are many characters, the limited caption display area can be used effectively, and the viewer can easily grasp the contents. Can be gripped.

[0033] In ST136, whether the same speaker identification information as the speaker identification information included in the new caption information acquired from the demultiplexing unit 113 exists in the caption history storage unit 116, that is, the stored card Determine whether or not. If it is determined that it exists (YES), the process proceeds to ST138, and if it is determined that it does not exist (NO), the process proceeds to ST137.

In ST137, new caption display information is recorded in a free area of caption history storage section 116 based on the new caption information.

[0035] In ST138, the subtitle display information including the speaker identification information included in the newly acquired subtitle information and the same speaker identification information stored in the subtitle history storage unit 116 has the power (stored). Whether or not) is determined. Here, in ST131, subtitles after the specified time have also been deleted in ST131, and in ST132, when the subtitle display information within the specified time is not deleted from the subtitle display, only the subtitle display information is deleted. Therefore, it is determined whether or not only the caption is deleted. If it is determined that subtitles are present (YES), the process proceeds to ST140, and if it is determined that no subtitles are present (NO), the process proceeds to ST139.

[0036] In ST139, new caption information is stored in the caption display information (not including the caption) including the same speaker identification information stored in caption caption storage section 116.

In ST140, it is determined whether or not the display position next to the display position corresponding to the same speaker identification information stored in subtitle history storage section 116 is empty. For example, if the display position of the lowest speaker identification information stored in the caption history storage unit 116 is the second from the top, the next display position, that is, the top 3 It is determined whether or not the display position of the eye is empty. If it is determined that it is empty (YES), the process proceeds to ST 142, and if it is determined that it is not empty (NO), the process proceeds to ST 141. Also, if the same speaker identification information is the lowest display position and there is no next display position, it is determined that it is not empty and (NO).

[0038] In ST142, among the same speaker identification information stored in subtitle history storage section 116 in ST141, it is determined that the next display position of the lowest display position is not empty. Create a space at the display position of. Specifically, the same speaker identification information If the display position of the lowest information is the second from the top, and if the next display position, that is, the third display position from the top is not empty, then the top three The display position of the caption display information for the eyes is shifted up to the fourth position, and the fourth and subsequent positions from the top are also shifted up.

As described above, the caption processing unit 115 dynamically stores the caption information for each speaker in the caption history storage unit 116, and deletes the caption information stored in the caption history storage unit 116 in order of age.

[0040] Thus, according to Embodiment 1, one image area is divided into a caption display area and a video display area, and information indicating a speaker is associated with captions indicating the content of the speaker. By displaying the caption display image in the caption display area and displaying the video in the video display area, the content of the program can be recognized by the user even in a silent state without hiding the video. Can be easily grasped.

In the present embodiment, description has been made assuming that there is only one type of icon, speaker frame, font, character color, character size, etc. as speaker information. However, as shown in FIG. A plurality of icons, speaker frames, fonts, character colors, character sizes, etc. may be prepared, and the display designation information may be used so that any one of these may be used. In this case, if the other broadcast contents include multiple speaker information indicating the same speaker, the speaker information extraction unit 118 can combine the other broadcast content capability speaker identification information with the display designation information. As shown in FIG. 12, the subtitle display image is created according to the extracted display designation information. Also, if the broadcast content does not contain multiple speaker information, the default information prepared in advance is used.

[0042] Note that the display processing unit 120 according to the present embodiment displays a speaker frame of a plurality of speakers, and when a speaker's speech continues, a speaker frame of another speaker is displayed. May be deleted, and the speaker frame of a speaker who makes a continuous speech in the deleted area may be expanded. If the speaker frame is expanded to the maximum and the speech continues, scroll the subtitles. Thereby, a long subtitle can be displayed.

[Embodiment 2]

In the second embodiment of the present invention, the other broadcast contents include time control mode (TMD) and display start time (STM) t and time information, and when this time information is used, explain.

In the video display device 150 according to Embodiment 2 of the present invention, as shown in FIG. 13, the speaker information extraction processing unit 151 extracts time information from the data output from the demultiplexing unit 113. The extracted time information is output to the caption processing unit 115 and the display processing unit 120.

[0045] Thereby, the video display device 150 can omit the timer for measuring the current time, and the device scale can be reduced.

[Embodiment 3]

In video display device 160 according to Embodiment 3 of the present invention, as shown in FIG. 14, storage device 161 is a DVD (Digital Versatile Disc), an SD card, a hard disk, or the like, and video and audio content and data content are stored. ing.

Thus, the video display device 160 can simultaneously display video, speaker information, and subtitles using the video / audio content and data content stored in the storage device 161.

As shown in FIG. 15, the video display device 165 receives the broadcast wave, records the received signal demodulated by the tuner unit 112, records it in the recording processing unit 166, and stores it in the storage device 161. You may have the reception video recording function to memorize. In this case, the received broadcast wave may be demodulated and displayed in real time, or stored in the storage device 161 and displayed.

[0049] (Embodiment 4)

FIG. 16 shows the configuration of a broadcasting system according to Embodiment 4 of the present invention. In this figure, in the video display device 170, the communication unit 171 transmits / receives video / audio content and data content from the servo 180 via a communication network such as the Internet network. Here, the communication method of the communication unit 171 may be wired or wireless regardless of the type, such as a network adapter, a wireless local area network (LAN), Bluetooth, or infrared communication.

[0050] The server 180 inputs speaker information using a camera, a keyboard, or the like as the input device 181, stores the speaker information in the speaker information storage unit 182, and stores the speaker information in the video display device 170 via the communication unit 183. Send.

[0051] Thereby, the video display device 170 can acquire the video / audio content and the caption information from the broadcast wave, and can acquire the speaker information from the communication network. This makes it possible to When the speaker information of a program is acquired in advance via a network or data broadcasting, and the video of that program is played, the viewer can use the acquired speaker information to Can be easily grasped.

As shown in FIG. 17, in the video display device 172, the communication unit 171 may acquire the video / audio content, the caption information, and the speaker information from the communication network. In addition, as shown in FIG. 18, in the video display device 173, subtitle information and speaker information may be acquired from the communication network, and video / audio content may be acquired from the broadcast wave. Video, speaker information, and subtitles can be displayed at the same time even in analog broadcasting that is not included.

[0053] (Embodiment 5)

FIG. 19 shows the configuration of a broadcasting system according to Embodiment 5 of the present invention. In this figure, in the video display device 190, the authentication processing unit 192 acquires the authentication information input by the user from the input device 191, and sends an inquiry about the acquired authentication information via the communication unit 171. To 200.

[0054] The speaker information distribution apparatus 200 receives an inquiry for authentication from the video display apparatus 190 via the communication unit 201, and the authentication processing unit 202 collates the authentication information. A plurality of kinds of speaker information stored in the speaker information storage unit 203 are transmitted. Note that the speaker information stored in the speaker information storage unit 203 is input by the input device 204.

[0055] The storage device 193 in the video display device 190 is an SD card or the like having a secure area, and a plurality of types of speaker information acquired from the speaker information distribution device 200 and program identification information using the speaker information. (Program name, broadcast station name, channel, start time, end time, other ID, etc.) and authentication information input from the input device 191 are stored, and only the authentication processing unit 192 can access.

[0056] When the video display device 190 starts viewing a program specified by the program identification information, the authentication processing unit 192 accesses the storage device 193, reads the information stored in the storage device 193, and stores the speaker information. Write to part 119. When the viewing of the program ends, the authentication processing unit 192 deletes the information written in the speaker information storage unit 119. As a result, the authentication process Information (here, multiple types of speaker information) can be prevented from leaking

[0057] In this way, only the video display device that has been successfully authenticated can obtain rich speaker information and perform subtitle display using display designation information. For this reason, a service is provided when authentication information is distributed to users who have purchased rich speaker information or authentication information is distributed to users who have answered a questionnaire such as a homepage of a program. be able to.

As described above, according to the fifth embodiment, only a video display device that has been successfully authenticated can obtain a plurality of types of speaker information, and can perform rich subtitle display.

[Embodiment 6]

FIG. 20 shows the configuration of a broadcast system according to Embodiment 6 of the present invention. In this figure, the video display device 210 includes a first communication unit 171 connected to the speaker information distribution device 200 via a communication network such as the Internet network, a non-contact IC such as a watermelon (registered trademark), a wireless A second communication unit 211 that performs communication using a tag, infrared rays, or the like is provided.

When receiving the key acquisition request from the video display device 210 via the communication unit 221, the key distribution device 220 receives the key (or the key and the address of the speaker information distribution device) managed by the key distribution management unit 222. The video information is distributed to the video display device 210, and the information is notified to the speaker information distribution device 200. Note that authentication information (key, ID) managed by the key distribution management unit 222 is input by the input device 223.

Speaker information distribution apparatus 200 receives a key distribution notification from key distribution apparatus 220 and adds the information to the authentication information managed by authentication processing unit 202. In addition, upon receiving an authentication inquiry using a key from the video display device 210, the authentication processing unit 202 performs authentication, and a plurality of information stored in the speaker information storage unit 203 is only stored for the video display device that has been successfully authenticated. Send the type of speaker information.

In this way, only the video display device that has acquired the key from the key distribution device 220 can obtain rich speaker information and perform subtitle display using a plurality of types of speaker information. For this reason, providing a key service to users who have purchased multiple speaker information, or distributing keys at the store where the program was purchased when purchasing goods related to the program, provides a service It can be done.

As described above, according to the sixth embodiment, only the video display device that has acquired the key can obtain a plurality of types of speaker information, and a rich subtitle display using a plurality of types of speaker information. It can be performed.

[0064] (Embodiment 7)

In video display device 230 according to Embodiment 7 of the present invention, as shown in FIG. 21, audio processing unit 231 decodes the audio stream output from multiplexing / demultiplexing unit 113, and audio analysis is performed on the decoded audio stream. Output to part 232

The sound analysis unit 232 analyzes the sound stream output from the sound processing unit 231 and outputs analysis results such as volume and pitch to the display processing unit 233. Also, by analyzing the characteristics of the voice, information expressing emotions such as emotions, gender information, and information indicating age (for example, baby, child, adult, elderly person, etc.) are generated and output to the display processing unit 233. .

The display processing unit 233 creates a caption display image using the audio analysis result output from the audio analysis unit 232. For example, the volume is associated with the character size, and the pitch is associated with the character color. Also, information representing emotions is associated with fonts, and gender is associated with highlight colors. However, the decoration corresponding to each content of the voice analysis result is not limited to this.

As described above, according to the seventh embodiment, by visually displaying the result of analyzing the voice of the speaker in the caption display image, information other than characters can be displayed in caption, and the program is displayed. The contents can be grasped more easily.

As shown in FIG. 22, the video display device 235 has a video analysis unit 236, the video analysis unit 236 analyzes the video stream output from the video processing unit 117, and the display processing unit 233 The decoration corresponding to the analysis result such as the size may be performed. The display processing unit 233 may perform decoration corresponding to scenes such as morning, noon, night, sea, mountain, and soccer.

[0069] Also, for example, when a voice of a speaker is detected, an icon (speaking face) indicating that the speaker is speaking or a speaker frame is changed, and if the voice of the speaker is no longer detected, Then, V, you can change it to an icon indicating that it is (listen, face, face) or speaker frame! By the way, as a speaker information, let's show the speaker by a simple video icon such as an animated GIF that is not just a still image icon. [0070] (Embodiment 8)

In video display apparatus 240 according to Embodiment 8 of the present invention, as shown in FIG. 23, audio processing section 231 decodes the audio stream output from multiplexing / demultiplexing section 113, and the decoded audio stream is sent to the speaker. Output to analysis unit 241.

[0071] The speaker analysis unit 241 detects a speaker from video and audio, and extracts an image of the speaker.

. Then, the extracted image is enlarged / reduced to a specified size to obtain speaker information. The speaker information is stored in the subtitle history storage unit 116 together with the subtitle information processed by the subtitle processing unit 115. It is assumed that the technology for detecting a speaker's video and audio power uses existing technology, for example, the technology described in Patent Document 1.

[0072] Accordingly, video, speaker information, and subtitles can be displayed simultaneously even in analog broadcasting that does not include speaker information.

[Embodiment 9]

In the video display device 250 according to Embodiment 9 of the present invention, as shown in FIG. 24, the voice recognition unit 251 converts the voice stream output from the voice processing unit 231 into voice information, thereby converting it into character information. Generate caption information. The generated caption information is stored in the caption history storage 1

Memorized in 16.

[0074] Thereby, even when the speaker information and the caption information are not included in the broadcast wave, caption display can be performed.

[0075] (Other embodiments)

As shown in Fig. 25A, the subtitle display order may be displayed in order (ascending order) from the top of the subtitle display area, or in order (descending order) from the bottom of the subtitle display area as shown in Fig. 25B. You can do it. At this time, in accordance with the display order of the subtitles, the size of the speaker frame is changed step by step to bright colors, plain colors, the subtitle text color is gradually reduced, and the font size is further increased. May be reduced step by step, or the display order may be numbered. Thereby, the user can recognize the display order of the subtitles without reading the subtitles. You can let the user set the display order (descending or ascending order) of the captions!

[0076] In the first aspect of the present invention, a subtitle display image in which subtitles and speaker information for allowing a viewer to recognize a subtitle speaker are associated is created, and the created subtitle display image and video are synthesized. table Display processing means, and display means for displaying an image synthesized by the display processing means.

[0077] According to this configuration, the subtitle and the speaker in the video can be associated with each other, so that even if the speaker is displayed on the video, the viewer can recognize the program contents in a silent state. It can be easily grasped.

[0078] According to a second aspect of the present invention, in the above aspect, there is provided a screen comprising speaker information acquisition means for acquiring speaker information, and speaker information storage means for storing the acquired speaker information. An image display device.

[0079] A third aspect of the present invention is the video display apparatus according to the above aspect, wherein the speaker information acquisition means acquires speaker information in advance before starting reception of a program.

[0080] According to these configurations, for example, even when the speaker information of a certain program is acquired in advance through the Internet, data broadcasting, or the like, and only the video of the program is reproduced, the acquired speaker is acquired. By using information, viewers can easily understand the contents of the program.

[0081] A fourth aspect of the present invention is the video display device according to the above aspect, wherein the speaker information acquisition means acquires speaker information together with reception of a program.

[0082] According to this configuration, it is possible to save the trouble of acquiring speaker information in advance, and to improve the convenience of the viewer.

[0083] According to a fifth aspect of the present invention, in the above aspect, the display processing unit is different from the subtitle display area in which the display processing unit displays one subtitle display area and the subtitle display area. The video display device is divided into a video display area to be displayed, video including a speaker is arranged in the video display area, and subtitles and speaker information corresponding to the speaker are arranged in the subtitle display area.

[0084] According to this configuration, since the subtitle display and the video display are displayed separately on the same screen, the video display and the subtitle display do not overlap, and the video or subtitle display cannot be seen. Can be prevented.

[0085] According to a sixth aspect of the present invention, in the above aspect, the display processing unit dynamically sets a speaker frame, which is a region for displaying a caption for each speaker, according to the presence or absence of the speaker. Distribute It is a video display device.

[0086] According to this configuration, by arranging a speaker frame when there is a speaker's utterance and by not arranging a speaker frame when there is no speaker's utterance, the video screen ratio and video display If the screen ratio of the device is different, the surplus area can be used effectively as a caption display area.

[0087] In a seventh aspect of the present invention, in the above aspect, the display processing means displays a subtitle of a second speaker different from the first speaker while displaying the subtitle of the first speaker. From the video display device, the subtitles of the first speaker are deleted.

[0088] According to this configuration, even when the first speaker finishes speaking, the subtitles of the first speaker are displayed, and even if multiple people speak at the same time, the viewer can change the content. It can be easily grasped.

[0089] In an eighth aspect of the present invention, in the above aspect, the display processing means is displayed when an upper limit of the number of speaker frames can be displayed and when a new speaker appears. This is a video display device in which a speaker frame is deleted and a speaker frame corresponding to a new speaker is arranged.

[0090] According to this configuration, even when there are many characters, the limited caption display area can be used effectively, and the viewer can easily grasp the contents.

[0091] In a ninth aspect of the present invention, in the above aspect, the display processing means is a plurality of types of speaker information that allows a viewer to recognize the same speaker acquired by the speaker information acquisition means. The video display device creates a caption display image based on display designation information indicating which speaker information is used.

According to this configuration, rich subtitle display can be performed by designating speaker information from among a plurality of types that allows the viewer to recognize the same speaker.

[0093] According to a tenth aspect of the present invention, in the above aspect, when the display processing means displays a speaker frame of a plurality of speakers and a speaker's speech continues, This is a video display device that deletes the speaker frame of the speaker and extends the speaker frame of any of the speakers to the deleted area.

[0094] According to this configuration, a speaker frame of a speaker having a continuous speech is replaced with another speaker frame being displayed. It is possible to display a long subtitle by deleting and expanding to the deleted area.

[0095] An eleventh aspect of the present invention includes, in the above aspect, an analysis unit that analyzes video or audio, and the display processing unit is a video that decorates subtitles based on an analysis result of the analysis unit. It is a display device.

[0096] According to this configuration, by visually displaying the result of analyzing the voice or video of the speaker, information other than characters can be displayed in subtitles, and the program content can be grasped more easily. That's right.

[0097] In a twelfth aspect of the present invention, in the above aspect, the display processing means associates the display order of the caption display information with the decoration of the caption, and installs the caption according to the display order of the caption display information. It is a video display device to decorate.

[0098] According to this configuration, it is possible to visually recognize the display order of the subtitles without reading the subtitles, so that the contents of the program can be easily grasped.

[0099] A thirteenth aspect of the present invention is a broadcast wave transmission device that transmits video, subtitles, and speaker information that allows a viewer to recognize a speaker of the subtitles as a broadcast wave, and a transmission from the broadcast wave transmission device. Receiving processing for receiving the broadcast wave generated, and creating a caption display image in which the caption contained in the received broadcast wave is associated with the speaker information, and combining the created caption display image and video And a video display device having display means for displaying the image synthesized by the display processing means.

[0100] According to this configuration, the subtitle and the speaker in the video can be associated with each other, so that even if the speaker is displayed on the video, the viewer can recognize the program content in a silent state. It can be easily grasped.

[0101] A fourteenth aspect of the present invention provides a recording device that records speaker information that allows a viewer to recognize a speaker of video, captions, and captions, and subtitles included in the information recorded in the recording device. A video having: a subtitle display image associated with speaker information; a display processing unit that combines the generated subtitle display image with the video; and a display unit that displays the image combined by the display processing unit. And a display device.

[0102] According to this configuration, the subtitles and the speakers in the video can be associated with each other. Therefore, even if the speakers are displayed on the video, the viewer can recognize them, and the program contents can be displayed in a silent state. easily I can grasp it.

[0103] In a fifteenth aspect of the present invention, an authentication processing means for performing an authentication process and a video display device authenticated by the authentication processing means allow a plurality of different speakers to recognize the same speaker. A subtitle display image created by creating a subtitle display image in which a subtitle is associated with! / Or a deviation of a plurality of speaker information transmitted from the authentication device; An authentication system comprising: a display processing unit configured to combine an image and a video; and a display unit configured to display an image combined by the display processing unit.

[0104] According to this configuration, since the video display device authenticated by the authentication device acquires a plurality of different speaker information that allows the viewer to recognize the same speaker, the authenticated video display device Rich subtitle display can be performed.

[0105] In the sixteenth aspect of the present invention, a subtitle display image in which subtitles and speaker information for allowing a viewer to recognize a subtitle speaker are associated is created, and the created subtitle display image and video are synthesized. A video display method comprising: a display processing step; and a display step for displaying an image synthesized in the display processing step.

[0106] According to this method, the subtitles and the speakers in the video can be associated with each other, so that even if the speakers are displayed on the video, the viewer can recognize them, and the program contents can be displayed in a silent state. It can be easily grasped.

[0107] This specification is based on Japanese Patent Application No. 2004-245734 filed on Aug. 25, 2004.

All this content is included here.

Industrial applicability

[0108] The video display device and video display method according to the present invention make it easy for the viewer to grasp the program contents even in a silent state! It can be applied to a mobile phone having a small effect and a small screen size.

Claims

The scope of the claims

[1] A display processing unit that generates a subtitle display image in which subtitles and speaker information that allows a viewer to recognize a subtitle speaker are associated, and combines the generated subtitle display image and video. The display processing unit Display means for displaying an image synthesized by

A video display device comprising:

[2] Speaker information acquisition means for acquiring speaker information;

Speaker information storage means for storing the acquired speaker information;

The video display device according to claim 1, further comprising:

[3] The video display device according to claim 2, wherein the speaker information acquisition means acquires speaker information in advance before starting reception of a program.

4. The video display device according to claim 2, wherein the speaker information acquisition means acquires speaker information together with reception of a program.

[5] The display processing unit divides one image area into a subtitle display area for displaying subtitles and a video display area for displaying video, which is different from the subtitle display area, and includes a speaker. The video display device according to claim 1, wherein video is arranged in the video display area, and subtitles and speaker information corresponding to the speaker are arranged in the subtitle display area.

6. The video display device according to claim 1, wherein the display processing means dynamically arranges a speaker frame, which is a region for displaying a caption for each speaker, according to the presence or absence of the speaker.

[7] The display processing means displays the subtitle of the second speaker different from the first speaker while displaying the subtitle of the first speaker, and then deletes the subtitle of the first speaker. The image display device according to claim 1.

[8] The display processing means displays the maximum number of speaker frames that can be displayed, and when a new speaker appears, deletes the displayed speaker frame and makes it a new speaker. The video display device according to claim 1, wherein a corresponding speaker frame is arranged.

[9] The display processing means uses V, out of a plurality of kinds of speaker information for allowing the viewer to recognize the same speaker acquired by the speaker information acquisition means. The video display device according to claim 1, wherein a subtitle display image is created based on display designation information indicating whether or not.

[10] The display processing means deletes a speaker frame of another speaker when a speaker's speech continues while displaying the speaker frames of a plurality of speakers, To the above

The video display device according to claim 1, wherein the speaker frame of V or any speaker is expanded.

[11] comprises analysis means for analyzing video or audio;

The video display device according to claim 1, wherein the display processing means decorates a caption based on an analysis result of the analysis means.

12. The video display device according to claim 1, wherein the display processing unit associates the display order of the caption display information with decoration of the caption, and decorates the caption according to the display order of the caption display information.

[13] A broadcast wave transmission device that transmits, as a broadcast wave, speaker information that allows viewers to recognize video, captions, and caption speakers.

Receiving means for receiving a broadcast wave sent from the broadcast wave sending device;

Display processing means for creating a caption display image in which the caption contained in the received broadcast wave is associated with the speaker information, and synthesizing the created caption display image and video;

Display means for displaying an image synthesized by the display processing means;

A video display device comprising:

A broadcasting system comprising:

[14] A recording device for recording video information, subtitles, and speaker information that allows viewers to recognize the speakers of the subtitles;

A display processing unit that generates a subtitle display image in which subtitles included in the information recorded in the recording device are associated with speaker information, and combines the generated subtitle display image and video; and the display processing unit Display means for displaying the captured image;

A video display device comprising:

A recording / playback system comprising:

[15] Authentication processing means for performing authentication processing;

Transmitting means for transmitting a plurality of different speaker information for allowing the viewer to recognize the same speaker to the video display device authenticated by the authentication processing means;

An authentication device having

A subtitle is associated with one of a plurality of speaker information transmitted from the authentication device A display processing unit that generates a caption display image, combines the generated caption display image and video, and a display unit that displays an image combined by the display processing unit;

A video display device comprising:

An authentication system comprising:

A subtitle display image in which subtitles and speaker information for allowing the viewer to recognize the subtitle speaker are associated with each other is generated, and the generated subtitle display image and the video are combined. A display process for displaying the image,

A video display method comprising: