KR20180122040A

KR20180122040A - Reception device and reception method

Info

Publication number: KR20180122040A
Application number: KR1020187031616A
Authority: KR
Inventors: 다케토시 야마네; 야스아키 야마기시
Original assignee: 소니 주식회사
Priority date: 2014-07-14
Filing date: 2015-07-01
Publication date: 2018-11-09
Also published as: ZA201608004B; US20200053412A1; RU2017100076A; KR102307330B1; EP3171610A4; JPWO2016009834A1; EP3171610A1; KR20170033273A; RU2017100076A3; US11197048B2; MX368686B; US10491934B2; MY188845A; SG11201700130VA; MX2017000281A; US20170134782A1; WO2016009834A1; BR112017000101A2; RU2686663C2; EP3171610B1

Abstract

본 기술은, 시각 장애자에 대한 엑세서빌리티를 향상시킬 수 있도록 하는 송신 장치, 송신 방법, 수신 장치, 및 수신 방법에 관한 것이다. 표시 정보에 대한 제작자가 의도하는 음성의 발화에 관한 음성 발화 메타데이터를 생성하는 음성 발화 메타데이터 생성부와, 음성 발화 메타데이터를 포함하는 전자 프로그램 정보를 생성하는 전자 프로그램 정보 생성부와, 표시 정보를 표시 가능한 수신 장치에 대하여, 전자 프로그램 정보를 송신하는 송신부를 구비하는 송신 장치가 제공된다. 본 기술은, 예를 들어 디지털 방송 신호를 송신 가능한 송신기에 적용할 수 있다.The present invention relates to a transmitting apparatus, a transmitting method, a receiving apparatus, and a receiving method that can improve the accessibility to a visually impaired person. An electronic program information generating unit for generating electronic program information including the speech utterance meta data; a display unit for displaying the display information There is provided a transmitting apparatus including a transmitting unit that transmits electronic program information to a receiving apparatus capable of displaying the electronic program information. This technique can be applied to, for example, a transmitter capable of transmitting a digital broadcast signal.

Description

{RECEPTION DEVICE AND RECEPTION METHOD}

본 기술은 송신 장치, 송신 방법, 수신 장치, 및 수신 방법에 관한 것이고, 특히, 시각 장애자에 대한 엑세서빌리티를 향상시킬 수 있도록 한 송신 장치, 송신 방법, 수신 장치, 및 수신 방법에 관한 것이다.The present invention relates to a transmitting apparatus, a transmitting method, a receiving apparatus, and a receiving method, and more particularly, to a transmitting apparatus, a transmitting method, a receiving apparatus, and a receiving method capable of improving the accessibility to a visually impaired person.

디지털 방송의 분야에서는, 시각 장애자에 대한 엑세서빌리티가 요구되고 있다(예를 들어, 특허문헌 1 참조).In the field of digital broadcasting, there is a demand for accessibility to the visually impaired (see, for example, Patent Document 1).

특히, 미국에서는, 소위 미국 엑세서빌리티법(CVAA: The 21st Century Communications and Video Accessibility Act of 2010)이 제정되고, 연방 통신 위원회(FCC: Federal Communications Commission)에 의해, 이 법률을 베이스로 한 영상 프로그램의 엑세서빌리티에 관한 다양한 규제가 발표되었다.Particularly in the United States, the so-called US 21st Century Communications and Video Accessibility Act (CVAA) was enacted and the Federal Communications Commission (FCC) The regulations on the excellence of the

일본 특허 공개 제2009-204711호 공보Japanese Patent Application Laid-Open No. 2009-204711

그런데, 시각 장애자에 대하여, 프로그램 정보 등의 유저 인터페이스(UI: User Interface)를 제시할 경우, 당해 프로그램 정보 등의 텍스트 정보를, TTS(Text To Speech) 엔진으로 소리 내어 읽음으로써, 엑세서빌리티를 향상시키는 것이 일반적이다.However, when presenting a user interface (UI) such as program information to a visually impaired person, text information such as program information is read aloud in a TTS (Text To Speech) engine, .

그러나, TTS 엔진으로는, 프로그램 정보 등의 제작자가 의도한 대로 텍스트 정보가 소리 내어 읽힌다고 단정할 수는 없어, 시각 장애자가 정상인과 동등한 정보를 얻을 수 있다는 보증이 없다. 그로 인해, 확실하게 제작자가 의도한 대로의 발화가 행해져서, 시각 장애자가 정상인과 동등한 정보를 얻을 수 있도록 하기 위한 기술이 요구되고 있었다.However, with the TTS engine, it can not be concluded that the text information is read aloud as intended by the creator of the program information, etc., and there is no guarantee that the visually impaired can obtain the same information as a normal person. Therefore, there has been a demand for a technique for making it possible for the visually impaired to obtain information equivalent to that of a normal person by making speech as intended by the creator without fail.

본 기술은 이러한 상황을 감안하여 이루어진 것으로, 확실하게 제작자가 의도한 대로의 발화가 행해지도록 함으로써, 시각 장애자에 대한 엑세서빌리티를 향상시킬 수 있도록 하는 것이다.The present invention has been made in view of the above circumstances, and it is intended to make it possible to improve the excellence of the visually handicapped by surely making the utterance as intended by the manufacturer.

본 기술의 제1 측면의 송신 장치는, 표시 정보에 대한 제작자가 의도하는 음성의 발화에 관한 메타데이터를 생성하는 메타데이터 생성부와, 상기 메타데이터를 포함하는 전자 프로그램 정보를 생성하는 전자 프로그램 정보 생성부와, 상기 표시 정보를 표시 가능한 수신 장치에 대하여 상기 전자 프로그램 정보를 송신하는 송신부를 구비하는 송신 장치이다.The transmitting apparatus according to the first aspect of the present invention includes a meta data generating unit for generating meta data on a speech uttered by a producer with respect to display information and electronic program information for generating electronic program information including the meta data And a transmitting unit for transmitting the electronic program information to a receiving apparatus capable of displaying the display information.

상기 메타데이터는, 읽는 법이 한가지로 정해지지 않은 문자열, 또는 발음이 난해한 문자열의 발화에 관한 정보를 포함하고 있도록 할 수 있다.The meta data may include information about a character string whose reading is not determined in one way, or information about a character string whose pronunciation is difficult.

상기 표시 정보는 콘텐츠에 관한 정보, 또는 아이콘을 포함하고 있도록 할 수 있다.The display information may include information on the content or an icon.

상기 콘텐츠를 취득하는 콘텐츠 취득부를 더 구비하고, 상기 송신부는 상기 전자 프로그램 정보를 상기 콘텐츠와 함께, 디지털 방송 신호로 송신하도록 할 수 있다.And a content acquiring unit that acquires the content, and the transmitting unit may transmit the electronic program information together with the content in a digital broadcast signal.

상기 전자 프로그램 정보는 OMA-BCAST(Open Mobile Alliance - Mobile Broadcast Services Enabler Suite)로 규정된 ESG(Electronic Service Guide)에 준거하고 있고, 상기 메타데이터는 SSML(Speech Synthesis Markup Language) 형식으로 기술되고, 상기 ESG를 구성하는 소정의 프래그먼트에, 상기 SSML 형식으로 기술된 상기 메타데이터 파일의 취득처를 나타내는 어드레스 정보, 또는 상기 SSML 형식으로 기술된 상기 메타데이터의 내용 그 자체가 포함되도록 할 수 있다.The electronic program information conforms to an Electronic Service Guide (ESG) defined by OMA-BCAST (Open Mobile Alliance - Mobile Broadcast Services Enabler Suite), and the metadata is described in SSML (Speech Synthesis Markup Language) The address information indicating the acquisition destination of the metadata file described in the SSML format or the contents of the metadata described in the SSML format itself may be included in a predetermined fragment constituting the ESG.

송신 장치는 독립된 장치여도 되고, 하나의 장치를 구성하고 있는 내부 블록이어도 된다.The transmitting apparatus may be an independent apparatus or an inner block constituting one apparatus.

본 기술의 제1 측면의 송신 방법은, 상술한 본 기술의 제1 측면의 송신 장치에 대응하는 송신 방법이다.The transmission method according to the first aspect of the present technology is a transmission method corresponding to the transmission device according to the first aspect of the present invention described above.

본 기술의 제1 측면의 송신 장치, 및 송신 방법에 있어서는, 표시 정보에 대한 제작자가 의도하는 음성의 발화에 관한 메타데이터가 생성되고, 상기 메타데이터를 포함하는 전자 프로그램 정보가 생성되며, 상기 표시 정보를 표시 가능한 수신 장치에 대하여 상기 전자 프로그램 정보가 송신된다.In the transmitting apparatus and the transmitting method according to the first aspect of the present invention, metadata relating to a speech utterance intended by a manufacturer for display information is generated, electronic program information including the metadata is generated, The electronic program information is transmitted to a receiving apparatus capable of displaying information.

본 기술의 제2 측면의 수신 장치는, 송신 장치로부터 송신되어 오는, 표시 정보에 대한 제작자가 의도하는 음성의 발화에 관한 메타데이터를 포함하는 전자 프로그램 정보를 수신하는 수신부와, 상기 전자 프로그램 정보에 포함되는 상기 메타데이터를 취득하는 메타데이터 취득부와, 상기 메타데이터에 기초하여, 상기 표시 정보를 소리 내어 읽는 음성 읽기부를 구비하는 수신 장치이다.A receiving apparatus according to a second aspect of the present invention includes a receiving unit that receives electronic program information transmitted from a transmitting apparatus and that contains meta data related to speech uttered by a manufacturer on the display information; And a voice reading unit for reading the display information aloud based on the meta data.

상기 수신부는 디지털 방송 신호로서, 상기 콘텐츠와 함께 송신되는 상기 전자 프로그램 정보를 수신하도록 할 수 있다.The receiving unit may receive, as a digital broadcast signal, the electronic program information transmitted together with the content.

상기 전자 프로그램 정보는 OMA-BCAST로 규정된 ESG에 준거하고 있고, 상기 메타데이터는 SSML 형식으로 기술되고, 상기 ESG를 구성하는 소정의 프래그먼트에, 상기 SSML 형식으로 기술된 상기 메타데이터 파일의 취득처를 나타내는 어드레스 정보, 또는 상기 SSML 형식으로 기술된 상기 메타데이터의 내용 그 자체가 포함되어 있고, 상기 메타데이터 취득부는, 상기 어드레스 정보에 따라 상기 메타데이터의 파일을 취득하거나, 또는 상기 프래그먼트로부터 상기 메타데이터를 취득하도록 할 수 있다.Wherein the electronic program information is in conformity with an ESG defined by OMA-BCAST, the metadata is described in an SSML format, and in a predetermined fragment constituting the ESG, the acquisition destination of the metadata file described in the SSML format Or the contents of the metadata described in the SSML format, and the metadata acquiring section acquires the file of the metadata in accordance with the address information, or acquires the file of the meta data from the fragment Data can be acquired.

수신 장치는 독립된 장치여도 되고, 하나의 장치를 구성하고 있는 내부 블록이어도 된다.The receiving apparatus may be an independent apparatus or an internal block constituting one apparatus.

본 기술의 제2 측면의 수신 방법은, 상술한 본 기술의 제2 측면의 수신 장치에 대응하는 수신 방법이다.The receiving method according to the second aspect of the present technology is a receiving method corresponding to the receiving device according to the second aspect of the present technique described above.

본 기술의 제2 측면의 수신 장치, 및 수신 방법에 있어서는, 송신 장치로부터 송신되어 오는, 표시 정보에 대한 제작자가 의도하는 음성의 발화에 관한 메타데이터를 포함하는 전자 프로그램 정보가 수신되고, 상기 전자 프로그램 정보에 포함되는 상기 메타데이터가 취득되며, 상기 메타데이터에 기초하여 상기 표시 정보가 소리 내어 읽힌다.In the receiving apparatus and the receiving method according to the second aspect of the present invention, electronic program information including meta data about a speech intended by a manufacturer for display information transmitted from a transmitting apparatus is received, The metadata included in the program information is acquired, and the display information is read aloud based on the metadata.

본 기술의 제1 측면, 및 제2 측면에 의하면, 시각 장애자에 대한 엑세서빌리티를 향상시킬 수 있다.According to the first aspect and the second aspect of the present invention, it is possible to improve the accessibility to a visually impaired person.

또한, 여기에 기재된 효과는 반드시 한정되는 것은 아니며, 본 개시 중에 기재된 어느 한 효과여도 된다.Further, the effects described herein are not necessarily limited, and any of the effects described in the present disclosure may be used.

도 1은 프로그램 정보나 타이틀을 소리 내어 읽는 예를 도시하는 도면이다.
도 2는 아이콘을 소리 내어 읽는 예를 도시하는 도면이다.
도 3은 종래의 TTS 엔진에 의한 텍스트 정보를 소리내어 읽는 예를 설명하는 도면이다.
도 4는 종래의 TTS 엔진에 의한 텍스트 정보를 소리내어 읽는 예를 설명하는 도면이다.
도 5는 본 기술을 적용한 TTS 엔진에 의한 텍스트 정보를 소리내어 읽는 예를 설명하는 도면이다.
도 6은 본 기술을 적용한 TTS 엔진에 의한 텍스트 정보를 소리내어 읽는 예를 설명하는 도면이다.
도 7은 본 기술을 적용한 방송 시스템의 구성예를 도시하는 도면이다.
도 8은 본 기술을 적용한 송신 장치의 구성예를 도시하는 도면이다.
도 9는 본 기술을 적용한 수신 장치의 구성예를 도시하는 도면이다.
도 10은 ESG의 구조의 예를 도시하는 도면이다.
도 11은 ESG의 서비스 프래그먼트의 구성예를 도시하는 도면이다.
도 12는 ESG의 콘텐츠 프래그먼트의 구성예를 도시하는 도면이다.
도 13은 확장 ESG의 구성예를 도시하는 도면이다.
도 14는 확장 ESG의 다른 구성예를 도시하는 도면이다.
도 15는 PhoneticInfoURI 요소의 상세한 구성을 도시하는 도면이다.
도 16은 PhoneticInfo 요소의 상세한 구성을 도시하는 도면이다.
도 17은 SSML 형식에 있어서의 sub 요소의 기술예를 도시하는 도면이다.
도 18은 SSML 형식에 있어서의 phoneme 요소의 기술예를 도시하는 도면이다.
도 19는 SSML 형식에 있어서의 audio 요소의 기술예를 도시하는 도면이다.
도 20은 송신 처리를 설명하는 흐름도이다.
도 21은 수신 처리를 설명하는 흐름도이다.
도 22는 컴퓨터의 구성예를 도시하는 도면이다.1 is a diagram showing an example of reading program information and titles out loud.
Fig. 2 is a diagram showing an example of reading an icon aloud.
3 is a view for explaining an example of reading text information by a conventional TTS engine.
4 is a diagram for explaining an example of reading text information by a conventional TTS engine.
5 is a view for explaining an example of reading text information by a TTS engine to which the present technology is applied.
6 is a diagram for explaining an example of reading text information by the TTS engine to which the present technology is applied.
7 is a diagram showing a configuration example of a broadcasting system to which the present technology is applied.
8 is a diagram showing a configuration example of a transmission apparatus to which the present technology is applied.
9 is a diagram showing a configuration example of a reception apparatus to which the present technology is applied.
10 is a diagram showing an example of the structure of the ESG.
11 is a diagram showing a configuration example of a service fragment of an ESG.
12 is a diagram showing a configuration example of a content fragment of an ESG.
13 is a diagram showing an example of the configuration of an extended ESG.
14 is a diagram showing another example of the configuration of the extended ESG.
15 is a diagram showing a detailed configuration of the PhoneticInfoURI element.
16 is a diagram showing a detailed configuration of the PhoneticInfo element.
17 is a diagram showing an example of description of a sub element in the SSML format.
18 is a diagram showing an example of description of a phoneme element in the SSML format.
19 is a diagram showing an example of description of an audio element in the SSML format.
20 is a flowchart for explaining transmission processing.
21 is a flowchart for explaining the reception processing.
22 is a diagram showing a configuration example of a computer.

이하, 도면을 참조하면서 본 기술의 실시 형태에 대하여 설명한다. 또한, 설명은 이하의 순서로 행하기로 한다.Hereinafter, embodiments of the present technology will be described with reference to the drawings. The description will be made in the following order.

1. 본 기술의 음성 발화 메타데이터의 개요1. Outline of speech spoken metadata of this technology

2. 시스템의 구성2. Configuration of the system

3. ESG의 확장에 의한 음성 발화 메타데이터의 배치3. Placement of speech utterance metadata by extension of ESG

4. 음성 발화 메타데이터의 기술예4. Description example of speech utterance metadata

5. 각 장치에서 실행되는 처리의 흐름5. Flow of processing executed on each device

6. 변형예6. Variations

7. 컴퓨터의 구성7. Computer Configuration

<1. 본 기술의 음성 발화 메타데이터의 개요><1. Outline of speech spoken metadata of this technology>

미국 엑세서빌리티법에 관계되는 연방 통신 위원회의 규제 중, 유저 인터페이스에 관한 규제(FCC Report & Order(FCC 13-138) released October 31, 2013. C.F.R. Title 47 §79.107, 108)에서는, 텔레비전 수상기 등의 영상 프로그램을 표시 가능한 수신기에 대하여, 그 유저 인터페이스를, 시각 장애자에 대하여 엑세서블하게 할 것이 요구되고 있다.(FCC 13-138) released October 31, 2013. CFR Title 47 §79.107, 108) of the Federal Communications Commission Regulations relating to the US Accessibility Act, It is required that the user interface of the receiver capable of displaying the video program of the present invention be accessible to the visually handicapped.

구체적으로는, 도 1에 도시하는 바와 같이, 전자 서비스 가이드(ESG: Electronic Service Guide)의 화면(ESG 화면)이 표시된 경우에, 예를 들어, 타이틀이나 프로그램 정보 등을 소리 내어 읽음으로, 시각 장애자에 대하여 방송 프로그램의 선국에 필요한 정보를 제공할 수 있다.More specifically, as shown in FIG. 1, when a screen (ESG screen) of an electronic service guide (ESG) is displayed, for example, by reading a title or program information, To provide information necessary for selecting a broadcast program.

또한, 도 2에 도시하는 바와 같이, 메뉴 화면이 표시된 경우에는, 예를 들어, 각종 서비스를 나타낸 아이콘에 대해서, 그 서비스의 내용을 소리 내어 읽음으로써, 시각 장애자에 대하여 서비스의 내용을 제공할 수 있다.As shown in Fig. 2, when a menu screen is displayed, for example, by reading the contents of the service with respect to an icon indicating various services, contents of the service can be provided to the visually impaired have.

이와 같이, 수신기에 표시되는 유저 인터페이스에 관한 정보를 소리 내어 읽음으로써, 시각 장애자는, 당해 유저 인터페이스에 관한 정보를 입수하고, 수신기에 대한 각종 조작을 행하는 것이 가능하게 된다.In this manner, by visually reading information about the user interface displayed on the receiver, the blind person can obtain information on the user interface and perform various operations on the receiver.

그런데, 방송 프로그램(프로그램)을 선국하기 위한 채널 정보나 프로그램 정보는, 방송국 등의 송신기로부터 수신기에 대하여 ESG 정보로서 제공되지만, 이 ESG 정보는, 주로 텍스트 정보나 로고 데이터 등을 포함한다. 그리고, 수신기에서는, ESG 정보에 기초하여, 방송 프로그램을 선국하기 위한 ESG 화면을 생성하여 표시하게 된다.Channel information or program information for selecting a broadcast program (program) is provided as ESG information to a receiver from a transmitter such as a broadcasting station, but this ESG information mainly includes text information, logo data, and the like. Then, based on the ESG information, the receiver generates and displays an ESG screen for selecting a broadcast program.

상술한 바와 같이, ESG 화면을 표시할 때에는, 그 유저 인터페이스를, 시각 장애자에 대하여 엑세서블하게 할 것이 요구되고, 예를 들어 타이틀이나 프로그램 정보 등을 소리 내어 읽게 되지만, 여기에서는, 타이틀이나 프로그램 정보 등의 텍스트 정보를, TTS(Text To Speech) 엔진으로 소리 내어 읽는 것이 일반적이다. 이 TTS 엔진은, 텍스트 정보로부터, 인간의 음성을 인공적으로 만들어 낼 수 있는 음성 합성기(Text To Speech Synthesizer)이다.As described above, when the ESG screen is displayed, the user interface is required to be accessible to the visually impaired. For example, titles and program information are read out aloud. Here, Text information such as information is generally read aloud by a TTS (Text To Speech) engine. This TTS engine is a speech synthesizer (Text To Speech Synthesizer) capable of artificially generating human speech from text information.

그러나, TTS 엔진으로는, 유저 인터페이스의 제작자가 의도한 대로 텍스트 정보가 소리 내어 읽힌다고 단정할 수는 없어, 시각 장애자가 정상인과 동등한 정보를 얻을 수 있다는 보증은 없다.However, the TTS engine does not guarantee that text information is read aloud as intended by the user of the user interface, and there is no guarantee that the visually impaired can obtain the same information as a normal person.

구체적으로는, 도 3에 도시하는 바와 같이, 예를 들어, "AAA"인 텍스트 정보는, "triple A" 또는 "A A A"라고 읽을 수 있기 때문에, 그 읽는 법이 한가지로 정해지지 않으므로, TTS 엔진에서는, 어떻게 소리 내어 읽어야 좋은지를 판단할 수 없고, 결과적으로 제작자가 의도한 대로 텍스트 정보가 소리 내어 읽히지 않을 가능성이 생긴다.Specifically, as shown in FIG. 3, for example, since the text information "AAA" can be read as "triple A" or "AAA", the reading method is not determined as one, , It is impossible to judge how to read aloud, and as a result, there is a possibility that the text information may not be read aloud as intended by the author.

또한, 도 4에 도시하는 바와 같이, 예를 들어, "Caius College"인 텍스트 정보는, 그 발음이 난해한 고유 명사 등이기 때문에, TTS 엔진에서는, 어떻게 소리 내어 읽어야 좋을지 판단할 수 없고, 제작자가 의도한 대로 텍스트 정보가 소리 내어 읽히지 않을 가능성이 있다.Further, as shown in Fig. 4, for example, since the text information of " Caius College " is a proper noun with difficulty in pronunciation, etc., the TTS engine can not determine how to read out aloud, There is a possibility that the text information may not be read aloud as it is.

이와 같이, 텍스트 정보의 읽는 법이 한가지로 정해지지 않은 경우나, 발음이 난해한 고유 명사 등인 경우 등에는, 제작자가 의도한 대로 텍스트 정보가 소리 내어 읽히지 않을 가능성이 있기 때문에, 확실하게 제작자가 의도한 대로의 발화가 행해져서, 시각 장애자가 정상인과 동등한 정보를 얻을 수 있도록 하는 기술이 요구되고 있었다.As described above, in the case where the method of reading text information is not determined to be one, or in the case of a proper noun with difficult pronunciation, etc., there is a possibility that the text information may not be read aloud as intended by the maker. So that the visually impaired can obtain information equivalent to that of a normal person.

따라서, 본 기술에서는, 확실하게 유저 인터페이스 등의 표시 정보에 대한 제작자가 의도한 대로의 음성의 발화가 행해지도록 하기 위해서, 제작자가 의도하는 음성의 발화에 관한 정보(이하, 「음성 발화 메타데이터」라고 함)를 TTS 엔진에 제공하고, 당해 TTS 엔진이, 제작자가 의도하는 음성을 발화할 수 있도록 한다. 또한, 당해 음성 발화 메타데이터는, ESG 정보에 포함하여 제공할 수 있다.Therefore, in the present technology, in order to surely carry out speech utterance as intended by the maker of the display information such as the user interface, information about speech utterance intended by the maker (hereinafter, " speech utterance metadata " ) To the TTS engine so that the TTS engine can speak the intended audio of the producer. In addition, the speech utterance metadata can be provided in the ESG information.

구체적으로는, 도 5에 도시하는 바와 같이, 예를 들어, "AAA"인 텍스트 정보에 대해서, 그 음성의 읽는 법을 나타낸 "triple A"를, 음성 발화 메타데이터로서 TTS 엔진에 제공되도록 함으로써, 당해 TTS 엔진은, 음성 발화 메타데이터에 기초하여, "triple A"라고 소리 내어 읽을 수 있다.Specifically, as shown in FIG. 5, for example, "triple A" indicating the method of reading the voice is provided to the TTS engine as voice utterance meta data for the text information "AAA" The TTS engine can read aloud "triple A" based on the speech utterance metadata.

즉, 도 3에 있어서, "AAA"인 텍스트 정보를 입력한 경우, TTS 엔진은, "triple A"와, "A A A" 중 어느 쪽으로 소리 내어 읽는 것이 올바른지를 판단할 수 없었지만, 도 5에 있어서는, 음성 발화 메타데이터로서의 "triple A"를 입력함으로써, TTS 엔진은, 음성 발화 메타데이터에 따라, "triple A"를 소리 내어 읽을 수 있으므로, 제작자가 의도하는 음성이 발화되게 된다.3, when the text information of "AAA" is input, the TTS engine can not determine which of the "triple A" and "AAA" is aloud to read out. In FIG. 5, By inputting " triple A " as the voice utterance metadata, the TTS engine can read out " triple A " in accordance with the voice utterance metadata, so that the voice intended by the producer is uttered.

또한, 도 6에 도시하는 바와 같이, 예를 들어, "Caius College"인 텍스트 정보에 대해서, 그 음소 정보를, 음성 발화 메타데이터로서 TTS 엔진에 제공되도록 함으로써, 당해 TTS 엔진은 음성 발화 메타데이터에 기초하여, "keys college"라고 소리 내어 읽을 수 있다.As shown in FIG. 6, for example, with respect to the text information "Caius College", by providing the phoneme information to the TTS engine as the speech utterance meta data, the TTS engine outputs the speech utterance metadata Based on this, you can read aloud as "keys college".

즉, 도 4에 있어서, "Caius College"라는 텍스트 정보를 입력한 경우, TTS 엔진은, 그 발음이 난해한 고유 명사 등이기 때문에, 어떻게 소리 내어 읽는 것이 올바른지를 판단할 수 없었지만, 도 6에 있어서는, 음성 발화 메타데이터로서의 음소 정보를 입력함으로써, TTS 엔진은, 음성 발화 메타데이터에 따라, "keys college"라고 소리 내어 읽을 수 있으므로, 제작자가 의도하는 음성이 발화되게 된다.In other words, in FIG. 4, when the text information "Caius College" is input, the TTS engine can not determine how to read out aloud because it is a proper noun with difficulty in pronunciation. In FIG. 6, By inputting the phoneme information as the voice utterance metadata, the TTS engine can read out "keys college" in accordance with the voice utterance meta data, so that the voice intended by the author is uttered.

이와 같이, 음성 발화 메타데이터를 TTS 엔진에 제공함으로써, 예를 들어, 텍스트 정보의 읽는 법이 한가지로 정해지지 않은 경우나, 발음이 난해한 고유 명사 등인 경우 등이더라도, 확실하게 제작자가 의도한 대로 텍스트 정보가 소리 내어 읽히기 때문에, 시각 장애자가 정상인과 동등한 정보를 얻을 수 있게 된다.Thus, by providing the speech utterance metadata to the TTS engine, even if the method of reading the text information is not determined to be one, or if it is a proper noun with difficult pronunciation, etc., Is read aloud, so that a visually impaired person can obtain the same information as a normal person.

<2. 시스템의 구성><2. System Configuration>

(방송 시스템의 구성예)(Configuration Example of Broadcasting System)

도 7은 본 기술을 적용한 방송 시스템의 구성예를 도시하는 도면이다.7 is a diagram showing a configuration example of a broadcasting system to which the present technology is applied.

방송 시스템(1)은 방송 프로그램 등의 콘텐츠를 제공함과 함께, 유저 인터페이스 등의 표시 정보를, 시각 장애자에 대하여 엑세서블로 하는 것이 가능한 시스템이다. 방송 시스템(1)은 송신 장치(10)와, 수신 장치(20)로 구성된다.The broadcasting system 1 is a system capable of providing content such as a broadcast program and externally displaying information such as a user interface to the visually impaired. The broadcasting system 1 is composed of a transmitting apparatus 10 and a receiving apparatus 20.

송신 장치(10)는, 예를 들어 지상 디지털 방송 서비스를 제공하는 방송국에 의해 운영된다. 송신 장치(10)는 방송 프로그램 등의 콘텐츠를, 디지털 방송 신호에 의해 송신한다. 또한, 송신 장치(10)는 음성 발화 메타데이터를 포함하는 ESG 정보를 생성하고, 디지털 방송 신호에 포함하여 송신한다.The transmitting apparatus 10 is operated, for example, by a broadcasting station that provides terrestrial digital broadcasting service. The transmitting apparatus 10 transmits a content such as a broadcasting program by using a digital broadcasting signal. Further, the transmitting apparatus 10 generates ESG information including the voice utterance meta data, and transmits the ESG information in the digital broadcasting signal.

수신 장치(20)는, 예를 들어 텔레비전 수상기나 셋톱 박스 등으로 구성되며, 유저의 각 가정 등에 설치된다. 수신 장치(20)는 송신 장치(10)로부터 송신되어 오는 디지털 방송 신호를 수신하고, 방송 프로그램 등의 콘텐츠의 영상이나 음성을 출력한다.The receiving apparatus 20 is constituted by, for example, a television receiver or a set-top box, and is installed in each home or the like of the user. The receiving apparatus 20 receives a digital broadcasting signal transmitted from the transmitting apparatus 10, and outputs a video or audio of a content such as a broadcasting program.

또한, 수신 장치(20)는 TTS 엔진을 갖고 있으며, 유저 인터페이스 등의 표시 정보를 표시하는 경우에, ESG 정보에 포함되는 음성 발화 메타데이터에 기초하여, 유저 인터페이스 등의 표시 정보를 소리 내어 읽는다.In addition, the receiving apparatus 20 has a TTS engine and, when displaying display information such as a user interface, reads out display information such as a user interface on the basis of the speech utterance meta data included in the ESG information.

여기에서는, TTS 엔진이, 음성 발화 메타데이터에 따라, 텍스트 정보 등의 표시 정보를 소리 내어 읽게 되므로, 예를 들어, 텍스트 정보의 읽는 법이 한가지로 정해지지 않은 경우나, 발음이 난해한 고유 명사 등인 경우 등이더라도, 확실하게 제작자가 의도한 대로 소리 내어 읽히게 된다.Here, since the TTS engine reads the display information such as the text information aloud according to the speech utterance meta data, for example, when the method of reading the text information is not determined as one, or when it is a proper noun with difficult pronunciation It is surely read aloud as the author intended.

또한, 도 7의 방송 시스템(1)에서는, 1대의 송신 장치(10)만을 도시하고 있지만, 실제로는 복수의 방송국마다 송신 장치(10)가 설치된다. 마찬가지로, 도 7의 방송 시스템(1)에서는, 1대의 수신 장치(20)만을 도시하고 있지만, 실제로는 복수의 유저의 가정마다, 수신 장치(20)가 설치되어 있다.Although only one transmitting apparatus 10 is shown in the broadcasting system 1 of Fig. 7, the transmitting apparatus 10 is actually provided for each of a plurality of broadcasting stations. Similarly, although only one receiving apparatus 20 is shown in the broadcasting system 1 of Fig. 7, the receiving apparatus 20 is actually provided for each household of a plurality of users.

(송신 장치의 구성예)(Configuration Example of Transmitting Apparatus)

도 8은 도 7의 송신 장치의 구성예를 도시하는 도면이다.8 is a diagram showing a configuration example of the transmitting apparatus in Fig.

도 8에 있어서, 송신 장치(10)는 콘텐츠 취득부(111), 음성 발화 메타데이터 생성부(112), ESG 정보 생성부(113), 스트림 생성부(114), 및 송신부(115)로 구성된다.8, the transmission apparatus 10 includes a content acquisition unit 111, a speech utterance metadata generation unit 112, an ESG information generation unit 113, a stream generation unit 114, and a transmission unit 115 do.

콘텐츠 취득부(111)는 방송 프로그램 등의 콘텐츠를 취득하고, 스트림 생성부(114)에 공급한다. 또한, 콘텐츠 취득부(111)는 콘텐츠에 대하여 예를 들어 인코드나 포맷 형식의 변환 처리 등을 실행할 수 있다.The content acquisition unit 111 acquires a content such as a broadcast program and supplies the acquired content to the stream generation unit 114. [ Further, the content acquisition unit 111 can execute, for example, encode and format conversion processing on the content.

또한, 콘텐츠로서는, 예를 들어, 이미 수록된 콘텐츠의 보관 장소로부터, 방송 시간대에 따라서 해당하는 콘텐츠가 취득되거나, 또는 스튜디오나 로케이션 장소로부터 라이브의 콘텐츠가 취득되거나 한다.As the content, for example, the corresponding content is acquired from the storage place of the already recorded content according to the broadcasting time zone, or the live content is acquired from the studio or the location.

음성 발화 메타데이터 생성부(112)는, 예를 들어 유저 인터페이스의 제작자로부터의 지시에 따라, 음성 발화 메타데이터를 생성하고, ESG 정보 생성부(113)에 공급한다. 또한, 음성 발화 메타 데이터로서는, 예를 들어, 텍스트 정보의 읽는 법이 한가지로 정해지지 않은 경우에 그 음성의 읽는 법을 나타낸 정보나, 발음이 난해한 고유 명사 등일 경우에 그 음소 정보가 생성된다.The speech utterance meta data generation unit 112 generates speech utterance meta data according to an instruction from the manufacturer of the user interface, for example, and supplies it to the ESG information generation unit 113. [ As the speech utterance meta data, for example, when the method of reading text information is not determined in one way, the phonemic information is generated in the case of the information indicating the reading method of the voice, or in the case of a proper noun with difficult pronunciation.

여기서, ESG 정보에 저장되는 음성 발화 메타 데이터로서는, 음성 발화 메타데이터를 취득하기 위한 어드레스 정보를 기술한 것과, 음성 발화 메타데이터의 내용 그 자체를 기술한 것의 2종류가 존재한다. 그리고, 음성 발화 메타데이터에, 어드레스 정보를 기술했을 경우에는, 음성 발화 메타데이터의 내용은, 당해 어드레스 정보에 따라 취득되는 파일(이하, 「음성 발화 메타데이터 파일」이라고 함)에 기술되어 있게 된다.Here, as the speech utterance meta data stored in the ESG information, there are two types, that is, the description of the address information for acquiring the speech utterance meta data and the description of the contents of the speech utterance meta data itself. When the address information is described in the speech utterance meta data, the contents of the speech utterance meta data are described in a file (hereinafter referred to as "speech utterance metadata file") acquired in accordance with the address information .

즉, 음성 발화 메타데이터 생성부(112)는, 어드레스 정보를 포함하는 음성 발화 메타데이터를 생성하여, ESG 정보 생성부(113)에 공급한 경우에는, 당해 어드레스 정보에 따라 취득되는 음성 발화 메타데이터 파일을 생성하여, 스트림 생성부(114)에 공급하게 된다. 한편, 음성 발화 메타데이터가 그 내용을 포함하고 있을 경우, 음성 발화 메타데이터 생성부(112)는 음성 발화 메타데이터 파일을 생성할 필요는 없기 때문에, 당해 음성 발화 메타데이터만을, ESG 정보 생성부(113)에 공급하게 된다.That is, the speech utterance meta data generation unit 112 generates speech utterance meta data including address information, and supplies the speech utterance meta data to the ESG information generation unit 113, And supplies the generated file to the stream generating unit 114. On the other hand, when the speech utterance metadata includes the contents, the speech utterance metadata generation unit 112 does not need to generate the speech utterance meta data file. Therefore, only the speech utterance meta data is transmitted to the ESG information generation unit 113).

ESG 정보 생성부(113)는 방송 프로그램 등의 콘텐츠를 선국하기 위한 채널 정보로서, ESG 정보를 생성한다. 또한, ESG 정보 생성부(113)는, 음성 발화 메타데이터 생성부(112)로부터 공급되는 음성 발화 메타데이터를, ESG 정보에 저장(배치)한다. ESG 정보 생성부(113)는 음성 발화 메타데이터를 포함하는 ESG 정보를 스트림 생성부(114)에 공급한다.The ESG information generating unit 113 generates ESG information as channel information for selecting a content such as a broadcast program. The ESG information generation unit 113 stores (arranges) the speech spoken meta data supplied from the speech spoken meta data generation unit 112 in the ESG information. The ESG information generation unit 113 supplies the ESG information including the speech utterance meta data to the stream generation unit 114. [

스트림 생성부(114)는 콘텐츠 취득부(111)로부터 공급되는 콘텐츠 데이터와, ESG 정보 생성부(113)로부터 공급되는 ESG 정보에 기초하여, 소정의 규격에 준거한 스트림을 생성하고, 송신부(115)에 공급한다.The stream generating unit 114 generates a stream conforming to a predetermined standard based on the content data supplied from the content acquiring unit 111 and the ESG information supplied from the ESG information generating unit 113, .

또한, ESG 정보 생성부(113)로부터 공급되는 ESG 정보에 포함되는 음성 발화 메타데이터가 어드레스 정보를 포함하고 있을 경우, 스트림 생성부(114)에는, 음성 발화 메타데이터 생성부(112)로부터 음성 발화 메타데이터 파일이 공급된다. 이 경우, 스트림 생성부(114)는 콘텐츠 취득부(111)로부터 공급되는 콘텐츠 데이터와, 음성 발화 메타데이터 생성부(112)로부터 공급되는 음성 발화 메타데이터 파일과, ESG 정보 생성부(113)로부터 공급되는 ESG 정보에 기초하여, 소정의 규격에 준거한 스트림을 생성한다.When the speech utterance meta data included in the ESG information supplied from the ESG information generation section 113 includes address information, the stream generation section 114 receives speech speech data from the speech utterance metadata generation section 112 A metadata file is supplied. In this case, the stream generating unit 114 generates content data supplied from the content acquiring unit 111, a speech utterance meta data file supplied from the speech utterance meta data generating unit 112, Based on the supplied ESG information, a stream conforming to a predetermined standard is generated.

송신부(115)는 스트림 생성부(114)로부터 공급되는 스트림에 대하여, 예를 들어 디지털 변조 등의 처리를 실시하고, 안테나(116)를 통하여 디지털 방송 신호로서 송신한다.The transmitting unit 115 performs processing such as digital modulation on the stream supplied from the stream generating unit 114 and transmits it as a digital broadcasting signal through the antenna 116. [

또한, 도 8의 송신 장치(10)에 있어서는, 모든 기능 블록이, 단일 장치 내에 배치될 필요는 없고, 적어도 일부의 기능 블록이 다른 기능 블록과는 독립된 장치로서 구성되도록 해도 된다. 예를 들어, 음성 발화 메타데이터 생성부(112)나 ESG 정보 생성부(113)는 인터넷상의 서버의 기능으로서 제공되도록 해도 된다. 그 경우, 송신 장치(10)는 당해 서버로부터 제공되는 음성 발화 메타데이터나 ESG 정보를 취득하여 처리하게 된다.In the transmitting apparatus 10 of Fig. 8, not all of the functional blocks need be arranged in a single device, and at least some functional blocks may be configured as devices independent of other functional blocks. For example, the speech utterance metadata generation unit 112 or the ESG information generation unit 113 may be provided as a function of a server on the Internet. In this case, the transmitting apparatus 10 acquires and processes the speech utterance meta data and the ESG information provided from the server.

(수신 장치의 구성예)(Configuration example of receiving apparatus)

도 9는 도 7의 수신 장치의 구성예를 도시하는 도면이다.FIG. 9 is a diagram showing a configuration example of the reception apparatus of FIG.

도 9에 있어서, 수신 장치(20)는 수신부(212), 스트림 분리부(213), 재생부(214), 표시부(215), 스피커(216), ESG 정보 취득부(217), 음성 발화 메타데이터 취득부(218), 및 TTS 엔진(219)으로 구성된다.9, the receiving apparatus 20 includes a receiving unit 212, a stream separating unit 213, a reproducing unit 214, a display unit 215, a speaker 216, an ESG information obtaining unit 217, A data acquisition unit 218, and a TTS engine 219.

수신부(212)는 안테나(211)로 수신된 디지털 방송 신호에 대하여 복조 처리 등을 행하고, 그것에 의해 얻어지는 스트림을 스트림 분리부(213)에 공급한다.The receiver 212 demodulates the digital broadcast signal received by the antenna 211, and supplies the stream obtained by the demultiplexing to the stream separator 213.

스트림 분리부(213)는 수신부(212)로부터 공급되는 스트림으로부터, 콘텐츠 데이터와 ESG 정보를 분리하여, 콘텐츠 데이터를 재생부(214)에, ESG 정보를 ESG 정보 취득부(217)에 각각 공급한다.The stream separating unit 213 separates the content data and the ESG information from the stream supplied from the receiving unit 212 and supplies the ESG information to the playback unit 214 and the ESG information respectively to the ESG information acquisition unit 217 .

재생부(214)는 스트림 분리부(213)로부터 공급되는 콘텐츠 데이터에 기초하여, 콘텐츠의 영상을 표시부(215)에 표시시킴과 함께, 콘텐츠의 음성을 스피커(216)로부터 출력시킨다. 이에 의해, 방송 프로그램 등의 콘텐츠의 재생이 행해진다.The playback unit 214 displays the video of the content on the display unit 215 and outputs the audio of the content from the speaker 216 based on the content data supplied from the stream separation unit 213. [ Thereby, the content such as a broadcast program is reproduced.

ESG 정보 취득부(217)는, 스트림 분리부(213)로부터 공급되는 ESG 정보를 취득한다. ESG 정보 취득부(217)는, 예를 들어 유저에 의해 ESG 화면의 표시가 지시된 경우, ESG 정보를 재생부(214)에 공급한다. 재생부(214)는 ESG 정보 취득부(217)로부터 공급되는 ESG 정보에 기초하여 ESG 화면을 생성하고, 표시부(215)에 표시시킨다.The ESG information acquisition unit 217 acquires the ESG information supplied from the stream separation unit 213. [ The ESG information acquisition unit 217 supplies the ESG information to the playback unit 214 when, for example, a user instructs display of the ESG screen. The playback unit 214 generates an ESG screen based on the ESG information supplied from the ESG information acquisition unit 217, and causes the display unit 215 to display the ESG screen.

또한, ESG 정보 취득부(217)는 ESG 정보에 포함되는 음성 발화 메타데이터를, 음성 발화 메타데이터 취득부(218)에 공급한다. 음성 발화 메타데이터 취득부(218)는 ESG 정보 취득부(217)로부터 공급되는 음성 발화 메타데이터를 취득한다.The ESG information acquisition unit 217 supplies the speech spoken meta data included in the ESG information to the speech spoken meta data acquisition unit 218. [ The speech utterance meta data acquisition unit 218 acquires speech utterance meta data supplied from the ESG information acquisition unit 217. [

여기서, 음성 발화 메타데이터에는, 음성 발화 메타데이터를 취득하기 위한 어드레스 정보를 기술한 것과, 음성 발화 메타데이터의 내용 그 자체를 기술한 것의 2종류가 존재하는 것은, 상술한 바와 같다.Here, two kinds of the speech utterance meta data, that is, the description of the address information for obtaining the speech utterance meta data and the description of the contents of the speech utterance meta data, are as described above.

즉, 음성 발화 메타데이터 취득부(218)는, 음성 발화 메타데이터에 어드레스 정보가 포함되어 있는 경우, 당해 어드레스 정보에 기초하여, 스트림 분리부(213)에 의해 분리되는 스트림으로부터, 음성 발화 메타데이터 파일을 취득하고, 거기에서 얻어지는 내용을 포함하고 있는 음성 발화 메타데이터를 TTS 엔진(219)에 공급한다. 한편, 음성 발화 메타데이터 취득부(218)는 음성 발화 메타데이터가 그 내용을 포함하고 있을 경우에는, 당해 음성 발화 메타데이터를 그대로 TTS 엔진(219)에 공급한다.That is, when address information is included in the speech utterance meta data, the speech utterance meta data acquisition section 218 acquires, from the stream separated by the stream separation section 213, speech speech metadata And supplies the speech utterance metadata including the content obtained there to the TTS engine 219. [ On the other hand, when the speech utterance metadata includes the contents, the speech utterance metadata acquisition unit 218 supplies the speech utterance metadata to the TTS engine 219 as it is.

TTS 엔진(219)은 음성 발화 메타데이터 취득부(218)로부터 공급되는 음성 발화 메타데이터에 기초하여, 유저 인터페이스 등의 표시 정보를 소리 내어 읽고, 그 음성을 스피커(216)로부터 출력한다.The TTS engine 219 reads out display information such as a user interface and outputs the voice from the speaker 216 based on the voice utterance meta data supplied from the voice utterance meta data acquiring section 218. [

여기에서는, 예를 들어, ESG 화면이 표시부(215)에 표시되어 있는 경우에 있어서, 시각 장애자에 대하여 엑세서블하게 하기 위해, 타이틀이나 프로그램 정보 등을 소리 내어 읽을 때에 있어, 텍스트 정보의 읽는 법이 한가지로 정해지지 않을 때 등에, TTS 엔진(219)은 음성 발화 메타데이터에 따라, 텍스트 정보가, 제작자의 의도한 대로 소리 내어 읽히게 한다. 이에 의해, 시각 장애자가 정상인과 동등한 정보를 얻을 수 있게 된다.Here, for example, in the case where the ESG screen is displayed on the display unit 215, in order to enable the visually impaired to access the title, program information, etc., The TTS engine 219 causes the text information to be read aloud as intended by the manufacturer in accordance with the speech utterance meta data. This makes it possible for the blind person to obtain information equivalent to that of a normal person.

또한, 도 9의 수신 장치(20)에 있어서는, 표시부(215)와 스피커(216)가 내부에 설치되어 있는 구성을 도시했지만, 표시부(215)와 스피커(216)는 외부의 다른 장치로서 설치되도록 해도 된다.9 shows a configuration in which the display unit 215 and the speaker 216 are provided inside the display device 215 and the speaker 216 may be provided as external devices You can.

<3. ESG의 확장에 의한 음성 발화 메타데이터의 배치><3. Placement of speech utterance metadata by ESG extension>

이어서, 음성 발화 메타데이터가 저장되는 ESG 정보의 상세에 대하여 설명한다. 또한, ESG(Electronic Service Guide)는, 휴대 전화의 규격 책정을 행하는 조직인 OMA(Open Mobile Alliance)에 의해 그 사양이 책정되어 있고, 음성 발화 메타데이터가 저장되는 ESG 정보도, OMA-BCAST(OMA - Mobile Broadcast Services Enabler Suite)에 규정된 ESG에 준거하고 있다.Next, the details of the ESG information in which the speech utterance meta data is stored will be described. The ESG (Electronic Service Guide) is specified by OMA (Open Mobile Alliance), which is an organization for the specification of a cellular phone. ESG information in which voice spoken metadata is stored is also OMA-BCAST (OMA- Mobile Broadcast Services Enabler Suite (ESG).

(ESG의 구조)(Structure of ESG)

도 10은 ESG의 구조의 예를 도시하는 도면이다. 또한, 도 10에 있어서, 프래그먼트 사이를 접속하는 각 라인은, 접속된 각 프래그먼트에서의 상호 참조를 의미하고 있다.10 is a diagram showing an example of the structure of the ESG. In Fig. 10, each line connecting the fragments means a cross-reference in each connected fragments.

도 10에 있어서, ESG는 각각의 목적을 갖는 프래그먼트로 구성되고, 사용하는 용도에 따라, 어드미니스트레이티브(Administrative), 프로비저닝(Provisioning), 코어(Core), 및 액세스(Access)를 포함하는 4개의 그룹으로 나뉜다.In Fig. 10, the ESG is composed of fragments having respective purposes, and according to the purpose of use, the ESG is divided into four sections including Administrative, Provisioning, Core, and Access. Group.

어드미니스트레이티브는, ESG 정보를 수신할 수 있는 기본 정보를 제공하는 그룹이다. 어드미니스트레이티브의 그룹은, 서비스 가이드 전달 디스크립터(ServiceGuideDeliveryDescriptor)로 구성된다. 서비스 가이드 전달 디스크립터는, 복수의 서비스 가이드 프래그먼트를 수신할 수 있는 채널에 관한 정보, 채널에 관한 스케줄링 정보, 및 갱신된 정보를 수신 장치(20)에 제공한다. 이에 의해, 수신 장치(20)에서는, 필요한 ESG 정보만을 적절한 시간에 수신하는 것이 가능하게 된다.The ad ministrative is a group that provides basic information that can receive ESG information. A group of ad ministries consists of a ServiceGuideDeliveryDescriptor. The service guide delivery descriptor provides information on a channel capable of receiving a plurality of service guide fragments, scheduling information on the channel, and updated information to the receiving apparatus 20. [ Thereby, the receiving apparatus 20 can receive only necessary ESG information at an appropriate time.

프로비저닝는, 서비스 수신에 관한 요금 정보를 제공하기 위한 그룹이다. 프로비저닝의 그룹은, 구매 아이템(Purchase Item), 구매 데이터(Purchase Data), 및 구매 채널(Purchase Channel)로 구성된다. 구매 아이템은, 서비스 또는 서비스 번들에 관한 요금 정보를 제공한다. 구매 데이터는, 유저가 어떠한 방법을 통해 요금을 지불할 수 있는지에 관한 정보를 제공한다. 구매 채널은, 유저가 실제로 서비스를 구입할 수 있는 시스템에 관한 정보를 제공한다.Provisioning is a group for providing charge information on service reception. The group of provisioning is made up of Purchase Item, Purchase Data, and Purchase Channel. The purchase item provides charge information for the service or service bundle. The purchase data provides information on how the user can pay the fee. The purchase channel provides information about the system in which the user can actually purchase the service.

또한, 구매 아이템, 구매 데이터, 및 구매 채널의 각 프래그먼트에는, 음성 발화 메타데이터 또는 그 취득처를 나타내는 어드레스 정보를 저장할 수 있다. 구매 아이템, 구매 데이터, 및 구매 채널의 각 프래그먼트에, 음성 발화 메타데이터를 저장하는 방법에 대해서는, 도 13을 참조하여 후술한다.In addition, the fragments of the purchase item, the purchase data, and the purchase channel can store the speech utterance meta data or the address information indicating the acquisition destination thereof. A method of storing speech utterance metadata in each fragment of the purchase item, purchase data, and purchase channel will be described later with reference to FIG.

코어는, 서비스 그 자체에 관한 정보를 제공하는 그룹이다. 코어의 그룹은, 서비스(Service), 스케줄(Schedule), 및 콘텐츠(Content)로 구성된다. 서비스는, 채널·서비스의 내용, 및 관련되는 제어 정보를 포함하는 메타데이터를 제공한다. 스케줄은, 콘텐츠의 배신 스케줄, 및 관련되는 제어 정보를 포함하는 메타데이터를 제공한다. 콘텐츠는, 서비스를 구성하는 콘텐츠의 내용, 및 관련되는 제어 정보를 포함하는 메타데이터를 제공한다.A core is a group that provides information about the service itself. A group of cores consists of a Service, a Schedule, and a Content. The service provides metadata including the content of the channel service and the associated control information. The schedule provides metadata that includes the content distribution schedule and associated control information. The content provides metadata including the content of the content constituting the service and the associated control information.

또한, 서비스, 및 콘텐츠의 각 프래그먼트에는, 음성 발화 메타데이터 또는 그 취득처를 나타내는 어드레스 정보를 저장할 수 있다. 도 11에는, 서비스 프래그먼트(Service Fragment)의 구성예가 도시되고, 도 12에는, 콘텐츠 프래그먼트(Content Fragment)의 구성예가 도시되어 있다. 서비스와 콘텐츠의 각 프래그먼트에, 음성 발화 메타데이터를 저장하는 방법에 대해서는, 도 13을 참조하여 후술한다.The fragments of the service and the contents can store the speech utterance meta data or the address information indicating the acquisition destination. FIG. 11 shows a configuration example of a service fragment, and FIG. 12 shows a configuration example of a content fragment. A method of storing speech utterance metadata in each fragment of a service and contents will be described later with reference to Fig.

액세스는, 코어 그룹의 서비스를 수신하는 방법을 나타내는 서비스 액세스 정보, 및 서비스를 구성하고 있는 콘텐츠가 송신되는 세션에 관한 구체적인 정보를 제공하는 그룹이며, 수신 장치(20)가 서비스에 액세스할 수 있도록 한다. 액세스의 그룹은, 액세스(Access), 및 세션 디스크립션(Session Description)으로 구성된다.The access is a group that provides specific information regarding service access information indicating how to receive the service of the core group and the session in which the content that constitutes the service is transmitted and allows the receiving device 20 to access the service do. The group of access consists of an Access, and a Session Description.

액세스 그룹 내의 액세스는, 하나의 서비스에 관한 복수의 액세스 방법을 수신 장치(20)에 제공함으로써, 하나의 서비스에 기초하여 몇 가지 부가적인 서비스에 액세스할 수 있는 방법을 제공한다. 세션 디스크립션은, 하나의 액세스 프래그먼트(Access Fragment)로 정의된 서비스 액세스가 송신하는 서비스에 관한 세션 정보를 제공한다.The access in the access group provides a method of accessing some additional services based on one service, by providing the receiving device 20 with a plurality of access methods for one service. The session description provides session information about a service transmitted by a service access defined by one Access Fragment.

또한, 상술한 4개의 그룹 외에, 프리뷰 데이터(Preview Data)와, 인터랙티비티 데이터(Interactivity Data)가 있다. 프리뷰 데이터는, 서비스와 콘텐츠를 위한 프리뷰나 아이콘 등을 제공한다. 인터랙티비티 데이터는, 서비스나 콘텐츠에 관한 애플리케이션에 대한 메타데이터를 제공한다.In addition to the above-mentioned four groups, there are preview data (Preview Data) and interactive data (Interactivity Data). The preview data provides previews and icons for services and contents. The interactivity data provides metadata about the application about the service or the content.

또한, 프리뷰 데이터 프래그먼트(Preview Data Fragment)에는, 음성 발화 메타데이터 또는 그 취득처를 나타내는 어드레스 정보를 저장할 수 있다. 프리뷰 데이터 프래그먼트에 음성 발화 메타데이터를 저장하는 방법에 대해서는, 도 14를 참조하여 후술한다.In addition, the preview data fragment (Preview Data Fragment) may store the speech utterance metadata or the address information indicating the acquisition destination. A method of storing speech utterance metadata in a preview data fragment will be described later with reference to Fig.

(확장 ESG의 구성예)(Example of Extended ESG Configuration)

도 13은, 음성 발화 메타데이터 또는 그 취득처를 나타내는 어드레스 정보를 저장하기 위해 확장된 ESG의 구성예를 도시하는 도면이다. 또한, 도 13의 ESG의 확장은, ESG를 구성하는 각 프래그먼트 중, 서비스 프래그먼트(Service Fragment), 콘텐츠 프래그먼트(Content Fragment), 구매 아이템 프래그먼트(Purchase Item Fragment), 구매 데이터 프래그먼트(Purchase Data Fragment), 및 구매 채널 프래그먼트(Purchase Channel)가 대상이 된다.13 is a diagram showing an example of the configuration of an ESG extended to store speech utterance meta data or address information indicating the acquisition destination of the speech utterance meta data. In addition, the ESG extension shown in FIG. 13 includes a Service Fragment, a Content Fragment, a Purchase Item Fragment, a Purchase Data Fragment, and a Purchase Data Fragment among the fragments constituting the ESG, And purchase channel fragments (Purchase Channel).

즉, 이들 프래그먼트에는, Name 요소와 Description 요소가 포함되므로, Name 요소와 Description 요소에 대하여 PhoneticInfoURI 요소 또는 PhoneticInfo 요소를 추가하는 확장이 행해지도록 한다. 또한, 이들 프래그먼트의 PrivateExt 요소에, PhoneticInfoURI 요소 또는 PhoneticInfo 요소가 추가되도록 해도 된다.That is, since the Name element and the Description element are included in these fragments, extension for adding the PhoneticInfoURI element or the PhoneticInfo element to the Name element and the Description element is performed. A PhoneticInfoURI element or PhoneticInfo element may be added to the PrivateExt element of these fragments.

도 13에 있어서, Name 요소에는, 콘텐츠 프래그먼트의 명칭이 지정된다. Name 요소는 PhoneticInfoURI 요소, PhoneticInfo 요소, 및 Type 속성을, 자요소로서 포함하고 있다.In Fig. 13, the Name element specifies the name of the content fragment. The Name element contains the PhoneticInfoURI element, the PhoneticInfo element, and the Type attribute as child elements.

PhoneticInfoURI 요소에는, 음성 발화 메타데이터를 취득하기 위한 어드레스 정보가 지정된다. Type 속성은, PhoneticInfoURI 요소와 페어로 사용되며, 음성 발화 메타데이터의 종별을 나타내는 타입 정보가 지정된다.In the PhoneticInfoURI element, address information for obtaining speech utterance metadata is specified. The Type attribute is used as a pair with the PhoneticInfoURI element, and type information indicating the type of speech utterance metadata is specified.

이 어드레스 정보로서는, 예를 들어, URI(Uniform Resource Identifier)가 지정된다. 또한, 예를 들어, 음성 발화 메타데이터 파일이, FLUTE(File Delivery over Unidirectional Transport) 세션으로 전송될 경우에는, FLUTE 세션으로 전송되는 음성 발화 메타데이터 파일을 취득하기 위한 어드레스 정보가 지정된다. 또한, 음성 발화 메타데이터는, 음성 합성 마크업 언어인, SSML(Speech Synthesis Markup Language)에 의해 기술할 수 있다.As this address information, for example, a URI (Uniform Resource Identifier) is designated. Further, for example, when the speech utterance metadata file is transmitted in a FLUTE (Session Initiation Protocol) session, address information for acquiring the speech utterance metadata file transmitted in the FLUTE session is specified. The speech utterance metadata can be described by SSML (Speech Synthesis Markup Language), which is a speech synthesis markup language.

PhoneticInfo 요소에는, 음성 발화 메타데이터의 내용 그 자체가 기술된다. 예를 들어, 이 음성 발화 메타데이터의 내용은 SSML로 기술된다. Type 속성은, PhoneticInfo 요소와 페어로 사용되며, 음성 발화 메타데이터의 종별을 나타내는 타입 정보가 지정된다.The PhoneticInfo element describes the contents of the speech utterance metadata itself. For example, the contents of this speech utterance metadata are described in SSML. The Type attribute is used as a pair with the PhoneticInfo element, and type information indicating the type of speech utterance metadata is specified.

또한, 도 13에 있어서, 출현수(Cardinality)를 말하자면, "1..N"이 지정된 경우에는, 그 요소 또는 속성은 1 이상 지정되고, "0..N"이 지정된 경우에는, 그 요소 또는 속성을 1 이상 지정할지 여부는 임의이다. 또한, "0..1"이 지정된 경우에는, 그 요소 또는 속성을 지정할지 여부는 임의이다.In Fig. 13, when "1.N" is specified, one or more elements or attributes are specified. When "0.N" is specified, the element or attribute Whether or not to specify an attribute of 1 or more is arbitrary. When " 0..1 " is specified, whether or not the element or attribute is specified is arbitrary.

따라서, Name 요소의 자요소인 PhoneticInfoURI 요소, PhoneticInfo 요소, 및 Type 속성은, 옵셔널인 요소 또는 속성이며, PhoneticInfoURI 요소와 PhoneticInfo 요소는, 한쪽 요소만이 배치되는 것 뿐만 아니라, 그 양쪽의 요소가 배치되도록 해도 된다.Therefore, the PhoneticInfoURI element, the PhoneticInfo element, and the Type attribute, which are the child elements of the Name element, are optional elements or attributes, and the PhoneticInfoURI element and the PhoneticInfo element have not only one element but also both elements .

또한, 도 13에 있어서, Description 요소는, PhoneticInfoURI 요소, PhoneticInfo 요소, 및 Type 속성을, 자요소로서 포함하고 있다. 즉, Description 요소의 자요소는, 상술한 Name 요소의 자요소와 마찬가지가 된다.13, the Description element includes a PhoneticInfoURI element, a PhoneticInfo element, and a Type attribute as child elements. That is, the child element of the Description element is the same as the child element of the above-mentioned Name element.

구체적으로는, PhoneticInfoURI 요소에는, 음성 발화 메타데이터를 취득하기 위한 어드레스 정보가 지정된다. Type 속성은, PhoneticInfoURI 요소와 페어로 사용되며, 음성 발화 메타데이터의 종별을 나타내는 타입 정보가 지정된다. 또한, PhoneticInfo 요소에는, 음성 발화 메타데이터의 내용 그 자체가 기술된다. Type 속성은, PhoneticInfo 요소와 페어로 사용되며, 음성 발화 메타데이터의 종별을 나타내는 타입 정보가 지정된다.Specifically, in the PhoneticInfoURI element, address information for obtaining speech utterance metadata is specified. The Type attribute is used as a pair with the PhoneticInfoURI element, and type information indicating the type of speech utterance metadata is specified. The PhoneticInfo element also describes the contents of the speech utterance metadata itself. The Type attribute is used as a pair with the PhoneticInfo element, and type information indicating the type of speech utterance metadata is specified.

또한, Description 요소의 자요소인 PhoneticInfoURI 요소와, PhoneticInfo 요소에 대해서도, 어느 한쪽의 요소를 배치해도 되고, 그들 요소의 양쪽이 배치되도록 해도 된다.In addition, either one of the elements may be placed in the PhoneticInfoURI element and the PhoneticInfo element, which are the child elements of the Description element, or both of the elements may be arranged.

도 14는, 음성 발화 메타데이터 또는 그 취득처를 나타내는 어드레스 정보를 저장하기 위해 확장된 ESG의 다른 구성예를 도시하는 도면이다. 또한, 도 14의 ESG의 확장은, ESG를 구성하는 각 프래그먼트 중, 프리뷰 데이터 프래그먼트(Preview Data Fragment)가 대상이 된다.14 is a view showing another example of the structure of the ESG extended to store the speech utterance meta data or the address information indicating the acquisition destination. In addition, in the extension of the ESG shown in Fig. 14, a preview data fragment (Preview Data Fragment) among the respective fragments constituting the ESG is targeted.

즉, 프리뷰 데이터 프래그먼트에는, Picture 요소가 포함되므로, 그relativePreference 속성에 대하여 PhoneticInfoURI 요소 또는 PhoneticInfo 요소를 추가하는 확장이 행해지도록 한다. 또한, PhoneticInfoURI 요소 또는 PhoneticInfo 요소는, 프리뷰 데이터 프래그먼트의 PrivateExt 요소에 추가되도록 해도 된다.In other words, since the preview data fragment includes the Picture element, the extension for adding the PhoneticInfoURI element or the PhoneticInfo element to the relativePreference attribute is performed. In addition, the PhoneticInfoURI element or the PhoneticInfo element may be added to the PrivateExt element of the preview data fragment.

도 14에 있어서, Picture 요소에는, 서비스와 콘텐츠를 위한 프리뷰나 아이콘 등이 정의된다. Picture 요소는, PhoneticInfoURI 요소, PhoneticInfo 요소, 및 Type 속성을, 자요소로서 포함하고 있다. 즉, Picture 요소의 자요소는, 상술한 Name 요소나 Description 요소의 자요소와 마찬가지가 된다.In Fig. 14, previews and icons for services and contents are defined in the Picture element. The Picture element includes a PhoneticInfoURI element, a PhoneticInfo element, and a Type attribute as child elements. That is, the child element of the Picture element is the same as the child element of the Name element or the Description element described above.

또한, Picture 요소의 자요소인 PhoneticInfoURI 요소와, PhoneticInfo 요소에 대해서도, 어느 한쪽의 요소를 배치해도 되고, 그들 요소의 양쪽이 배치되도록 해도 된다.Either one of the elements may be placed in the PhoneticInfoURI element and the PhoneticInfo element which are the child elements of the Picture element, or both of the elements may be arranged.

(PhoneticInfoURI 요소의 구성)(Configuration of the PhoneticInfoURI element)

도 15는 확장된 ESG에 있어서의 PhoneticInfoURI 요소의 상세한 구성을 도시하는 도면이다.15 is a diagram showing a detailed configuration of the PhoneticInfo URI element in the extended ESG.

도 15에 있어서, PhoneticInfoURI 요소는, 서비스 프래그먼트 등의 name 요소 또는 Description 요소, 또는, 프리뷰 데이터 프래그먼트의 picture 요소의 자요소로서 기술된다. PhoneticInfoURI 요소에는, 음성 발화 메타데이터를 취득하기 위한 어드레스 정보가 지정된다.In Fig. 15, the PhoneticInfoURI element is described as a name element or a description element such as a service fragment, or a child element of a picture element of a preview data fragment. In the PhoneticInfoURI element, address information for obtaining speech utterance metadata is specified.

또한, PhoneticInfoURI 요소의 type 속성으로서, 음성 발화 메타데이터의 인코딩 포맷의 식별 URI가 지정된다.In addition, as the type attribute of the PhoneticInfoURI element, an identification URI of the encoding format of the speech utterance metadata is specified.

예를 들어, 음성 발화 메타데이터 파일은, FLUTE 세션으로 전송되므로, PhoneticInfoURI 요소에는, FLUTE 세션으로 전송되는 음성 발화 메타데이터 파일을 취득하기 위한 어드레스 정보가 기술된다.For example, since the speech utterance metadata file is transmitted in the FLUTE session, the PhoneticInfoURI element describes the address information for acquiring the speech utterance metadata file transmitted in the FLUTE session.

(PhoneticInfo 요소의 구성)(Configuration of PhoneticInfo element)

도 16은 확장된 ESG에 있어서의 PhoneticInfo 요소의 상세한 구성을 도시하는 도면이다.16 is a diagram showing a detailed configuration of the PhoneticInfo element in the extended ESG.

도 16에 있어서, PhoneticInfo 요소는, 서비스 프래그먼트 등의 name 요소 또는 Description 요소, 또는, 프리뷰 데이터 프래그먼트의 picture 요소의 자요소로서 기술된다. PhoneticInfo 요소에는, 음성 발화 메타데이터의 내용 그 자체가 기술된다.In Fig. 16, the PhoneticInfo element is described as a name element or a Description element such as a service fragment, or a child element of a picture element of a preview data fragment. The PhoneticInfo element describes the contents of the speech utterance metadata itself.

또한, PhoneticInfo 요소의 type 속성으로서, 음성 발화 메타데이터의 인코딩 포맷의 식별 URI가 지정된다.In addition, as the type attribute of the PhoneticInfo element, an identification URI of the encoding format of the speech utterance metadata is specified.

예를 들어, 음성 발화 메타데이터의 내용은, 음성 합성 마크업 언어인 SSML로 기술되며, PhoneticInfo 요소의 개시 태그와 종료 태그의 사이에, 텍스트 정보로서 수용된다.For example, the contents of the speech utterance metadata are described in SSML, which is a speech synthesis markup language, and are accommodated as text information between the start tag and the end tag of the PhoneticInfo element.

또한, 확장된 ESG의 대상의 프래그먼트에는, PhoneticInfoURI 요소 및 PhoneticInfo 요소 중 적어도 하나의 요소가 기술된다. 또한, 음성 발화 메타데이터는, PhoneticInfoURI 요소 또는 PhoneticInfo 요소에 의해 지정되기 때문에, 「PhoneticInfo 오브젝트」라고 칭해지는 경우가 있다.Further, at least one element of the PhoneticInfoURI element and the PhoneticInfo element is described in the fragment of the object of the extended ESG. In addition, the speech utterance metadata is sometimes referred to as a " PhoneticInfo object " because it is specified by a PhoneticInfoURI element or a PhoneticInfo element.

<4. 음성 발화 메타데이터의 기술예><4. Description example of speech utterance metadata>

상술한 바와 같이, 음성 발화 메타데이터는, 예를 들어, 음성 합성 마크업 언어인 SSML로 기술할 수 있다. 이 SSML은, W3C(World Wide Web Consortium)에 의해, 보다 고품질의 음성 합성 기능을 이용 가능하게 하는 것을 목적으로 하여 권고된 것이다. SSML을 사용함으로써, 발음이나 음량, 상태 등, 음성 합성에 필요한 요소를 섬세하게, 또한 적절하게 제어하는 것이 가능하게 된다. 이하, 도 17 내지 도 19에는, SSML 형식 문서의 기술예를 예시하고 있다.As described above, the speech utterance metadata can be described, for example, in SSML which is a speech synthesis markup language. This SSML is recommended by the World Wide Web Consortium (W3C) for the purpose of enabling higher quality speech synthesis functions. By using SSML, elements necessary for voice synthesis such as pronunciation, volume, and state can be controlled in a delicately and appropriately manner. 17 to 19 illustrate description examples of an SSML format document.

(sub 요소)(sub element)

도 17은 SSML 형식에 있어서의 sub 요소의 기술예를 도시하는 도면이다.17 is a diagram showing an example of description of a sub element in the SSML format.

sub 요소는, 텍스트 정보를 다른 텍스트 정보로 치환하기 위해 사용된다. alias 속성에는, 음성 발화용 텍스트 정보가 지정된다. 예를 들어, 도 17에 있어서, "W3C"인 텍스트 정보는, "World Wide Web Consortium"인 음성 발화용 텍스트 정보로 변환되어 소리 내어 읽힌다.The sub element is used to replace text information with other text information. The alias attribute specifies text information for speech utterance. For example, in Fig. 17, the text information of " W3C " is converted into text information for speech utterance " World Wide Web Consortium "

이 sub 요소를 사용함으로써, 예를 들어 텍스트 정보의 읽는 법이 한가지로 정해지지 않은 경우에 그 음성의 읽는 법을 나타낸 정보를 지정하는 것이 가능하게 된다.By using this sub element, for example, when the method of reading text information is not determined in one way, it becomes possible to designate information indicating how to read the voice.

(phoneme 요소)(phoneme element)

도 18은 SSML 형식에 있어서의 phoneme 요소의 기술예를 도시하는 도면이다.18 is a diagram showing an example of description of a phoneme element in the SSML format.

phoneme 요소는, 기술되어 있는 텍스트 정보에, 음소/음성상의 발음을 부여하기 위해 사용된다. phoneme 요소는, alphabet 속성과, ph 속성을 지정할 수 있다. alphabet 속성에는, 음소/음성의 발음 문자가 지정된다. ph 속성에는, 음소/음성의 문자열이 지정된다. 예를 들어, 도 18에 있어서는, "La vita e bella"인 텍스트 정보의 읽는 법이, ph 속성으로 지정된다. 또한, alphabet 속성으로 지정된 "ipa"는, 국제 음성 신호(IPA: International Phonetic Alphabet)의 읽기 기호에 대응하고 있음을 나타내고 있다.The phoneme element is used to give phonetic / phonetic pronunciation to the textual information described. The phoneme element can specify the alphabet and ph attributes. The alphabet property specifies phonetic / phonetic pronunciation characters. The ph attribute specifies a string of phonemes / voices. For example, in Fig. 18, a reading method of text information "La vita e bella" is designated by the ph attribute. In addition, " ipa " designated by the alphabet attribute corresponds to a reading symbol of an international phonetic alphabet (IPA).

이 phoneme 요소를 사용함으로써 예를 들어, 발음이 난해한 고유 명사 등일 경우에 그 음소 정보 등을 지정하는 것이 가능하게 된다.By using this phoneme element, it becomes possible to designate the phoneme information or the like in the case where the pronunciation is an intricate proper noun, for example.

(audio 요소)(audio element)

도 19는 SSML 형식에 있어서의 audio 요소의 기술예를 도시하는 도면이다.19 is a diagram showing an example of description of an audio element in the SSML format.

audio 요소는, 음성 파일의 내장 음성이나 합성 음성을 출력하기 위해 사용된다. audio 요소는 src 속성을 지정할 수 있다. src 속성에는, 음성 파일의 URI(Uniform Resource Identifier)가 지정된다. 예를 들어, 도 19에 있어서는, "What city do you want to fly from?"인 텍스트 정보가, src 속성으로 지정된 "prompt.au"인 음성 파일을 재생함으로써 소리 내어 읽힌다.The audio element is used to output a built-in voice or synthesized voice of a voice file. The audio element can specify the src attribute. The src attribute specifies the URI (Uniform Resource Identifier) of the voice file. For example, in FIG. 19, text information of "What city do you want to fly from?" Is read aloud by playing back a voice file "prompt.au" specified by the src attribute.

이 audio 요소를 사용함으로써, 예를 들어 녹음 완료된 음성 파일의 재생이 가능하게 되고, 유저 인터페이스의 제작자가 의도한 대로의 음성 정보를 시각 장애자에게 제공 가능하게 된다.By using this audio element, for example, reproduction of a recorded audio file becomes possible, and audio information as intended by the user of the user interface can be provided to the visually impaired.

또한, 상술한 sub 요소, phoneme 요소, 및 audio 요소는, SSML 형식을 사용한 음성 발화 메타데이터의 기술 방법의 일례이며, SSML 형식의 다른 요소나 속성을 사용하도록 해도 된다. 또한, 음성 발화 메타데이터는, SSML 형식 이외의 다른 마크업 언어 등으로 기술하도록 해도 된다.The sub element, phoneme element, and audio element described above are examples of methods of describing speech utterance metadata using the SSML format, and other elements or attributes of the SSML format may be used. Further, the speech utterance metadata may be described in a markup language other than the SSML format.

<5. 각 장치에서 실행되는 처리의 흐름><5. The flow of processing executed in each device>

이어서, 도 7의 방송 시스템(1)을 구성하는 송신 장치(10)와 수신 장치(20)에서 실행되는 처리의 흐름을 설명한다.Next, the flow of processing executed in the transmitting apparatus 10 and the receiving apparatus 20 constituting the broadcasting system 1 of Fig. 7 will be described.

(송신 처리)(Transmission processing)

먼저, 도 20의 흐름도를 참조하여, 도 7의 송신 장치(10)에 의해 실행되는, 송신 처리의 흐름을 설명한다.First, with reference to the flowchart of Fig. 20, the flow of transmission processing executed by the transmission apparatus 10 of Fig. 7 will be described.

스텝 S111에 있어서, 콘텐츠 취득부(111)는 방송 프로그램 등의 콘텐츠를 취득하여 스트림 생성부(114)에 공급한다.In step S111, the content acquisition unit 111 acquires a content such as a broadcast program and supplies the acquired content to the stream generation unit 114. [

스텝 S112에 있어서, 음성 발화 메타데이터 생성부(112)는, 예를 들어 유저 인터페이스의 제작자로부터의 지시에 따라, 음성 발화 메타데이터를 생성하고, ESG 정보 생성부(113)에 공급한다.In step S112, the speech utterance meta data generation unit 112 generates speech utterance meta data according to an instruction from the manufacturer of the user interface, for example, and supplies it to the ESG information generation unit 113. [

또한, 음성 발화 메타데이터 생성부(112)는, 어드레스 정보를 포함하는 음성 발화 메타데이터를 생성하여 ESG 정보 생성부(113)에 공급한 경우에는, 당해 어드레스 정보에 따라 취득되는 음성 발화 메타데이터 파일을 생성하여, 스트림 생성부(114)에 공급한다.When generating the speech utterance meta data including the address information and supplying the generated speech utterance meta data to the ESG information generation unit 113, the speech utterance meta data generation unit 112 generates the speech utterance meta data, which is acquired in accordance with the address information, And supplies it to the stream generating unit 114. [

스텝 S113에 있어서, ESG 정보 생성부(113)는 음성 발화 메타데이터 생성부(112)로부터 공급되는 음성 발화 메타데이터에 기초하여, ESG 정보를 생성하고, 스트림 생성부(114)에 공급한다.In step S113, the ESG information generation unit 113 generates ESG information based on the speech utterance meta data supplied from the speech utterance meta data generation unit 112, and supplies the ESG information to the stream generation unit 114. [

스텝 S114에 있어서, 스트림 생성부(114)는 콘텐츠 취득부(111)로부터 공급되는 콘텐츠 데이터와, ESG 정보 생성부(113)로부터 공급되는 ESG 정보에 기초하여, 소정의 규격에 준거한 스트림을 생성하여, 송신부(115)에 공급한다.In step S114, the stream generating unit 114 generates a stream conforming to a predetermined standard, based on the content data supplied from the content acquiring unit 111 and the ESG information supplied from the ESG information generating unit 113 And supplies it to the transmission unit 115.

또한, 스트림 생성부(114)는 ESG 정보 생성부(113)로부터 공급되는 ESG 정보에 포함되는 음성 발화 메타데이터가 어드레스 정보를 포함하고 있을 경우, 콘텐츠 데이터와 ESG 정보에 더하여, 음성 발화 메타데이터 생성부(112)로부터 공급되는 음성 발화 메타데이터 파일에 기초해서, 소정의 규격에 준거한 스트림을 생성하여, 송신부(115)에 공급한다.When the speech utterance meta data included in the ESG information supplied from the ESG information generation unit 113 includes address information, the stream generation unit 114 generates the speech utterance meta data in addition to the content data and the ESG information Based on the speech utterance metadata file supplied from the speech recognition unit 112, supplies the generated stream to the transmission unit 115. [

스텝 S115에 있어서, 송신부(115)는 스트림 생성부(114)로부터 공급되는 스트림에 대하여, 예를 들어 디지털 변조 등의 처리를 실시하고, 안테나(116)를 통하여 디지털 방송 신호로서 송신한다.In step S115, the transmission unit 115 performs processing such as digital modulation on the stream supplied from the stream generation unit 114, and transmits the stream as a digital broadcast signal through the antenna 116. [

이상, 송신 처리에 대하여 설명했다. 이 송신 처리에서는, 유저 인터페이스 등의 표시 정보에 대한 제작자가 의도하는 음성의 발화에 관한 음성 발화 메타데이터가 생성되고, 음성 발화 메타데이터를 포함하는 ESG 정보가 생성되며, 콘텐츠와 함께, 음성 발화 메타데이터를 포함하는 ESG 정보가 송신된다.The transmission processing has been described above. In this transmission processing, speech utterance meta data on speech uttered by a maker with respect to display information such as a user interface is generated, ESG information including speech utterance meta data is generated, and the speech utterance meta data ESG information including data is transmitted.

이에 의해, 수신 장치(20)측에서는, TTS 엔진(219)이 음성 발화 메타데이터에 기초하여, 표시 정보를 소리 내어 읽으므로, 예를 들어, 텍스트 정보의 읽는 법이 한가지로 정해지지 않은 경우나, 발음이 난해한 고유 명사 등인 경우 등이더라도, 확실하게 제작자가 의도한 대로 텍스트 정보가 소리 내어 읽힌다. 그 결과, 시각 장애자가 정상인과 동등한 정보를 얻을 수 있게 된다.As a result, on the receiving apparatus 20 side, the TTS engine 219 reads the display information aloud on the basis of the speech utterance meta data. For example, when the reading method of the text information is not determined as one, The text information is read out aloud as the author intends, even if it is an intricate proper noun or the like. As a result, the visually impaired can obtain the same information as the normal person.

(수신 처리)(Reception processing)

이어서, 도 21의 흐름도를 참조하여, 도 7의 수신 장치(20)에 의해 실행되는 수신 처리의 흐름을 설명한다.Next, the flow of the reception processing executed by the reception apparatus 20 of Fig. 7 will be described with reference to the flowchart of Fig.

스텝 S211에 있어서, 수신부(212)는 안테나(211)를 통하여 송신 장치(10)로부터 송신되어 오는 디지털 방송 신호를 수신한다. 또한, 수신부(212)는 디지털 방송 신호에 대하여 복조 처리 등을 행하고, 그것에 의해 얻어지는 스트림을, 스트림 분리부(213)에 공급한다.In step S211, the receiving section 212 receives the digital broadcast signal transmitted from the transmitting apparatus 10 via the antenna 211. [ The receiver 212 demodulates the digital broadcast signal, and supplies the stream obtained by the demodulation to the stream demultiplexer 213.

스텝 S212에 있어서, 스트림 분리부(213)는 수신부(212)로부터 공급되는 스트림으로부터, 콘텐츠 데이터와 ESG 정보를 분리하여, 콘텐츠 데이터를 재생부(214)에, ESG 정보를 ESG 정보 취득부(217)에 각각 공급한다.In step S212, the stream separating unit 213 separates the content data and the ESG information from the stream supplied from the receiving unit 212, and sends the content data to the reproducing unit 214 and the ESG information to the ESG information obtaining unit 217 Respectively.

스텝 S213에 있어서, ESG 정보 취득부(217)는 스트림 분리부(213)로부터 공급되는 ESG 정보를 취득한다. ESG 정보 취득부(217)는, 예를 들어 유저에 의해 ESG 화면의 표시가 지시된 경우, ESG 정보를 재생부(214)에 공급한다. 또한, ESG 정보 취득부(217)는 ESG 정보에 포함되는 음성 발화 메타데이터를, 음성 발화 메타데이터 취득부(218)에 공급한다.In step S213, the ESG information acquisition unit 217 acquires the ESG information supplied from the stream separation unit 213. [ The ESG information acquisition unit 217 supplies the ESG information to the playback unit 214 when, for example, a user instructs display of the ESG screen. The ESG information acquisition unit 217 supplies the speech spoken meta data included in the ESG information to the speech spoken meta data acquisition unit 218. [

스텝 S214에 있어서, 재생부(214)는 ESG 정보 취득부(217)로부터 공급되는 ESG 정보에 기초하여, ESG 화면을 생성하고, 표시부(215)에 표시시킨다.In step S214, the playback unit 214 generates an ESG screen based on the ESG information supplied from the ESG information acquisition unit 217, and causes the display unit 215 to display the ESG screen.

스텝 S215에 있어서, 음성 발화 메타데이터 취득부(218)는 ESG 정보 취득부(217)로부터 공급되는 음성 발화 메타데이터를 취득한다.In step S215, the speech utterance meta data acquisition unit 218 acquires the speech utterance meta data supplied from the ESG information acquisition unit 217. [

여기서, 음성 발화 메타데이터 취득부(218)는 음성 발화 메타데이터에 어드레스 정보가 포함되어 있는 경우, 당해 어드레스 정보에 기초하여, 스트림 분리부(213)에 의해 분리되는 스트림으로부터, 음성 발화 메타데이터 파일을 취득하고, 거기에서 얻어지는 내용을 포함하고 있는 음성 발화 메타데이터를 TTS 엔진(219)에 공급한다. 한편, 음성 발화 메타데이터 취득부(218)는, 음성 발화 메타데이터가 그 내용을 포함하고 있을 경우에는, 당해 음성 발화 메타데이터를 그대로 TTS 엔진(219)에 공급한다.Here, the speech utterance meta data acquisition unit 218 acquires, from the stream separated by the stream separation unit 213, based on the address information, the speech utterance metadata file And supplies the speech utterance metadata including the content obtained there to the TTS engine 219. [ On the other hand, when the speech utterance metadata includes the contents, the speech utterance metadata acquisition unit 218 supplies the speech utterance metadata to the TTS engine 219 as it is.

스텝 S216에 있어서, TTS 엔진(219)은, 음성 발화 메타데이터 취득부(218)로부터 공급되는 음성 발화 메타데이터에 기초하여, 유저 인터페이스 등의 표시 정보를 소리 내어 읽고, 그 음성을 스피커(216)로부터 출력한다.In step S216, the TTS engine 219 reads out display information such as a user interface and the like on the basis of the speech utterance meta data supplied from the speech utterance meta data acquiring section 218, .

여기에서는, 스텝 S214의 처리에서, 표시부(215)에 ESG 화면이 표시되어 있는 경우에 있어서, 시각 장애자에 대하여 엑세서블하게 하기 위해, 타이틀이나 프로그램 정보 등을 소리 내어 읽을 때에 있어, 텍스트 정보의 읽는 법이 한가지로 정해지지 않을 때 등에, TTS 엔진(219)은 음성 발화 메타데이터에 따라, 텍스트 정보가 제작자가 의도한 대로 소리 내어 읽히게 한다.Here, in the process of step S214, when the ESG screen is displayed on the display unit 215, when the title, program information, and the like are read aloud to enable the visually impaired to be accessible, The TTS engine 219 causes the text information to be read aloud as the author intends, according to the speech utterance metadata.

이상, 수신 처리에 대하여 설명했다. 이 수신 처리에서는, 송신 장치(10)로부터 송신되어 오는, 표시 정보에 대한 제작자가 의도하는 음성의 발화에 관한 음성 발화 메타데이터를 포함하는 ESG 정보가 수신되고, ESG 정보에 포함되는 음성 발화 메타데이터가 취득되며, 음성 발화 메타데이터에 기초하여, 유저 인터페이스 등의 표시 정보가 소리 내어 읽힌다.The reception processing has been described above. In this reception processing, the ESG information including the speech utterance meta data concerning the utterance of speech intended by the manufacturer to the display information transmitted from the transmission apparatus 10 is received, and the speech utterance metadata And display information such as a user interface is read aloud based on the speech utterance meta data.

이에 의해, TTS 엔진(219)에 있어서는, 음성 발화 메타데이터에 기초하여, 표시 정보를 소리 내어 읽으므로, 예를 들어, 텍스트 정보의 읽는 법이 한가지로 정해지지 않은 경우나, 발음이 난해한 고유 명사 등인 경우 등이더라도, 확실하게 제작자가 의도한 대로 텍스트 정보가 소리 내어 읽힌다. 그 결과, 시각 장애자가 정상인과 동등한 정보를 얻을 수 있게 된다.Thus, in the TTS engine 219, the display information is read out aloud based on the speech utterance meta data. Thus, for example, when the method of reading text information is not determined in one way, Text information is read aloud as the author intended. As a result, the visually impaired can obtain the same information as the normal person.

<6. 변형예><6. Modifications>

상술한 설명에서는, 음성 발화 메타데이터에 어드레스 정보가 포함되어 있는 경우에는, 당해 어드레스 정보에 따라, FLUTE 세션으로 전송되는 음성 발화 메타데이터 파일이 취득된다고 설명했지만, 음성 발화 메타데이터 파일은, 인터넷상의 서버로부터 배신되도록 해도 된다. 이 경우, 어드레스 정보로서는, 서버의 URL(Uniform Resource Locator) 등이 지정된다.In the above description, when address information is included in the speech utterance meta data, the speech utterance meta data file transmitted in the FLUTE session is acquired in accordance with the address information. However, It may be distributed from the server. In this case, a URL (Uniform Resource Locator) or the like of the server is designated as the address information.

또한, 상술한 설명에서는, 전자 프로그램 정보로서, OMA-BCAST로 규정된 ESG를 설명했지만, 본 기술은, 예를 들어, EPG(Electronic Program Guide), 그 밖의 전자 프로그램 정보에 적용할 수 있다. 또한, ESG 정보 등의 전자 프로그램 정보가, 인터넷상의 서버로부터 배신되어, 수신 장치(20)에 의해 수신되도록 해도 된다.In the above description, the ESG specified by OMA-BCAST is described as the electronic program information. However, the present technology can be applied to, for example, EPG (Electronic Program Guide) and other electronic program information. Electronic program information such as ESG information may be distributed from a server on the Internet and received by the receiving apparatus 20. [

<7. 컴퓨터의 구성><7. Computer Configuration>

상술한 일련의 처리는, 하드웨어에 의해 실행할 수도 있고, 소프트웨어에 의해 실행할 수도 있다. 일련의 처리를 소프트웨어에 의해 실행하는 경우에는, 그 소프트웨어를 구성하는 프로그램이 컴퓨터에 인스톨된다. 도 22는 상술한 일련의 처리를 프로그램에 의해 실행하는 컴퓨터 하드웨어의 구성예를 도시하는 도면이다.The series of processes described above may be executed by hardware or by software. When a series of processes are executed by software, the programs constituting the software are installed in the computer. 22 is a diagram showing an example of the configuration of computer hardware that executes the above-described series of processes by a program.

컴퓨터(900)에 있어서, CPU(Central Processing Unit)(901), ROM(Read Only Memory)(902), RAM(Random Access Memory)(903)은, 버스(904)에 의해 서로 접속되어 있다. 버스(904)에는 또한, 입출력 인터페이스(905)가 접속되어 있다. 입출력 인터페이스(905)에는 입력부(906), 출력부(907), 기록부(908), 통신부(909), 및 드라이브(910)가 접속되어 있다.In the computer 900, a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, and a RAM (Random Access Memory) 903 are connected to each other by a bus 904. An input / output interface 905 is also connected to the bus 904. An input unit 906, an output unit 907, a recording unit 908, a communication unit 909, and a drive 910 are connected to the input / output interface 905.

입력부(906)는 키보드, 마우스, 마이크로폰 등을 포함한다. 출력부(907)는 디스플레이, 스피커 등을 포함한다. 기록부(908)는 하드 디스크나 불휘발성 메모리 등을 포함한다. 통신부(909)는 네트워크 인터페이스 등을 포함한다. 드라이브(910)는 자기 디스크, 광 디스크, 광자기 디스크, 또는 반도체 메모리 등의 리무버블 미디어(911)를 구동한다.The input unit 906 includes a keyboard, a mouse, a microphone, and the like. The output unit 907 includes a display, a speaker, and the like. The recording unit 908 includes a hard disk, a nonvolatile memory, and the like. The communication unit 909 includes a network interface and the like. The drive 910 drives a removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

이상과 같이 구성되는 컴퓨터(900)에서는, CPU(901)가, ROM(902)이나 기록부(908)에 기억되어 있는 프로그램을, 입출력 인터페이스(905) 및 버스(904)를 통하여, RAM(903)에 로드하여 실행함으로써, 상술한 일련의 처리가 행해진다.In the computer 900 configured as described above, the CPU 901 loads the program stored in the ROM 902 or the recording unit 908 into the RAM 903 via the input / output interface 905 and the bus 904, The above-described series of processes is performed.

컴퓨터(900)(CPU(901))가 실행하는 프로그램은, 예를 들어, 패키지 미디어 등으로서의 리무버블 미디어(911)에 기록하여 제공할 수 있다. 또한, 프로그램은, 로컬에리어 네트워크, 인터넷, 디지털 위성 방송 등의 유선 또는 무선의 전송 매체를 통하여 제공할 수 있다.The program executed by the computer 900 (the CPU 901) can be recorded on the removable medium 911, for example, as a package medium or the like and provided. The program may be provided through a wired or wireless transmission medium such as a local area network, the Internet, digital satellite broadcasting, or the like.

컴퓨터(900)에서는, 프로그램은 리무버블 미디어(911)를 드라이브(910)에 장착함으로써, 입출력 인터페이스(905)를 통하여 기록부(908)에 인스톨할 수 있다. 또한, 프로그램은, 유선 또는 무선의 전송 매체를 통하여 통신부(909)에서 수신하고, 기록부(908)에 인스톨할 수 있다. 그 밖에 프로그램은, ROM(902)이나 기록부(908)에 미리 인스톨해 둘 수 있다.In the computer 900, the program can be installed in the recording unit 908 via the input / output interface 905 by mounting the removable medium 911 on the drive 910. [ The program can be received by the communication unit 909 via a wired or wireless transmission medium and installed in the recording unit 908. [ In addition, the program can be installed in the ROM 902 or the recording unit 908 in advance.

여기서, 본 명세서에 있어서, 컴퓨터가 프로그램에 따라서 행하는 처리는, 반드시 흐름도로서 기재된 순서에 따라서 시계열로 행해질 필요는 없다. 즉, 컴퓨터가 프로그램에 따라서 행하는 처리는, 병렬적 또는 개별로 실행되는 처리(예를 들어, 병렬 처리 또는 오브젝트에 의한 처리)도 포함한다. 또한, 프로그램은 하나의 컴퓨터(프로세서)에 의해 처리되는 것이어도 되고, 복수의 컴퓨터에 의해 분산 처리되는 것이어도 된다.Here, in this specification, the processing that the computer performs in accordance with the program does not necessarily have to be performed in time series according to the procedure described as the flowchart. That is, the processing that the computer performs in accordance with the program includes processing that is executed in parallel or individually (for example, parallel processing or processing by object). The program may be processed by one computer (processor), or may be distributed by a plurality of computers.

또한, 본 기술의 실시 형태는, 상술한 실시 형태에 한정되는 것은 아니며, 본 기술의 요지를 일탈하지 않는 범위에서 다양한 변경이 가능하다.The embodiments of the present technology are not limited to the above-described embodiments, and various modifications are possible without departing from the gist of the present invention.

또한, 본 기술은, 이하와 같은 구성을 취할 수 있다.Further, the present technology can take the following configuration.

(1)(One)

표시 정보에 대한 제작자가 의도하는 음성의 발화에 관한 메타데이터를 생성하는 메타데이터 생성부와,A meta data generation unit for generating meta data on a speech utterance intended by the manufacturer for the display information,

상기 메타데이터를 포함하는 전자 프로그램 정보를 생성하는 전자 프로그램 정보 생성부와,An electronic program information generating unit for generating electronic program information including the metadata;

상기 표시 정보를 표시 가능한 수신 장치에 대하여 상기 전자 프로그램 정보를 송신하는 송신부And a transmitting unit for transmitting the electronic program information to a receiving apparatus capable of displaying the display information

를 구비하는 송신 장치.And a transmitting unit.

(2)(2)

상기 메타데이터는, 읽는 법이 한가지로 정해지지 않은 문자열, 또는 발음이 난해한 문자열의 발화에 관한 정보를 포함하고 있는The meta data includes information about a character string whose reading is not determined in one way, or information about a character string whose pronunciation is difficult

(1)에 기재된 송신 장치.(1).

(3)(3)

상기 표시 정보는 콘텐츠에 관한 정보, 또는 아이콘을 포함하고 있는The display information includes information on the content, or an icon

(1) 또는 (2)에 기재된 송신 장치.(1) or (2).

(4)(4)

상기 콘텐츠를 취득하는 콘텐츠 취득부를 더 구비하고,Further comprising a content acquiring unit that acquires the content,

상기 송신부는 상기 전자 프로그램 정보를 상기 콘텐츠와 함께, 디지털 방송 신호로 송신하는Wherein the transmitter transmits the electronic program information together with the content as a digital broadcast signal

(3)에 기재된 송신 장치.(3).

(5)(5)

상기 전자 프로그램 정보는 OMA-BCAST(Open Mobile Alliance - Mobile Broadcast Services Enabler Suite)로 규정된 ESG(Electronic Service Guide)에 준거하고 있고,The electronic program information conforms to an Electronic Service Guide (ESG) defined by OMA-BCAST (Open Mobile Alliance - Mobile Broadcast Services Enabler Suite)

상기 메타데이터는 SSML(Speech Synthesis Markup Language) 형식으로 기술되고,The metadata is described in SSML (Speech Synthesis Markup Language) format,

상기 ESG를 구성하는 소정의 프래그먼트에, 상기 SSML 형식으로 기술된 상기 메타데이터 파일의 취득처를 나타내는 어드레스 정보, 또는 상기 SSML 형식으로 기술된 상기 메타데이터의 내용 그 자체가 포함되는The address information indicating the acquisition destination of the metadata file described in the SSML format or the content itself of the metadata described in the SSML format is included in a predetermined fragment constituting the ESG

(1) 내지 (4) 중 어느 하나에 기재된 송신 장치.The transmission apparatus according to any one of (1) to (4).

(6)(6)

송신 장치의 송신 방법에 있어서,A transmitting method of a transmitting apparatus,

상기 송신 장치가,The transmitting apparatus comprising:

표시 정보에 대한 제작자가 의도하는 음성의 발화에 관한 메타데이터를 생성하고,Generates meta data on a speech utterance intended by the manufacturer for the display information,

상기 메타데이터를 포함하는 전자 프로그램 정보를 생성하고,Generating electronic program information including the metadata,

상기 표시 정보를 표시 가능한 수신 장치에 대하여 상기 전자 프로그램 정보를 송신하는And transmits the electronic program information to a receiving apparatus capable of displaying the display information

스텝을 포함하는 송신 방법.A transmission method comprising a step.

(7)(7)

송신 장치로부터 송신되어 오는, 표시 정보에 대한 제작자가 의도하는 음성의 발화에 관한 메타데이터를 포함하는 전자 프로그램 정보를 수신하는 수신부와,A receiving unit for receiving electronic program information transmitted from a transmitting apparatus, the electronic program information including meta data relating to a speech uttered by a manufacturer on display information;

상기 전자 프로그램 정보에 포함되는 상기 메타데이터를 취득하는 메타데이터 취득부와,A metadata acquisition unit for acquiring the metadata included in the electronic program information;

상기 메타데이터에 기초하여, 상기 표시 정보를 소리 내어 읽는 음성 읽기부Based on the metadata, a voice reading unit

를 구비하는 수신 장치.And a receiving unit.

(8)(8)

(7)에 기재된 수신 장치.(7).

(9)(9)

(7) 또는 (8)에 기재된 수신 장치.(7) or (8).

(10)(10)

상기 수신부는 디지털 방송 신호로서, 상기 콘텐츠와 함께 송신되는 상기 전자 프로그램 정보를 수신하는Wherein the receiving unit receives, as a digital broadcast signal, the electronic program information transmitted together with the content

(9)에 기재된 수신 장치.(9).

(11)(11)

상기 전자 프로그램 정보는 OMA-BCAST로 규정된 ESG에 준거하고 있고,The electronic program information is based on an ESG defined by OMA-BCAST,

상기 메타데이터는 SSML 형식으로 기술되고,The metadata is described in an SSML format,

상기 ESG를 구성하는 소정의 프래그먼트에, 상기 SSML 형식으로 기술된 상기 메타데이터 파일의 취득처를 나타내는 어드레스 정보, 또는 상기 SSML 형식으로 기술된 상기 메타데이터의 내용 그 자체가 포함되어 있고,The address information indicating the acquisition destination of the metadata file described in the SSML format or the content itself of the metadata described in the SSML format is included in a predetermined fragment constituting the ESG,

상기 메타데이터 취득부는, 상기 어드레스 정보에 따라 상기 메타데이터의 파일을 취득하거나, 또는 상기 프래그먼트로부터 상기 메타데이터를 취득하는Wherein the metadata acquisition unit acquires the file of the metadata in accordance with the address information or acquires the metadata from the fragment

(7) 내지 (10) 중 어느 하나에 기재된 수신 장치.The receiving apparatus according to any one of (7) to (10).

(12)(12)

수신 장치의 수신 방법에 있어서,A receiving method of a receiving apparatus,

상기 수신 장치가,The receiving apparatus comprising:

송신 장치로부터 송신되어 오는, 표시 정보에 대한 제작자가 의도하는 음성의 발화에 관한 메타데이터를 포함하는 전자 프로그램 정보를 수신하고,Receiving electronic program information transmitted from a transmitting apparatus, the electronic program information including meta data concerning a speech utterance intended by the manufacturer for the display information,

상기 전자 프로그램 정보에 포함되는 상기 메타데이터를 취득하고,Acquiring the metadata included in the electronic program information,

상기 메타데이터에 기초하여, 상기 표시 정보를 소리 내어 읽는Based on the metadata, reading the display information aloud

스텝을 포함하는 수신 방법.A receiving method comprising a step.

1: 방송 시스템
10: 송신 장치
20: 수신 장치
111: 콘텐츠 취득부
112: 음성 발화 메타데이터 생성부
113: ESG 정보 생성부
114: 스트림 생성부
115: 송신부
212: 수신부
213: 스트림 분리부
214: 재생부
215: 표시부
216: 스피커
217: ESG 정보 취득부
218: 음성 발화 메타데이터 취득부
219: TTS 엔진
900: 컴퓨터
901: CPU1: broadcasting system
10: Transmitter
20: Receiver
111:
112: Voice spoken metadata generation unit
113: ESG information generation unit
114:
115:
212: Receiver
213:
214:
215:
216: Speaker
217: ESG information acquisition unit
218: voice speech metadata acquisition unit
219: TTS engine
900: Computer
901: CPU

Claims

A receiving unit for receiving electronic program information transmitted from a transmitting apparatus, the electronic program information including meta data relating to a speech uttered by a manufacturer on display information;
A process of acquiring the metadata included in the electronic program information and a process of controlling the process of reading the display information aloud based on the metadata,
Wherein the display information includes information on a content or an icon,
And the control unit reads out the contents of the service corresponding to the icon on the basis of the metadata.

The method according to claim 1,
Wherein the meta data includes information about a character string whose reading method is not determined in one way or an ignition of a proper noun.

The method according to claim 1,
Wherein the receiving unit receives the electronic program information transmitted together with the content as a digital broadcast signal.

The method according to claim 1,
Wherein the electronic program information includes address information indicating a location where the file of the meta data is to be acquired or contents of the meta data itself,
Wherein the control unit controls processing of acquiring the file of the metadata or acquiring the metadata from the electronic program information in accordance with the address information.

5. The method of claim 4,
Wherein a URL of a server on the Internet is specified in the address information.

A receiving method of a receiving apparatus,
The receiving apparatus comprising:
Receiving electronic program information transmitted from a transmitting apparatus, the electronic program information including meta data related to a speech uttered by a manufacturer with respect to display information;
A step of acquiring the metadata included in the electronic program information and a step of reading out the display information aloud based on the metadata,
Wherein the display information includes information on a content or an icon,
And the controlling step includes a step of reading out the contents of the service corresponding to the icon on the basis of the metadata.