KR101961704B1

KR101961704B1 - Apparatus for receiving iptv service, and control method thereof

Info

Publication number: KR101961704B1
Application number: KR1020170132949A
Authority: KR
Inventors: 이광원
Original assignee: 에스케이브로드밴드주식회사
Priority date: 2017-10-13
Filing date: 2017-10-13
Publication date: 2019-07-17

Abstract

The present invention relates to a voice bridge device and voice recognition processing device, and an operation method thereof. According to the present invention, the voice bridge device can extend a recognition distance of a voice signal and improve the accuracy of a recognition result of the voice signal in connection with processing a predetermined operation according to the voice signal (voice) recognition result.

Description

[0001] APPARATUS FOR RECEIVING IPTV SERVICE, AND CONTROL METHOD THEREOF [0002]

본 발명은 서비스수신장치(셋탑박스)가 음성신호를 인식하는 별도의 음성인식처리장치와 연계하는 방식을 통해 IPTV 서비스신호 내 영상신호와 매칭되는 자막신호를 생성하여 제공하기 위한 방안에 관한 것이다.The present invention relates to a method for generating and providing a caption signal matched with a video signal in an IPTV service signal through a method in which a service receiving apparatus (set-top box) is associated with a separate speech recognition processing apparatus for recognizing a voice signal.

최근 디지털 기반의 디지털 텔레비전 기술이 상용화되면서 각 가정에 연결되어 있는 인터넷 등과 같은 네트워크를 통해 다양한 컨텐츠를 제공하는 것이 가능해졌으며, IPTV(Internet Protocol Television) 서비스를 그 대표적인 예로 들 수 있다.As digital-based digital television technology has been commercialized recently, various contents can be provided through a network such as the Internet connected to each home, and IPTV (Internet Protocol Television) service is a representative example.

IPTV 서비스는, 방송 컨텐츠, 인터넷 컨텐츠, VoD(Video on Demand) 서비스 등의 다양한 서비스를 셋탑박스(Set-top Box)와 같은 서비스수신장치를 통해 제공하는 서비스를 일컫는다.The IPTV service refers to a service that provides various services such as broadcast contents, Internet contents, and VoD (Video on Demand) through a service receiving device such as a set-top box.

이러한, IPTV 서비스에서는 영상신호와 음향신호가 복합된 IPTV 서비스신호를 수신장치가 중계하여 재생장치(IPTV)에서 해당 IPTV 서비스신호를 서비스수신장치로부터 수신하여 재생하는 방식으로 제공될 수 있다.In the IPTV service, a receiving device may relay an IPTV service signal in which a video signal and an audio signal are combined, and the corresponding IPTV service signal may be received from a service receiving device in a reproducing device (IPTV) and reproduced.

헌데, 이처럼 서비스수신장치가 중계하는 IPTV 서비스신호의 경우, 영상신호와 음향신호가 복합된 관계로 청각장애인의 경우 IPTV 서비스신호에서 영상신호의 시청만이 가능할 뿐 영상신호와 매칭되는 음향신호는 청취할 수 없다는 문제가 있다.In the case of the IPTV service signal relayed by the service receiving device as described above, since the video signal and the acoustic signal are combined, the audio signal matching the video signal is only audible for the video signal in the IPTV service signal, There is a problem that it can not be done.

본 발명은 상기한 사정을 감안하여 창출된 것으로서, 본 발명에서 도달하고자 하는 목적은, IPTV 서비스신호를 중계하는 서비스수신장치(예: 셋탑박스)가 음성신호를 인식하는 별도의 음성인식처리장치와 연계하는 방식을 통해 IPTV 서비스신호 내 영상신호와 매칭되는 자막신호를 생성하여 제공하는데 있다.The object of the present invention is to provide a service receiving apparatus (for example, a set-top box) for relaying an IPTV service signal, a separate voice recognition processing device for recognizing a voice signal, And generates and provides a caption signal matching with a video signal in the IPTV service signal through a linking method.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 서비스수신장치는, IPTV 서비스신호로부터 영상신호와 상기 영상신호와 매칭되는 음향신호를 분리하는 신호분리부; 상기 음향신호를 음성인식처리장치로 전송하여 상기 음성인식장치에서 상기 음향신호 내 음성신호를 인식할 수 있도록 하는 음향신호처리부; 상기 음성인식처리장치로부터 상기 음성신호를 인식한 결과인 자막신호를 수신하여, 상기 자막신호와 상기 영상신호를 합성한 합성영상신호를 생성하는 합성영상신호생성부; 및 상기 합성영상신호를 상기 재생장치로 전송하여, 상기 재생장치에서 상기 합성영상신호의 재생에 따라 상기 영상신호와 상기 자막신호가 함께 표시되도록 하는 합성영상신호처리부를 포함한다.According to an aspect of the present invention, there is provided a service receiving apparatus including: a signal separator for separating an image signal from an IPTV service signal and an acoustic signal matching the image signal; An acoustic signal processing unit for transmitting the acoustic signal to a voice recognition processing device to allow the voice recognition device to recognize a voice signal in the acoustic signal; A composite video signal generator for receiving a caption signal resulting from recognizing the audio signal from the speech recognition processor and generating a composite video signal obtained by synthesizing the caption signal and the video signal; And a composite video signal processing unit for transmitting the composite video signal to the reproduction apparatus and causing the reproduction apparatus to display the video signal and the caption signal together with reproduction of the composite video signal.

구체적으로, 상기 음향신호처리부는, 상기 음향신호를 상기 음성인식처리장치가 수신할 수 있는 통신신호로 변환하며, 상기 통신신호를 상기 음성인식처리장치로 전송하여, 상기 음성인식처리장치가 상기 통신신호로부터 변환되는 음향신호에 대해서만 음성 인식을 처리하도록 할 수 있다.Specifically, the acoustic signal processing section converts the acoustic signal into a communication signal that can be received by the voice recognition processing device, and transmits the communication signal to the voice recognition processing device, It is possible to process the speech recognition only for the acoustic signal that is converted from the signal.

구체적으로, 상기 합성처리부는, 상기 음향신호가 전송된 시점으로부터 상기 자막신호가 수신되는 시점까지의 시간인 자막신호대기시간, 및 상기 음성인식처리장치에서 상기 음성신호의 인식을 시작한 시점으로부터 상기 자막신호를 생성을 완료한 시점까지의 시간인 자막신호생성시간을 기초로 상기 합성영상신호를 생성할 수 있다.Specifically, the synthesis processing unit may be configured to determine, based on a subtitle signal waiting time that is a time from when the sound signal is transmitted to when the subtitle signal is received, The composite video signal can be generated on the basis of the subtitle signal generation time which is the time until the generation of the signal is completed.

구체적으로, 상기 합성영상신호생성부는, 상기 영상신호의 시작 프레임으로부터 상기 자막신호대기시간과 상기 자막신호생성시간의 차이만큼의 시간이 경과된 시점에 해당하는 상기 영상신호의 특정 프레임을 합성구간으로 지정하여, 상기 합성구간에 상기 자막신호를 삽입하는 방식으로 상기 합성영상신호를 생성할 수 있다.Specifically, the composite video signal generator generates a composite video signal by combining a specific frame of the video signal corresponding to a time point corresponding to a time difference between the caption signal wait time and the caption signal generation time from the start frame of the video signal, And the subtitle signal is inserted into the synthesis section, thereby generating the composite video signal.

구체적으로, 상기 자막신호는, 상기 음성신호에 포함되는 자연어말뭉치 단위로 수신되며, 상기 합성구간은, 자연어말뭉치 간 인식 시점의 차이가 임계시간 미만인 경우 서로 이웃한 순서의 자막신호가 함께 표시될 수 있도록 상기 특정 프레임으로부터 이웃한 다음 프레임으로 연장될 수 있다.Specifically, the subtitle signal is received by the natural language corpus unit included in the audio signal, and when the difference in the recognition time between the natural language corpora is less than the threshold time, the subtitle signals in the neighboring order are displayed together To the next frame neighboring the specific frame.

구체적으로, 상기 자막신호생성시간은, 상기 음성인식처리장치로부터 자연어말뭉치 단위의 상기 자막신호가 수신되는 시점마다 상기 음성인식처리장치로부터 수신될 수 있다.Specifically, the caption signal generation time may be received from the speech recognition processing device at each time when the caption signal of a natural language corpora is received from the speech recognition processing device.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 서비스수신장치의 동작 방법은, IPTV 서비스신호로부터 영상신호와 상기 영상신호와 매칭되는 음향신호를 분리하는 신호분리단계; 상기 음향신호를 음성인식처리장치로 전달하여 상기 음성인식장치에서 상기 음향신호 내 음성신호를 인식할 수 있도록 하는 음향신호처리단계; 상기 음성인식처리장치로부터 상기 음성신호를 인식한 결과인 자막신호를 수신하여, 상기 자막신호와 상기 영상신호를 합성한 합성영상신호를 생성하는 합성영상신호생성단계; 및 상기 합성영상신호를 상기 재생장치로 전송하여, 상기 재생장치에서 상기 합성영상신호의 재생에 따라 상기 영상신호와 상기 자막신호가 함께 표시되도록 하는 합성영상신호처리단계를 포함한다.According to an aspect of the present invention, there is provided a method of operating a service receiving apparatus, the method comprising: separating a video signal from an IPTV service signal and an acoustic signal matching the video signal; A sound signal processing step of transmitting the sound signal to a speech recognition processing device so that the speech recognition device can recognize the speech signal in the sound signal; A composite video signal generation step of receiving a caption signal resulting from recognizing the audio signal from the speech recognition processing device and generating a composite video signal in which the caption signal and the video signal are synthesized; And a synthesizing video signal processing step of transmitting the synthesized video signal to the reproducing apparatus and causing the reproducing apparatus to simultaneously display the video signal and the subtitle signal in accordance with the reproduction of the composite video signal.

구체적으로, 상기 음향신호처리단계는, 상기 음향신호를 상기 음성인식처리장치가 수신할 수 있는 통신신호로 변환하며, 변환된 상기 통신신호를 상기 음성인식처리장치로 전송하여, 상기 음성인식처리장치가 상기 통신신호로부터 변환되는 음향신호에 대해서만 음성 인식을 처리하도록 할 수 있다.Specifically, the acoustic signal processing step converts the acoustic signal into a communication signal that can be received by the voice recognition processing device, and transmits the converted communication signal to the voice recognition processing device, Can process the voice recognition only for the acoustic signal converted from the communication signal.

구체적으로, 상기 합성영상신호생성단계는, 상기 음향신호가 전송된 시점으로부터 상기 자막신호가 수신되는 시점까지의 시간인 자막신호대기시간, 및 상기 음성인식처리장치에서 상기 음성신호의 인식을 시작한 시점으로부터 상기 자막신호를 생성을 완료한 시점까지의 시간인 자막신호생성시간을 기초로 상기 합성영상신호를 생성할 수 있다.Specifically, the composite video signal generation step may include: a caption signal wait time which is a time from a time point at which the sound signal is transmitted to a time point at which the caption signal is received; and a time point at which recognition of the sound signal is started The subtitle signal generation time, which is the time from completion of generation of the subtitle signal to completion of generation of the subtitle signal, can be generated.

구체적으로, 상기 합성영상신호생성단계는, 상기 영상신호의 시작 프레임으로부터 상기 자막신호대기시간과 상기 자막신호생성시간의 차이만큼의 시간이 경과된 시점에 해당하는 상기 영상신호의 특정 프레임을 합성구간으로 지정하여, 상기 합성구간에 상기 자막신호를 삽입하는 방식으로 상기 합성영상신호를 생성할 수 있다.Specifically, the composite video signal generating step may include generating a composite video signal by adding a specific frame of the video signal, which corresponds to a time point corresponding to a time difference between the caption signal wait time and the caption signal generation time from the start frame of the video signal, , And the composite video signal can be generated by inserting the caption signal into the composite section.

이에, 본 발명에 따른 음성브리지장치 및 음성인식처리장치, 그리고 그 동작 방법에 의하면, IPTV 서비스신호를 중계하는 서비스수신장치(예: 셋탑박스)가 음성신호를 인식하는 별도의 음성인식처리장치와 연계하는 방식을 통해 IPTV 서비스신호 내 영상신호와 매칭되는 자막신호를 생성하여 제공함으로써, 청각장애인을 대상으로도 IPTV 서비스에 대한 만족도를 제고할 수 있다.According to another aspect of the present invention, there is provided a voice bridge apparatus, a voice recognition processing apparatus, and a method of operating the same, wherein a service receiving apparatus (e.g., a set top box) for relaying an IPTV service signal includes a separate voice recognition processing apparatus It is possible to improve the satisfaction of the IPTV service even for the hearing-impaired people by generating and providing the subtitle signal matched with the video signal in the IPTV service signal through the linking method.

도 1은 본 발명의 일 실시예에 따른 음성인식 처리 시스템의 개략적인 구성도.
도 2는 본 발명의 일 실시예에 따른 음성브리지장치의 개략적인 구성도.
도 3은 본 발명의 일 실시예에 따른 음성인식처리장치의 개략적인 구성도.
도 4는 본 발명의 일 실시예에 따른 음성브리지장치에서의 동작 흐름을 설명하기 위한 순서도.
도 5는 본 발명의 일 실시예에 따른 음성인식처리장치에서의 동작 흐름을 설명하기 위한 순서도.1 is a schematic configuration diagram of a speech recognition processing system according to an embodiment of the present invention;
FIG. 2 is a schematic configuration diagram of a voice bridge apparatus according to an embodiment of the present invention; FIG.
3 is a schematic configuration diagram of a speech recognition processing apparatus according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating an operation flow in a voice bridge apparatus according to an embodiment of the present invention; FIG.
5 is a flowchart for explaining an operation flow in a speech recognition processing apparatus according to an embodiment of the present invention;

이하, 첨부된 도면을 참조하여 본 발명의 일 실시예에 대하여 설명한다.Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 IPTV 서비스 시스템의 개략적인 구성을 보여주고 있다.1 shows a schematic configuration of an IPTV service system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 IPTV 서비스 시스템은, 서비스수신장치(10), 음성인식처리장치(20), 및 재생장치(30)를 포함하는 구성을 가질 수 있다.1, an IPTV service system according to an embodiment of the present invention may have a configuration including a service receiving apparatus 10, a voice recognition processing apparatus 20, and a reproducing apparatus 30 .

서비스수신장치(10)는 IPTV 서비스를 중계하기 위해 특정 장소(예: 댁내)에 설치되는 셋탑박스를 일컫는 것으로서, 공유기(도시안됨)를 통해 서비스 프로바이더(도시 안됨)가 제공하는 IPTV 서비스신호를 통해서 예컨대, 방송 컨텐츠, 인터넷 컨텐츠, VoD(Video on Demand) 서비스 등을 수신할 수 있다.The service receiving apparatus 10 refers to a set-top box installed in a specific place (for example, a home) to relay an IPTV service. The service receiving apparatus 10 receives an IPTV service signal provided by a service provider (not shown) via a router For example, broadcast contents, Internet contents, VoD (Video on Demand) services, and the like.

참고로, 전술한 서비스 프로바이더에는, 방송 서버(Broadcast Server), VoD 서버(VoD Server), 메타 데이터 서버(Metadata Server), 및 가입자 관리부(Subscriber Management Unit) 등이 포함될 수 있다.For reference, the service provider may include a broadcast server, a VoD server, a metadata server, and a subscriber management unit.

음성인식처리장치(20)는 음성신호를 인식하고, 음성신호 인식 결과에 따른 정해진 동작을 처리하는 일종의 인공지능장치를 일컫는다.The speech recognition processor 20 refers to a kind of artificial intelligence device that recognizes a speech signal and processes a predetermined operation according to a speech signal recognition result.

예를 들어, 음성인식처리장치(20)는 댁내에서 서비스수신장치(10)와 재생장치(20)와 공유기(도시안됨)를 통해서 연결되거나 또는 근거리통신(블루투스)를 통해서 연결되는 스피커(조명장치) 등의 형태로 구현될 수 있으며, 그 형태에 있어서 특별한 제한은 따르지 않는다.For example, the speech recognition processing apparatus 20 may be a speaker connected to the service receiving apparatus 10, the reproducing apparatus 20, a router (not shown) or a local communication (bluetooth) ), And the like, and there is no particular restriction on the form.

또한, 이러한 음성인식처리장치(20)에서의 이용하는 음성신호 인식 기술은, 예컨대, 딥러닝 기술 등에 기반한 공지의 기술이 채택될 수 있다.Further, as a speech signal recognition technique used in the speech recognition processing apparatus 20, a known technique based on, for example, a deep learning technique can be adopted.

재생장치(30)는 서비스수신장치(10)로부터 수신되는 IPTV 서비스신호를 재생하는 장치를 일컫는 것으로서, 디지털 TV가 이에 해당될 수 있으며, 이에 제한되는 것이 아닌 IPTV 서비스신호에 포함되는 영상신호와 음향신호를 재생할 수 있는 장치는 모두 포함될 수 있다.The playback apparatus 30 refers to a device that plays back an IPTV service signal received from the service receiving apparatus 10 and may be a digital TV. The playback apparatus 30 may include, but is not limited to, a video signal and an audio signal Any device capable of reproducing a signal can be included.

한편, 본 발명의 일 실시예에 따라 서비스수신장치(10)가 중계하게 되는 IPTV 서비스신호의 경우, 영상신호와 이러한 영상신호와 매칭되는 음향신호로 구분될 수 있다.In the case of the IPTV service signal relayed by the service receiving apparatus 10 according to an embodiment of the present invention, the video signal and the audio signal matching the video signal can be distinguished.

이에, 청각장애인의 경우 IPTV 서비스신호에서 영상신호의 시청만이 가능할 뿐 영상신호와 매칭되는 음향신호는 청취할 수 없는 실정이다.Therefore, in the case of the hearing-impaired person, only the video signal can be viewed in the IPTV service signal, but the acoustic signal matching the video signal can not be heard.

물론, 청각장애인을 위해 영상신호에 대한 자막 또는 자막에 준하는 정보를 제공하는 방송 서비스가 존재하기는 하나, 이때의 정보는 실시간 수화 또는 속기에 해당하는 것으로서 자막 또는 자막에 준하는 정보를 생성하기 위한 별도의 인원이 필요하다는 한계점이 있다.Of course, there is a broadcasting service that provides information corresponding to subtitles or subtitles for a video signal for a hearing-impaired person, but the information at this time corresponds to real-time sign language or shorthand, and is separate for generating subtitle or subtitle- There is a limit to the need for personnel.

이에, 본 발명의 일 실시예에서는 위 한계점을 해결하기 위해 IPTV 서비스신호 내 영상신호와 매칭되는 자막신호를 생성하여 제공할 수 잇는 새로운 방안을 제안하고자 하며, 이하에서는 이를 실현하기 위한 서비스수신장치(10)의 구성에 대해 보다 구체적으로 설명하기로 한다.Accordingly, in order to solve the above-mentioned limitation, an embodiment of the present invention proposes a new method for generating and providing a caption signal matched with a video signal in an IPTV service signal. In the following, a service receiving apparatus 10 will be described in more detail.

도 2는 본 발명의 일 실시예에 따른 서비스수신장치(10)의 개략적인 구성을 보여주고 있다.FIG. 2 shows a schematic configuration of a service receiving apparatus 10 according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 본 발명의 일 실시예에 따른 서비스수신장치(10)는 신호분리부(11), 음향신호처리부(12), 합성영상신호생성부(13), 및 합성영상신호처리부(14)를 포함하는 구성을 가질 수 있다.2, a service receiving apparatus 10 according to an embodiment of the present invention includes a signal separating unit 11, an audio signal processing unit 12, a synthesized video signal generating unit 13, And a processing unit 14, as shown in FIG.

이상의 서비스수신장치(10)의 구성 전체 내지는 적어도 일부는 하드웨어 모듈 형태 또는 소프트웨어 모듈 형태로 구현되거나, 하드웨어 모듈과 소프트웨어 모듈이 조합된 형태로도 구현될 수 있다.All or at least some of the configuration of the service receiving apparatus 10 may be implemented in the form of a hardware module or a software module, or a combination of a hardware module and a software module.

여기서, 소프트웨어 모듈이란, 서비스수신장치(10) 내에서 연산을 처리하는 프로세서에 의해 실행되는 명령어로 이해될 수 있으며, 이러한 명령어는 서비스수신장치(10) 내 메모리에 탑재된 형태를 가질 수 있다.Here, the software module may be understood as a command executed by a processor that processes an operation in the service receiving apparatus 10, and the command may have a form of being mounted in a memory in the service receiving apparatus 10. [

한편, 본 발명의 일 실시예에 따른 서비스수신장치(10)는 음성인식처리장치(20), 및 재생장치(30)와의 실질적인 통신 기능을 담당하는 RF 모듈인 통신부(15)의 구성을 더 포함할 수 있다.The service receiving apparatus 10 according to the embodiment of the present invention further includes a configuration of the communication unit 15 that is an RF module that performs a practical communication function with the speech recognition processing apparatus 20 and the reproducing apparatus 30 can do.

여기서, 통신부(15)는 예컨대, 안테나 시스템, RF 송수신기, 하나 이상의 증폭기, 튜너, 하나 이상의 발진기, 디지털 신호 처리기, 코덱(CODEC) 칩셋, 및 메모리 등을 포함하지만 이에 제한되지는 않으며, 이 기능을 수행하는 공지의 회로는 모두 포함할 수 있다.Here, the communication unit 15 includes, but is not limited to, an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chipset, But may include all known circuits to perform.

결국, 본 발명의 일 실시예에 따른 서비스수신장치(10)는 전술한 구성을 통해 영상신호와 매칭되는 자막신호 제공할 수 있는데, 이하에서는 이를 실현하기 위한 서비스수신장치(10) 내 각 구성에 대해 보다 구체적으로 설명하기로 한다.As a result, the service receiving apparatus 10 according to the embodiment of the present invention can provide a caption signal matched with a video signal through the above-described configuration. Hereinafter, More specifically, it will be described below.

신호분리부(11)는 IPTV 서비스신호를 분리하는 기능을 수행한다.The signal separator 11 separates the IPTV service signal.

보다 구체적으로, 신호분리부(11)는 서비스 프로바이더(도시안됨)로부터 수신되는 IPTV 서비스신호를 영상신호와 이러한 영상신호와 매칭되는 음향신호로 분리하게 된다.More specifically, the signal separating unit 11 separates an IPTV service signal received from a service provider (not shown) into a video signal and an acoustic signal matching the video signal.

여기서, 음향신호에는 본 발명의 일 실시예에 따라 인식의 대상이 되는 음성신호가 포함된다.Here, the audio signal includes a voice signal to be recognized according to an embodiment of the present invention.

이때, 신호분리부(11)는 서비스 프로바이더(도시안됨)로부터 수신되는 IPTV 서비스신호를 버퍼(도시안됨)에 임시 저장하게 되며, 이후 버퍼로부터 일정 크기(단위 크기)의 IPTV 서비스신호를 설정주기마다 추출하여 영상신호와 영상신호와 매칭되는 음향신호를 분리한다.At this time, the signal separator 11 temporarily stores an IPTV service signal received from a service provider (not shown) in a buffer (not shown). Then, the IPTV service signal of a predetermined size (unit size) And separates the video signal and the acoustic signal matched with the video signal.

여기서, 일정 크기(단위 크기)란 영상신호에 해당하는 일정 개수의 프레임으로 이해될 수 있다.Here, the predetermined size (unit size) can be understood as a certain number of frames corresponding to the video signal.

음향신호처리부(12)는 음향신호의 전송을 처리하는 기능을 수행한다.The sound signal processing unit 12 performs a function of processing the transmission of the sound signal.

보다 구체적으로, 음향신호처리부(12)는 IPTV 서비스신호로부터 영상신호와 음향신호가 분리되면, 분리된 음향신호를 음성인식처리장치(20)로 전송하여 음성인식장치(20)로 하여금 음향신호 내 포함되어 있는 음성신호를 인식할 수 있도록 한다.More specifically, when the video signal and the audio signal are separated from the IPTV service signal, the audio signal processing unit 12 transmits the separated audio signal to the voice recognition processing unit 20, So that it can recognize the included voice signal.

이때, 음향신호처리부(12)는 IPTV 서비스신호로부터 분리된 음향신호를 음성인식처리장치(20)가 수신할 수 있는 통신신호로 변환하고, 이처럼 변환된 통신신호를 음성인식처리장치(20)로 전송하여 음향신호 내 음성신호에 대한 인식을 요청할 수 있다.At this time, the acoustic signal processing unit 12 converts the acoustic signal separated from the IPTV service signal into a communication signal that can be received by the speech recognition processing unit 20, and transmits the converted communication signal to the speech recognition processing unit 20 To request recognition of the voice signal in the acoustic signal.

이와 관련하여, 음성인식처리장치(20)에서는 서비스수신장치(10)로부터 수신되는 통신신호를 다시 음향신호로 변환하고, 변환된 음향신호로부터 음성신호를 인식할 수 있으며, 음성신호에 대한 인식 결과로서 자막신호를 생성하게 된다.In this regard, in the speech recognition processing apparatus 20, the communication signal received from the service receiving apparatus 10 is converted into a sound signal again, the sound signal can be recognized from the converted sound signal, A subtitle signal is generated.

여기서, 음성인식처리장치(20)는 집중도 향상 즉, 주변에서 유입되는 타 음성신호에 대한 인식을 처리하지 않기 위한 목적으로 서비스수신장치(10)로부터 통신신호 수신 시에 통신신호로부터 변환되는 음향신호에 대해서만 음성신호 인식을 처리하게 된다.Here, the speech recognition processing apparatus 20 is configured to receive the acoustic signal (acoustic signal) converted from the communication signal at the time of receiving the communication signal from the service receiving apparatus 10 for the purpose of improving the concentration, that is, Only the voice signal is recognized.

한편, 음향신호처리부(12)는 영상신호와 자막신호의 동기화를 목적으로 음향신호를 음성인식처리장치(20)로 전송한 시점을 기록하여 관리하며, 이러한 음향신호의 전송 동작은, 버퍼로부터 IPTV 서비스신호가 분리되는 전술한 설정주기마다 반복된다.The sound signal processing unit 12 records and manages a time point at which the sound signal is transmitted to the voice recognition processor 20 for the purpose of synchronizing the video signal and the caption signal. The service signal is repeated every set up period in which the service signal is separated.

합성영상신호생성부(13)는 자막신호와 영상신호를 합성한 합성영상신호를 생성하는 기능을 수행한다.The composite video signal generation unit 13 generates a composite video signal by combining the caption signal and the video signal.

보다 구체적으로, 합성영상신호생성부(13)는 음성인식처리장치(20)로부터 음성신호를 인식한 결과인 자막신호를 수신되는 경우, 수신된 자막신호와 영상신호를 합성한 합성영상신호를 생성하게 된다.More specifically, when the subtitle signal resulting from recognizing the audio signal from the speech recognition processor 20 is received, the composite video signal generator 13 generates a composite video signal obtained by synthesizing the received subtitle signal and the video signal .

이때, 합성영상신호생성부(13)는 음향신호가 전달된 시점으로부터 상기 자막신호가 수신되는 시점까지의 시간인 자막신호대기시간과 음성인식처리장치(20)에서 음성신호의 인식을 시작한 시점으로부터 자막신호를 생성을 완료한 시점까지의 시간인 자막신호생성시간을 이용하여 합성영상신호를 생성하게 된다.At this time, the composite video signal generation unit 13 generates a composite video signal based on the subtitle signal waiting time, which is the time from when the sound signal is delivered to when the caption signal is received, and when it starts to recognize the speech signal in the speech recognition processor 20 The composite video signal is generated using the caption signal generation time which is the time until the generation of the caption signal is completed.

여기서, 자막신호생성시간은, 음성인식처리장치(20)로부터 자막신호가 수신될 때마다 음성인식처리장치(20)로부터 수신되며, 이처럼 수신되는 자막신호는 음성신호에 포함된 자연어말뭉치 단위로 수신된다.Here, the caption signal generation time is received from the speech recognition processing device 20 every time a caption signal is received from the speech recognition processing device 20, and the received caption signal is received by the natural language corpora do.

즉, 합성영상신호생성부(13)는 도 3에 도시된 바와 같이, 전송된 음향신호와 매칭되는 영상신호의 시작 프레임으로부터 자막신호대기시간과 자막신호생성시간의 차이만큼의 시간이 경과된 시점에 해당하는 특정 프레임(a)을 합성구간으로 지정하고, 지정된 합성구간에 대해 자막신호를 삽입하는 방식을 통해서 합성영상신호를 생성하게 된다.That is, as shown in FIG. 3, the composite video signal generation unit 13 generates a composite video signal at a timing when a time corresponding to the difference between the caption signal latency and the caption signal generation time elapses from the start frame of the video signal matched with the transmitted acoustic signal A composite video signal is generated by designating a specific frame a as a composite section and inserting a caption signal in a specified composite section.

이때, 합성영상신호생성부(13)는 전송된 음향신호와 매칭되는 영상신호의 특정 프레임에 대해 합성구간을 지정함에 있어서, 자연어말뭉치 간 인식 시점의 차이가 임계시간 미만인 경우 서로 이웃한 순서의 자막신호가 함께 표시될 수 있도록 위 합성구간을 상기 특정 프레임(a)으로부터 이웃한 다음 프레임으로 연장하여 지정할 수 있다.At this time, in designating the synthesis section for a specific frame of the video signal matched with the transmitted sound signal, the composite video signal generation section 13 generates a subtitle The synthesizing section may be extended from the specific frame (a) to the next frame adjacent thereto so that the signals can be displayed together.

이는, 자연어말뭉치 간 인식 시점의 차이가 임계시간 미만인 경우가 연속되는 경우, 합성구간으로 지정되는 영상신호의 프레임 구간이 그에 상응하는 길이만큼 연장될 수 있다는 의미로 이해될 수 있다.This can be understood to mean that the frame interval of the video signal designated as the synthesis interval can be extended by the corresponding length if the difference in the recognition time of the natural language corpora is less than the threshold time.

이처럼 자연어말뭉치 간 인식 시점의 차이가 임계시간 미만인 경우 서로 이웃한 순서의 자막신호가 함께 표시될 수 있도록 합성구간을 연장하는 것은, 자막신호의 노출(표시) 연속성을 확보하여 유의미한 의미 전달을 가능케 하기 위함이다.If the difference in recognition time between natural language corpora is less than the threshold time, extending the synthesis interval so that adjacent subtitle signals can be displayed together enables ensuring the continuity of exposure (display) of the subtitle signal, It is for this reason.

합성영상신호처리부(14)는 합성영상신호를 전송하는 기능을 수행한다.The composite video signal processing unit 14 performs a function of transmitting a composite video signal.

보다 구체적으로, 합성영상신호처리부(14)는 영상신호와 자막신호를 합성한 합성영상신호가 생성되면, 생성된 합성영상신호를 재생장치(30)로 전송하여, 재생장치(30)에서 합성영상신호의 재생에 따라 영상신호와 자막신호가 함께 표시되도록 한다.More specifically, the composite video signal processing unit 14 transmits the generated composite video signal to the playback apparatus 30 when the composite video signal obtained by synthesizing the video signal and the caption signal is generated, The video signal and the caption signal are displayed together with the reproduction of the signal.

이때 합성영상신호가 재생장치(30)로 전송되는 시점은, 원활한 자막신호합성제공을 위해서 합성영상신호로의 합성이 완료된 영상신호의 크기가 일정크기(일정 프레임 개수) 이상인 경우로 지정될 수 있다. 다만, 영상신호의 크기가 커질수록 서비스 프로바이더(도시안됨)로부터 수신되는 IPTV 서비스신호를 실시간 재생하는 것보다 지연시간이 길어질 수 있음은 물론이다.At this time, the timing at which the composite video signal is transmitted to the playback apparatus 30 may be designated as a case where the size of the video signal that has been synthesized into the composite video signal is greater than or equal to a predetermined size (constant frame number) . However, as the size of the video signal increases, the delay time may be longer than that of real-time playback of the IPTV service signal received from the service provider (not shown).

이하에서는, 도 4를 참조하여 본 발명의 일 실시예에 따른 서비스수신장치(10)에서의 동작 흐름에 대해 설명하기로 한다.Hereinafter, an operation flow in the service receiving apparatus 10 according to an embodiment of the present invention will be described with reference to FIG.

여기서, 도 4는 본 발명의 일 실시예에 따른 서비스수신장치(10)에서의 동작 흐름을 설명하기 위한 개략적인 순서도이다.Here, FIG. 4 is a schematic flowchart for explaining an operational flow in the service receiving apparatus 10 according to an embodiment of the present invention.

먼저, 신호분리부(11)는 서비스 프로바이더(도시안됨)로부터 수신되는 IPTV 서비스신호를 영상신호와 이러한 영상신호와 매칭되는 음향신호로 분리한다(S11-S13).First, the signal separating unit 11 separates an IPTV service signal received from a service provider (not shown) into a video signal and an acoustic signal matching the video signal (S11-S13).

이때, 신호분리부(11)는 서비스 프로바이더(도시안됨)로부터 수신되는 IPTV 서비스신호를 버퍼(도시안됨)에 임시 저장하게 되며, 이후 버퍼로부터 일정 크기(단위 크기)의 IPTV 서비스신호를 설정주기마다 추출하여 영상신호와 영상신호와 매칭되는 음향신호를 분리할 수 있다.At this time, the signal separator 11 temporarily stores an IPTV service signal received from a service provider (not shown) in a buffer (not shown). Then, the IPTV service signal of a predetermined size (unit size) So that the video signal and the audio signal matching the video signal can be separated.

그리고 나서, 음향신호처리부(12)는 IPTV 서비스신호로부터 영상신호와 음향신호가 분리되면, 분리된 음향신호를 음성인식처리장치(20)로 전송하여 음성인식장치(20)로 하여금 음향신호 내 포함되어 있는 음성신호를 인식할 수 있도록 한다(S14-S15).Then, when the video signal and the sound signal are separated from the IPTV service signal, the sound signal processing unit 12 transmits the separated sound signal to the sound recognition processing unit 20 so that the sound recognition unit 20 stores the sound signal in the sound signal (S14-S15).

나아가, 합성영상신호생성부(13)는 음성인식처리장치(20)로부터 음성신호를 인식한 결과인 자막신호를 수신되는 경우, 수신된 자막신호와 영상신호를 합성한 합성영상신호를 생성한다(S16-S18).Further, when receiving the caption signal as a result of recognizing the voice signal from the voice recognition processor 20, the composite video signal generator 13 generates a composite video signal obtained by synthesizing the received caption signal and the video signal ( S16-S18).

즉, 합성영상신호생성부(13)는 전송된 음향신호와 매칭되는 영상신호의 시작 프레임으로부터 자막신호대기시간과 자막신호생성시간의 차이만큼의 시간이 경과된 시점에 해당하는 특정 프레임을 합성구간으로 지정하고, 지정된 합성구간에 대해 자막신호를 삽입하는 방식을 통해서 합성영상신호를 생성하게 된다.That is, the composite video signal generator 13 generates a composite video signal corresponding to a time point at which a time corresponding to the difference between the caption signal waiting time and the caption signal generation time elapses from the start frame of the video signal matched with the transmitted sound signal, , And a composite video signal is generated by a method of inserting a caption signal for a designated composite section.

이때, 합성영상신호생성부(13)는 전송된 음향신호와 매칭되는 영상신호의 특정 프레임에 대해 합성구간을 지정함에 있어서, 자연어말뭉치 간 인식 시점의 차이가 임계시간 미만인 경우 서로 이웃한 순서의 자막신호가 함께 표시될 수 있도록 위 합성구간을 상기 특정 프레임으로부터 이웃한 다음 프레임으로 연장하여 지정할 수 있다.At this time, in designating the synthesis section for a specific frame of the video signal matched with the transmitted sound signal, the composite video signal generation section 13 generates a subtitle The synthesizing section may be extended from the specific frame to the next frame adjacent to the specific frame so that the signal can be displayed together.

이후, 합성영상신호처리부(14)는 영상신호와 자막신호를 합성한 합성영상신호가 생성되면, 생성된 합성영상신호를 재생장치(30)로 전송하여, 재생장치(30)에서 합성영상신호의 재생에 따라 영상신호와 자막신호가 함께 표시되도록 한다(S19).Thereafter, when the composite video signal obtained by synthesizing the video signal and the caption signal is generated, the composite video signal processing unit 14 transmits the generated composite video signal to the playback apparatus 30, The video signal and the caption signal are displayed together with the reproduction (S19).

이때, 합성영상신호가 재생장치(30)로 전송되는 시점은, 원활한 자막신호합성제공을 위해서 합성영상신호로의 합성이 완료된 영상신호의 크기가 일정크기(일정 프레임 개수) 이상인 경우로 지정될 수 있다. 다만, 영상신호의 크기가 커질수록 서비스 프로바이더(도시안됨)로부터 수신되는 IPTV 서비스신호를 실시간 재생하는 것보다 지연시간이 길어질 수 있음은 물론이다.At this time, the time when the composite video signal is transmitted to the playback apparatus 30 may be designated as a case where the size of the video signal that has been synthesized into the composite video signal is greater than or equal to a certain size (a predetermined number of frames) have. However, as the size of the video signal increases, the delay time may be longer than that of real-time playback of the IPTV service signal received from the service provider (not shown).

이상에서 살펴본 바와 같이 본 발명의 일 실시예에 따른 서비스수신장치(10) 및 서비스수신장치(10)에서의 동작 흐름에 따르면, IPTV 서비스신호를 중계하는 서비스수신장치(10)가 음성신호를 인식하는 별도의 음성인식처리장치(20)와 연계하는 방식을 통해 IPTV 서비스신호 내 영상신호와 매칭되는 자막신호를 생성하여 제공함으로써, 청각장애인을 대상으로도 IPTV 서비스에 대한 만족도를 제고할 수 있음을 알 수 있다.As described above, according to the operation flow in the service receiving apparatus 10 and the service receiving apparatus 10 according to the embodiment of the present invention, the service receiving apparatus 10 relaying the IPTV service signal recognizes the voice signal It is possible to improve the satisfaction of the IPTV service even for a hearing-impaired person by providing a subtitle signal matching with a video signal in the IPTV service signal through a method of associating with a separate speech recognition processing device 20 Able to know.

한편, 본 명세서에서 설명하는 기능적인 동작과 주제의 구현물들은 디지털 전자 회로로 구현되거나, 본 명세서에서 개시하는 구조 및 그 구조적인 등가물들을 포함하는 컴퓨터 소프트웨어, 펌웨어 혹은 하드웨어로 구현되거나, 이들 중 하나 이상의 결합으로 구현 가능하다.　 본 명세서에서 설명하는 주제의 구현물들은 하나 이상의 컴퓨터 프로그램 제품, 다시 말해 제어 시스템의 동작을 제어하기 위하여 혹은 이것에 의한 실행을 위하여 유형의 프로그램 저장매체 상에 인코딩된 컴퓨터 프로그램 명령에 관한 하나 이상의 모듈로서 구현될 수 있다.It should be understood that the functional operations and subject matter implementations described herein may be implemented as digital electronic circuitry, or may be embodied in computer software, firmware, or hardware, including the structures disclosed herein, and structural equivalents thereof, . Implementations of the subject matter described herein may be implemented as one or more computer program products, i. E. As one or more modules for computer program instructions encoded on a program storage medium of the type for control of, or for execution by, the operation of the control system Can be implemented.

컴퓨터로 판독 가능한 매체는 기계로 판독 가능한 저장 장치, 기계로 판독 가능한 저장 기판, 메모리 장치, 기계로 판독 가능한 전파형 신호에 영향을 미치는 물질의 조성물 혹은 이들 중 하나 이상의 조합일 수 있다.The computer-readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter that affects the machine readable propagation type signal, or a combination of one or more of the foregoing.

본 명세서에서 "시스템"이나 "장치"라 함은 예컨대 프로그래머블 프로세서, 컴퓨터 혹은 다중 프로세서나 컴퓨터를 포함하여 데이터를 제어하기 위한 모든 기구, 장치 및 기계를 포괄한다. 제어 시스템은, 하드웨어에 부가하여, 예컨대 프로세서 펌웨어를 구성하는 코드, 프로토콜 스택, 데이터베이스 관리 시스템, 운영 체제 혹은 이들 중 하나 이상의 조합 등 요청 시 컴퓨터 프로그램에 대한 실행 환경을 형성하는 코드를 포함할 수 있다.As used herein, the term " system "or" device "encompasses any apparatus, apparatus, and machine for controlling data, including, for example, a programmable processor, a computer or a multiprocessor or computer. The control system may, in addition to the hardware, comprise code that forms an execution environment for a computer program upon request, such as code comprising a processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of these .

컴퓨터 프로그램(프로그램, 소프트웨어, 소프트웨어 어플리케이션, 스크립트 혹은 코드로도 알려져 있음)은 컴파일되거나 해석된 언어나 선험적 혹은 절차적 언어를 포함하는 프로그래밍 언어의 어떠한 형태로도 작성될 수 있으며, 독립형 프로그램이나 모듈, 컴포넌트, 서브루틴 혹은 컴퓨터 환경에서 사용하기에 적합한 다른 유닛을 포함하여 어떠한 형태로도 전개될 수 있다. 컴퓨터 프로그램은 파일 시스템의 파일에 반드시 대응하는 것은 아니다. 프로그램은 요청된 프로그램에 제공되는 단일 파일 내에, 혹은 다중의 상호 작용하는 파일(예컨대, 하나 이상의 모듈, 하위 프로그램 혹은 코드의 일부를 저장하는 파일) 내에, 혹은 다른 프로그램이나 데이터를 보유하는 파일의 일부(예컨대, 마크업 언어 문서 내에 저장되는 하나 이상의 스크립트) 내에 저장될 수 있다. 컴퓨터 프로그램은 하나의 사이트에 위치하거나 복수의 사이트에 걸쳐서 분산되어 통신 네트워크에 의해 상호 접속된 다중 컴퓨터나 하나의 컴퓨터 상에서 실행되도록 전개될 수 있다.A computer program (also known as a program, software, software application, script or code) may be written in any form of programming language, including compiled or interpreted language, a priori or procedural language, Components, subroutines, or other units suitable for use in a computer environment. A computer program does not necessarily correspond to a file in the file system. The program may be stored in a single file provided to the requested program, or in multiple interactive files (e.g., a file storing one or more modules, subprograms, or portions of code) (E.g., one or more scripts stored in a markup language document). A computer program may be deployed to run on multiple computers or on one computer, located on a single site or distributed across multiple sites and interconnected by a communications network.

한편, 컴퓨터 프로그램 명령어와 데이터를 저장하기에 적합한 컴퓨터로 판독 가능한 매체는, 예컨대 EPROM, EEPROM 및 플래시메모리 장치와 같은 반도체 메모리 장치, 예컨대 내부 하드디스크나 외장형 디스크와 같은 자기 디스크, 자기광학 디스크 및 CD-ROM과 DVD-ROM 디스크를 포함하여 모든 형태의 비휘발성 메모리, 매체 및 메모리 장치를 포함할 수 있다. 프로세서와 메모리는 특수 목적의 논리 회로에 의해 보충되거나, 그것에 통합될 수 있다.On the other hand, computer readable media suitable for storing computer program instructions and data include semiconductor memory devices such as, for example, EPROM, EEPROM and flash memory devices, such as magnetic disks such as internal hard disks or external disks, Non-volatile memory, media and memory devices, including ROM and DVD-ROM disks. The processor and memory may be supplemented by, or incorporated in, special purpose logic circuits.

본 명세서에서 설명한 주제의 구현물은 예컨대 데이터 서버와 같은 백엔드 컴포넌트를 포함하거나, 예컨대 어플리케이션 서버와 같은 미들웨어 컴포넌트를 포함하거나, 예컨대 사용자가 본 명세서에서 설명한 주제의 구현물과 상호 작용할 수 있는 웹 브라우저나 그래픽 유저 인터페이스를 갖는 클라이언트 컴퓨터와 같은 프론트엔드 컴포넌트 혹은 그러한 백엔드, 미들웨어 혹은 프론트엔드 컴포넌트의 하나 이상의 모든 조합을 포함하는 연산 시스템에서 구현될 수도 있다. 시스템의 컴포넌트는 예컨대 통신 네트워크와 같은 디지털 데이터 통신의 어떠한 형태나 매체에 의해서도 상호 접속 가능하다.Implementations of the subject matter described herein may include, for example, a back-end component such as a data server, or may include a middleware component, such as an application server, or may be a web browser or a graphical user, for example a user, who may interact with an implementation of the subject- Front-end components such as client computers with interfaces, or any combination of one or more of such back-end, middleware, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication, such as, for example, a communications network.

본 명세서는 다수의 특정한 구현물의 세부사항들을 포함하지만, 이들은 어떠한 발명이나 청구 가능한 것의 범위에 대해서도 제한적인 것으로서 이해되어서는 안되며, 오히려 특정한 발명의 특정한 실시형태에 특유할 수 있는 특징들에 대한 설명으로서 이해되어야 한다. 마찬가지로, 개별적인 실시형태의 문맥에서 본 명세서에 기술된 특정한 특징들은 단일 실시형태에서 조합하여 구현될 수도 있다. 반대로, 단일 실시형태의 문맥에서 기술한 다양한 특징들 역시 개별적으로 혹은 어떠한 적절한 하위 조합으로도 복수의 실시형태에서 구현 가능하다. 나아가, 특징들이 특정한 조합으로 동작하고 초기에 그와 같이 청구된 바와 같이 묘사될 수 있지만, 청구된 조합으로부터의 하나 이상의 특징들은 일부 경우에 그 조합으로부터 배제될 수 있으며, 그 청구된 조합은 하위 조합이나 하위 조합의 변형물로 변경될 수 있다.While the specification contains a number of specific implementation details, it should be understood that they are not to be construed as limitations on the scope of any invention or claim, but rather on the description of features that may be specific to a particular embodiment of a particular invention Should be understood. Likewise, the specific features described herein in the context of separate embodiments may be implemented in combination in a single embodiment. Conversely, various features described in the context of a single embodiment may also be implemented in multiple embodiments, either individually or in any suitable subcombination. Further, although the features may operate in a particular combination and may be initially described as so claimed, one or more features from the claimed combination may in some cases be excluded from the combination, Or a variant of a subcombination.

또한, 본 명세서에서는 특정한 순서로 도면에서 동작들을 묘사하고 있지만, 이는 바람직한 결과를 얻기 위하여 도시된 그 특정한 순서나 순차적인 순서대로 그러한 동작들을 수행하여야 한다거나 모든 도시된 동작들이 수행되어야 하는 것으로 이해되어서는 안 된다. 특정한 경우, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다. 또한, 상술한 실시형태의 다양한 시스템 컴포넌트의 분리는 그러한 분리를 모든 실시형태에서 요구하는 것으로 이해되어서는 안되며, 설명한 프로그램 컴포넌트와 시스템들은 일반적으로 단일의 소프트웨어 제품으로 함께 통합되거나 다중 소프트웨어 제품에 패키징될 수 있다는 점을 이해하여야 한다It is also to be understood that although the present invention is described herein with particular sequence of operations in the drawings, it is to be understood that it is to be understood that it is to be understood that all such illustrated acts have to be performed or that such acts must be performed in their particular order or sequential order, Can not be done. In certain cases, multitasking and parallel processing may be advantageous. Also, the separation of the various system components of the above-described embodiments should not be understood as requiring such separation in all embodiments, and the described program components and systems will generally be integrated together into a single software product or packaged into multiple software products It should be understood that

이와 같이, 본 명세서는 그 제시된 구체적인 용어에 본 발명을 제한하려는 의도가 아니다. 따라서, 상술한 예를 참조하여 본 발명을 상세하게 설명하였지만, 당업자라면 본 발명의 범위를 벗어나지 않으면서도 본 예들에 대한 개조, 변경 및 변형을 가할 수 있다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.As such, the present specification is not intended to limit the invention to the specific terminology presented. Thus, while the present invention has been described in detail with reference to the above examples, those skilled in the art will be able to make adaptations, modifications, and variations on these examples without departing from the scope of the present invention. The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

본 발명에 일 실시예에 따른 서비스수신장치, 및 그 동작 방법에 따르면, 서비스수신장치(셋탑박스)가 음성신호를 인식하는 별도의 음성인식처리장치와 연계하는 방식을 통해 IPTV 서비스신호 내 영상신호와 매칭되는 자막신호를 생성하여 제공할 수 있다는 점에서 기존 기술의 한계를 뛰어 넘음에 따라 관련 기술에 대한 이용만이 아닌 적용되는 장치의 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있는 정도이므로 산업상 이용가능성이 있는 발명이다.According to the service receiving apparatus and the operation method thereof according to an embodiment of the present invention, a service receiving apparatus (set-top box) is connected to a separate voice recognition processing apparatus for recognizing a voice signal, It is possible to generate the subtitle signal matching with the subtitle signal and to provide the subtitle signal. Therefore, the present invention is not limited to the use of the related art, It is an invention that is industrially usable.

10: 서비스수신장치
11: 신호분리부 12: 음향신호처리부
13: 합성영상신호생성부 14: 함성영상신호처리부
20: 음성인식처리장치
30: 재생장치10: Service receiving device
11: Signal separator 12: Acoustic signal processor
13: Composite video signal generator 14: Composite video signal processor
20: Speech recognition processing device
30: playback device

Claims

A signal separator for separating a video signal from an IPTV service signal and an acoustic signal matching the video signal;
An acoustic signal processor for transmitting the acoustic signal to a speech recognition processor to recognize the speech signal in the acoustic signal in the speech recognition processor;
A composite video signal generator for receiving a caption signal resulting from recognizing the audio signal from the speech recognition processor and generating a composite video signal obtained by synthesizing the caption signal and the video signal; And
And a composite video signal processor for transmitting the synthesized video signal to a reproducing apparatus and causing the reproducing apparatus to display the video signal and the caption signal together with reproduction of the composite video signal,
Wherein the composite video signal processor comprises:
A caption signal waiting time which is a time from when the sound signal is transmitted to when the caption signal is received and a time point from when the speech recognition processor starts to recognize the voice signal until the time when the caption signal is generated And the subtitle signal generation unit generates the composite video signal based on a subtitle signal generation time which is a time of the subtitle signal.

The method according to claim 1,
Wherein the acoustic signal processor comprises:
Converting the sound signal into a communication signal that can be received by the voice recognition processing apparatus and transmitting the communication signal to the voice recognition processing apparatus so that the voice recognition processing apparatus can voice only And recognizes the received service.

delete

The method according to claim 1,
Wherein the composite video signal generator comprises:
Designates a specific frame of the video signal corresponding to a time point when a time corresponding to a difference between the caption signal wait time and the caption signal generation time elapses from a start frame of the video signal as a synthesis section, And generates the composite video signal by inserting the composite video signal.

5. The method of claim 4,
Wherein the subtitle signal comprises:
Wherein the speech signal is received by a natural language corpus unit included in the speech signal,
In the synthesis section,
Wherein when the difference in recognition time between natural language corpora is less than a threshold time, the subtitle signal is extended from the specific frame to a neighboring next frame so that adjacent subtitle signals can be displayed together.

The method according to claim 1,
The subtitle signal generation time may be,
And said speech recognition processing device receives said speech signal from said speech recognition processing device at each time when said subtitle signal in natural language corpus is received from said speech recognition processing device.

A signal separation step of separating a video signal from an IPTV service signal and an acoustic signal matching the video signal;
A sound signal processing step of transmitting the sound signal to a speech recognition processing device so that the speech recognition processing device can recognize a speech signal in the sound signal;
A composite video signal generation step of receiving a caption signal resulting from recognizing the audio signal from the speech recognition processing device and generating a composite video signal in which the caption signal and the video signal are synthesized; And
And a synthesized video signal processing step of transmitting the synthesized video signal to a reproduction apparatus so that the video signal and the caption signal are displayed together with reproduction of the synthesized video signal in the reproduction apparatus,
The composite video signal generation step may include:
A caption signal waiting time which is a time from when the sound signal is transmitted to when the caption signal is received and a time point from when the speech recognition processor starts to recognize the voice signal until the time when the caption signal is generated Wherein the composite video signal is generated based on a subtitle signal generation time that is a time of the subtitle signal.

8. The method of claim 7,
Wherein the acoustic signal processing step comprises:
Converting the sound signal into a communication signal that can be received by the voice recognition processing device and transmitting the converted communication signal to the voice recognition processing device so that the voice recognition processing device And the voice recognition is performed only for the voice recognition unit.

delete

8. The method of claim 7,
The composite video signal generation step may include:
Designates a specific frame of the video signal corresponding to a time point when a time corresponding to a difference between the caption signal wait time and the caption signal generation time elapses from a start frame of the video signal as a synthesis section, And generating the composite video signal by inserting the composite video signal.

11. The method of claim 10,
Wherein the subtitle signal comprises:
Wherein the speech signal is received by a natural language corpus unit included in the speech signal,
In the synthesis section,
Wherein when the difference in the recognition time between natural language corpora is less than the threshold time, the subtitle signal is extended from the specific frame to a neighboring next frame so that adjacent subtitle signals can be displayed together.

8. The method of claim 7,
The subtitle signal generation time may be,
And the audio signal is received from the speech recognition processing device at each time when the caption signal of a natural language corpora is received from the speech recognition processing device.