KR20000008162A

KR20000008162A - Voice-character converting apparatus

Info

Publication number: KR20000008162A
Application number: KR1019980027874A
Authority: KR
Inventors: 김선우
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1998-07-10
Filing date: 1998-07-10
Publication date: 2000-02-07

Abstract

PURPOSE: The apparatus recognizes the voice of the other part and displays the voice in characters in a video conference where a deaf person attends. CONSTITUTION: The apparatus includes; a video coder-decoder(210) compressing an image signal from the external or restoring to original image signal; an audio coder-decoder(230) compressing a voice signal from the external or restoring to original voice signal; a multiplexor/demultiplexor(250) multiplexing the compressed image signal and voice signal or demultiplexing the restored image signal and voice signal; a voice-character conversion part converting the demultiplexed voice signal into a character signal; and a network connection part(260) to transmit/receive the multiplexed/demultiplexed signal with the external. The system enables the deaf person to communicate with the other part.

Description

Speech-to-text converter

본 발명은 디스플레이 장치에 관한 것으로서, 보다 상세하게는 화상회의 장치(VCS : Video Conferencing System)에서 상대방의 음성을 인식하여 문자로 표시하는 음성-문자 변환장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a display device, and more particularly, to a voice-to-text converter for recognizing and displaying a text of a counterpart in a video conferencing system (VCS).

일반적으로 VCS라는 약자로 불리는 화상 회의 시스템은 원거리에 떨어진 사람들이 통신망으로 연결된 컴퓨터를 사용하여 실시간에 비디오와 오디오 그리고 회의에 사용되는 여러 가지 정보를 주고받으면서 마치 얼굴을 맞대고 앉아 회의를 하는 것처럼 진행시킬 수 있는 시스템이다.Video conferencing systems, commonly abbreviated as VCS, allow remote people to use a computer connected to a network to exchange video, audio and other information used in meetings in real time, as if sitting face to face. It can be a system.

도 1은 종래의 음성-문자 변환장치의 구성을 보이는 블록도 이다.1 is a block diagram showing the configuration of a conventional speech-to-text converter.

종래의 화상회의 장치에서는 비디오 인터페이스(Video Interface)(100) 및 오디오 인터페이스(Audio Interface)(110)를 통하여 화상신호 및 음성신호가 입력된다. 비디오 부호기-복호기(CODEC : COder-DECoder)(120) 및 오디오 부호기-복호기(130)는 비디오 인터페이스(100) 및 오디오 인터페이스(110)를 통하여 입력된 화상 및 음성신호를 압축하여 하나의 데이터 프레임으로 멀티플렉서/디멀티플렉서(Multiplexer/Demultiplxer)(140)에서 다중화 하고 네트워크 인터페이스(Network Interface)(150)를 통하여 상대방에게 전송한다. 수신측에서는 수신된 신호를 역 다중화 하여 각각의 화상신호와 음성신호로 분리하여 원래의 신호로 복원함으로써 화상 및 음성 데이터 전송이 가능하다. 시스템 제어부(160)는 다중화시 또는 역다중화 시에 멀티플렉서/디멀티플렉서로 입/출력되는 화상신호 및 음성신호의 합성 또는 분리를 제어한다. 이를 이용하여 화상회의를 하는 경우에는 비디오 장치 및 오디오 장치를 이용하여 상대방의 모습 및 음성 데이터를 수신하여 화상을 보면서 음성을 이용한 회의가 가능하다.In a conventional videoconferencing apparatus, video signals and audio signals are inputted through a video interface 100 and an audio interface 110. The video coder-decoder (CODEC: COder-DECoder) 120 and the audio coder-decoder 130 compress the image and audio signals input through the video interface 100 and the audio interface 110 into one data frame. Multiplexing is performed in the multiplexer / demultiplxer 140 and transmitted to the other party through the network interface 150. On the receiving side, the received signal is demultiplexed, separated into respective image signals and audio signals, and restored to the original signal, thereby enabling image and audio data transmission. The system controller 160 controls the synthesis or separation of the image signal and the audio signal input / output to the multiplexer / demultiplexer when multiplexing or demultiplexing. In the case of video conferencing using this, a video conference and a voice conference can be received using a video device and an audio device while viewing a video.

그러나 종래와 같은 화상회의 장치로는 청각 장애가 있는 사용자가 화상회의에 참여하고자 하는 경우 상대방의 음성을 인식하지 못하기 때문에 상호간의 통신이 불가능한 문제점이 있었다.However, the conventional videoconferencing device has a problem in that communication with each other is impossible because the hearing impaired user does not recognize the other party's voice in order to participate in the videoconference.

본 발명이 이루고자 하는 기술적인 과제는 청각 장애인이 참석한 화상회의에서 상대방의 음성을 인식하여 문자로 표시하는 음성-문자 변환장치를 제공하는데 있다.The technical problem to be achieved by the present invention is to provide a speech-to-text converter for recognizing the voice of the other party in a video conference attended by the deaf.

도 2는 본 발명에 따른 음성-문자 변환장치의 구성을 보이는 블록도 이다.2 is a block diagram showing the configuration of a voice-to-text converter according to the present invention.

본 발명이 이루고자 하는 기술적인 과제를 해결하기 위한 음성-문자 변환장치는 화상회의 시스템에 있어서, 외부로부터 입력되는 화상신호를 압축하거나 원래의 화상신호로 복원하는 비디오 부호기-복호기; 외부로부터 입력되는 음성신호를 압축하거나 원래의 음성신호로 복원하는 오디오 부호기-복호기; 상기 비디오 부호기-복호기 및 상기 오디오 부호기-복호기에서 압축된 화상신호 및 음성신호를 다중화시키거나 복원된 화상신호 및 음성신호를 역다중화시키는 멀티플렉서/디멀티플렉서; 상기 멀티플렉서/디멀티플렉서에서 역다중화된 음성신호를 문자신호로 변환시키는 음성-문자 변환부; 및 상기 멀티플렉서/디멀티플렉서의 다중화 또는 역다중화 신호를 외부와 송/수신하기 위한 네트워크 접속부를 포함하는 것이 바람직하다.In the videoconferencing system, there is provided a video-to-speech apparatus, comprising: a video encoder-decoder for compressing or restoring an image signal input from the outside into an original image signal; An audio encoder-decoder for compressing a voice signal input from the outside or restoring the original voice signal; A multiplexer / demultiplexer for multiplexing the compressed picture signal and the audio signal in the video encoder-decoder and the audio encoder-decoder or demultiplexing the reconstructed picture signal and the audio signal; A voice-to-text converter for converting the demultiplexed voice signal from the multiplexer / demultiplexer into a text signal; And a network connection for transmitting / receiving a multiplexed or demultiplexed signal of the multiplexer / demultiplexer to the outside.

이하, 첨부된 도면을 참조하여 본 발명을 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail the present invention.

도 2에 도시된 장치는 비디오 인터페이스(200), 비디오 인터페이스(200)를 통하여 입력되는 화상신호를 압축하거나 원래의 화상신호로 복원하는 비디오 부호기-복호기(210), 오디오 인터페이스(220), 오디오 인터페이스(220)를 통하여 입력되는 음성신호를 압축하거나 원래의 음성신호로 복원하는 오디오 부호기-복호기(230), 키보드 또는 터치패널과 같은 외부 입력장치와 접속되는 외부입력 인터페이스(240), 비디오 부호기-복호기(210)의 화상신호, 오디오 부호기-복호기(230)의 음성신호, 외부입력 인터페이스(240)와 접속된 외부 입력신호를 다중화시키거나 역다중화시키는 멀티플렉서/디멀티플렉서(250), 멀티플렉서/디멀티플렉서(250)에서 다중화 또는 역다중화된 신호를 외부에 송/수신하기 위한 네트워크 인터페이스(260), 멀티플렉서/디멀티플렉서(250)에서 역다중화된 음성신호를 문자신호로 변환시키는 음성-문자 변환부(240), 멀티플렉서/디멀티플렉서(260)의 다중화/역다중화를 제어하는 시스템 제어부(280)로 구성된다.The apparatus shown in FIG. 2 includes a video encoder 200, a video encoder-decoder 210 for compressing or restoring an image signal input through the video interface 200 to an original image signal, an audio interface 220, an audio interface. An audio encoder-decoder 230 for compressing a voice signal input through the 220 or restoring an original voice signal, an external input interface 240 connected with an external input device such as a keyboard or a touch panel, and a video encoder-decoder The multiplexer / demultiplexer 250 multiplexes or demultiplexes the image signal of 210, the audio signal of the audio encoder-decoder 230, the external input signal connected to the external input interface 240, the multiplexer / demultiplexer 250 The network interface 260, the multiplexer / demultiplexer 250 for transmitting / receiving a multiplexed or demultiplexed signal to the outside It consists of a system controller 280 for controlling the multiplexing / demultiplexing of the character conversion unit 240, a multiplexer / demultiplexer (260) the neutralized voice signal to the voice signal converted into characters.

이어서, 도 2를 참조하여 본 발명을 상세히 설명하면 다음과 같다.Next, the present invention will be described in detail with reference to FIG. 2.

비디오 부호기-복호기(210)는 비디오 인터페이스(200)를 통하여 입력되는 화상신호를 압축하거나 원래의 화상신호로 복원한다. 화상신호 송신 시에는 비디오 인터페이스(200)로부터 입력되는 화상 신호를 압축하고, 수신 시에는 멀티플렉서/디멀티플렉서(260)에서 역다중화된 화상신호를 원래의 신호로 복원한다.The video encoder-decoder 210 compresses the image signal input through the video interface 200 or restores the original image signal. When the image signal is transmitted, the image signal input from the video interface 200 is compressed, and upon reception, the demultiplexed image signal by the multiplexer / demultiplexer 260 is restored to the original signal.

오디오 부호기-복호기(230)는 오디오 인터페이스(220)를 통하여 입력되는 음성신호를 압축하거나 원래의 음성신호로 복원한다. 음성신호 송신 시에는 오디오 인터페이스(220)로부터 입력되는 음성신호를 압축하고, 수신 시에는 멀티플렉서/디멀티플렉서(260)에서 역다중화된 음성신호를 원래의 신호로 복원한다.The audio encoder-decoder 230 compresses the voice signal input through the audio interface 220 or restores the original voice signal. When the voice signal is transmitted, the voice signal input from the audio interface 220 is compressed, and upon reception, the voice signal demultiplexed by the multiplexer / demultiplexer 260 is restored to the original signal.

외부입력 인터페이스(240)는 컴퓨터의 키보드(Keyboard) 또는 터치패널(Touch Panel)과 같은 외부 입력장치를 본 장치와 접속시키기 위한 인터페이스이다. 외부입력 인터페이스(240)에 외부 입력장치가 연결되면 화상신호나 음성신호뿐만 아니라 외부 입력장치에 의해 입력된 신호도 함께 전송된다.The external input interface 240 is an interface for connecting an external input device such as a keyboard or a touch panel of a computer with the apparatus. When an external input device is connected to the external input interface 240, a signal input by an external input device as well as an image signal or an audio signal is transmitted together.

멀티플렉서/디멀티플렉서(250)는 비디오 부호기-복호기(210)의 화상신호, 오디오 부호기-복호기(230)의 음성신호, 외부입력 인터페이스(250)와 접속된 외부 입력신호를 다중화시키거나 역다중화시킨다. 송신 시에는 비디오 부호기-복호기(210)에서 압축된 화상신호, 오디오 부호기-복호기(230)에서 압축된 음성신호, 외부입력 인터페이스(240)로부터 인가된 외부장치 입력신호를 인가 받아 시스템 제어부(280)의 제어신호에 의해 다중화시킨다. 수신 시에는 멀티플렉서/디멀티플렉서(250)로 입력된 화상신호, 음성신호를 역다중화하여 즉, 각각 분리하여 비디오 부호기-복호기(210) 및 오디오 부호기-복호기(230)로 인가하여 원래의 신호로 복원시킨다.The multiplexer / demultiplexer 250 multiplexes or demultiplexes an image signal of the video encoder-decoder 210, an audio signal of the audio encoder-decoder 230, and an external input signal connected to the external input interface 250. In transmission, the system controller 280 receives an image signal compressed by the video encoder-decoder 210, an audio signal compressed by the audio encoder-decoder 230, and an external device input signal applied from the external input interface 240. Multiplex by the control signal of. Upon reception, the image signal and the audio signal input to the multiplexer / demultiplexer 250 are demultiplexed, that is, they are separated and applied to the video encoder-decoder 210 and the audio encoder-decoder 230 to restore the original signal. .

네트워크 인터페이스(260)는 멀티플렉서/디멀티플렉서(250)의 다중화 또는 역다중화 신호를 외부와 송/수신하기 위한 인터페이스로써 공중 전화 교환망(PSTN : Public Switched Telephone Network)이나 종합 정보 통신망(ISDN : Integrated Service Digital Network)과 같은 전송매체에 접속되어 있다.The network interface 260 is an interface for transmitting / receiving multiplexed or demultiplexed signals of the multiplexer / demultiplexer 250 to the outside and a public switched switched telephone network (PSTN) or an integrated service digital network (ISDN). Is connected to a transmission medium.

음성-문자 변환부(240)는 멀티플렉서/디멀티플렉서(250)에서 역 다중화된 음성 신호를 문자로 변환시켜 비디오 인터페이스(200)로 인가하여 표시장치(미도시)에서 화상신호와 분리된 영역에 표시한다. 또한 음성-문자 변환부(240)는 외국어를 인식하여 모국어로 번역하는 기능을 구비할 수 있다. 예를 들어 미국인과 한국인이 화상회의를 진행하는 경우 청각장애자가 아닐지라도 의사소통에 문제가 발생할 수 있다. 이때 서로 번역된 문자(즉, 영어를 한국어로 한국어를 영어로)를 송/수신하게 되면 의사소통 문제를 해결할 수 있게 된다.The speech-to-text converting unit 240 converts the demultiplexed speech signal from the multiplexer / demultiplexer 250 into a text and applies the text signal to the video interface 200 to display in a region separated from the image signal in the display device (not shown). . In addition, the speech-to-text converter 240 may have a function of recognizing a foreign language and translating it into a native language. For example, if Americans and Koreans conduct video conferences, they may have problems communicating, even if they are not deaf. At this time, if the translated characters (ie, English to Korean and Korean to English) are transmitted / received, communication problems can be solved.

시스템 제어부(280)는 화상회의의 데이터 송/수신에 따른 멀티플렉서/디멀티플렉서(260)의 다중화/역다중화를 제어한다. 시스템 제어부(280)는 현재 상황이 데이터를 송신해야 하는 경우라면 멀티플렉서/디멀티플렉서(250)를 멀티플렉서로 동작시켜 화상신호, 음성신호를 다중화시키고, 현재 상황이 데이터를 수신해야 하는 경우라면 멀티플렉서/디멀티플렉서(250)를 디멀티플렉서로 동작시켜 화상신호, 음성신호를 역다중화시킨다.The system controller 280 controls multiplexing / demultiplexing of the multiplexer / demultiplexer 260 according to data transmission / reception of video conference. The system controller 280 multiplexes the image signal and the audio signal by operating the multiplexer / demultiplexer 250 as a multiplexer when the current situation needs to transmit data, and multiplexer / demultiplexer when the current situation requires data reception. 250 is operated as a demultiplexer to demultiplex image signals and audio signals.

본 발명은 상술한 실시 예에 한정되지 않으며 본 발명의 사상 내에서 당업자에 의한 변형이 가능함은 물론이다.The present invention is not limited to the above-described embodiments and can be modified by those skilled in the art within the spirit of the invention.

상술한 바와 같이 본 발명에 따르면, 화상회의 시에 상대방의 음성을 인식하여 음성을 문자로 변환시켜 디스플레이 하기 때문에 청각 장애가 있는 사용자는 상대방과 의사소통이 가능하게 되는 효과가 있다.As described above, according to the present invention, the user who has a hearing impairment can communicate with the other party because the user recognizes the other party's voice in the video conference and converts the voice into a text for display.

Claims

In a video conference system,

A video encoder-decoder for compressing an image signal input from the outside or restoring the original image signal;

An audio encoder-decoder for compressing a voice signal input from the outside or restoring the original voice signal;

A multiplexer / demultiplexer for multiplexing the compressed picture signal and the audio signal in the video encoder-decoder and the audio encoder-decoder or demultiplexing the reconstructed picture signal and the audio signal;

A voice-to-text converter for converting the demultiplexed voice signal from the multiplexer / demultiplexer into a text signal; And

And a network connection unit for transmitting / receiving a multiplexer or demultiplexer signal of the multiplexer / demultiplexer to the outside.

The method of claim 1, wherein the multiplexer / demultiplexer

And an external input signal such as a keyboard or a touch panel to be multiplexed or demultiplexed together with the image signal and the audio signal.

The method of claim 1, wherein the voice / text conversion unit

And a foreign language of the voice signal is translated into a native language.