KR20080086793A

KR20080086793A - Audio data reproducing mobile device

Info

Publication number: KR20080086793A
Application number: KR1020070028924A
Authority: KR
Inventors: 남시욱
Original assignee: 엘지전자 주식회사
Priority date: 2007-03-23
Filing date: 2007-03-23
Publication date: 2008-09-26

Abstract

An audio data reproducing mobile device is provided to automatically generate a caption by installing a sound/text conversion part for converting an audio signal into a text based on voice recognition. An audio data reproducing mobile device comprises a sound/text conversion part(50), a caption process part, and a storage part(70). The sound/text conversion part converts an audio signal into a text based on voice recognition. The caption process part outputs the converted text information into the caption information on a corresponding audio signal. The storage part stores the generated caption information. The audio signal is selected from one of moving picture contents, language contents, music contents, and mobile digital broadcasting contents.

Description

Mobile device capable of audio playback {AUDIO DATA REPRODUCING MOBILE DEVICE}

도1은 본 발명의 실시예에 따른 휴대 기기의 블럭 구성도1 is a block diagram of a portable device according to an embodiment of the present invention.

도2는 본 발명의 실시예에 따른 오디오 데이터와 자막 데이터 포맷의 예를 나타낸 도면2 illustrates an example of audio data and subtitle data formats according to an embodiment of the present invention.

도3은 본 발명의 실시예에 따른 휴대 기기에서 음성 인식을 기반으로 자막을 생성하여 디스플레이하는 방법의 플로우차트3 is a flowchart of a method for generating and displaying subtitles based on speech recognition in a mobile device according to an embodiment of the present invention;

본 발명은 재생되는 오디오 신호를 음성 인식을 기반으로 인식하고, 인식된 결과에 따라 해당 오디오 신호의 자막을 자동으로 생성하여 디스플레이할 수 있도록 한 휴대 기기에 관한 것이다.The present invention relates to a portable device for recognizing an audio signal to be reproduced based on voice recognition, and automatically generating and displaying a subtitle of the corresponding audio signal according to a recognized result.

핸드폰, PDA(Personal Digtal Assistant), PMP(Portable Multimedia Player), MP3 플레이어, 이동형 디지털 방송 수신기 등과 같은 휴대 기기는 음성이나 영상, 데이터 통신 기능 뿐만 아니라 이동형 디지털 방송 수신, 음악이나 어학 컨텐츠와 같은 오디오 데이터 재생, 동영상 파일의 재생 기능에 이르기 까지 다양한 기능을 개별, 혹은 복합된 형태로 갖추고 있다.Mobile devices such as mobile phones, PDAs (Personal Digtal Assistants), Portable Multimedia Players (PMPs), MP3 players, mobile digital broadcast receivers, etc., are capable of receiving audio, video and data communications, as well as audio data such as mobile digital broadcast reception, music and language content. It has various functions such as playback and video file playback in individual or complex form.

동영상 컨텐츠 또는 어학 컨텐츠의 경우는 비디오 및/또는 오디오 재생과 더불어 자막을 함께 디스플레이하는 기술이 구현되고 있다. 자막을 함께 디스플레이하는 기술은 오디오 데이터와 함께 자막 데이터가 사전에 제공되는 경우에 이루어지게 된다. 따라서, 자막 데이터가 해당 컨텐츠와 함께 제공되지 않는 동영상 컨텐츠, 어학 컨텐츠, 음악 파일 등에서는 자막없이 해당 컨텐츠의 재생만 이루어지게 된다. 또한, 이동형 디지털 방송을 수신하는 경우에도 방송 시스템 자체에서 자막을 제공하지 않는 한, 수신 단말기에서는 자막없이 해당 방송 프로그램의 시청만 이루어지게 된다.In the case of video content or language content, a technology for displaying subtitles together with video and / or audio playback is implemented. The technique of displaying subtitles together is made when subtitle data is provided in advance together with audio data. Therefore, only video content, language content, music files, etc., in which the caption data is not provided with the content, is played without the caption. In addition, even when a mobile digital broadcast is received, the receiving terminal can only watch the corresponding broadcast program without the caption unless the broadcast system itself provides the caption.

본 발명은 음성 인식에 기반하여 오디오 신호를 자동으로 자막으로 변환하여 디스플레이하여 줄 수 있도록 한 휴대 기기를 제공한다.The present invention provides a portable device capable of automatically converting an audio signal into subtitles based on voice recognition and displaying the same.

본 발명은 음성 인식에 기반하여 오디오 신호를 자동으로 자막으로 변환하고, 변환된 자막 정보를 해당 컨텐츠와 함께 저장하여 줌으로써, 음성 인식을 기반으로 자막이 자동으로 부가된 컨텐츠를 사용할 수 있도록 한 휴대 기기를 제공한다.The present invention automatically converts an audio signal into subtitles based on voice recognition, and stores the converted subtitle information together with the corresponding content, thereby enabling the user to use a content in which subtitles are automatically added based on voice recognition. To provide.

본 발명은 오디오를 포함하는 동영상 컨텐츠의 재생이나, 어학 컨텐츠나 음악 파일과 같은 오디오 컨텐츠 등의 재생이나, 이동형 디지털 방송 수신 시에, 오디오 신호를 음성 인식을 기반으로 텍스트(text)로 변환하여 자동으로 자막을 생성할 수 있도록 한 휴대 기기를 제공한다.The present invention converts an audio signal into text based on speech recognition and automatically plays back a video content including audio, plays back audio content such as language content or music file, or receives a mobile digital broadcast. It provides a mobile device that can generate subtitles.

본 발명은 오디오를 포함하는 동영상 컨텐츠의 재생이나, 어학 컨텐츠나 음 악 파일과 같은 오디오 컨텐츠의 재생이나, 이동형 디지털 방송 수신 시에, 오디오 신호를 음성 인식을 기반으로 텍스트로 변환하여 자동으로 자막을 생성하고, 생성된 자막 정보를 해당 컨텐츠와 함께 저장하여 사용할 수 있게 하거나, 생성된 자막 정보를 해당 컨텐츠와 동기시켜 디스플레이하여 줄 수 있도록 한 휴대 기기를 제공한다.The present invention converts an audio signal into text based on voice recognition and automatically generates subtitles when playing video content including audio, playing audio content such as language content or music file, or receiving a mobile digital broadcast. The present invention provides a mobile device that generates and allows the generated subtitle information to be stored and used together with the corresponding content or to display the generated subtitle information in synchronization with the corresponding content.

본 발명에 따른 휴대 기기는, 오디오 신호를 음성 인식에 기반하여 텍스트로 변환하는 음성/텍스트 변환부; 상기 변환된 텍스트 정보를 해당 오디오 신호에 대한 자막 정보로 출력하는 자막 처리부; 를 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a portable device comprising: a voice / text converter for converting an audio signal into text based on voice recognition; A caption processing unit for outputting the converted text information as caption information for a corresponding audio signal; Characterized in that it comprises a.

본 발명에 따른 휴대 기기는, 오디오 신호를 디코딩하기 위한 디코더; 상기 디코딩된 오디오 신호를 출력하는 오디오 출력부; 상기 디코딩된 오디오 신호를 입력받아 음성 인식을 수행하는 음성 인식부; 상기 인식된 음성을 텍스트로 변환하여 자막 정보를 생성하는 음성/텍스트 변환부; 상기 변환된 텍스트를 상기 오디오 신호에 대한 자막으로 출력하는 디스플레이부; 상기 출력되는 오디오 신호와 자막을 동기화시켜 주기 위한 제어부; 상기 생성된 자막 정보를 저장하기 위한 저장부; 를 포함하는 것을 특징으로 한다.A portable device according to the present invention comprises a decoder for decoding an audio signal; An audio output unit configured to output the decoded audio signal; A voice recognition unit receiving the decoded audio signal and performing voice recognition; A voice / text converter configured to convert the recognized voice into text to generate caption information; A display unit for outputting the converted text as a subtitle for the audio signal; A controller for synchronizing the output audio signal with a subtitle; A storage unit for storing the generated subtitle information; Characterized in that it comprises a.

이하, 첨부된 도면을 참조하여 본 발명의 실시예에 따른 휴대 기기와, 휴대 기기에서 이루어지는 자막 생성과 저장, 자막과 오디오 간의 동기화 제어 방법에 대해서 설명하면 다음과 같다.Hereinafter, a portable device according to an exemplary embodiment of the present invention and a method of controlling subtitle generation and storage, synchronization between subtitles and audio will be described with reference to the accompanying drawings.

도1은 오디오 재생이 가능한 휴대 기기의 구성을 보여준다. 오디오 재생이 가능한 휴대 기기의 예로는, 음악 파일 재생 기능을 탑재한 핸드폰, PMP와 같은 동영상 재생기기, PDA와 같은 디지털 데이터 재생 및 처리기기, MP3 플레이어와 같은 오디오 플레이어 등이 있다. 본 발명은 이외에도, 오디오 데이터를 재생할 수 있는 기능을 탑재한 다양한 종류의 휴대 기기에 적용된다. 다른 예를 들면 이동형 디지털 방송 수신 단말기를 들 수 있다. 이동형 디지털 방송 수신 단말기는 방송 프로그램을 수신하여 오디오 및/또는 비디오를 재생하고 출력한다.1 shows a configuration of a portable device capable of audio reproduction. Examples of portable devices capable of audio reproduction include mobile phones equipped with music file playback functions, video player devices such as PMP, digital data playback and processing devices such as PDAs, and audio players such as MP3 players. The present invention is also applied to various kinds of portable devices equipped with a function capable of reproducing audio data. Another example is a mobile digital broadcast receiving terminal. The mobile digital broadcast receiving terminal receives a broadcast program to play and output audio and / or video.

도1을 참조하면 본 발명의 실시예에 따른 휴대 기기는 기기 조작을 위한 사용자 인터페이스부(10)와, 입력되는 오디오 신호(Ain)를 디코딩하기 위한 오디오 디코더(20)와, 디코딩된 오디오 신호를 출력하기 위한 오디오 출력부(30)와, 디코딩된 오디오 신호를 입력받아 음성 인식을 수행하는 음성 인식부(40)와, 인식된 음성신호를 해당 텍스트로 변환하여 자막 정보를 생성하는 음성/텍스트 변환부(50)와, 변환된 텍스트 정보를 해당 오디오 신호에 대한 자막으로 출력해 주는 디스플레이부(60)와, 생성된 자막 정보를 저장하기 위한 저장부(70)와, 생성된 자막 정보 또는 저장된 자막 정보를 해당 오디오 신호와 동기화시켜 출력해 주기 위한 제어부(80)를 포함한다.Referring to FIG. 1, a portable device according to an exemplary embodiment of the present invention includes a user interface 10 for operating a device, an audio decoder 20 for decoding an input audio signal Ain, and a decoded audio signal. An audio output unit 30 for outputting, a voice recognition unit 40 for receiving a decoded audio signal and performing voice recognition, and a voice / text conversion for generating subtitle information by converting the recognized voice signal into a corresponding text. A display unit 60 for outputting the converted text information as subtitles for the corresponding audio signal, a storage unit 70 for storing the generated subtitle information, and generated subtitle information or stored subtitles. And a control unit 80 for outputting information in synchronization with the corresponding audio signal.

사용자 인터페이스부(10)는 키 패드를 사용할 수 있다. 사용자 인터페이스부(10)를 통해서 오디오 데이터 재생 명령이 입력되고 오디오 데이터에 대한 자막 생성 명령이 입력되면 휴대 기기는 오디오 신호의 재생, 자막 정보의 생성, 생성된 자막의 디스플레이를 수행한다.The user interface 10 may use a keypad. When an audio data reproduction command is input through the user interface unit 10 and a caption generation command for the audio data is input, the portable device plays an audio signal, generates caption information, and displays the generated caption.

먼저, 오디오 디코더(20)는 입력된 오디오 신호(Ain)를 디코딩한다. 오디오 신호(Ain)의 제공은 다양한 리소스(resource)로부터 이루어질 수 있는데, 예를 들면 저장매체에 저장된 어학 컨텐츠나 음악 파일, 혹은 동영상 파일로부터 제공받을 수 있으며, 수신된 디지털 방송 데이터 중에서 오디오 데이터를 입력받을 수도 있다. 대부분의 오디오 데이터는 압축 부호화된 형태로 제공되므로 입력 오디오 신호(Ain)는 오디오 디코더(20)에 의해서 디코딩된다.First, the audio decoder 20 decodes the input audio signal Ain. The audio signal Ain may be provided from various resources. For example, the audio signal Ain may be provided from language content, a music file, or a video file stored in a storage medium. The audio data may be input from the received digital broadcast data. You can get it. Since most of the audio data is provided in compressed coded form, the input audio signal Ain is decoded by the audio decoder 20.

오디오 디코더(20)에 의해서 디코딩된 오디오 신호는 오디오 출력부(30)에 전달된다. 오디오 출력부(30)는 디코딩된 디지털 오디오 데이터를 아날로그 오디오 신호로 변환하여 스피커나 이어폰 혹은 헤드폰 등과 같은 출력장치로 출력한다.The audio signal decoded by the audio decoder 20 is transmitted to the audio output unit 30. The audio output unit 30 converts the decoded digital audio data into an analog audio signal and outputs the same to an output device such as a speaker, earphone, headphone, or the like.

오디오 디코더(20)에 의해서 디코딩된 오디오 신호는 음성 인식부(40)에 전달된다. 음성 인식부(40)는 입력된 오디오 신호에 대한 음성 인식을 수행한다. 음성 인식부(40)에서 이루어지는 음성 인식 기술은 여러 가지를 고려할 수 있겠으나, 대부분의 음성 인식 기술은 입력 오디오 신호에서 음향학적인 특징 정보를 추출하고, 그 추출된 특징 정보에 근거하여 해당 음성을 인식하는 기술에 근거한다. 음성 인식을 위해서 추출되는 특징 정보는 음성 신호의 피치, 에너지 등을 들 수 있으며, 이러한 특징 정보를 사전에 구축된 데이터 베이스의 특징 정보와 비교 검색하여 그 유사도에 따라 해당 음성 신호가 어떤 음성을 발음한 것인지를 판정하고 그 결과를 출력한다.The audio signal decoded by the audio decoder 20 is transmitted to the voice recognition unit 40. The voice recognition unit 40 performs voice recognition on the input audio signal. Although the speech recognition technology performed by the speech recognizer 40 may be considered in various ways, most speech recognition technologies extract acoustic feature information from an input audio signal and recognize the corresponding speech based on the extracted feature information. Based on technology Feature information extracted for speech recognition may include pitch and energy of a voice signal. The feature information is searched by comparison with feature information of a previously constructed database, and the voice signal pronounces a certain voice according to its similarity. It is determined whether or not it is done and the result is output.

음성 인식부(40)의 인식 결과는 음성/텍스트 변환부(50)에 전달된다. 본 발명에서, 음성/텍스트 변환부(40)에서 이루어지는 음성/텍스트 변환 기술은 여러 가지를 고려할 수 있겠으나, 음성/텍스트 변환 기술은 인식된 음성에 해당하는 텍스 트를 맵핑하여 해당 텍스트 코드를 생생해 내는 기법에 근거하여 간단하게 이루어질 수 있을 것이다.The recognition result of the voice recognition unit 40 is transmitted to the voice / text converter 50. In the present invention, the speech / text conversion technique performed by the speech / text converter 40 may be considered in various ways, but the speech / text conversion technique generates a corresponding text code by mapping text corresponding to the recognized speech. It can be done simply on the basis of the technique.

음성/텍스트 변환부(50)에 의해서 변환된 텍스트 정보는 그 오디오 신호에 대한 자막 정보가 된다. 이 자막 정보는 디스플레이부(60)에 전달된다. 디스플레이부(60)는 입력된 자막 정보(즉, 자막 코드 값)에 근거하여 해당 문자 영상을 디스플레이 장치에 출력해 준다.The text information converted by the speech / text converter 50 becomes caption information for the audio signal. This caption information is transmitted to the display unit 60. The display unit 60 outputs the corresponding character image to the display device based on the inputted caption information (ie, caption code value).

이 때, 제어부(80)에 의하여 오디오 출력과 자막 출력의 타이밍을 동기화시켜 준다. 제어부(80)에 의한 오디오/자막 동기화는 오디오 신호의 출력 시각 정보에 근거하여 그 시각에서 생성된 상기 자막 정보가 출력되도록 타이밍을 제어함으로써 이루어진다.At this time, the control unit 80 synchronizes the timing of the audio output and the subtitle output. Audio / subtitle synchronization by the control unit 80 is performed by controlling timing so that the caption information generated at that time is output based on the output time information of the audio signal.

한편, 음성/텍스트 변환부(50)에 의해서 생성된 자막 정보는 저장부(70)에 저장할 수도 있다. 이는, 자막이 없는 동영상 컨텐츠나 어학 컨텐츠, 음악 파일 등에 대하여 상기 음성 인식을 기반으로 생성된 자막을 부가하여 주기 위한 것이다. 이렇게 저장된 자막 정보는 해당 컨텐츠를 재생할 때 함께 재생되어 앞서 설명한 바와 같이 비디오 혹은 오디오 신호와 동기화되어 출력되는데 사용할 수 있다.Meanwhile, the caption information generated by the voice / text converter 50 may be stored in the storage 70. This is to add subtitles generated based on the speech recognition to video content, language content, music files, etc. without subtitles. The subtitle information stored in this way can be used to be reproduced together when playing the corresponding content and to be output in synchronization with the video or audio signal as described above.

본 발명에서는 음성 인식을 기반으로 자막을 자동으로 생성하므로, MP3 플레이어나 PMP 등 압축 포맷 오디오 파일 재생이 가능한 휴대 기기에서 음악이나 어학 컨텐츠 등을 재생할 때 음성 인식에 의해서 자막을 실시간으로 자동으로 표시해 줄 수 있게 된다. 또한, 이동형 디지털 방송 수신 단말기에서 제공되는 뉴스, 영화, 드라마와 같은 동영상 컨텐츠 등을 시청할 경우에 이를 자막으로 표시해 줄 수 있 다.In the present invention, the caption is automatically generated based on speech recognition, so that the caption is automatically displayed in real time by speech recognition when playing music or language content in a portable device capable of playing a compressed format audio file such as an MP3 player or PMP. It becomes possible. In addition, when watching a video content, such as news, movies, dramas provided by the mobile digital broadcast receiving terminal can be displayed as subtitles.

본 발명에서는 디코딩된 오디오 신호에 대하여 실시간으로 자막을 생성하고 이를 동기화시켜 표시해 주는 경우 뿐만 아니라, 음성 인식부(40)와 음성/텍스트 변환부(50)를 구동시켜 사전에 특정 컨텐츠에 대한 자막 정보를 생성하여 저장해 놓고, 이 자막 정보를 후에 그 컨텐츠를 재생할 때 함께 읽어와서 출력해 주는 방법으로 휴대 기기를 운용하는 방법도 가능하다.In the present invention, the subtitle information for the specific content is previously driven by driving the voice recognition unit 40 and the voice / text converter 50 as well as generating and displaying the subtitles in real time with respect to the decoded audio signal. It is also possible to operate the mobile device by generating and storing the information, and reading the subtitle information together when the content is played later.

도2는 본 발명의 실시예에 따른 오디오 데이터와 자막 데이터 포맷의 예를 보여주고 있다. 본 발명에 따른 상기 데이터 포맷은 여기에 나타낸 것으로 제한되지 않는다. 도2의 데이터 포맷은 이해를 돕기 위하여 예시한 것 뿐이며, 저장부(70)에 저장할 때 사용할 수 있다.2 shows an example of audio data and subtitle data formats according to an embodiment of the present invention. The data format according to the invention is not limited to that shown here. The data format of FIG. 2 is only illustrated for clarity and may be used when storing the data in the storage unit 70.

도2를 참조하면 오디오 데이터는 헤더(110)와 오디오 데이터(120) 부분으로 나눌 수 있으며, 자막 데이터는 자막 인덱스(210)와 자막 데이터(220) 부분으로 나눌 수 있다. 오디오 데이터 헤더(110)는 그 오디오 데이터에 연결될 자막 데이터를 가리키는 자막 인덱스 정보를 포함한다. 오디오 데이터를 재생할 때 그 헤더 부분의 자막 인덱스 정보를 이용하여 해당 자막 데이터를 읽어올 수 있게 된다.Referring to FIG. 2, audio data may be divided into a header 110 and audio data 120, and subtitle data may be divided into a subtitle index 210 and a subtitle data 220. The audio data header 110 includes subtitle index information indicating subtitle data to be connected to the audio data. When playing audio data, the caption data can be read using the caption index information of the header portion.

도3은 본 발명의 실시예에 따른 휴대 기기에서 음성 인식을 기반으로 자막을 생성하여 디스플레이하는 방법의 플로우차트이다.3 is a flowchart of a method for generating and displaying a subtitle based on speech recognition in a mobile device according to an embodiment of the present invention.

제 1 단계(S10)는 오디오 디코더(20)에 의한 오디오 디코딩 단계이다. 디코딩된 오디오 신호는 오디오 출력부(30)와 음성 인식부(40)로 전달된다.The first step S10 is an audio decoding step by the audio decoder 20. The decoded audio signal is transmitted to the audio output unit 30 and the voice recognition unit 40.

제 2 단계(S20)는 디코딩된 오디오 신호에 대하여 음성 인식을 수행하는 단 계이다. 음성 인식은 음성 인식부(40)에 의해서 이루어지며, 인식된 결과는 음성/텍스트 변환부(50)로 전달된다.The second step S20 is a step of performing speech recognition on the decoded audio signal. Speech recognition is performed by the speech recognizer 40, and the recognized result is transmitted to the speech / text converter 50.

제 3 단계(S30)는 인식된 음성신호를 해당 텍스트로 변환하는 단계이다. 즉, 음성/텍스트 변환부(50)에 의해서 음성신호를 그 음성에 해당되는 텍스트로 변환한다. 변환된 텍스트는 해당 음성신호에 대한 자막 정보가 된다. 생성된 자막 정보는 디스플레이부(60)로 전달되며, 사용자 인터페이스부(10)를 이용하여 자막 생성과 저장 명령이 입력된 경우라면 저장부(70)에 오디오 파일과 연결시켜 저장된다(S60).The third step S30 is a step of converting the recognized voice signal into the corresponding text. That is, the speech / text converter 50 converts the speech signal into text corresponding to the speech. The converted text becomes subtitle information for the corresponding audio signal. The generated caption information is transmitted to the display unit 60, and if the caption generation and storage command is input using the user interface unit 10, the caption information is stored in connection with the audio file in the storage unit 70 (S60).

제 4 단계(S40)는 제어부(80)에 의한 음성과 자막의 동기화 제어 단계이다. 즉, 오디오 출력부(30)로 출력되는 오디오 신호와 디스플레이부(60)로 출력되는 자막 정보의 출력 타이밍을 동기화시켜 주는 단계이다. 음성과 자막을 동기화시키는 간단한 방법으로는 음성 인식에 소요되는 시간을 감안하여 오디오 출력을 소정 시간만큼 지연시켜 출력시키는 방법을 사용할 수도 있을 것이다. 그렇지만, 보다 더 정교한 동기화를 위해서는 오디오 출력과 자막 간의 시각 정보(즉, time stamp)를 활용하여 양자의 출력 시각을 서로 일치시켜 주는 방법을 사용하는 것도 가능할 것이다.The fourth step S40 is a synchronization control step of the voice and the subtitle by the controller 80. That is, the step of synchronizing the output timing of the audio signal output to the audio output unit 30 and the subtitle information output to the display unit 60. As a simple method of synchronizing the voice and the subtitle, a method of delaying the audio output by a predetermined time in consideration of the time required for voice recognition may be used. However, for more precise synchronization, it may be possible to use a method of synchronizing the output times of both by utilizing the time information (that is, time stamp) between the audio output and the subtitle.

제 5 단계(S50)는 음성과 자막을 출력하는 단계이다. 즉, 오디오 출력부(30)와 디스플레이부(60)에서 각각 오디오 신호와 자막이 서로 동기화된 상태로 출력되는 단계이다.A fifth step S50 is a step of outputting voice and subtitles. That is, the audio output unit 30 and the display unit 60 output the audio signal and the subtitle in synchronization with each other.

지금까지는 휴대 기기에서, 휴대 기기에 탑재된 음성 인식에 기반하여 자막 정보를 생성하고 이를 표시해 주는 방법에 대하여 설명하였다. 그렇지만, 휴대 기기에서 음성 인식 및 음성/텍스트 변환의 실시간 처리가 어려운 경우에는 자막을 휴대 기기에서 사전에 생성하여 저장해 놓고, 해당 컨텐츠를 재생할 때에 상기 저장된 자막 정보를 읽어와서 표시해 주는 방법을 사용할 수도 있다.Up to now, a description has been given of a method for generating and displaying caption information based on voice recognition mounted in a portable device. However, if it is difficult to process voice recognition and voice / text conversion in real time on a mobile device, a subtitle may be generated and stored in advance on the mobile device, and the stored subtitle information may be read and displayed when the corresponding content is played back. .

또는, 휴대 기기 이외의 컴퓨터 시스템, 예를 들면 PC(Personal Computer)에 음성 인식 엔진을 탑재하고, 여기에서 음성 인식 엔진을 구동시켜 음성의 인식과 음성/텍스트 변환에 기반한 자막 생성을 실행하고, 생성된 자막과 해당 컨텐츠를 휴대 기기로 다운로드하여 저장해 두었다가 사용하는 방법도 가능하다.Alternatively, a speech recognition engine is mounted on a computer system other than a portable device, for example, a personal computer (PC), and the speech recognition engine is driven to execute subtitle generation based on speech recognition and speech / text conversion. You can also download the saved subtitles and their contents to your mobile device and save them for use.

본 발명에 따른 휴대 기기에서는 음성 인식을 기반으로 오디오 자막을 자동으로 생성하고, 이를 저장하거나, 생성 또는 저장된 오디오 신호를 해당 컨텐츠와 동기화시켜 출력해 준다.In the portable device according to the present invention, audio subtitles are automatically generated based on voice recognition and stored therein, or the generated or stored audio signals are synchronized with the corresponding content and output.

본 발명에 따른 휴대 기기에서는 휴대 기기 자체에서 오디오 컨텐츠에 대한 자막을 생성하기 때문에, 어학 컨텐츠의 재생이나 디지털 방송 시청시에 실시간으로 해당 컨텐츠 또는 방송 프로그램에 대한 자막을 함께 시청할 수 있다.In the mobile device according to the present invention, since the mobile device itself generates subtitles for the audio content, the mobile device can watch the subtitles of the corresponding content or the broadcast program in real time when the language content is reproduced or the digital broadcast is viewed.

Claims

A speech / text converter for converting an audio signal into text based on speech recognition; A caption processing unit for outputting the converted text information as caption information for a corresponding audio signal; A mobile device comprising a.

The mobile device of claim 1, further comprising a storage unit for storing the generated subtitle information.

The mobile device of claim 1, wherein the audio signal is selected from at least one of video content, language content, music content, and mobile digital broadcast content.

A decoder for decoding the audio signal; An audio output unit configured to output the decoded audio signal; A voice recognition unit receiving the decoded audio signal and performing voice recognition; A voice / text converter configured to convert the recognized voice into text to generate caption information; A display unit for outputting the converted text as a subtitle for the audio signal; A controller for synchronizing the output audio signal with a subtitle; A storage unit for storing the generated subtitle information; A mobile device comprising a.

The mobile device of claim 4, wherein the audio signal is selected from at least one of video content, language content, music content, and mobile digital broadcast content.