KR20200024484A

KR20200024484A - Speech to text recording apparatus and speech-text converting system using the same

Info

Publication number: KR20200024484A
Application number: KR1020180101216A
Authority: KR
Inventors: 이지혜
Original assignee: 주식회사 나무엔
Priority date: 2018-08-28
Filing date: 2018-08-28
Publication date: 2020-03-09

Abstract

The present invention relates to an STT recording device for recording a voice using a small device, converting a voice into a text, and sharing information on a voice and a text with various terminals and a voice-to-text conversion system using the same. By recording and storing a voice with optimal sound quality through a dedicated recording device with a small size and receiving and storing text information and linked information including synchronization information for a voice and a text through an external server together, utilization may be maximized by transmitting the linked information and information related to a voice file including a text file together to another terminal.

Description

Speech to text recording apparatus and speech-text converting system using the same}

본 발명은 STT(Speech to text) 녹음 장치 및 이를 이용한 음성-텍스트 변환 시스템에 관한 것으로, 특히 소형 장치를 이용하여 음성을 녹음하고 이를 텍스트로 변환하며 이러한 음성 및 텍스트 정보를 다양한 단말기들과 공유할 수 있도록 한 STT 녹음 장치 및 이를 이용한 음성-텍스트 변환 시스템에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech to text (STT) recording apparatus and a speech-to-text converting system using the same. An STT recording apparatus and a voice-to-text conversion system using the same are provided.

지식과 정보가 중요한 정보화 사회가 성숙해지면서 다양한 정보 교류나 학습 방법들이 일상적으로 사용되고 있다. 특히 휴대 장치들과 통신망의 급속한 발전에 따라 원격에서 멀티미디어 컨텐츠를 이용하는 정보 교류 방법과 학습 방법들 역시 익숙해지고 있는 상황이다.As information society where knowledge and information are important has matured, various information exchanges and learning methods are used on a daily basis. In particular, with the rapid development of mobile devices and communication networks, information exchange methods and learning methods using multimedia contents from a remote place are becoming familiar.

하지만, 아직까지 대면을 통한 회의, 상담, 강의 등과 같이 실제로 정보를 교류할 대상과 직접 만나는 경우에 비할 수 없다.However, it is still not comparable to meeting with the person to whom information is actually exchanged, such as face-to-face meetings, counseling, and lectures.

이러한 대면식 정보 교류나 학습의 경우 충실한 참여를 통해서 원하는 정보를 얻을 수 있고, 메모나 노트 필기 등을 통해서 기억을 되살릴 수 있지만, 현실감있는 정보를 충실히 얻기 위하여 대면 상황에서의 음성을 녹음하는 방법 또한 활발히 활용되고 있다.In the case of face-to-face information exchange or learning, the desired information can be obtained through faithful participation, and the memory can be revived by taking notes or notes, but the method of recording voice in a face-to-face situation in order to obtain realistic information faithfully It is actively used.

이러한 음성 녹음을 위해서 다양한 전용 음성 녹음 장치들이 존재하며, 대부분의 사람들이 보유하고 있는 스마트폰이나 휴대 단말기에 음성 녹음 기능이 존재하므로 필요에 따라 대면 상황의 음성을 녹음할 수 있다.Various voice recording apparatuses exist for such voice recording, and the voice recording function exists in a smartphone or a mobile terminal owned by most people, so that voices of face-to-face situations can be recorded as needed.

전용 음성 녹음 장치들은 최적화된 음질과 압축률을 통해서 효과적으로 음성을 녹음하여 파일로 저장하여 관리할 수 있으나, 이러한 저장 내용들이 누적될 경우 그 내용을 신속하게 파악할 수 없는 수많은 음성 파일들이 존재하게 되므로 그 관리가 어렵고 내용을 파악하는데 시간이 오래 걸리게 된다. 특히 제한된 인터페이스를 가지기 때문에 원하는 내용이 포함된 음성 파일을 선택하거나 해당 내용이 포함된 부분을 찾아서 듣기가 쉽지 않다.Dedicated voice recording devices can efficiently record and manage voices through optimized sound quality and compression ratios, but if these contents accumulate, there are many voice files that cannot be understood quickly. Is difficult and takes a long time to understand the content. In particular, due to the limited interface, it is not easy to select a voice file containing the desired content or to find and listen to the part containing the content.

한편, 편리한 사용자 인터페이스를 구비한 스마트폰이나 휴대 단말기의 음성 녹음 기능(애플리케이션)의 경우 항상 휴대한다는 점에서 필요한 시점에 음성을 녹음할 수 있으나, 그 기본적인 기능은 음성 통화를 포함하는 통신 기능이라는 점에서 음성이 녹음되고 있는 상황에서 착신 전화가 오거나 다양한 애플리케이션을 통한 연락이 수신될 수 있어 음성 녹음 전용으로 활용하기 어려우며, 음성 녹음에 최적화되어 있지 않기 때문에 음질에 한계가 있을 뿐만 아니라 다용도 활용에 의해 항상 부족한 저장 공간이 음성 녹음에 의해 점유되므로 녹음된 음성을 지우거나 다른 곳으로 이동 저장해야 하는 불편함이 존재한다.On the other hand, in the case of a voice recording function (application) of a smart phone or a portable terminal with a convenient user interface, the voice can be recorded at a necessary time in that it is always carried, but the basic function is a communication function including a voice call. It is difficult to use for voice recording only because an incoming call or a call through various applications can be received while a voice is being recorded in the system.It is not optimized for voice recording. Since insufficient storage space is occupied by voice recording, there is an inconvenience in that the recorded voice must be deleted or moved to another place.

즉, 전용 음성 녹음 장치의 경우 수많은 음성 정보를 보관하고 필요할 때 재생하거나 타 단말기로 전달하여 이용할 수 있으나, 너무 많은 음성 정보들을 그 내용과 연계하여 관리하거나 검색하기 어려운 한계가 있고, 녹음 기능을 부가 기능으로 가지는 다른 단말기를 이용할 경우 녹음 음질이 좋지 않으며 녹음에 의해 점유되는 공간에 의해 녹음 내용을 누적하여 보관하고 이용 하기가 어렵고 이 역시 그 내용을 확인하여 관리하거나 원하는 내용을 검색하기가 어려운 한계가 있다.That is, in the case of a dedicated voice recording device, a large number of voice information can be stored and played back when necessary or transferred to another terminal, but there is a limitation that it is difficult to manage or search too much voice information in association with its contents, and add a recording function. In case of using other terminal with function, recording sound quality is not good and it is difficult to accumulate and store the recording contents by the space occupied by the recording, and also it is difficult to check and manage the contents or search for desired contents. have.

따라서, 최적 음질로 녹음된 음성을 방대하게 누적하여 휴대할 수 있으면서 용이하게 내용을 검색하여 원하는 음성 파일을 재생할 수 있도록 함과 아울러, 활용도가 제한된 음성 데이터를 활용도가 높은 텍스트로 자동 변환하여 활용도를 높이도록 하는 새로운 전용 녹음 장치 및 이를 이용한 음성-텍스트 변환 시스템이 필요한 실정이다.Therefore, it is possible to carry and accumulate a large amount of recorded voices with the optimal sound quality, and to easily search for the contents and play the desired voice file. There is a need for a new dedicated recording device and a voice-to-text conversion system using the same.

한편, 음성을 텍스트로 변환하는 STT(Speech to text) 기술의 발전에 따라 음성을 텍스트로 변환하는 기능을 다양하게 활용하고 있으며, 녹음된 음성을 텍스트로 변환하는 것도 어렵지 않게 수행할 수 있다. 하지만, 인간이 듣는 경우에도 그 발음이나 주변 환경에 의해 음성의 내용을 정확하게 구분할 수 없는 경우가 빈번하며 발음이 유사하지만 실제 문자는 다른 음성도 존재하기 때문에 STT의 결과는 완전히 신뢰할 수 없는 한계가 있다. 따라서, 단순히 음성을 텍스트로 변환할 수 있지만 신뢰할 수 없는 정보를 제공함으로써 오히려 데이터의 신뢰성이 낮아지는 문제가 발생할 수 있다.Meanwhile, according to the development of speech to text (STT) technology for converting speech into text, various functions of converting speech into text are utilized, and converting recorded speech into text can be easily performed. However, even when a human hears, the contents of the voice cannot be accurately distinguished by the pronunciation or the surrounding environment, and the pronunciation is similar, but since there are other voices with actual characters, the result of the STT is not completely reliable. . Thus, a problem may occur in that the reliability of data may be lowered by simply converting a voice into text but providing unreliable information.

결국, 최적 음질로 음성을 녹음하여 관리하면서 이를 텍스트로 변환하여 추가적인 정보를 더 제공하는 것으로 그치는 것이 아니라 신뢰할 수 있는 정보를 다양한 방식으로 제공할 수 있어야 하지만, 아직까지 이러한 요구들 모두 만족하는 기술이 등장하지 못하고 있는 실정이다.After all, it is not only to record and manage voice with optimal sound quality, but to convert it to text to provide additional information, and to provide reliable information in various ways. The situation is not appearing.

한국 공개특허 제10-2017-0046958호, [전자 장치 및 그의 음성 인식을 이용한 기능 실행 방법]Korean Patent Laid-Open Publication No. 10-2017-0046958, [A method for executing a function using an electronic device and speech recognition thereof] 한국 공개특허 제10-2014-0006503호, [휴대 단말기의 사용자 음성 녹음 및 재생 방법 및 장치]Korean Patent Laid-Open No. 10-2014-0006503, [Method and Apparatus for Recording and Reproducing User Voice of Mobile Terminal]

전술한 문제점을 개선하기 위한 본 발명 실시예들의 목적은 휴대가 간편하고 용이하게 음원을 지향할 수 있는 소형의 전용 녹음 장치를 제공하되, 최적 음질로 음성을 녹음하여 저장하며 이를 통신망에 구성된 서버로 전송한 후 STT에 따른 결과를 수신하여 녹음된 음성과 매칭 저장함과 아울러 음성과 텍스트에 대한 동기화 정보를 포함한 연동 정보도 함께 수신하여 저장함으로써, 텍스트를 기반으로 원하는 내용이 포함된 음성 파일을 쉽게 검색하고 해당 위치를 확인하여 재생하거나 이러한 연동 정보와 텍스트 파일을 포함하는 음성 파일 관련 정보들을 다른 단말에 함께 전달할 수 있도록 한 STT 녹음 장치 및 이를 이용한 음성-텍스트 변환 시스템을 제공하는 것이다. An object of the embodiments of the present invention for improving the above problems is to provide a compact dedicated recording device that can be easily and easily directed to the sound source, recording and storing the voice in the optimum sound quality and to a server configured in the communication network Receives the result according to the STT after transmission and matches with the recorded voice, and also receives and saves the interlocking information including the synchronization information for the voice and the text, so that the voice file containing the desired contents can be easily searched based on the text. The present invention provides an STT recording apparatus and a voice-to-text conversion system using the same, which allow a user to check and play a corresponding position, or to transmit information related to a voice file including such interworking information and a text file to another terminal.

또한, 본 발명 실시예들의 다른 목적은 녹음 장치로부터 녹음된 음성을 수신한 서버가 STT에 따른 결과를 생성하되, 음성과 텍스트에 대한 동기화 정보 및 텍스트 변환부분에 대한 신뢰도 정보를 포함한 연동 정보도 함께 생성하여 녹음 장치로 전달함으로써, 녹음 장치나 관련 정보를 제공 받는 타 단말기가 음성 파일에 대응되는 텍스트 파일을 이용할 경우 연동 정보를 통해서 신뢰도가 낮은 부분에 대한 음성 재생이 용이하도록 한 STT 녹음 장치 및 이를 이용한 음성-텍스트 변환 시스템을 제공하는 것이다. In addition, another object of the embodiments of the present invention is to generate a result according to the STT by the server receiving the recorded voice from the recording device, along with the interworking information including the synchronization information for the voice and text and the reliability information for the text conversion portion STT recording device that makes it possible to easily play the voice of the part with low reliability through interworking information when the recording device or other terminal receiving related information uses a text file corresponding to the voice file by generating and transmitting it to the recording device. It is to provide a speech-to-text conversion system.

또한, 본 발명 실시예들의 또 다른 목적은 녹음 장치의 저장 용량이 부족하거나, 사용자가 요청하는 경우 삭제 대상 음성 파일을 선택하여 해당 음성 파일에 대응되는 연동 정보를 기준으로 신뢰도가 낮은 부분의 음성 영역만 선별한 새로운 음성 파일을 생성하고 원본 음성 파일을 삭제함으로써 해당 음성 파일에 대한 저장 용량을 줄일 수 있도록 함과 아울러, TTS(Text to speech) 기능을 통해 해당 음성 파일에 대응되는 텍스트 파일을 재생하면서 신뢰도가 낮은 부분은 별도 보관된 음성 파일을 이용하도록 하여 용량과 신뢰성을 모두 만족시킬 수 있도록 한 STT 녹음 장치 및 이를 이용한 음성-텍스트 변환 시스템을 제공하는 것이다.In addition, another object of the embodiments of the present invention is a voice region of a low reliability part based on the linkage information corresponding to the voice file by selecting a voice file to be deleted if the storage capacity of the recording device is insufficient or the user requests it. You can reduce the storage capacity of the voice file by creating a new voice file selected and deleting the original voice file, and play the text file corresponding to the voice file through the Text to speech (TTS) function. The low reliability part is to provide an STT recording device and a voice-to-text conversion system using the same, so that the capacity and reliability can be satisfied by using a voice file stored separately.

상기와 같은 목적을 달성하기 위하여, 본 발명의 일 실시예에 따른 STT 녹음 장치는 마이크를 통해 음성 신호를 수집하여 디지털 신호로 변환하는 음성 입력부와, 외부 서버와 통신하는 무선 통신부와, 커넥터를 통해 충전 전원을 공급받고 외부 단말과 정보를 교환하는 유선 통신부와, 사용자의 입력을 수신하고 사용자에게 상태를 알리는 사용자 인터페이스부와, 음성 입력부에서 변환된 디지털 신호를 기반으로 하는 음성 정보가 저장되는 저장부와, 무선 통신부를 통해 저장부에 저장된 음성 정보를 외부 서버에 제공하고, 외부 서버로부터 해당 음성 정보에 대응되는 텍스트 정보와, 음성 정보와 텍스트 정보의 동기화 정보가 포함된 연동 정보를 수신하여 저장부에 저장하는 음성-텍스트 관리부와, 사용자 인터페이스부를 통해 사용자의 입력을 수신하고 그에 대한 상태를 표시하도록 제어하고, 음성-텍스트 관리부의 요청에 따라 무선 통신부와 저장부를 제어하며, 유선 통신부를 통해 저장부의 저장 정보를 외부 단말에 제공하거나 외부 단말로부터 수신한 정보를 저장부에 저장하도록 하는 제어부를 포함한다.In order to achieve the above object, the STT recording apparatus according to an embodiment of the present invention collects a voice signal through a microphone and converts it into a digital signal, a wireless communication unit for communicating with an external server, and through a connector A wired communication unit that receives charging power and exchanges information with an external terminal, a user interface unit that receives a user input and informs the user of the state, and a storage unit that stores voice information based on the digital signal converted by the voice input unit. And providing the voice information stored in the storage to the external server through the wireless communication unit, and receiving interworking information including text information corresponding to the voice information and synchronization information of the voice information and the text information from the external server. Voice-text management unit for storing in the user, and receives the user input through the user interface unit Control the wireless communication unit and the storage unit according to the request of the voice-text management unit, and provide the storage information of the storage unit to the external terminal through the wired communication unit or store information received from the external terminal in the storage unit. And a control unit for storing.

일례로서, 연동 정보는 음성 정보에 대응되어 변환된 텍스트에 대해서 재생 시간 기준 위치나 데이터 기준 위치에 대한 동기화 정보와 변환된 텍스트의 신뢰도 정보가 포함될 수 있다.As an example, the interworking information may include synchronization information about a reproduction time reference position or a data reference position and reliability information of the converted text with respect to the text converted corresponding to the voice information.

일례로서, 연동 정보는 텍스트 정보와 별도로 구분되거나 텍스트 정보에 태그 형태로 혼합되어 제공될 수 있다.As one example, the interworking information may be provided separately from the text information or mixed with the text information in a tag form.

일례로서, 제어부는 사용자 인터페이스부를 통한 사용자의 입력이나 기설정된 기준에 따른 저장부의 저장 공간 부족 시 삭제 대상 음성 정보를 선택하고, 해당 음성 정보에 대응되는 연동 정보에서 신뢰도가 기준 이하인 텍스트 영역에 대응되는 음성 정보 영역을 분리하여 별도의 보충 음성 정보를 생성한 후 해당 음성 정보를 삭제할 수 있다.For example, the controller selects voice information to be deleted when the user inputs through the user interface unit or when the storage unit runs out of storage space according to a predetermined criterion, and corresponds to a text area having a reliability lower than a reference value in interworking information corresponding to the voice information. The voice information area may be separated to generate additional supplementary voice information, and then the corresponding voice information may be deleted.

일례로서, 제어부는 사용자 인터페이스를 통한 사용자의 입력이나 기설정된 기준에 따른 저장부의 저장 공간 부족 시 삭제 대상 음성 정보를 선택하고, 해당 삭제 대상 음성 정보를 무선 통신부를 통해 외부 서버에 제공하며, 해당 음성 정보에 대응되는 연동 정보에서 신뢰도가 기준 이하인 텍스트 영역에 대응되는 음성 정보 영역을 분리하여 별도로 생성된 보충 음성 정보를 수신한 후 해당 음성 정보를 보충 음성 정보로 대체할 수 있다.For example, the controller selects the voice information to be deleted when the user inputs through the user interface or when the storage unit runs out of storage space according to a predetermined criterion, and provides the voice information to be deleted to the external server through the wireless communication unit. In the interworking information corresponding to the information, the voice information area corresponding to the text area having a reliability lower than the reference level may be separated to receive supplementary voice information generated separately, and the voice information may be replaced with the supplemental voice information.

일례로서, 신뢰도가 기준 이하인 텍스트 영역에 대응되는 음성 정보 영역을 분리할 경우 상기 대응 음성 정보 영역을 기준으로 그 이전 및 이후에 존재하는 기 설정된 숫자의 단어나 문장까지 음성 정보 영역을 확장하여 분리할 수도 있다.For example, when the voice information area corresponding to the text area having a reliability lower than the reference is separated, the voice information area may be extended and separated up to a preset number of words or sentences existing before and after the reference based on the corresponding voice information area. It may be.

일례로서, 사용자 인터페이스부의 입력에 따라 제어부가 저장부에서 음성 정보를 선별하여 제공하면 음성 정보를 재생하는 재생부를 더 포함하며, 재생부는 음성 정보를 재생하거나, 해당 음성 정보에 대응되는 텍스트를 이용하여 TTS(Text to speech) 방식으로 음성 정보를 제공할 수 있고, TTS 방식으로 음성 정보를 제공할 경우 해당 음성 정보에 대응되는 연동 정보를 통해 신뢰도가 낮은 변환 부분이 검출되면 텍스트 대신 대응 음성 정보 영역을 재생할 수 있다.For example, when the controller selects and provides the voice information from the storage unit according to the input of the user interface unit, the controller may further include a playback unit for playing the voice information, and the playback unit may reproduce the voice information or use text corresponding to the voice information. Speech information can be provided using a text to speech (TTS) method, and when the speech information is provided using the TTS method, if a low-conversion portion is detected through interworking information corresponding to the speech information, the corresponding speech information area is replaced with text. Can play.

일례로서, 제어부는 사용자 인터페이스부나 무선 통신부 혹은 유선 통신부를 통해 수신되는 검색어를 포함한 검색 요청 시 저장부에 저장된 음성 정보를 대응되는 텍스트의 내용을 기준으로 검색하여 선별할 수 있다.For example, the controller may search and select the voice information stored in the storage unit based on the content of the text when the search request including the search word received through the user interface unit, the wireless communication unit, or the wired communication unit.

본 발명의 다른 실시예에 따른 STT 녹음 장치를 이용한 음성-텍스트 변환 시스템은 수집되는 음성 정보를 저장하며, 상기 음성 정보를 서비스 서버에 전송한 후 수신되는 대응 텍스트 정보와, 동기화 정보 및 신뢰도 정보를 포함하는 연동 정보를 음성 정보에 매칭하여 저장부에 저장하고 연결되는 외부 단말기에 음성 신호, 대응 텍스트 정보 및 연동 정보를 제공하는 STT 녹음 장치와, 상기 STT 녹음 장치로부터 수신되는 음성 정보를 텍스트로 변환하면서 재생 시간 기준 위치나 데이터 기준 위치에 대한 동기화 정보와 변환된 텍스트의 신뢰도 정보를 포함하는 연동 정보를 생성하여 텍스트 정보와 연동 정보를 상기 STT 녹음 장치로 전송하는 서비스 서버를 포함한다.The voice-to-text conversion system using the STT recording apparatus according to another embodiment of the present invention stores the collected voice information, and transmits the corresponding text information, synchronization information, and reliability information received after transmitting the voice information to the service server. STT recording device that matches the included interlocking information to the voice information and stores it in the storage unit and provides a voice signal, corresponding text information and interlocking information to the connected external terminal, and converts the voice information received from the STT recording device into text And a service server for generating interlocking information including synchronization information on a reproduction time reference position or a data reference position and reliability information of the converted text, and transmitting the text information and the interlocking information to the STT recording device.

일례로서, STT 녹음 장치는 사용자의 입력이나 기설정된 기준에 따른 저장부의 저장 공간 부족 시 삭제 대상 음성 정보를 선택하여 연동 정보에서 신뢰도가 기준 이하인 텍스트 영역에 대응되는 음성 정보 영역을 분리하여 별도로 생성된 보충 음성 정보로 대체 저장하되, 이러한 보충 음성 정보를 STT 녹음 장치가 생성하거나 서비스 서버가 STT 녹음 장치가 선택한 음성 정보에 대한 보충 음성 정보를 생성하여 STT 녹음 장치에 제공할 수 있다.As an example, the STT recording apparatus may separately generate a voice information area corresponding to a text area having a reliability lower than a reference from interlocking information by selecting the voice information to be deleted when the storage space of the storage unit according to a user's input or a predetermined standard is insufficient. The supplementary voice information may be alternatively stored, but the supplementary voice information may be generated by the STT recording apparatus or the service server may generate supplemental voice information about the voice information selected by the STT recording apparatus and provide the supplementary voice information to the STT recording apparatus.

본 발명의 실시예들에 따른 STT 녹음 장치 및 이를 이용한 음성-텍스트 변환 시스템은 소형의 전용 녹음 장치를 통해 최적 음질로 음성을 녹음하여 저장하며, 외부 서버를 통해 텍스트 정보 및 음성과 텍스트에 대한 동기화 정보를 포함한 연동 정보도 함께 수신하여 저장함으로써, 연동 정보와 텍스트 파일을 포함하는 음성 파일 관련 정보들을 다른 단말에 함께 전달하여 활용도를 극대화할 수 있는 효과가 있다.The STT recording apparatus and the voice-to-text conversion system using the same according to embodiments of the present invention record and store the voice at an optimum sound quality through a small dedicated recording apparatus, and synchronize text information and voice and text through an external server. By receiving and storing the interworking information including the information, there is an effect that can maximize the utilization by transmitting the interworking information and voice file-related information including the text file to the other terminal.

본 발명의 실시예들에 따른 STT 녹음 장치 및 이를 이용한 음성-텍스트 변환 시스템은 STT 녹음 장치가 음성과 텍스트에 대한 동기화 정보 및 텍스트 변환부분에 대한 신뢰도 정보를 포함한 연동 정보도 함께 생성하여 녹음 장치로 전달함으로써, 녹음 장치나 관련 정보를 제공 받는 타 단말기가 음성 파일에 대응되는 텍스트 파일을 이용할 경우 연동 정보를 통해서 신뢰도가 낮은 부분에 대한 음성 재생이 용이하도록 하여 신뢰성을 높일 수 있는 효과가 있다.According to embodiments of the present invention, an STT recording apparatus and a voice-to-text conversion system using the STT recording apparatus also generate interworking information including synchronization information on voice and text and reliability information on a text conversion portion, and then, to the recording apparatus. By transmitting, when the recording device or another terminal provided with the related information uses a text file corresponding to the voice file, it is possible to increase the reliability by facilitating the voice reproduction for the low reliability part through the interworking information.

본 발명의 실시예들에 따른 STT 녹음 장치 및 이를 이용한 음성-텍스트 변환 시스템은 녹음 장치의 저장 용량이 부족하거나, 사용자가 제어하는 경우 삭제 대상 음성 파일을 선택하여 해당 음성 파일에 대응되는 연동 정보를 기준으로 신뢰도가 낮은 부분의 음성 영역만 선별한 새로운 음성 파일을 생성하고 원본 음성 파일을 삭제함으로써 해당 음성 파일에 대한 저장 용량을 줄일 수 있도록 함과 아울러, TTS(Text to speech) 기능을 통해 해당 음성 파일에 대응되는 텍스트 파일을 재생하면서 신뢰도가 낮은 부분은 별도 보관된 음성 파일을 이용하도록 하여 용량과 신뢰성을 모두 만족시킬 수 있는 효과가 있다.According to embodiments of the present invention, an STT recording apparatus and a voice-to-text conversion system using the same may provide interworking information corresponding to a corresponding voice file by selecting a voice file to be deleted if the storage capacity of the recording device is insufficient or when a user controls it. By creating a new voice file that selects only the voice region of the low-reliability part as a reference and deleting the original voice file, the storage capacity for the voice file can be reduced, and the voice is also provided through the text to speech (TTS) function. The part having low reliability while playing the text file corresponding to the file has the effect of satisfying both capacity and reliability by using a separately stored voice file.

도 1은 본 발명의 실시예에 따른 STT 녹음 장치를 이용한 음성-텍스트 변환 시스템을 보인 구성도이다.
도 2는 본 발명의 실시예에 따른 STT 녹음 장치의 예시를 보인 것이다.
도 3은 본 발명의 실시예에 따른 STT 녹음 장치의 구성도이다.
도 4는 본 발명의 실시예에 따른 서비스 서버의 구성도이다.
도 5는 본 발명의 실시예에 따른 음성-텍스트 변환 시스템의 특징적 동작 방식을 설명하기 위한 개념도이다.
도 6은 본 발명의 실시예에 따른 STT 녹음 장치의 동작 과정을 보인 순서도이다.
도 7은 본 발명의 실시예에 따른 서비스 서버의 동작 과정을 보인 순서도이다.
도 8은 본 발명의 실시예에 따른 용량 감소 방식을 설명하기 위한 개념도이다.
도 9는 본 발명의 실시예에 따른 용량 감소 방식의 동작 과정을 보인 순서도이다.1 is a block diagram showing a voice-to-text conversion system using an STT recording apparatus according to an embodiment of the present invention.
2 shows an example of an STT recording apparatus according to an embodiment of the present invention.
3 is a block diagram of an STT recording apparatus according to an embodiment of the present invention.
4 is a configuration diagram of a service server according to an embodiment of the present invention.
5 is a conceptual diagram illustrating a characteristic operation method of the speech-to-text conversion system according to an embodiment of the present invention.
6 is a flowchart illustrating an operation process of an STT recording apparatus according to an embodiment of the present invention.
7 is a flowchart illustrating an operation of a service server according to an embodiment of the present invention.
8 is a conceptual diagram illustrating a capacity reduction method according to an embodiment of the present invention.
9 is a flowchart illustrating an operation of a capacity reduction method according to an embodiment of the present invention.

상기한 바와 같은 본 발명을 첨부된 도면들과 실시예들을 통해 상세히 설명하도록 한다.The present invention as described above will be described in detail with reference to the accompanying drawings and embodiments.

본 발명에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 발명에서 사용되는 기술적 용어는 본 발명에서 특별히 다른 의미로 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 발명에서 사용되는 기술적인 용어가 본 발명의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 당업자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 발명에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.Technical terms used in the present invention are merely used to describe particular embodiments, it should be noted that it is not intended to limit the present invention. In addition, the technical terms used in the present invention should be interpreted as meanings generally understood by those skilled in the art unless the present invention is defined in any other meaning in the present invention, and is excessively comprehensive. It shall not be construed in the sense of or in the sense of being excessively reduced. In addition, when the technical terminology used in the present invention is an incorrect technical term that does not accurately express the spirit of the present invention, it should be understood as being replaced by a technical term that can be properly understood by those skilled in the art. In addition, the general terms used in the present invention should be interpreted as defined in the dictionary or according to the context before and after, and should not be interpreted in an excessively reduced sense.

또한, 본 발명에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함한다. 본 발명에서, "구성된다" 또는 "포함한다" 등의 용어는 발명에 기재된 여러 구성 요소들, 또는 여러 단계를 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.Also, the singular forms used in the present invention include the plural forms unless the context clearly indicates otherwise. In the present invention, terms such as “consisting of” or “comprising” are not to be construed as necessarily including all of the various components or steps described in the invention, and some of the components or some of the steps are included. It should be construed that it may not be, or may further include additional components or steps.

또한, 본 발명에서 사용되는 제 1, 제 2 등과 같이 서수를 포함하는 용어는 구성 요소들을 설명하는데 사용될 수 있지만, 구성 요소들은 용어들에 의해 한정되어서는 안 된다. 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성 요소는 제 2 구성 요소로 명명될 수 있고, 유사하게 제 2 구성 요소도 제 1 구성 요소로 명명될 수 있다.In addition, terms including ordinal numbers such as first and second used in the present invention may be used to describe components, but the components should not be limited by the terms. The terms are used only to distinguish one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings, and the same or similar components will be given the same reference numerals regardless of the reference numerals and redundant description thereof will be omitted.

또한, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 발명의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 발명의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다.In addition, in describing the present invention, when it is determined that the detailed description of the related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, it should be noted that the accompanying drawings are only for easily understanding the spirit of the present invention and should not be construed as limiting the spirit of the present invention by the accompanying drawings.

특히, 본 발명에서 설명하는 타 단말기는 유선 통신 기능을 구비한 스마트 폰(Smart Phone), 휴대 단말기(Portable Terminal), 이동 단말기(Mobile Terminal), 개인 정보 단말기(Personal Digital Assistant: PDA), PMP(Portable Multimedia Player) 단말기, 개인용 컴퓨터(Personal Computer), 노트북 컴퓨터, 태블릿 PC(Tablet PC) 등과 같은 다양한 단말기를 포함하며, 상기 서비스 서버는 웹페이지 인터페이스를 제공하는 웹 서버, 어플리케이션을 통해서 설정된 기능을 수행하는 웹 어플리케이션 서버를 포함하는 물리적 혹은 가상적 서버를 포함한다.In particular, the other terminals described in the present invention are a smart phone having a wired communication function, a portable terminal, a mobile terminal, a personal digital assistant (PDA), a PMP ( It includes various terminals such as a portable multimedia player terminal, a personal computer, a notebook computer, a tablet PC, and the like, and the service server performs a function set through a web server and an application that provide a web page interface. It includes a physical or virtual server including a web application server.

이하, 본 발명의 실시예들을 도 1 내지 9를 참조하여 설명한다.Hereinafter, embodiments of the present invention will be described with reference to FIGS. 1 to 9.

도 1은 본 발명의 실시예에 따른 STT 녹음 장치를 이용한 음성-텍스트 변환 시스템을 보인 구성도이다. 도시된 바와 같이 작고 휴대가 가능하며 화자(1)의 음성을 최적 음질로 저장하고 서비스 서버(200)에 해당 음성 정보를 전송한 후 수신한 대응 텍스트 정보를 매칭 저장하는 STT 녹음 장치(100)와, 통신망(20)에 연결되어 STT 녹음 장치(100)와 연동하여 녹음된 음성 정보에 대한 텍스트 변화 과정을 수행한 후 변환 텍스트를 포함하는 결과를 STT 녹음 장치(100)에 제공하는 서비스 서버(200)를 포함한다.1 is a block diagram showing a voice-to-text conversion system using an STT recording apparatus according to an embodiment of the present invention. As shown in the figure is small and portable STT recording device 100 for storing the voice of the speaker (1) in the optimal sound quality and matching the corresponding text information received after transmitting the corresponding voice information to the service server 200 and The service server 200 is connected to the communication network 20 and performs a text change process on the voice information recorded in association with the STT recording apparatus 100 and then provides the STT recording apparatus 100 with a result including the converted text. ).

STT 녹음 장치(100)는 타 단말기(31, 32)와 유선으로 통신 연결되어 저장된 음성 정보와 그에 대응되는 텍스트 정보를 전송할 수 있으며, 타 단말기(31, 32)는 수신한 음성 정보 및 텍스트 정보를 함께 또는 각각 사용할 수 있고, 회원 인증을 통해 서비스 서버(200)에 접속한 후 관련 정보를 전달 받을 수도 있다.The STT recording apparatus 100 may transmit voice information and text information corresponding thereto stored in a wired communication with other terminals 31 and 32, and the other terminals 31 and 32 may receive the received voice information and text information. It may be used together or separately, and may receive related information after accessing the service server 200 through member authentication.

즉, 서비스 서버(200)가 수신한 음성 정보와 그에 대응되는 변환 텍스트 정보를 회원에 대응하여 저장한 후 회원 인증을 통해 접속한 타 단말기(31, 32)에 해당 정보를 제공할 수도 있다. 이를 위해서 서비스 서버(200)는 내부에 회원 관리부(230)를 두어 회원 가입과 회원에 대한 인증을 수행할 수 있으며, 해당 회원에 대해서 수신한 음성 정보와 변환된 텍스트 정보를 포함하는 서비스 관련 정보를 보유하고 인증된 회원에게 해당 정보를 제공해 줄 수 있다.That is, the service server 200 may store the received voice information and the corresponding converted text information corresponding to the member and provide the corresponding information to the other terminals 31 and 32 connected through member authentication. To this end, the service server 200 may have a member management unit 230 therein to perform member registration and authentication of a member, and service related information including voice information and converted text information received for the member. The information can be provided to the retained and certified members.

한편, 본 발명의 실시예에서는 STT 녹음 장치(100)와 서비스 서버(200)가 단순히 음성 정보를 전달하고 이를 알려져 있는 음성 인식 기술을 통해 텍스트로 변환한 결과 만을 제공하는 것이 아니라, 다음과 같은 추가 정보를 추가로 생성 및 이용할 수 있다. Meanwhile, in the exemplary embodiment of the present invention, the STT recording apparatus 100 and the service server 200 do not simply deliver the voice information and provide only the result of converting the voice information into text through a known voice recognition technology. Additional information can be created and used.

먼저, 서비스 서버(200)는 내부의 STT 변환부(240)를 통해서 음성 정보에 대한 텍스트 변환을 수행하면서, 음성 정보에 대응되어 변환된 텍스트에 대해서 재생 시간 기준 위치나 데이터 기준 위치에 대한 동기화 정보를 생성한 후 이를 연동 정보로서 STT 녹음 장치(100)에 제공할 수 있다.First, the service server 200 performs text conversion on the voice information through the internal STT conversion unit 240, and synchronizes information on the playback time reference position or data reference position with respect to the text converted in correspondence with the voice information. After generating the can be provided to the STT recording device 100 as interworking information.

STT 녹음 장치(100)는 텍스트 정보와 연동 정보를 기반으로 입력되는 키워드가 포함된 음성 정보와 위치를 용이하게 파악할 수 있으므로 수많은 음성 정보들 중에서 원하는 내용이 포함된 음성 정보를 선별하여 제공할 수 있으며, 관련 키워드가 포함된 위치에서 곧바로 재생할 수 있다. Since the STT recording apparatus 100 can easily grasp the voice information and the location including the keyword input based on the text information and the interworking information, the STT recording device 100 can selectively provide the voice information including the desired content among the numerous voice information. , Can be played directly from the location containing the relevant keyword.

또한, 이러한 연동 정보는 텍스트 정보와 별도로 구분되거나 텍스트 정보에 태그 형태로 혼합되어 제공될 수 있는데, 마치 자막처럼 음성 정보를 재생할 경우 대응되는 텍스트가 표시되도록 할 수 있으며, 노래의 가사처럼 재생되는 음성 정보에 맞추어 텍스트가 표시되도록 할 수도 있다. 이는 해당 정보를 제공 받은 타 단말기들(31, 32)에서 통상의 멀티미디어 재생 어플리케이션을 통해 실행할 수 있으므로 그 편의성이 높다.In addition, such interworking information may be provided separately from the text information or mixed in the form of a tag in the text information. When the audio information is played like a subtitle, the corresponding text may be displayed, and the voice played like the lyrics of a song. You can also have text displayed in response to information. This is convenient because other terminals 31 and 32 provided with the corresponding information can be executed through a general multimedia playback application.

다른 추가 정보로서, 서비스 서버(200)는 STT 변환부(240)를 통해서 음성 정보에 대한 텍스트 변환을 수행하면서, 텍스트 변환부분에 대한 신뢰도 정보를 생성하여 연동 정보로 제공할 수 있는데, 이러한 신뢰도 정보는 변환 시 얻어지는 변환의 결과들 중 인식률에 대응하여 설정될 수 있으며 그 정보의 구성은 다양할 수 있다. As another additional information, the service server 200 may perform the text conversion on the voice information through the STT conversion unit 240, generate the reliability information for the text conversion portion and provide it as interworking information. May be set corresponding to the recognition rate among the results of the conversion obtained during the conversion, and the configuration of the information may vary.

예를 들어, 변환된 텍스트에 대해서 동기화 정보와 같이 각 인식 단어에 대한 신뢰성을 수치로 표현한 정보를 생성할 수도 있고, 몇 가지 단계의 기준을 마련한 후 해당 기준에 따라 텍스트의 표현을 달리(굵게 표시, 기울임 표시, 밑줄, 크기 등)할 수도 있으며, 태그 정보로서 신뢰성을 표시할 수도 있다.For example, it is possible to generate information that numerically expresses the reliability of each recognized word, such as synchronization information, for the converted text. , Italic, underline, size, etc.), and reliability as tag information.

이와 같은 신뢰성 정보는 동기화 정보와 함께 생성되기 때문에 신뢰성이 낮은 부분을 용이하게 선별하여 해당 부분의 음성 정보를 빠르게 선택 재생할 수 있다. 이를 통해서 텍스트 결과물을 이용하고자 할 경우 신뢰성이 낮은 부분을 빠르게 확인하여 사용자가 잘못된 부분을 보정할 수 있다.Since the reliability information is generated together with the synchronization information, the parts having low reliability can be easily selected to quickly select and reproduce the voice information of the corresponding parts. In this way, when you want to use the text output, the user can correct the wrong part by quickly checking the low reliability part.

한편, 이러한 신뢰성 정보와 그에 대응되는 음성 정보는 변환된 텍스트에 대한 신뢰성을 높이기 위한 중요한 부분으로 활용될 수 있음과 아울러, 저장 공간이 부족하거나 파일을 정리해야 할 필요가 있을 때 재생 시간이 길고 용량이 큰 음성 정보를 모두 보관하지 않고 신뢰성이 낮아서 텍스트 만으로는 정확한 변환을 확인할 수 없는 음성 정보 부분만 분리하여 별도의 보충 음성 정보를 생성한 후 원래의 음성 정보를 대체함으로써 용량을 줄이면서도 신뢰성을 유지할 수 있다. On the other hand, such reliability information and corresponding speech information can be utilized as an important part to increase the reliability of the converted text, and also have long playback time and capacity when there is insufficient storage space or when files need to be cleaned up. It is possible to maintain reliability while reducing the capacity by replacing the original voice information by generating separate supplementary voice information by separating only the voice information part which is not reliable and cannot confirm the exact conversion by text alone without storing all this large voice information. have.

이 경우 신뢰도나 동기화 정보 부분에서의 음성 정보 연동 시점은 보충 음성 정보에 대한 것으로 변경될 수 있으며, 다른 방식으로서, 보충 음성 정보와 원본 음성 정보의 재생 길이는 동일하도록 하되 선별되지 않은(신뢰도가 높은) 영역은 음성 정보를 삭제하여 압축률을 높인 상태로 보충 음성 정보를 생성하여 재생 시간은 동일하지만 저장 용량을 현저히 줄일 수도 있다.In this case, the timing of interworking voice information in the reliability or synchronization information may be changed to that of supplementary voice information. Alternatively, the reproduction lengths of the supplementary voice information and the original voice information may be the same, but not selected (high reliability). The subfields generate supplemental voice information with the compression ratio increased by deleting the voice information, so that the playback time is the same but the storage capacity can be significantly reduced.

이하, 이러한 본 발명의 실시예들에 대한 좀 더 상세한 구성을 통해서 본 발명의 동작들을 설명하도록 한다. Hereinafter, the operation of the present invention will be described through a more detailed configuration of the embodiments of the present invention.

도 2는 본 발명의 실시예에 따른 STT 녹음 장치의 예시를 보인 것이고, 도 3은 본 발명의 실시예에 따른 STT 녹음 장치의 구성도를 보인 것이다. Figure 2 shows an example of the STT recording apparatus according to an embodiment of the present invention, Figure 3 shows a block diagram of the STT recording apparatus according to an embodiment of the present invention.

도 2에 도시된 바와 같이 STT 녹음 장치(100)는 용이하게 휴대할 수 있는 소형의 크기를 가진 것으로 지향성 마이크(121)를 구비하고 있으며, 사용자의 입력을 수신할 수 있는 입력 인터페이스(131)와 착탈식 저장부를 저장할 수 있는 저장 인터페이스(181), 유선 통신부와 연결할 수 있는 커넥터(141) 및 재생을 위해 스피커나 이어폰/헤드폰을 연결할 수 있는 커넥터(191), 상태를 표시할 수 있는 출력 인터페이스(132)가 구성되어 있다. As shown in FIG. 2, the STT recording apparatus 100 has a small size that can be easily carried and includes a directional microphone 121 and an input interface 131 for receiving a user input. A storage interface 181 for storing a removable storage unit, a connector 141 for connecting with a wired communication unit, a connector 191 for connecting a speaker or earphone / headphone for playback, and an output interface for displaying a state 132 ) Is configured.

도시되지는 않았으나 출력 인터페이스(132)로서 문자나 그래픽을 표시할 수 있는 표시부를 더 구성할 수 있고, 도시된 유선 통신용 커넥터(141)외에 직접 타 단말기의 유선 통신용 커넥터에 마치 메모리 스틱처럼 직접 연결될 수 있는 단자(예를 들어, 다양한 타입의 USB 단자)를 구비할 수도 있다.Although not shown, the output interface 132 may further include a display unit capable of displaying text or graphics, and may be directly connected to a wired communication connector of another terminal as a memory stick, in addition to the wired communication connector 141 shown. It may be provided with a terminal (for example, various types of USB terminal).

도 3은 이러한 STT 녹음 장치(100)의 내부 구성을 보인 구성도로서, 도시된 바와 같이 마이크를 통해 음성 신호를 수집하여 디지털 신호로 변환하는 음성 입력부(120)와, 외부 서버로서 STT 기능을 수행하는 서비스 서버와 통신하는 무선 통신부(110)와, 커넥터를 통해 충전 전원을 공급받고 외부 단말과 정보를 교환하는 유선 통신부(140)와, 사용자의 입력을 수신하고 사용자에게 상태를 알리는 사용자 인터페이스부(130)와, 음성 입력부(120)에서 변환된 디지털 신호를 기반으로 하는 음성 정보가 저장되는 저장부(180)와, 무선 통신부(110)를 통해 저장부(180)에 저장된 음성 정보를 서비스 서버에 제공하고, 서비스 서버로부터 해당 음성 정보에 대응되는 텍스트 정보와, 해당 음성 정보와 텍스트 정보의 동기화 정보가 포함된 연동 정보를 수신하여 저장부(180)에 저장하는 음성-텍스트 관리부(150)와, 저장부(180)에 저장된 음성 정보를 재생하는 재생부(190)와, 유선 통신부(140)를 통해 제공되는 충전 전원이나 별도의 외부 전원을 통해 배터리를 충전하고 충전된 전원을 관리하는 전원 관리부(170)와, 음성 입력부(120)를 통해 디지털 신호로 변환된 음성 정보를 소정 포맷의 파일 형태로 변환하여 저장부(180)에 저장하고, 사용자 인터페이스부(130)를 통해 사용자의 입력을 수신하고 그에 대한 상태를 표시하도록 제어하며, 음성-텍스트 관리부(150)의 요청에 따라 무선 통신부(110)와 저장부(180)를 제어하고, 유선 통신부(140)를 통해 저장부(180)의 저장 정보를 외부 단말에 제공하거나 외부 단말로부터 수신한 정보를 저장부(180)에 저장하는 제어부(160)를 포함한다. 3 is a block diagram showing the internal configuration of the STT recording apparatus 100, the voice input unit 120 to collect the voice signal through a microphone and convert it into a digital signal as shown, and performs the STT function as an external server The wireless communication unit 110 to communicate with the service server, the wired communication unit 140 receives the charging power through the connector and exchanges information with the external terminal, and a user interface unit for receiving the user's input and informs the user of the status ( 130, the storage unit 180 storing voice information based on the digital signal converted by the voice input unit 120, and the voice information stored in the storage unit 180 through the wireless communication unit 110 to the service server. And receiving interworking information including text information corresponding to the voice information from the service server and synchronization information of the voice information and the text information, and storing the received information in the storage unit 180. Charge the battery through the voice-text management unit 150, the playback unit 190 for reproducing the voice information stored in the storage unit 180, and the charging power provided through the wired communication unit 140 or a separate external power source. And converts the voice information converted into a digital signal through the power management unit 170 and the voice input unit 120 into a file format in a predetermined format and stores the stored power in the storage unit 180, and the user interface unit ( Receives a user's input through the 130 and displays the status thereof, and controls the wireless communication unit 110 and the storage unit 180 according to the request of the voice-text management unit 150, wired communication unit 140 The controller 160 may provide the storage information of the storage unit 180 to the external terminal through the storage unit 180 or store the information received from the external terminal in the storage unit 180.

물론, 제어부(160)는 추가적으로 전원 관리부(170)를 통해 전원 상태를 파악하여 이를 사용자 인터페이스부(130)를 통해 사용자에게 알리거나, 사용자 인터페이스부(130)의 입력에 따라 저장부(180)에 저장된 음성 정보나 텍스트 정보를 선별하여 재생부(190)를 통해 재생하며 그 선별 과정이나 재생 과정을 사용자 인터페이스부(130)를 통해 표시하는 기본적인 기능을 수행할 수 있다. Of course, the controller 160 additionally checks the power state through the power manager 170 and informs the user of the power state through the user interface 130 or the storage unit 180 according to the input of the user interface 130. The stored voice information or text information may be selected and played through the playback unit 190, and the screening process or the playback process may be displayed through the user interface 130.

상기 음성 입력부(120)는 내부적으로 내장된 마이크와 증폭기, 필터, 아날로그 디지털 변환기 등의 구성을 포함할 수 있으며, 무선 통신부(110)에 구성될 수 있는 근거리 무선 통신을 이용하여 연동하는 무선 마이크를 선택적으로 이용할 수도 있다.The voice input unit 120 may include an internally built-in microphone, an amplifier, a filter, an analog-digital converter, and the like. The voice input unit 120 may include a wireless microphone interworking using short-range wireless communication that may be configured in the wireless communication unit 110. It can also be used selectively.

사용자 인터페이스부(130)는 복수의 입력용 스위치나 터치 스위치, 조그셔틀, 슬라이드 스위치, 로터리 스위치, 볼륨을 포함하는 다양한 입력 수단 중 적어도 하나와, 발광 다이오드(Light-Emitting Diode: LED), 액정 디스플레이(Liquid Crystal Display: LCD), 박막 트랜지스터 액정 디스플레이(Thin Film Transistor-Liquid Crystal Display: TFT LCD), 유기 발광 다이오드(Organic Light-Emitting Diode: OLED), 플렉시블 디스플레이(Flexible Display), 3차원 디스플레이(3D Display), 전자잉크 디스플레이(e-ink display), LED(Light Emitting Diode)와 같은 디스플레이나 스피커를 포함하는 다양한 출력 수단 중 적어도 하나로 구현될 수 있다.The user interface 130 includes at least one of a plurality of input means including a plurality of input switches, a touch switch, a jog shuttle, a slide switch, a rotary switch, and a volume, a light-emitting diode (LED), and a liquid crystal display. (Liquid Crystal Display: LCD), Thin Film Transistor-Liquid Crystal Display (TFT LCD), Organic Light-Emitting Diode (OLED), Flexible Display, 3D Display (3D) At least one of various output means including a display or a speaker such as a display, an e-ink display, and a light emitting diode (LED) may be implemented.

무선 통신부(110)는 기본적으로 통신망에 구성된 서비스 서버와 정보를 교환할 수 있는 다양한 광대역 무선 통신이나 인터넷 연결을 위한 근거리 무선 통신 방식을 지원할 수 있으며, 무선 마이크 등과 연결하기 위한 근거리 무선 통신 방식을 지원할 수 있다. 예를 들어, 무선랜(Wireless LAN: WLAN), 와이브로(Wireless Broadband: Wibro), 와이맥스(World Interoperability for Microwave Access: Wimax), HSDPA(High Speed Downlink Packet Access), IEEE 802.16, 롱 텀 에볼루션(Long Term Evolution: LTE), 광대역 무선 이동 통신 서비스(Wireless Mobile Broadband Service: WMBS), 장거리 IoT 통신 방식인 로라(Long Range: LoRa) 등이 포함될 수 있다. 한편, 근거리 통신 기술로는, 블루투스(Bluetooth), 와이 파이(Wi-Fi), RFID(Radio Frequency Identification), 적외선 통신(Infrared Data Association: IrDA), UWB(Ultra Wideband), 지그비(ZigBee), 인접 자장 통신(Near Field Communication: NFC), 초음파 통신(Ultra Sound Communication: USC), 가시광 통신(Visible Light Communication: VLC), BLE(Bluetooth Low Energy) 등이 포함될 수 있다.The wireless communication unit 110 may basically support various broadband wireless communication that can exchange information with a service server configured in a communication network or a short range wireless communication method for connecting to the Internet, and support a short range wireless communication method for connecting with a wireless microphone. Can be. For example, Wireless LAN (WLAN), Wireless Broadband (Wibro), WiMAX (World Interoperability for Microwave Access (Wimax), High Speed Downlink Packet Access (HSDPA), IEEE 802.16, Long Term Evolution Evolution (LTE), Wireless Mobile Broadband Service (WMBS), and Long Range (LoRa), which is a long-range IoT communication method, may be included. Meanwhile, short-range communication technologies include Bluetooth, Wi-Fi, Radio Frequency Identification (RFID), Infrared Data Association (IrDA), UWB (Ultra Wideband), ZigBee, and Neighboring. Near Field Communication (NFC), Ultrasonic Communication (Ultra Sound Communication (USC)), Visible Light Communication (VLC), Bluetooth Low Energy (BLE), and the like may be included.

유선 통신부(140)는 전력선 통신(Power Line Communication: PLC), USB 통신, 이더넷(Ethernet), IEEE 1394, 시리얼 통신(serial communication) 등의 다양한 통신 방식이 사용될 수 있으나, 호환성이 높고 전원 전달도 가능한 USB 통신 방식을 이용하는 것이 바람직하다.The wired communication unit 140 may use various communication methods such as power line communication (PLC), USB communication, Ethernet, IEEE 1394, and serial communication, but are highly compatible and capable of delivering power. It is preferable to use a USB communication system.

음성-텍스트 관리부(150)는 예컨대 와이파이 연결이 가능한 상태가 되면 무선 통신부(110)를 통해 저장부(180)에 저장된 음성 정보를 서비스 서버에 제공할 수 있으며, 서비스 서버로부터 해당 음성 정보에 대응되는 텍스트 정보와, 해당 음성 정보와 텍스트 정보의 동기화 정보가 포함된 연동 정보를 수신하여 저장부(180)에 저장하는데, 이러한 연동 정보에는 동기화 정보 외에 신뢰도 정보가 더 포함될 수 있다. For example, when the Wi-Fi connection is possible, the voice-text manager 150 may provide voice information stored in the storage 180 to the service server through the wireless communication unit 110 and correspond to the voice information from the service server. The interlocking information including the text information and the synchronization information of the corresponding voice information and the text information is received and stored in the storage unit 180. The interlocking information may further include reliability information in addition to the synchronization information.

제어부는 사용자 인터페이스부(130)를 통한 사용자의 입력이나 기설정된 기준에 따른 저장부(180)의 저장 공간 부족 시 삭제 대상 음성 정보를 선택하여 연동 정보에서 신뢰도가 기준 이하인 텍스트 영역에 대응되는 음성 정보 영역을 분리하여 별도로 생성한 보충 음성 정보로 대체 저장할 수 있다. 즉, 저장 용량이 큰 음성 정보를 삭제하는 대신 텍스트 변환 신뢰도가 낮은 부분의 음성 정보만 분리하여 저장한 보충 음성 정보를 대신 저장함으로써 저장 공간을 절약할 수 있다.The controller selects voice information to be deleted when the user inputs through the user interface unit 130 or when the storage unit 180 runs out of storage space according to a predetermined criterion so that the voice information corresponding to the text area whose reliability is lower than the reference in the interlocking information is selected. The area can be separated and replaced with the supplementary voice information generated separately. That is, instead of deleting the voice information having a large storage capacity, it is possible to save the storage space by storing supplemental voice information stored by separating only the voice information of the portion having low text conversion reliability.

이러한 보충 음성 정보는 제어부(160)가 직접 생성하거나 서비스 서버를 통해서 제공 받을 수 있다. Such supplementary voice information may be generated by the controller 160 directly or provided through a service server.

예를 들어 제어부(160)는 삭제 대상 음성 정보를 선택하고, 해당 음성 정보에 대응되는 연동 정보에서 신뢰도가 기준 이하인 텍스트 영역에 대응되는 음성 정보 영역을 분리하여 별도의 보충 음성 정보를 생성한 후 해당 음성 정보를 삭제하고 생성된 보충 음성 정보를 저장부(180)에 저장할 수 있다.For example, the controller 160 selects the voice information to be deleted, separates the voice information area corresponding to the text area whose reliability is lower than the reference from the interworking information corresponding to the voice information, and generates additional supplementary voice information. The voice information may be deleted and the generated supplemental voice information may be stored in the storage unit 180.

다른 예로서, 제어부(160)는 삭제 대상 음성 정보를 무선 통신부를 통해 서비스 서버에 제공하고, 해당 음성 정보에 대응되는 연동 정보에서 신뢰도가 기준 이하인 텍스트 영역에 대응되는 음성 정보 영역을 분리하여 별도로 생성된 보충 음성 정보를 수신한 후 해당 음성 정보를 보충 음성 정보로 대체할 수 있다.As another example, the controller 160 provides the voice information to be deleted to the service server through the wireless communication unit, and separately generates a voice information area corresponding to a text area having a reliability lower than a reference from interworking information corresponding to the voice information. After the supplementary voice information is received, the voice information may be replaced with the supplementary voice information.

이러한 각 경우에 있어서, 원본 음성 정보 대신 보충 음성 정보로 변경되었으므로 동기화 정보를 구비한 대응 연동 정보를 보충 음성 정보에 맞추어 변경하거나, 혹은 보충 음성 정보를 재생 길이는 원본 음성 정보와 동일하게 하되 신뢰도가 낮은 음성 정보 영역만 실제 데이터가 존재하고 그 외에는 데이터가 존재하지 않도록 하여 음성 정보의 크기를 줄이면서 연동 정보는 그대로 유지할 수 있다.In each of these cases, the supplementary speech information is changed to supplementary speech information instead of the original speech information, so that the corresponding linkage information with the synchronization information is changed according to the supplementary speech information, or the supplementary speech information has the same reproduction length as the original speech information, but the reliability is high. The interworking information can be kept intact while reducing the size of the voice information by making the actual data exist only in the low voice information area and no other data.

더불어, 신뢰도가 기준 이하인 텍스트 영역에 대응되는 음성 정보 영역을 분리할 경우 신뢰도가 낮은 음성 정보 영역만 분리하면 이를 사람이 듣더라도 정확히 구분하지 못할 가능성이 높다. 따라서, 대응 음성 정보 영역을 기준으로 그 이전 및 이후에 존재하는 기 설정된 숫자의 단어나 문장까지 음성 정보 영역을 확장하여 분리함으로써 맥락이나 전후를 파악하여 신뢰도가 낮은 부분의 음성을 확인하거나 추정할 수 있도록 한다.In addition, in the case where the voice information area corresponding to the text area whose reliability is lower than the reference is separated, if the voice information area having low reliability is separated, it is highly likely that the voice information area may not be accurately distinguished even by the human hearing. Therefore, the speech information region can be extended and separated from a predetermined number of words or sentences that exist before and after the corresponding speech information region to identify context or before and after to identify or estimate a speech having a low reliability. Make sure

한편, 재생부(190)는 제어부(160)의 제어에 따라 저장부(180)에 저장된 음성 정보를 재생하거나, 해당 음성 정보에 대응되는 텍스트를 이용하여 TTS(Text to speech) 방식으로 음성 정보를 제공할 수 있으며, TTS 방식으로 음성 정보를 제공할 경우 해당 음성 정보에 대응되는 연동 정보를 통해 신뢰도가 낮은 변환 부분이 검출되면 텍스트 대신 대응 음성 정보 영역을 재생할 수 있다. 즉, TTS를 통해서 저장 음성 정보에 대응되는 텍스트를 원하는 속도와 톤으로 음성을 들을 수 있으며 신뢰도가 낮은 텍스트 부분은 텍스트 대신에 대응 음성 정보 부분을 재생하여 잘못된 변환에 의한 신뢰도 하락을 방지할 수 있는데, 이는 원본 음성 정보를 모두 구비하거나 혹은 용량 감소를 위해 원본 음성 정보를 삭제하고 보충 음성 정보만 구비한 경우 모두 가능하다. 다시 말해서 원본 음성 정보를 삭제한 경우라도 신뢰성 있는 음성을 재생할 수 있게 된다.On the other hand, the playback unit 190 reproduces the voice information stored in the storage unit 180 under the control of the control unit 160, or by using the text corresponding to the speech information in the text to speech (TTS) method When the voice information is provided by the TTS method, when the transform part having low reliability is detected through the interworking information corresponding to the voice information, the corresponding voice information area may be played instead of the text. That is, through the TTS, the user can listen to the voice corresponding to the stored voice information at a desired speed and tone, and the low-reliability text portion can reproduce the corresponding voice information portion instead of the text to prevent the loss of reliability due to an incorrect conversion. For example, this may be performed when all original voice information is provided or when the original voice information is deleted and only supplementary voice information is provided to reduce the capacity. In other words, even if the original voice information is deleted, reliable voice can be reproduced.

이러한 제어부(160)는 사용자 인터페이스부(130)나 무선 통신부(110) 혹은 유선 통신부(140)를 통해 수신되는 검색어를 포함한 검색 요청 시 저장부(180)에 저장된 음성 정보를 대응되는 텍스트의 내용을 기준으로 검색하여 선별할 수 있는데, 이를 통해서 사용자가 원하는 내용이 포함된 음성 정보를 신속하고 용이하게 제공할 수 있다. The controller 160 displays the contents of the text corresponding to the voice information stored in the storage unit 180 when the search request including the search word received through the user interface 130, the wireless communication unit 110, or the wired communication unit 140. Searching and selecting based on the criteria, it can be provided quickly and easily the voice information including the content desired by the user.

결국, 위와 같은 구성을 통해서 방대한 음성 정보를 제한된 저장부에 저장할 수 있는데, 예를 들어 제한된 저장부에 한 학기 강의 내용을 모두 저장한다거나 1년간 회의 내용을 모두 저장하는 등의 방대한 음성 정보 내지 텍스트 정보 저장이 가능하며 원하는 내용을 신속하게 검색하여 확인할 수 있고, 필요한 정보만 선별하여 원하는 단말기에 제공할 수 있다.As a result, a large amount of voice information can be stored in a limited storage through the above configuration. For example, a large amount of voice information or text information can be stored in a limited storage such as storing all of a semester's lecture content or a year's meeting contents. It can be stored and can be quickly searched and checked for desired contents, and only necessary information can be selected and provided to a desired terminal.

도 4는 본 발명의 실시예에 따른 서비스 서버의 구성도이다.4 is a configuration diagram of a service server according to an embodiment of the present invention.

도시된 바와 같이 서비스 서버(200)는 STT 녹음 장치(100)와 통신망을 통해 통신하기 위한 통신부(210)와, STT 녹음 장치(100)가 전송하는 음성 정보를 수신하기 위한 음성 정보 수신부(220)와, 회원 가입 및 대응 회원에 대한 인증(전통적인 ID/PW 방식(웹 접속)이나 STT 녹음 장치(100)의 기기 식별 정보 이용)을 수행하고 STT 변환 기능 사용을 허가하거나 기 변환된 이력이나 결과물을 이용할 수 있도록 하는 회원 관리부(230)와, 음성 정보 수신부(220)를 통해 수신한 음성을 인식하여 텍스트로 변환하는 텍스트 생성부와 텍스트를 생성하면서 재생 시간이나 용량에 대한 위치를 기준으로 하는 동기화 정보 및 변환 신뢰도 정보를 포함하는 연동 정보를 생성하는 연동 정보 생성부를 구비한 STT 변환부(240)와, STT 변환부(240)에서 변환된 텍스트와 연동 정보를 결과 정보로서 STT 녹음 장치(100)에 제공하는 STT 결과 생성부(250)와, 음성 정보 수신부(220)에서 수신된 음성 정보와 STT 결과 생성부(250)에서 생성된 텍스트와 연동 정보를 회원별로 저장하는 회원별 음성 정보 저장부(260)를 구비한다. As shown, the service server 200 is a communication unit 210 for communicating with the STT recording apparatus 100 through a communication network, and a voice information receiving unit 220 for receiving voice information transmitted from the STT recording apparatus 100. And perform member registration and authentication of the corresponding member (traditional ID / PW method (web access) or device identification information of the STT recording device 100), and permit use of the STT conversion function, Member management unit 230 and the voice information receiving unit 220 to enable the use of the text generation unit for recognizing and converting the voice to the text and the synchronization information based on the location of the playback time or capacity while generating the text And an STT converter 240 having an interlocking information generating unit for generating interlocking information including the conversion reliability information, and the text and the interlocking information converted by the STT converting unit 240 as result information. A member for storing the STT result generator 250 and the voice information received from the voice information receiver 220 and the text and interworking information generated by the STT result generator 250 for each member. The voice information storage unit 260 is provided.

더하여, 서비스 서버(200)는 STT 녹음 장치로부터 삭제 대상 음성 정보를 그 자체 혹은 해당 삭제 대상 음성 정보에 대한 파일명이나 식별 정보를 수신하며, 그에 대응되는 연동 정보를 회원별 음성 정보 저장부(260)에서 확인하여 해당 음성 정보에 대응되는 연동 정보에서 신뢰도가 기준 이하인 텍스트 영역에 대응되는 음성 정보 영역을 분리하여 별도로 생성된 보충 음성 정보를 생성한 후 이를 다시 STT 녹음 장치로 제공할 수 있다.In addition, the service server 200 receives the deletion target voice information itself or a file name or identification information of the corresponding deletion target voice information from the STT recording device, and provides the corresponding interworking information corresponding to the member voice information storage unit 260. In the interworking information corresponding to the corresponding voice information, the voice information area corresponding to the text area having the reliability lower than the reference level may be separated to generate supplementary voice information generated separately, and then provided to the STT recording apparatus.

도 5는 본 발명의 실시예에 따른 음성-텍스트 변환 시스템의 특징적 동작 방식을 설명하기 위한 개념도로서, 도시된 바와 같이 STT 녹음 장치(100)는 음성-텍스트 관리부(150)를 통해서 녹음된 음성 정보를 서비스 서버(200)에 제공하고, 서비스 서버(200)는 STT 변환부(240)를 통해서 해당 음성 정보를 텍스트로 변환하면서 동기 정보와 신뢰도 정보를 포함하는 연동 정보를 생성하며, STT 결과 생성부(250)를 통해 해당 정보를 STT 녹음 장치(100)의 음성-텍스트 관리부(150)에 제공하고, 제어부(160)는 수신한 텍스트 정보와 연동 정보를 저장부(180)에 저장한다. 5 is a conceptual diagram illustrating a characteristic operation method of the voice-to-text conversion system according to an embodiment of the present invention. As shown, the STT recording apparatus 100 records voice information recorded through the voice-text manager 150. To the service server 200, the service server 200 converts the corresponding voice information into text through the STT conversion unit 240 to generate interworking information including synchronization information and reliability information, STT result generator The corresponding information is provided to the voice-text management unit 150 of the STT recording apparatus 100 through the 250, and the controller 160 stores the received text information and the interworking information in the storage unit 180.

서비스 서버(200)는 수신한 음성 정보와 생성한 텍스트 및 연동 정보를 회원별 음성 정보 저장부(260)에 저장할 수 있으며, 필요 시 이를 다시 STT 녹음 장치(100)에 제공하거나 회원 가입된 사용자의 다른 단말기에 제공할 수도 있다.The service server 200 may store the received voice information, the generated text, and the interlocking information in the voice information storage unit 260 for each member, and if necessary, provide the same to the STT recording apparatus 100 again or to the registered user. It may be provided to another terminal.

한편, 경우에 따라서 서비스 서버(200)는 음성 정보를 수신한 후 신뢰도가 낮은 음성 정보 영역만 분리한 보충 음성 정보를 생성하여 제공할 수도 있는데, 설정에 따라 STT 녹음 장치(100)는 원본 음성 정보 대신 보충 음성 정보만 텍스트 및 연동 정보와 함께 저장할 수도 있다.Meanwhile, in some cases, the service server 200 may generate and provide supplemental voice information obtained by separating only the voice information area having low reliability after receiving the voice information. Depending on the setting, the STT recording apparatus 100 may provide original voice information. Instead, only supplementary voice information may be stored along with text and interworking information.

도 6은 본 발명의 실시예에 따른 STT 녹음 장치의 동작 과정을 보인 순서도로서, 도시된 바와 같이 STT 녹음 장치는 내장된 마이크나 연결된 무선 마이크를 통해서 음성 신호를 수신하여 이를 디지털 변환한 후 저장부에 음성 정보로 저장하고, 해당 음성 정보를 웹 상에 존재하는 웹 서버(서비스 서버)에 전송한다. 6 is a flowchart illustrating an operation of an STT recording apparatus according to an exemplary embodiment of the present invention. As shown, the STT recording apparatus receives a voice signal through a built-in microphone or a connected wireless microphone and digitally converts it to a storage unit. Is stored as voice information, and the voice information is transmitted to a web server (service server) existing on the web.

이후 해당 웹 서버(서비스 서버)로부터 대응되는 텍스트 정보와 연동 정보를 수신하여 이를 기 저장된 음성과 연동하여 저장하며, 내장된 재생부를 통해 저장된 음성 정보, 텍스트 정보를 이용할 수 있으며 연결되는 타 단말기에 원하는 정보를 선별하여 전달해 줄 수 있다. After receiving corresponding text information and interworking information from the corresponding web server (service server) and storing it in association with the pre-stored voice, you can use the stored voice information, text information through the built-in playback unit and the desired other terminal connected Information can be selected and delivered.

타 단말기는 음성 정보와 텍스트 정보 및 연동 정보를 이용하여 이들을 각각 사용하거나 자막이 포함된 멀티미디어 정보로 사용하거나, 가사가 보이는 음원 정보로 사용할 수 있다.The other terminal may use voice information, text information, and interworking information, respectively, use them as multimedia information including subtitles, or use sound source information with lyrics.

도 7은 본 발명의 실시예에 따른 서비스 서버의 동작 과정을 보인 순서도로서, 도시된 바와 같이 서비스 서버는 STT 녹음 장치로부터 음성 정보를 수신하고, 해당 음성 정보를 기반으로 STT를 수행하여 텍스트 정보를 생성하면서 동기 정보와 신뢰도 정보를 산출하여 연동 정보를 생성한다. 7 is a flowchart illustrating an operation of a service server according to an exemplary embodiment of the present invention. As illustrated, the service server receives voice information from an STT recording apparatus and performs STT based on the corresponding voice information to perform text information. Synchronization information is generated while generating synchronization information and reliability information.

이렇게 생성된 텍스트 정보와 연동 정보를 다시 STT 녹음 장치에 제공하고 수신된 음성과 생성된 텍스트 및 연동 정보를 매칭하여 저장한 후 인증된 회원에게 제공할 수 있다. 물론, 인증된 회원이 지정하는 타 회원에게 제공한다거나 원하는 대상(이메일, 전화번호, P2P, FTP 등)으로 저장된 음성, 텍스트, 연동 데이터를 전송해 줄 수도 있다.The generated text information and the interworking information may be provided to the STT recording apparatus again, and the received voice and the generated text and the interworking information may be matched and stored and then provided to the authenticated member. Of course, it can be provided to other members designated by an authenticated member, or can transmit voice, text, and interlocking data stored to a desired target (e-mail, phone number, P2P, FTP, etc.).

도 8은 본 발명의 실시예에 따른 용량 감소 방식을 설명하기 위한 개념도로서, 도시된 바와 같이 STT 녹음 장치의 저장부에는 음성 정보와 그에 대응되는 텍스트 정보 및 연동 정보가 매칭되어 저장된다. 여기서 텍스트 정보와 연동 정보는 분리되어 존재할 수도 있고 하나의 파일로서 연동 정보가 통합되어 존재할 수도 있다.FIG. 8 is a conceptual view illustrating a capacity reduction method according to an embodiment of the present invention. As shown in FIG. 8, voice information, text information corresponding thereto, and interworking information are matched and stored in a storage unit of an STT recording apparatus. Here, the text information and the interworking information may exist separately or the interworking information may be integrated as one file.

도 8a와 같이 강의나 회의 등을 녹음하였을 때 상당히 긴 시간(도시된 예시에서 20분 40초에서 59분 21초까지)의 음성 정보가 저장되는데, 이는 음성에 최적화하여 압축하였다 하더라도 상당한 용량을 차지하게 된다.When recording lectures or meetings as shown in FIG. 8A, voice information of a very long time (20 minutes 40 seconds to 59 minutes 21 seconds in the illustrated example) is stored, which takes up a considerable amount of capacity even when compressed and optimized for voice. Done.

따라서, 저장부의 용량이 부족하거나 사용자가 일정 용량 이상에 대한 압축을 요청하는 경우 선별된 일부(도시된 예시에서는 최장 시간을 가진 3개의 음성 정보)를 선택하며, STT 녹음 장치의 제어부나 서비스 서버가 해당 선택된 음성 정보에 대응되는 연동 정보에 포함된 신뢰도 정보를 확인하여 신뢰도가 기준 이하인 텍스트 영역에 대응되는 음성 정보 영역을 분리하여 별도로 보충 음성 정보를 생성하며, STT 녹음 장치는 이러한 보충 음성 정보를 대응되는 음성 정보 대신 저장부에 저장한다.Therefore, when the capacity of the storage unit is insufficient or the user requests compression for a predetermined capacity or more, the selected part (three voice information having the longest time in the illustrated example) is selected, and the control unit or service server of the STT recording device By checking the reliability information included in the interworking information corresponding to the selected voice information, the voice information area corresponding to the text area having the reliability below the standard is separated to generate supplemental voice information, and the STT recording apparatus corresponds to the supplementary voice information. It is stored in the storage instead of the voice information.

도시된 도 8b와 같이 기존의 음성 정보 중에서 텍스트 변환 신뢰도가 낮은 음성 영역들(또는 해당 음성 영역을 기준으로 전후 일부를 더 포함하도록 확장한 영역들)을 추출하여 생성한 보충 음성 정보는 그 크기가 월등히 줄어들게 된다. As shown in FIG. 8B, supplemental speech information generated by extracting speech regions having low text conversion reliability (or regions expanded to further include a front and rear part based on the speech region) from the existing speech information has a size that is the same. It will be greatly reduced.

도 9는 본 발명의 실시예에 따른 용량 감소 방식의 동작 과정을 보인 순서도로서, 도시된 바와 같이 STT 녹음 장치가 삭제할 음성 파일을 선택하면 해당 음성 파일에 대한 연동 정보를 분석하여 텍스트 변환 신뢰도가 낮은 음성 영역을 선별한 후 보충 음성 정보를 생성한다. FIG. 9 is a flowchart illustrating an operation of a capacity reduction method according to an embodiment of the present invention. As shown, when the STT recording apparatus selects a voice file to be deleted, the reliability of the text conversion is low by analyzing the interworking information on the corresponding voice file. After the voice region is selected, supplemental voice information is generated.

이러한 보충 음성 정보는 STT 녹음 장치에서 생성하거나 서비스 서버를 통해 생성하도록 요청하고 그 결과를 STT 녹음 장치가 수신할 것일 수 있다. Such supplementary voice information may be generated by the STT recording apparatus or requested to be generated through the service server, and the result may be received by the STT recording apparatus.

STT 녹음 장치는 보충 음성 정보를 대응 음성 정보를 삭제한 후 대신 저장하고, 필요한 경우 연동 정보를 보충 음성 정보에 맞추어 수정한다.The STT recording apparatus stores the supplementary voice information instead of the corresponding voice information after deleting it, and corrects the interworking information according to the supplementary voice information if necessary.

이러한 과정을 통해서 STT 녹음 장치는 제한된 저장 용량으로 방대한 음성 정보, 텍스트 정보 및 연동 정보를 저장할 수 있으며, 원하는 대상을 쉽고 빠르게 선별할 수 있고, 필요에 따라 재생하거나 타 단말기에 원하는 정보를 전달할 수 있다.Through this process, the STT recording device can store a large amount of voice information, text information, and interworking information with limited storage capacity, and can easily and quickly select a desired object, play back as needed, or deliver desired information to other terminals. .

이상에서는 본 발명에 따른 바람직한 실시예들에 대하여 도시하고 또한 설명하였다. 그러나 본 발명은 상술한 실시예에 한정되지 아니하며, 특허 청구의 범위에서 첨부하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 실시가 가능할 것이다. In the above described and illustrated with respect to preferred embodiments according to the present invention. However, the present invention is not limited to the above-described embodiments, and various modifications can be made by those skilled in the art without departing from the scope of the present invention, which is appended in the claims. .

100: STT 녹음 장치 110: 무선 통신부
120: 음성 입력부 121: 마이크
130: 사용자 인터페이스부 131: 입력 인터페이스
132: 출력 인터페이스 140: 유선 통신부
141: 유선 통신용 커넥터 150: 음성-텍스트 관리부
160: 제어부 170: 전원 관리부
180: 저장부 181: 저장 인터페이스
190: 재생부 191: 이어폰/해드폰 커넥터
200: 서비스 서버 210: 통신부
220: 음성 정보 수신부 230: 회원 관리부
240: STT 변환부 250: STT 결과 생성부
260: 회원별 음성 정보 저장부
100: STT recording device 110: wireless communication unit
120: voice input unit 121: microphone
130: user interface unit 131: input interface
132: output interface 140: wired communication unit
141: connector for wired communication 150: voice-text management unit
160: control unit 170: power management unit
180: storage unit 181: storage interface
190: playback unit 191: earphone / headphone connector
200: service server 210: communication unit
220: voice information receiving unit 230: member management unit
240: STT conversion unit 250: STT result generation unit
260: voice information storage unit for each member

Claims

A voice input unit for collecting a voice signal through a microphone and converting the voice signal into a digital signal;
A wireless communication unit communicating with an external server;
A wired communication unit receiving charging power through a connector and exchanging information with an external terminal;
A user interface unit for receiving a user input and informing a user of a status;
A storage unit for storing voice information based on the digital signal converted by the voice input unit;
Providing the voice information stored in the storage unit to the external server through the wireless communication unit, and receives the text information corresponding to the voice information and the interlocking information including the synchronization information of the voice information and the text information from the external server A voice-text manager for storing in the storage;
Receives the user's input through the user interface unit and controls to display the status thereof, controls the wireless communication unit and the storage unit according to the request of the voice-text management unit, and provides the storage information of the storage unit to the external terminal or via the wired communication unit STT recording device including a control unit for storing the information received from the terminal to the storage unit.

The STT recording apparatus according to claim 1, wherein the linkage information includes synchronization information about a reproduction time reference position or a data reference position and reliability information of the converted text with respect to the converted text corresponding to the voice information.

The STT recording apparatus according to claim 1, wherein the interworking information is provided separately from the text information or mixed with the text information in a tag form.

The text area of claim 1, wherein the control unit selects voice information to be deleted when the user inputs through the user interface unit or when the storage unit runs out of storage space according to a predetermined criterion, and the reliability is less than or equal to the reference in the linkage information corresponding to the voice information. STT recording device, characterized in that to separate the voice information area corresponding to the to generate additional supplementary voice information and then delete the corresponding voice information.

The method of claim 1, wherein the controller selects the voice information to be deleted when the user inputs through the user interface or the storage unit lacks a storage space according to a predetermined criterion, and provides the voice information to be deleted to the external server through the wireless communication unit. STT, characterized in that the voice information area corresponding to the text area of which reliability is lower than a reference from the interworking information corresponding to the voice information is received, and separately generated supplementary voice information is received and the corresponding voice information is replaced with the supplemental voice information. Recording device.

The voice information area of claim 4 or 5, wherein when the voice information area corresponding to the text area whose reliability is lower than the reference is separated, up to a predetermined number of words or sentences existing before and after the voice information area based on the corresponding voice information area. STT recording device, characterized in that for expanding.

The method of claim 1, further comprising a playback unit for reproducing the voice information when the control unit selects and provides the voice information in accordance with the input of the user interface unit,
The reproducing unit may reproduce the voice information or provide the voice information in a text to speech (TTS) manner by using text corresponding to the voice information, and when the voice information is provided in the TTS manner, STT recording apparatus, characterized in that for reproducing a corresponding voice information region instead of text when the conversion portion having a low reliability is detected through the interworking information.

The STT of claim 1, wherein the control unit searches and selects voice information stored in the storage unit based on the content of the corresponding text when a search request including a search word received through a user interface unit, a wireless communication unit, or a wired communication unit is selected. Recording device.

An external terminal for storing the collected voice information, matching the received text information after receiving the voice information to the service server, and interlocking information including synchronization information and reliability information, matching the voice information, and storing the voice information in a storage unit. An STT recording apparatus for providing a voice signal, corresponding text information, and interworking information to the apparatus;
STT recording the text information and the interworking information by generating interworking information including the synchronization information of the reference time position or the data reference position and the reliability information of the converted text while converting the voice information received from the STT recording device into text. Voice-to-text conversion system using an STT recording device comprising a service server for transmitting to the device.

10. The apparatus of claim 9, wherein the STT recording apparatus selects voice information to be deleted when the storage unit runs out of storage space according to a user's input or a predetermined standard, and separates a voice information area corresponding to a text area having a reliability lower than a reference from the linkage information. Substitute and save with supplementary voice information created separately,
The supplementary speech information is generated by the STT recording apparatus or the service server generates supplementary speech information on the speech information selected by the STT recording apparatus and provides the supplementary speech information to the STT recording apparatus. system.