KR100560915B1

KR100560915B1 - Method for storing using a voice and recognizing result value

Info

Publication number: KR100560915B1
Application number: KR1020010038944A
Authority: KR
Inventors: 김희경; 김문식; 김재인; 최성우; 송호은
Original assignee: 주식회사 케이티
Priority date: 2001-06-30
Filing date: 2001-06-30
Publication date: 2006-03-14
Also published as: KR20030002194A

Abstract

본 발명은 음성인식결과값과 음성을 다양한 방법을 통해 함께 저장함으로써 실제의 운용환경에서 음성 인식결과를 즉시 확인할 수 있으며, 인식 음성을 재인식하지 않아도 그에 대한 인식결과 및 각종 파일정보를 파악할 수 있도록 한 음성인식 결과값을 이용한 인식음성의 저장방법을 제공함에 그 목적이 있다.According to the present invention, the voice recognition result and the voice are stored together through various methods so that the voice recognition result can be immediately confirmed in the actual operating environment, and the recognition result and various file information can be grasped without recognizing the recognized voice. It is an object of the present invention to provide a method of storing speech recognition using speech recognition result values.

본 발명은 인식 음성데이터를 인식 결과값과 함께 저장하는 방법을 제공함으로써 그 파일의 독취 즉시, 해당 음성데이터와 관련된 인식 결과를 확인할 수 있도록 한 것으로, 그 저장방법은 크게 음성 파일내에 그 결과값을 게재하는 방법과, 음성파일명에 그 결과값을 게재하는 방법과, 별도의 로그파일에 인식 결과값을 게재하고 음성데이터가 저장된 파일과 함께 연계하여 저장하는 방법을 제시하는 것을 특징으로 한다. 이를 통하여 인식 음성에 대한 분석은 매우 용이해질 수 있다.The present invention provides a method of storing the recognized speech data together with the recognition result value, so that the recognition result related to the speech data can be confirmed immediately upon reading the file. And a method of displaying the result value in the voice file name, and a method of displaying the recognition result value in a separate log file and storing the data in association with the file in which the voice data is stored. Through this, analysis of the recognized speech may be very easy.

본 발명을 적용하면, 인식결과값과 음성을 함께 저장함으로써 실제 운용 환경에서의 장치 적응도를 향상시킬 수 있으며, 인식 결과를 해당 음성데이터를 열람하면서 동시에 열람 가능하게 되므로 그 분석도 매우 용이하게 행할 수 있다.Application of the present invention can improve the adaptability of the device in the actual operating environment by storing the recognition result value and the voice together, and the recognition result can be viewed at the same time while viewing the corresponding voice data, so that the analysis can be performed very easily. Can be.

Description

METHOD FOR STORING USING A VOICE AND RECOGNIZING RESULT VALUE}

도 1a, 1b는 일반적인 음성파일의 구조를 나타낸 도면, 1A and 1B are views showing the structure of a general voice file;

도 2는 본 발명의 일실시예에 따른 음성인식 결과값을 이용한 인식음성의 저장방법이 적용된 음성파일의 구조를 나타낸 도면이다.2 is a diagram illustrating a structure of a voice file to which a method of storing a recognized voice using a voice recognition result value is applied according to an exemplary embodiment of the present invention.

*도면의 주요부분에 대한 부호의 설명** Description of the symbols for the main parts of the drawings *

100:음성파일구조 101:파일크기부100: audio file structure 101: file size portion

102:데이터크기부 200:음성파일구조102: data size 200: voice file structure

201:인식결과부201: Recognition Results

본 발명은 음성인식 결과값을 이용한 인식음성의 저장방법에 관한 것으로, 보다 상세하게 인식결과값과 음성을 함께 저장하도록 하여 실제 운용 환경에서의 장치 적응도를 향상시키기 위한 음성인식 결과값을 이용한 인식음성의 저장방법에 관한 것이다.The present invention relates to a method of storing speech recognition using speech recognition result, and to recognize the speech recognition result using the speech recognition result to improve the device adaptability in a real operating environment by storing the speech recognition result and the speech together in more detail. It relates to a method of storing voice.

주지된 바와 같이, 음성인식시스템은 사람의 음성을 입력 받아 인식기를 통 해 인식 결과를 유도해내는 시스템으로, 현재 이러한 인식기술을 이용한 다양한 서비스가 개발되어 상용화되고 있다. As is well known, the voice recognition system is a system that derives a recognition result through a recognizer by receiving a human voice. Currently, various services using such recognition technology have been developed and commercialized.

음성인식시스템에서 입력된 음성은 향후 인식기 성능 향상 및 서비스 품질향상을 위하여 저장을 하는데, 현재 음성저장방식은 저장형식(Wave, 또는 vox형식 등)을 달리하는 방법을 이용하고 있을 뿐, 그 음성화일 자체로는 음성인식을 위한 기본 데이터로 사용되었다는 것 외에 어떠한 인식에 관련된 정보도 가지고 있지 않아 음성 인식 결과 분석에 전혀 도움을 주지 못한다.The voice input from the voice recognition system is stored to improve the performance of the recognizer in the future and to improve the quality of service. Currently, the voice recording method uses a different method of storage format (Wave or vox format). In addition to being used as basic data for speech recognition, it does not have any information related to speech recognition and thus does not help in analyzing speech recognition results.

일반적으로, 음성인식시스템은 실험실 환경에서 테스트를 거쳐 어느 정도의 인식성능이 출력되면 상용화 서비스를 실시하게 되는데, 음성인식 테스트 과정은 먼저, 음성을 수집하고, 수집된 음성을 실제 어떤 음성인지를 분류(레이블링 과정)한 후 음성인식기를 통해 시뮬레이션과정을 거쳐 인식결과를 얻으면 테스트가 완료되게 된다. In general, the voice recognition system is tested in a laboratory environment, and when a certain level of recognition performance is output, a commercialization service is performed. The voice recognition test process first collects a voice and classifies what voice is actually collected. After the (labeling process), the test is completed by obtaining a recognition result through a simulation process through a voice recognizer.

그러나, 실제 상용시스템에서는 실험실 환경에서는 나타나지 않았던 다양한 상황들이 있을 수 있고, 다양한 사용자의 다양한 음성에 따른 인식기의 적용, 그리고 인식기를 탑재한 서비스 시스템의 영향 등 여러 요인에 따라 인식율이 많은 영향을 받는다. 따라서, 실제 환경에서 음성인식시스템을 거친 인식음성을 저장하여 인식성능을 알아보는 것은 매우 중요한 일이라 할 수 있다.However, in actual commercial systems, there may be various situations that did not appear in the laboratory environment, and the recognition rate is greatly affected by various factors such as the application of the recognizer according to various voices of various users and the influence of the service system equipped with the recognizer. Therefore, it is very important to find out the recognition performance by storing the recognition voice through the voice recognition system in a real environment.

현재 이러한 상용시스템에서 음성인식서비스 시스템에서 인식된 음성을 이용하여 음성인식율을 조사하는 방법은 인식음성을 저장하여 위에서 설명한 실험실환경의 테스트과정과 동일하다. 즉, 인식음성을 실제 어떤 음성인지를 분류한 후 음 성인식 시뮬레이션 과정을 거치게 된다. 이 과정은 일단 음성인식시스템을 통해 인식기를 거친 음성을 다시 동일한 인식 시뮬레이션을 해야 하므로, 비효율적임과 더불어, 이미 인식기를 통해 테스트가 완료된 음성인식 시스템을 다시 동일한 테스트를 행하게 되므로 테스트를 위한 시간과 비용만 소모할 뿐 그 효과는 매우 미진하다는 문제가 있다.Currently, the method of investigating the speech recognition rate using the speech recognized by the speech recognition service system in the commercial system is the same as the test procedure of the laboratory environment described above by storing the recognized speech. In other words, the speech recognition is classified into the actual speech and then undergoes a speech stimulation simulation process. This process is inefficient and requires the same recognition simulation of the speech that has already been tested through the speech recognition system. In addition to the same test, the speech recognition system that has already been tested with the recognizer is again tested. There is a problem that only consumes the effect is very small.

본 발명은 상기한 종래 기술의 사정을 감안하여 이루어진 것으로, 인식 결과값과 음성을 다양한 방법을 통해 함께 저장함으로써 실제의 운용환경에서 음성 인식결과를 즉시 확인할 수 있으며, 인식 음성을 재인식하지 않아도 그에 대한 인식결과 및 각종 파일정보를 파악할 수 있도록 한 음성인식 결과값을 이용한 인식음성의 저장방법을 제공함에 그 목적이 있다.The present invention has been made in view of the above-described state of the art, and by storing the recognition result value and the voice together through various methods, it is possible to immediately confirm the speech recognition result in an actual operating environment, without recognizing the recognized speech therefor. It is an object of the present invention to provide a method of storing speech recognition using speech recognition result values to recognize the recognition result and various file information.

상기한 목적을 달성하기 위해, 본 발명의 바람직한 실시예에 따르면 입력된 음성데이터를 인식하고, 인식된 음성데이터를 저장하는 음성인식 시스템에 있어서, 인식된 음성데이터를 그 인식 결과값과 함께 저장함으로써 실제 운용 환경에서 음성인식 결과를 즉시 확인할 수 있도록 한 것을 특징으로 하는 음성인식 결과값을 이용한 인식음성 저장방법이 제공된다.In order to achieve the above object, according to a preferred embodiment of the present invention, in the speech recognition system for recognizing input voice data and storing the recognized voice data, by storing the recognized voice data with the recognition result value There is provided a method of storing a recognized voice using a voice recognition result value, wherein the voice recognition result can be immediately confirmed in a real operating environment.

바람직하게, 인식된 음성데이터를 웨이브에 저장할 때에는 그 인식 결과값은 파일길이 항목의 위치에 저장되는 것을 특징으로 하는 음성인식 결과값을 이용한 인식음성 저장방법이 제공된다.Preferably, when storing the recognized speech data in the wave, the recognition result stored in the position of the file length item is provided with a method for storing the recognition speech using the speech recognition result value.

보다 바람직하게, 인식된 음성데이터 파일의 파일명에 그 인식 결과값을 저장하는 것을 특징으로 하는 음성인식 결과값을 이용한 인식음성 저장방법이 제공된다.More preferably, there is provided a method for storing a recognized voice using a voice recognition result value, wherein the recognition result value is stored in a file name of a recognized voice data file.

또한, 인식된 음성데이터 파일에 별도의 로그파일을 첨부하여 저장하는 것을 특징으로 하는 음성인식 결과값을 이용한 인식음성 저장방법이 제공된다.In addition, there is provided a method for storing a recognized voice using a voice recognition result value, which comprises attaching and storing a separate log file to the recognized voice data file.

한편, 상기 음성인식 결과값은 인식 음성과 관련된 정보인 것을 특징으로 하는 음성인식 결과값을 이용한 인식음성 저장방법이 제공된다.On the other hand, the speech recognition result value is provided a method of storing the speech recognition using the speech recognition result, characterized in that the information associated with the speech recognition.

이하, 본 발명에 대해 도면을 참조하여 상세하게 설명한다.EMBODIMENT OF THE INVENTION Hereinafter, this invention is demonstrated in detail with reference to drawings.

본 발명은 크게 음성인식 결과값을 음성데이터에 부가하여 저장하는 방법으로 세가지 방법을 제시한다.The present invention proposes three methods as a method of storing the voice recognition result value in addition to the voice data.

그 첫째는 음성파일을 웨이브로 저장할 경우 그 음성파일내에 음성인식 결과값이 포함되도록 하는 방법이고, 둘째로는 음성파일명에 그 결과값을 삽입하여 저장하는 방법이고, 셋째로는 별도의 로그파일을 생성하여 음성데이터와 함께 저장하는 방법이다.Firstly, when the voice file is saved as a wave, the voice recognition result value is included in the voice file. Second, the result value is inserted and stored in the voice file name. Third, a separate log file is created. Create and store with voice data.

먼저, 음성파일을 웨이브로 저장할 때, 음성파일내에 음성인식 결과값을 포함하는 방법에 대하여 기술한다.First, a method of including a voice recognition result value in a voice file when the voice file is stored as a wave will be described.

상기한 방법의 설명을 위하여, 일반적인 음성파일의 구조(예, 웨이브화일)에 대하여 도 1a 및 1b를 참조하여 기술한다.For the explanation of the above method, the structure (eg, wave file) of a general voice file will be described with reference to FIGS. 1A and 1B.

본 발명은 일반적인 음성파일의 구조에서 그 헤더 내용을 이용하여 다양한 종류의 음성파일을 구성 또는 재생한다. 먼저, 일반 음성파일(100)의 구조를 보면 음성의 종류 및 특징을 나타내는 헤더부분(최초 44바이트; H)과, 실제 음성 데이타가 들어있는 데이터 부분(D)으로 나눌 수 있는 바, 상기 헤더(H)의 내용은 음성 데이타(D)의 내용을 저장 또는 재생하도록 하는데 중요하며, 정확하게 지정하여야만 정확한 음성을 저장, 재생이 가능하다.The present invention constructs or plays back various kinds of voice files using the header contents in the structure of a general voice file. First, the structure of the general voice file 100 can be divided into a header part (first 44 bytes; H) indicating the type and characteristics of the voice and a data part D containing actual voice data. The content of H) is important for storing or reproducing the contents of the voice data (D), and the correct voice can be stored and reproduced only when correctly specified.

상기 음성 파일(100)의 헤더부분(H)을 살펴보면, 2번째 4바이트에 파일의 크기(101)가 저장되며, 13번째 4바이트에 데이터의 크기(102)가 저장된다.Referring to the header portion H of the voice file 100, the file size 101 is stored in the second 4 bytes, and the size 102 of the data is stored in the 13 th 4 bytes.

이때, 상기 데이터크기(102)는 실제 음성이 저장된 데이터의 크기를 나타내며, 파일의 크기(101)는 현재부터 마지막까지의 파일크기로서 실제 데이터의 크기(102)에 헤더의 길이(44바이트)를 더하고, 앞의 "RIFF"를 저장하는 4바이트를 빼면 된다.At this time, the data size 102 represents the size of the data stored in the actual voice, the file size 101 is the file size from the current to the end of the actual data size 102 to the length of the header (44 bytes) Add and subtract 4 bytes to store the previous "RIFF".

즉, 파일의 크기(101) 및 데이터의 크기(102) 항목에 저장된 값은 유사한 것으로서 한가지만 알고 있어도 나머지 한가지의 정보를 얻을 수 있다. 실제로 상용화되어 사용되는 음성재생기 중에서는(예를 들어 CoolEditor, Windows Media player등) 실제로 음성 데이타크기(102)의 정보만을 이용하여 음성재생을 하도록 하는 것이 많다.That is, the values stored in the file size 101 and the data size 102 items are similar, and one piece of information can be obtained even if only one is known. In practice, many audio players (commercially available as CoolEditor, Windows Media player, etc.) that are commercially used are used for audio playback using only the information of the voice data size 102.

이하, 본 발명의 실시예에 대하여 도 2를 참조하여 보다 상세하게 기술한다.Hereinafter, embodiments of the present invention will be described in more detail with reference to FIG. 2.

도 2는 본 발명의 일실시예에 따른 음성인식 결과값을 이용한 음성인식 저장방법이 적용된 음성파일의 구조를 나타낸 도면이다.2 is a diagram illustrating a structure of a voice file to which a voice recognition storage method using a voice recognition result value is applied according to an embodiment of the present invention.

이를 참조하면, 본 발명에 따른 음성파일(200)의 구조는 상기 파일의 크기부분(101)에 파일의 크기를 나타내는 데이터 대신 음성인식 결과값(201)을 저장하고, 실제 음성은 데이터 저장부분(D)에 저장함으로써 음성 인식율을 향상시킴과 더불어 그 인식율 확인시에도 별도의 테스트가 필요치 않도록 한 것이다.Referring to this, the structure of the voice file 200 according to the present invention stores the voice recognition result value 201 in place of the data indicating the size of the file in the size portion 101 of the file, the actual voice is the data storage portion ( In addition to improving the speech recognition rate by storing it in D), no additional tests are required to check the recognition rate.

상기한 저장방법은 음성 인식결과를 저장하기 위한 새로운 필드를 필요로 하지 않으며, 코딩이 용이하다.The storage method does not require a new field for storing the speech recognition result, and coding is easy.

또한, 이를 통하여 음성재생을 하고자 하는 경우에도 일반적으로 사용되는 음성재생기를 이용하여 쉽게 음성재생을 할 수 있으며, 음성 인식결과를 음성파일 자체 내에 가지고 있으므로, 간단한 프로그래밍만으로 그 인식정보를 얻을 수 있다. In addition, even if the user wants to play the voice through this, it is possible to easily play the voice using a commonly used voice player, and since the voice recognition result is contained in the voice file itself, the recognition information can be obtained by simple programming.

상기 인식결과값(201)은 여러가지 형태로 저장이 가능하며, 인식결과를 나타내는 인식코드값과 인식결과 비터비값, 인식명칭 등 인식 결과와 관련된 각종 정보를 게재할 수 있다. 비터비값은 인식과정을 거쳐 계산된 값으로, 이 값을 이용하여 어떤 명칭으로 인식되었는지를 파악할 수 있게 된다. The recognition result value 201 may be stored in various forms, and may display various information related to the recognition result such as a recognition code value indicating the recognition result, a recognition result Viterbi value, and a recognition name. The Viterbi value is calculated through the recognition process and can be used to determine what name is recognized.

상기 비터비 값은 그 인식음성을 동일한 인식기를 돌렸을 때, 동일한 값을 얻을 수 있으므로, 해당하는 인식기의 오류 또는, 인식기를 탑재한 인식시스템의 오류 등을 찾는데도 충분히 이용 가능하다.Since the Viterbi value can be obtained when the recognition voice is turned to the same recognizer, the Viterbi value can be sufficiently used to find an error of a corresponding recognizer or an error of a recognition system equipped with a recognizer.

또한, 본 발명에서는 그 다른 실시예로서 음성파일명에 그 결과값을 삽입하여 저장하는 방법을 제시한다.In another aspect, the present invention provides a method of inserting and storing a result value in a voice file name.

보다 상세하게, 파일명은 DOS환경에서 8.3(파일명8, 확장자3)의 제한을 윈도우환경으로 바뀌면서 그 제한이 없어짐에 따라 파일명을 길게 할 수 있으며, 한글은 물론 특수문자(-,.)등을 이용할 수 있게 되었다. 이러한 점을 이용하여 음성화 일명에 인식결과를 넣어 저장할 수 있다.More specifically, the file name can be extended by changing the limitation of 8.3 (file name 8, extension 3) to the Windows environment in the DOS environment, and the file name can be lengthened by using the special characters (-,.) as well as Korean characters. It became possible. Using this point, the recognition result can be stored and stored in the so-called voice.

홍길동.-3824.2345.wavHong Gil-dong.-3824.2345.wav

위에 제시된 인식음성파일의 예를 보면, 음성화일은 인식결과로 "홍길동"을 가지며, 그 비터비 값은 "-3824.2345"를 가지고 있음을 나타낼 수 있다.In the example of the recognition speech file presented above, the speech file may indicate that the recognition file has "Hong Gil-dong" and the Viterbi value has "-3824.2345".

이를 통하여, 인식음성의 파일이름은 인식명칭, 인식코드, 그리고 비터비값등 인식결과와 관련된 내용을 이용할 수 있으며, 이렇게 저장된 음성은 음성파일명 자체로 인식결과를 쉽게 판단할 수 있어, 실시간으로 인식여부를 알 수 있고, 인식명칭의 사용분포를 파악할 수 있으므로 사용자의 이용성향을 파악하는데도 효과적이다.Through this, the file name of the recognized voice can use the contents related to the recognition result such as the recognition name, the recognition code, and the Viterbi value, and the stored voice can be easily judged by the voice file name itself, and is recognized in real time. It is also effective to grasp the user's propensity to use because it can identify the use distribution of the recognition name.

한편, 또한, 본 발명에서는 또 다른 실시예로서 별도의 로그파일을 생성하여 음성데이터와 함께 저장하는 방법을 제시한다.On the other hand, the present invention also provides a method for generating a separate log file and storing it with voice data as another embodiment.

상기한 방법은 음성데이터를 저장함에 있어서, 그 음성데이터에 관한 파일명과, 인식명칭, 코드, 비터비값 등 각종 음성파일 정보 및 그 인식결과값을 별도의 로그파일에 삽입하여 그 음성데이터와 연계시켜 저장한다.In the above method, in storing the voice data, a file name, a recognition name, a code, a Viterbi value, and various voice file information and the recognition result value are inserted into a separate log file and associated with the voice data. Save it.

따라서, 인식음성에 대한 분석 또는 그 인식 음성파일의 독취시 해당 음성데이터 및 그 결과값(음성데이터에 관한 파일명과, 인식명칭, 코드, 비터비값 등 각종 음성파일 정보)을 즉시 확인할 수 있도록 한다.Therefore, when analyzing the recognized voice or reading the recognized voice file, the voice data and its result value (file name of voice data, recognition name, code, Viterbi value, and various voice file information) can be immediately identified.

또한, 상기한 구조로 저장된 음성 파일을 이용하여 인식음성 분석을 행할 경우에는 인식결과가 정확한 지 여부를 매우 용이하게 파악할 수 있으며, 이미 저장된 음성 데이터를 재차 인식기를 통해 재검증을 해야 하는 번거로움을 줄일 수 있 다.In addition, when the recognition voice analysis is performed using the voice file stored in the above structure, it is very easy to determine whether the recognition result is correct, and the trouble of having to re-verify the already stored voice data through the recognizer again. Can be reduced.

상기한 세가지 실시예에 따른 저장방법을 통하여 인식음성에 대한 분석은 보다 용이하게 이루어질 수 있다. Through the storage method according to the above three embodiments, the analysis of the recognized voice may be made easier.

한편, 본 발명의 실시예에 따른 음성인식 결과값을 이용한 음성인식 저장방법은 단지 상기한 실시예에 한정되는 것이 아니라 그 기술적 요지를 이탈하지 않는 범위내에서 다양한 변경이 가능하다.On the other hand, the voice recognition storage method using the voice recognition result value according to an embodiment of the present invention is not limited to the above-described embodiment, various modifications are possible within the scope not departing from the technical gist.

상기한 바와 같이, 본 발명에 따른 음성인식 결과값을 이용한 음성인식 저장방법은 인식결과값과 음성을 함께 저장하도록 하여 실제 운용 환경에서의 장치 적응도를 향상시킬 수 있으며, 정확한 음성인식이 가능해진다. 또한, 본 방법을 음성인식율 분석을 위한 장치에 적용하여 그 인식결과를 정확하게 분석할 수 있으므로 음성인식에 대한 재검증이 필요치 않다는 효과가 있다.As described above, the voice recognition storage method using the voice recognition result value according to the present invention can store the recognition result value and the voice together to improve the adaptability of the device in the actual operating environment, it becomes possible to accurate voice recognition . In addition, since the present method can be applied to the apparatus for analyzing the speech recognition rate, the recognition result can be accurately analyzed, and thus there is no need to re-verify the speech recognition.

Claims

In the speech recognition system that recognizes the input voice data, and stores the recognized voice data,

A method of storing a recognized speech using a speech recognition result value, characterized by storing the recognized speech data and the recognition result value together in the same wave file to immediately confirm the speech recognition result in a real operating environment.

The method of claim 1, wherein when the recognized voice data is stored in the wave, the recognition result value is stored at a location of a file length item.

The method of claim 1, wherein the recognition result value is stored in a file name of the recognized voice data file.

The method of claim 1, wherein a separate log file is attached to and stored in the recognized voice data file.

The method of claim 1, wherein the speech recognition result value is information related to the speech recognition.