KR101888059B1

KR101888059B1 - The apparatus and method for managing context based speech model

Info

Publication number: KR101888059B1
Application number: KR1020180016663A
Authority: KR
Inventors: 이태훈
Original assignee: 주식회사 공훈
Priority date: 2018-02-12
Filing date: 2018-02-12
Publication date: 2018-09-10

Abstract

According to an embodiment of the present invention, a context-based speech model management apparatus can be interlocked with a context-presenting speaker identification system. The context-based speech model management apparatus comprises: a storage unit for storing individual speech data generated whenever a speech is received from a speaker; a similarity estimating unit for extracting individual speech data from the storage unit and estimating the similarity between the individual speech data when a plurality of individual speech data are stored in the storage unit; a speech model generating unit for generating a first speech model of the speaker according to at least one individual speech data selected based on the similarity estimated by the similarity estimating unit; a determination unit for determining whether a comparative speech model corresponding to the first speech model exists in the storage unit of the context-presenting speaker identification system, if it does not exist, providing the first speech model to be stored in the storage unit of the context-presenting speaker identification system, and if it exists, allowing the similarity estimating unit to estimate the similarity between the first speech model and the comparative speech model; and a speech model editing unit for replacing the comparative speech model with a first speech model if the comparative similarity which is the estimation result in the similarity estimating unit through the determination unit is equal to or larger than a predetermined reference value and combining the first speech model and the comparative speech model to generate a second speech model if the comparative similarity is below the predetermined reference value.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a context-

본 발명은 문맥 기반 음성 모델 관리 장치 및 그 장치의 동작 방법에 관한 것으로, 더욱 상세하게는 음성 인증 시스템에서 사용될 수 있는 음성 모델을 문맥에 기반한 화자의 발화 특성, 미리 설정된 소정의 기간 간격으로 갱신(update)함으로써 음성 모델을 관리하기 위한 장치 및 그 장치의 동작 방법에 관한 것이다. The present invention relates to a context-based speech model management apparatus and an operation method thereof, and more particularly, to a speech-based speech model management apparatus that updates a speech model that can be used in a speech authentication system, update of the speech model, and an operation method of the apparatus.

화자(speaker)의 음성은 영구적이지 않으며, 시간이 흐름에 따른 발성 근육의 노화, 생활환경(예컨대, 지역, 업무장소 등)의 변화, 건강 상태의 변화(예컨대, 감기의 발병 등) 등의 다양한 요인에 따라 일시적으로 또는 지속적이고 장기적으로 변화한다.The voice of the speaker is not permanent and can be varied in various ways such as the aging of the vocal muscles with time, the change of the living environment (for example, the area, the work place etc.), the change of the health condition (for example, Depending on the factors, they change temporarily or continuously and in the long term.

이처럼 화자의 음성의 영구성이 보장되지 않는 상태에서 음성을 통한 화자 확인 또는 식별을 하기 위하여는 화자의 음성 변화에 따라 화자 확인 또는 인증에 사용될 음성 모델도 같이 갱신 되어야 할 필요성이 있다.In order to confirm or identify the speaker through voice in a state in which the voice of the speaker is not guaranteed to be permanent, there is a need to update the voice model to be used for speaker verification or authentication according to the speaker's voice change.

종래에는 이러한 사용자의 화법상의 다양성을 반영하기 위하여, 사용자의 악센트 등을 검출하여 특정의 사용자를 구분하는 방식 등에 대하여 연구되었다. 그러나 이러한 종래의 음성 인식 방식들은 시간 또는 환경에 따라 변화하는 사용자의 음성을 효과적으로 추적, 관리할 수 없다는 단점이 존재한다. 다시 말해서, 종래의 음성 인식 방법 또는 음성 모델을 관리하기 위한 방법은 화자가 놓인 환경에 대한 고려 없이, 단순히 화자의 음성 특성 분석만을 통하여 화자에 대한 음성 모델을 변경하는 정도에 그쳤다. Conventionally, in order to reflect the diversity of such a user's speech, a method of identifying a specific user by detecting an accent of a user has been studied. However, such conventional speech recognition methods have a disadvantage that they can not effectively track and manage the user's voice, which changes according to time or environment. In other words, the conventional speech recognition method or the method for managing the speech model has been limited to the change of the speech model for the speaker by simply analyzing the speech characteristic of the speaker without consideration of the environment where the speaker is placed.

음성을 통한 다양한 전자기기의 제어 방식의 출현 및 보급에 따라, 사용자의 음성을 정확하게 인식(식별)하고, 그에 따른 적절한 동작(예컨대, 사용자 인증 등)을 수행하게 하기 위한 최신화된 음성 모델의 관리가 필요하다. Management of the updated voice model for accurately recognizing (identifying) the user's voice and performing appropriate operation (e.g., user authentication, etc.) according to the emergence and spread of control methods of various electronic devices through voice .

1. 대한민국 공개특허 제10-2017-0035905호 (공개일자: 2017.03.31)1. Korean Patent Publication No. 10-2017-0035905 (Published date: March 31, 2013)

본 발명은 전술한 문제점에 대한 일 대응안으로써 안출된 것으로, 음성 인증 시스템의 일 구현 양상인 문맥(단어) 제시형 시스템에서 활용될 수 있는 사용자의 문맥(단어) 음성모델이 포함된 매트릭스 DB에서 문맥(단어)과 관련하여 화자로부터 입력받은 음성의 변화 유무, 변화의 정도 등을 고려하여 해당 사용자의 문맥(단어) 음성모델을 갱신하기 위한 방법 및 그 장치를 제공하고자 한다. The present invention has been devised as a solution to the above-mentioned problem, and it is an object of the present invention to provide a voice DB system including a voice DB The present invention provides a method and apparatus for updating the context (word) speech model of a user in consideration of the presence or absence of a change in the speech input from the speaker in relation to the context (word) and the degree of change.

본 발명의 일 실시 예로써, 문맥 기반 음성 모델 관리 장치 및 그 장치의 동작방법이 제공될 수 있다. As an embodiment of the present invention, a context-based voice model management apparatus and an operation method of the apparatus can be provided.

본 발명의 일 실시 예에 따른 문맥 기반 음성 모델 관리 장치는 문맥 제시형 화자 식별 시스템과 연동될 수 있고, 이러한 장치에는 화자로부터의 음성이 수신될 때마다 생성된 개별 음성 데이터가 저장된 저장부, 개별 음성 데이터가 저장부에 복수개 저장되면, 저장부로부터 각각의 개별 음성 데이터를 추출하여 개별 음성 데이터 간의 유사도를 추정하는 유사도 추정부 ,유사도 추정부에 의하여 추정된 유사도에 기초하여 선별된 적어도 하나의 개별 음성 데이터에 따라 화자의 제 1 음성 모델을 생성하는 음성모델 생성부, 문맥 제시형 화자 식별 시스템의 저장부에 제 1 음성 모델에 상응하는 비교 음성 모델이 존재하는지 여부를 판단하고, 존재하지 않는다면 제 1 음성 모델을 문맥 제시형 화자 식별 시스템의 저장부로 제공하여 저장되게 하고, 존재한다면 제 1 음성 모델과 비교 음성 모델의 비교유사도가 유사도 추정부를 통하여 추정되게 하는 판단부 및 이러한 판단부에 의한 유사도 추정부에서의 추정 결과인 비교유사도가 소정의 기준값 이상인 경우 비교 음성 모델을 제 1 음성 모델로 교체하고, 소정의 기준값 미만인 경우 제 1 음성 모델과 비교 음성 모델을 조합하여 제 2 음성 모델을 생성하는 음성모델 편집부가 포함되고, 제 2 음성 모델은 판단부 및 음성모델 편집부로 제공될 수 있다. The context-based voice model management apparatus according to an exemplary embodiment of the present invention may be interlocked with a context-presenting speaker identification system. The apparatus includes a storage unit for storing individual voice data generated each time a voice is received from a speaker, A similarity degree estimating section for extracting each individual speech data from the storage section and estimating the similarity degree among the individual speech data when the speech data is stored in the storage section; A speech model generation unit for generating a first speech model of the speaker according to the speech data; a determination unit for determining whether or not a comparison speech model corresponding to the first speech model exists in the storage unit of the context- 1 < / RTI > speech model to the storage of the context-presenting speaker identification system to be stored, A comparison unit for comparing a first speech model of the first speech model and a comparison speech model through a similarity estimation unit; and a comparison unit for comparing the comparison speech model with the first comparison speech model when the comparison similarity, which is an estimation result of the similarity estimation unit by the determination unit, And a second voice model is generated by combining the first voice model and the comparison voice model when the voice model is lower than a predetermined reference value and the second voice model is provided to the determination section and the voice model editing section .

또한, 문맥 제시형 화자 식별 시스템에는, 화자로부터 음성을 수신하는 음성수신부, 수신된 음성으로부터 음성특성을 추출하기 위한 음성특성 추출부, 추출된 음성특성에 기초하여 음성 모델을 생성하는 문맥 음성모델 생성부, 생성된 음성 모델이 행렬(matrix) 형태로 저장되어 있는 저장부, 화자의 식별에 사용될 난수를 발생시키는 난수발생부, 저장부의 행렬 형태의 음성 모델 DB 상의 발생된 난수에 상응하는 위치에서의 음성 모델을 추출하는 음성모델 추출부, 추출된 음성 모델에 기초하여 화자에게 소정의 음성 발화를 요청하는 음성발화 요청부 및 화자로부터 발화된 음성을 추출된 음성 모델과 비교하여 화자를 식별하는 화자식별부가 포함되고, 소정의 음성 발화는 발생된 난수에 상응하는 저장부의 행렬 형태의 DB 상의 위치에 미리 설정되어 있는 단어 또는 문장의 독음일 수 있다. In addition, the context-presenting speaker identification system includes a voice receiving unit for receiving voice from a speaker, a voice characteristic extracting unit for extracting voice characteristics from the received voice, a context voice model generating unit for generating a voice model based on the extracted voice characteristics A random number generator for generating a random number to be used for identification of the speaker, a storage unit for storing the generated speech model in a matrix form, a random number generator for generating a random number to be used for identifying the speaker, A voice model extraction unit for extracting a voice model, a voice utterance request unit for requesting a speaker to voice a predetermined voice based on the extracted voice model, and a speaker identification for identifying the speaker by comparing the voice uttered from the voice to the extracted voice model And a predetermined voice utterance is preset at a position on the DB in the form of a matrix of a storage portion corresponding to the generated random number SOLO can be a word or sentence.

본 발명의 일 실시 예에 따른 개별 음성 데이터에는 화자의 발화별 음성의 주파수, 피치(pitch), 포먼트(formant), 발화시간, 발화속도 중 적어도 하나가 포함되고, 문맥 기반 음성 모델 관리 장치의 유사도 추정부에서는 화자의 발화별 음성 각각에 대한 개별 음성 데이터 간의 유사도가 평가될 수 있다. The individual voice data according to an embodiment of the present invention includes at least one of a frequency, a pitch, a formant, a speaking time, and a speaking speed of a voice of each speaker, In the similarity estimation unit, the similarity between individual speech data for each speech of the speaker can be evaluated.

또한, 본 발명의 일 실시 예에 따른 장치는 음성 모델의 관리 주기를 설정하기 위한 주기설정부를 더 포함하고, 설정된 관리 주기 내에 모든 음성 모델이 갱신(update)된 경우, 음성모델 편집부에서는 문맥 제시형 화자 식별 시스템의 저장부 상의 기존의 행렬 형태의 음성 모델 DB이 유지되게 하고, 설정된 관리 주기 내에 적어도 하나의 음성 모델이 갱신되지 않은 경우, 음성모델 편집부에서는 화자와 관련된 신규의 제 1 음성 모델에 기초하여 기존의 행렬 형태의 음성 모델 DB의 일부가 삭제되거나 유지되게 할 수 있다. In addition, the apparatus according to an embodiment of the present invention further includes a periodical setup unit for setting a management period of the voice model. When all the voice models are updated within the set management period, The voice model DB of the existing matrix form on the storage unit of the speaker identification system is maintained, and if at least one voice model is not updated within the set management period, the voice model editor edits the new first voice model related to the speaker So that a part of the voice model DB of the existing matrix form can be deleted or maintained.

음성모델 편집부에서는 화자와 관련된 신규의 제 1 음성 모델이 부존재한다면 미갱신된 적어도 하나의 음성 모델을 행렬 형태의 음성 모델 DB로부터 삭제하고, 신규의 제 1 음성 모델이 존재한다면 미갱신된 적어도 하나의 음성 모델과 신규의 제 1 음성 모델을 비교하고, 비교 결과 차이(difference)가 소정의 범위 내에 포함된다면 음성모델 편집부에서는 문맥 제시형 화자 식별 시스템의 저장부 상의 기존의 행렬 형태의 음성 모델 DB이 유지되게 하며 범위를 벗어난다면 미갱신된 적어도 하나의 음성 모델을 행렬 형태의 음성 모델 DB로부터 삭제할 수 있다. If the first speech model related to the speaker is absent, the speech model editing unit deletes at least one speech model that has not been updated from the speech model DB in the matrix form, and if there is a new first speech model, If the speech model is compared with the new first speech model and the comparison result difference is within a predetermined range, the speech model editing unit maintains the existing matrix-type speech model DB on the storage unit of the context- At least one voice model that has not been updated can be deleted from the voice model DB of the matrix form if it is out of the range.

본 발명의 일 실시 예에 따른 문맥 기반 음성 모델 관리 장치를 이용하여 음성 모델을 관리하는 방법은 (a) 화자로부터의 음성이 수신될 때마다 개별 음성 데이터가 생성되어 저장되는 단계, (b) 개별 음성 데이터가 복수개 저장되면 각각의 개별 음성 데이터를 추출하여 개별 음성 데이터 간의 유사도를 추정하는 단계, (c) 추정된 유사도에 기초하여 선별된 적어도 하나의 개별 음성 데이터에 따라 화자의 제 1 음성 모델을 생성하는 단계, (d) 문맥 제시형 화자 식별 시스템의 저장부에 제 1 음성 모델에 상응하는 비교 음성 모델이 존재하는지 여부를 판단하고, 존재하지 않는다면 제 1 음성 모델을 문맥 제시형 화자 식별 시스템의 저장부로 제공하여 저장되게 하고, 존재한다면 제 1 음성 모델과 비교 음성 모델의 비교유사도가 유사도 추정부를 통하여 추정되게 하는 단계 및 (e) 비교유사도가 소정의 기준값 이상인 경우 비교 음성 모델을 제 1 음성 모델로 교체하고, 소정의 기준값 미만인 경우 제 1 음성 모델과 비교 음성 모델을 조합하여 제 2 음성 모델을 생성하는 단계를 포함할 수 있다. 또한, 제 2 음성 모델에 대하여 (d) 및 (e) 단계가 반복적으로 재수행될 수 있다. A method for managing a speech model using a context-based speech model management apparatus according to an embodiment of the present invention includes the steps of (a) generating and storing individual speech data each time a speech is received from a speaker, (b) Extracting each individual speech data and estimating the similarity between the individual speech data if the plurality of speech data is stored; (c) comparing the first speech model of the speaker with at least one individual speech data selected based on the estimated similarity; (D) determining whether or not a comparison speech model corresponding to the first speech model exists in the storage unit of the context-presenting speaker identification system, and if the comparison speech model does not exist, And stores the similarity of the first speech model and the comparison speech model in the storage unit, (E) replacing the comparison speech model with the first speech model when the comparison similarity degree is equal to or greater than the predetermined reference value, and generating the second speech model by combining the first speech model and the comparison speech model when the comparison similarity is less than the predetermined reference value Step < / RTI > Also, steps (d) and (e) may be repeatedly performed again for the second speech model.

또한, 본 발명의 일 실시 예에 따른 방법은 전술한 장치의 주기설정부에 의하여 음성 모델의 관리 주기를 설정하는 단계를 더 포함하고, 설정된 관리 주기 내에 모든 음성 모델이 갱신(update)된 경우, 장치의 음성모델 편집부에서는 문맥 제시형 화자 식별 시스템의 저장부 상의 기존의 행렬 형태의 음성 모델 DB이 유지되게 하고, 설정된 관리 주기 내에 적어도 하나의 음성 모델이 갱신되지 않은 경우, 음성모델 편집부에서는 화자와 관련된 신규의 제 1 음성 모델에 기초하여 기존의 행렬 형태의 음성 모델 DB의 일부가 삭제되거나 유지되게 할 수 있다. In addition, the method according to an embodiment of the present invention further includes setting a management period of the voice model by the periodic configuration of the apparatus, and when all the voice models are updated within the set management period, The speech model editing unit of the speech model recognizing system allows the existing matrix-type speech model DB on the storage unit of the context-presenting speaker identification system to be maintained, and when at least one speech model is not updated within the set management period, It is possible to delete or maintain a part of the voice model DB in the existing matrix form based on the new first voice model.

음성모델 편집부에서는 화자와 관련된 신규의 제 1 음성 모델이 부존재한다면 미갱신된 적어도 하나의 음성 모델을 행렬 형태의 음성 모델 DB로부터 삭제하고, 신규의 제 1 음성 모델이 존재한다면 미갱신된 적어도 하나의 음성 모델과 신규의 제 1 음성 모델을 비교하고, 비교 결과 차이가 소정의 범위 내에 포함된다면 음성모델 편집부에서는 문맥 제시형 화자 식별 시스템의 저장부 상의 기존의 행렬 형태의 음성 모델 DB이 유지되게 하며 범위를 벗어난다면 미갱신된 적어도 하나의 음성 모델을 행렬 형태의 음성 모델 DB로부터 삭제할 수 있다. If the first speech model related to the speaker is absent, the speech model editing unit deletes at least one speech model that has not been updated from the speech model DB in the matrix form, and if there is a new first speech model, If the speech model is compared with the new first speech model and the comparison result difference is within the predetermined range, the speech model editing unit allows the speech model DB of the existing matrix form to be maintained on the storage unit of the context- It is possible to delete at least one speech model that has not been updated from the speech model DB of the matrix form.

한편, 본 발명의 일 실시 예로써, 전술한 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체가 제공될 수 있다.Meanwhile, as an embodiment of the present invention, a computer-readable recording medium on which a program for causing the computer to execute the above-described method may be provided.

본 발명의 일 실시 예에 따르면 화자 식별(또는 음성 인증) 시스템에서 사용될 수 있는 음성 모델을 문맥에 기반한 화자의 발화 특성, 미리 설정된 소정의 기간 간격으로 갱신(update)함으로써 음성 모델을 최신화(up to date)하여 관리할 수 있다. According to an embodiment of the present invention, a speech model that can be used in a speaker identification (or a voice authentication) system is updated by a speech-based characteristic of a speaker based on a context, to date).

또한, 사용자별 음성을 통한 다양한 전자기기의 제어를 효율적으로 할 수 있다. In addition, it is possible to efficiently control various electronic devices through user-specific voices.

또한, 사용자의 발화 상태(시간적 요인 또는 환경적 요인)에 따른 영향이 최소화되어 전자 상거래 등에서의 사용자 인증을 빠르고 정확하게 할 수 있다. Further, the influence of the user's ignition state (temporal factor or environmental factor) is minimized, so that user authentication in electronic commerce or the like can be performed quickly and accurately.

도 1은 본 발명의 일 실시 예에 따른 문맥 기반 음성 모델 관리 장치의 블록도 이다.
도 2는 본 발명의 일 실시 예에 따른 문맥 기반 음성 모델 관리 장치 및 이와 연동 가능한 문맥 제시형 화자 식별 시스템 각각의 블록도를 나타낸다.
도 3은 문맥 제시형 화자 식별 시스템의 동작 예를 나타낸다.
도 4는 본 발명의 일 실시 예에 따른 문맥 기반 음성 모델 관리 장치의 동작 예를 나타낸 순서도이다.
도 5는 본 발명의 다른 실시 예에 따른 문맥 기반 음성 모델 관리 장치의 동작 예를 나타낸다.
도 6은 본 발명의 일 실시 예에 따른 문맥 기반 음성 모델 관리 장치를 이용한 음성 모델 관리 방법을 나타낸 순서도이다.
1 is a block diagram of a context-based voice model management apparatus according to an embodiment of the present invention.
FIG. 2 is a block diagram of a context-based voice model management apparatus and a context-presentable speaker identification system operable therein according to an embodiment of the present invention.
3 shows an operation example of the context-present-type speaker identification system.
4 is a flowchart illustrating an operation example of a context-based voice model management apparatus according to an embodiment of the present invention.
FIG. 5 shows an operation example of a context-based voice model management apparatus according to another embodiment of the present invention.
6 is a flowchart illustrating a method of managing a speech model using a context-based speech model management apparatus according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시 예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which will be readily apparent to those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 발명에 대해 구체적으로 설명하기로 한다. The terms used in this specification will be briefly described and the present invention will be described in detail.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다. While the present invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not limited to the disclosed embodiments. Also, in certain cases, there may be a term selected arbitrarily by the applicant, in which case the meaning thereof will be described in detail in the description of the corresponding invention. Therefore, the term used in the present invention should be defined based on the meaning of the term, not on the name of a simple term, but on the entire contents of the present invention.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다. 또한, 명세서 전체에서 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, "그 중간에 다른 소자를 사이에 두고" 연결되어 있는 경우도 포함한다. When an element is referred to as "including" an element throughout the specification, it is to be understood that the element may include other elements as well, without departing from the spirit or scope of the present invention. Also, the terms "part," " module, "and the like described in the specification mean units for processing at least one function or operation, which may be implemented in hardware or software or a combination of hardware and software . In addition, when a part is referred to as being "connected" to another part throughout the specification, it includes not only "directly connected" but also "connected with other part in between".

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

정상적인 사용자의 발화 이외의 사용자의 감정, 주변 상황(예컨대, 소음 등), 화자의 건강 상태(예컨대, 감기 등의 질병)에 따라 같은 문맥(단어)이라 하더라도 일반적으로 말하는 음성 톤, 즉, 주파수와 피치가 변동될 수 있다. 다시 말해서, 화자의 음성은 시간적 요소(예컨대, 노화 등), 환경적 요소(예컨대, 콘서트장 등)에 의하여 일시적으로 또는 일정 기간 동안 계속적으로 변화할 수 있기 때문에, 이러한 변화 가능성을 모니터링함과 동시에 변화된 음성정보를 지속적으로 수집하여 업데이트함으로써 화자의 현재 상태가 충분히 반영된 음성정보에 따라 화자를 빠르고 정확하게 식별해낼 필요가 있다.(Speech), even if it is the same context (word) according to the user's feelings other than the normal user utterance, the surrounding situation (e.g., noise), the health condition of the speaker The pitch can be varied. In other words, since the voice of the speaker can be changed continuously temporally or by a certain period of time by means of temporal elements (such as aging etc.), environmental factors (such as a concert hall, etc.) It is necessary to quickly and accurately identify the speaker according to the voice information in which the current state of the speaker is sufficiently reflected by continuously collecting and updating the changed voice information.

사용자가 발화하는 음성의 경우, 전술한 바와 같이 특정 환경 및 상태에서 음성이 변화할 수 있음에도 불구하고 기존의 방식과 같이 고정된 음성 모델을 사용하여 사용자의 음성을 식별하는 것은 사용자의 생활환경 등에 따른 음성의 변동 가능성을 전혀 고려하지 못하므로, 음성 인식에 있어서 신뢰성이 심각하게 떨어질 수 있다. In the case of the speech uttered by the user, as described above, although the voice may change in a specific environment and the state, identification of the user's voice using the fixed voice model as in the conventional system may be performed according to the user's living environment, The possibility of fluctuation of speech is not taken into consideration at all, so reliability in speech recognition may be seriously deteriorated.

본 발명의 일 실시 예에 따른 장치 및 그 장치의 동작 방법에 따르면, 화자의 시간적, 환경적 요소에 무관하게 종래 기술 대비 화자에 대한 안정적인, 신뢰성 있는 음성 식별(인증)이 가능하다.According to the apparatus and the operation method of the apparatus according to an embodiment of the present invention, it is possible to reliably and reliably voice identification (authentication) with respect to the speaker, regardless of the temporal and environmental factors of the speaker.

도 1은 본 발명의 일 실시 예에 따른 문맥 기반 음성 모델 관리 장치의 블록도 이고, 도 2는 본 발명의 일 실시 예에 따른 문맥 기반 음성 모델 관리 장치 및 이와 연동 가능한 문맥 제시형 화자 식별 시스템 각각의 블록도를 나타내며, 도 3은 문맥 제시형 화자 식별 시스템의 동작 예를 나타낸다. 또한, 도 4는 본 발명의 일 실시 예에 따른 문맥 기반 음성 모델 관리 장치의 동작 예를 나타낸 순서도 이고, 도 5는 본 발명의 다른 실시 예에 따른 문맥 기반 음성 모델 관리 장치의 동작 예를 나타낸다.FIG. 1 is a block diagram of a context-based voice model management apparatus according to an embodiment of the present invention. FIG. 2 is a block diagram of a context-based voice model management apparatus according to an embodiment of the present invention and a context- FIG. 3 shows an operation example of the context-present type speaker identification system. 4 is a flowchart illustrating an operation example of a context-based voice model management apparatus according to an embodiment of the present invention, and FIG. 5 illustrates an operation example of a context-based voice model management apparatus according to another embodiment of the present invention.

본 발명의 일 실시 예에 따른 문맥 기반 음성 모델 관리 장치(1000)는 문맥 제시형 화자 식별 시스템(2000)과 연동될 수 있고, 이러한 장치(1000)에는 화자로부터의 음성이 수신될 때마다 생성된 개별 음성 데이터가 저장된 저장부(1100), 개별 음성 데이터가 저장부(1100)에 복수개 저장되면, 저장부(1100)로부터 각각의 개별 음성 데이터를 추출하여 개별 음성 데이터 간의 유사도를 추정하는 유사도 추정부(1200), 유사도 추정부(1200)에 의하여 추정된 유사도에 기초하여 선별된 적어도 하나의 개별 음성 데이터에 따라 화자의 제 1 음성 모델을 생성하는 음성모델 생성부(1300), 문맥 제시형 화자 식별 시스템(2000)의 저장부(2400)에 제 1 음성 모델에 상응하는 비교 음성 모델이 존재하는지 여부를 판단하고, 존재하지 않는다면 제 1 음성 모델을 문맥 제시형 화자 식별 시스템(2000)의 저장부(2400)로 제공하여 저장되게 하고, 존재한다면 제 1 음성 모델과 비교 음성 모델의 비교유사도가 유사도 추정부(1200)를 통하여 추정되게 하는 판단부(1400) 및 이러한 판단부(1400)에 의한 유사도 추정부(1200)에서의 추정 결과인 비교유사도가 소정의 기준값 이상인 경우 비교 음성 모델을 제 1 음성 모델로 교체하고, 소정의 기준값 미만인 경우 제 1 음성 모델과 비교 음성 모델을 조합하여 제 2 음성 모델을 생성하는 음성모델 편집부(1500)가 포함되고, 제 2 음성 모델은 판단부(1400) 및 음성모델 편집부(1500)로 재차 제공될 수 있다. The context-based voice model management apparatus 1000 according to an embodiment of the present invention may be interlocked with the context-presenting speaker identification system 2000. In this apparatus 1000, A storage unit 1100 for storing individual voice data and a similarity degree estimating unit 1100 for extracting individual voice data from the storage unit 1100 and estimating the similarity between individual voice data when a plurality of individual voice data are stored in the storage unit 1100. [ A speech model generating unit 1300 for generating a first speech model of the speaker according to at least one individual speech data selected based on the similarity estimated by the similarity estimating unit 1200, It is determined whether or not a comparison speech model corresponding to the first speech model exists in the storage unit 2400 of the system 2000. If the comparison speech model does not exist, A determination unit (1400) for causing the similarity of the first speech model and the comparison speech model to be estimated through the similarity estimating unit (1200) if the first speech model and the comparison speech model are present in the storage unit (2400) The comparative speech model is replaced with the first speech model when the comparison similarity degree, which is the estimation result in the similarity degree estimating section 1200 by the section 1400, is equal to or greater than the predetermined reference value, And the second voice model may be provided to the determination unit 1400 and the voice model editing unit 1500 again.

또한, 문맥 제시형 화자 식별 시스템(2000)에는, 화자로부터 음성을 수신하는 음성수신부(2100), 수신된 음성으로부터 음성특성을 추출하기 위한 음성특성 추출부(2200), 추출된 음성특성에 기초하여 음성 모델을 생성하는 문맥 음성모델 생성부(2300), 생성된 음성 모델이 행렬(matrix) 형태로 저장되어 있는 저장부(2400), 화자의 식별에 사용될 난수를 발생시키는 난수발생부(2500), 저장부의 행렬 형태의 음성 모델 DB 상의 발생된 난수에 상응하는 위치에서의 음성 모델을 추출하는 음성모델 추출부(2600), 추출된 음성 모델에 기초하여 화자에게 소정의 음성 발화를 요청하는 음성발화 요청부(2700) 및 화자로부터 발화된 음성을 추출된 음성 모델과 비교하여 화자를 식별하는 화자식별부(2800)가 포함되고, 소정의 음성 발화는 발생된 난수에 상응하는 저장부의 행렬 형태의 DB 상의 위치에 미리 설정되어 있는 단어 또는 문장의 독음일 수 있다. In addition, the context-presenting speaker identification system 2000 includes a voice receiving unit 2100 for receiving voice from a speaker, a voice characteristic extracting unit 2200 for extracting voice characteristics from the received voice, A storage unit 2400 storing the generated speech models in the form of a matrix, a random number generator 2500 for generating a random number to be used for identification of a speaker, A speech model extracting unit 2600 for extracting a speech model at a position corresponding to the generated random number on the speech model DB in the form of a matrix of the storage unit, a speech utterance requesting unit 2600 for requesting a speaker for a predetermined speech utterance based on the extracted speech model, And a speaker identification unit 2800 for identifying the speaker by comparing the voice uttered by the speaker with the extracted voice model. The predetermined voice utterance includes a matrix of the storage unit corresponding to the generated random number It may be a tongue or a tongue of a word or a sentence preset in a position on the form DB.

예를 들어, '은행' 이라는 단어 및 해당 단어의 발화 음성 모델이 저장부(2400)의 매트릭스 DB에 미리 저장되어 있고, 음성을 통한 사용자 식별(확인)을 위하여 사용자의 '은행' 이라는 단어의 발화가 필요한 경우에, 음성발화 요청부(2700)는 사용자로 하여금 "은행이라는 단어를 발음하시오." 라고 요청할 수 있다. 이러한 요청은 음성 또는 그림, 메시지 등으로 사용자에게 제시될 수 있다. 본 발명의 일 실시 예에 따른 음성 모델은 문맥과 해당 문맥에 대한 화자의 발음 방식 등의 발화 패턴 정보가 포함된 데이터 집합을 지칭한다. 또한, 문맥이라 함은, 특정의 단어(예컨대, "은행") 뿐만 아니라 해당 단어가 포함된 일련의 문장을 포함하는 것을 지칭한다. For example, the word " bank " and the spoken speech model of the word are stored in advance in the matrix DB of the storage unit 2400, and the user's " bank " The speech utterance request unit 2700 instructs the user to "pronounce the word bank ". . Such a request may be presented to the user via voice, picture, message, or the like. The speech model according to an embodiment of the present invention refers to a data set including speech context information such as a pronunciation of the speaker and a context of the context. Also, the context refers to including a specific word (e.g., "bank") as well as a series of sentences containing the word.

전술한 '은행' 이라는 단어 및 해당 단어의 발화 음성 모델은 미리 정해진 매트릭스 DB의 행렬 위치상에 저장되어 있을 수 있다. 사용자 음성 식별이 필요한 경우, 난수발생부(2500)에서는 난수가 발생되고, 해당 난수에 상응하는 매트릭스 DB의 행렬 위치상의 단어가 음성발화 요청 대상 단어로써 사용자에게 제시될 수 있다. The word " bank " and the spoken speech model of the word may be stored on a matrix location of a predetermined matrix DB. When user voice identification is required, a random number is generated in the random number generation unit 2500, and a word on the matrix position of the matrix DB corresponding to the random number can be presented to the user as a speech utterance request target word.

본 발명의 일 실시 예에 따른 문맥 제시형 음성 모델 매트릭스 DB는 NxM (N, M은 동일하거나 상이한 양의 정수) 형태로 구성될 수 있다. 예를 들면, 도 3 내지 도 5에서와 같이, 20x5 형태의 매트릭스로 문맥 제시형 음성 모델이 DB로 구축되어 있을 수 있다. The context-presenting speech model matrix DB according to an embodiment of the present invention may be configured in the form of NxM (where N and M are the same or different positive integers). For example, as shown in FIG. 3 to FIG. 5, a context-presenting voice model may be constructed as a DB in a 20x5 matrix.

문맥 기반 음성 모델 관리 장치(1000)는 통신부(1700)를 통하여 통신이 가능한 네트워크 내에 포함된 다른 전자 디바이스와 통신할 수 있다. 예를 들어, 장치(1000)는 문맥 제시형 화자 식별 시스템(2000)의 통신부(2900)와 데이터를 송수신하면서 상호 통신할 수 있다. 도 2에서는 설명의 편의상, 문맥 기반 음성 모델 관리 장치(1000)를 문맥 제시형 화자 식별 시스템(2000)과 분리 구분하여 작도하였지만, 문맥 기반 음성 모델 관리 장치(1000)는 문맥 제시형 화자 식별 시스템(2000)의 일부를 구성하도록 구현될 수도 있다. 이러한 통신부(1700, 2900)는 블루투스(Bluetooth) 통신 모듈, BLE(Bluetooth Low Energy) 통신 모듈, 근거리 무선 통신 모듈(Near Field Communication unit), 와이파이(Wi-Fi) 통신 모듈, 지그비(Zigbee) 통신 모듈, 적외선(IrDA, infrared Data Association) 통신 모듈, WFD(Wi-Fi Direct) 통신 모듈, UWB(ultra wideband) 통신 모듈, Ant+ 통신 모듈 등일 수 있으나, 이에 한정되는 것은 아니다.The context-based voice model management apparatus 1000 can communicate with other electronic devices included in a network capable of communicating via the communication unit 1700. [ For example, the device 1000 can communicate with the communication unit 2900 of the context-presenting speaker identification system 2000 while transmitting and receiving data. 2, the context-based speech model management apparatus 1000 is constructed by separating the context-based speech model management apparatus 1000 from the context-based speech recognition system 2000. However, the context- 2000). &Lt; / RTI > The communication units 1700 and 2900 may include a Bluetooth communication module, a BLE (Bluetooth Low Energy) communication module, a near field communication module, a Wi-Fi communication module, a Zigbee communication module , An infrared data association (IrDA) communication module, a WFD (Wi-Fi Direct) communication module, an UWB (ultra wideband) communication module, an Ant + communication module, and the like.

본 발명의 일 실시 예에 따른 개별 음성 데이터에는 화자의 발화별 음성의 주파수, 피치(pitch), 포먼트(formant), 발화시간, 발화속도 중 적어도 하나가 포함되고, 문맥 기반 음성 모델 관리 장치(1000)의 유사도 추정부(1200)에서는 화자의 발화별 음성 각각에 대한 개별 음성 데이터 간의 유사도가 평가될 수 있다. 피치(pitch)는 음의 높이를 지칭한다. 음성(유성음)은 성대 진동의 기본 주파수 성분과 그 고조파 성분으로 구성된다. 진동근원체는 모두 특유한 진동 특성(예컨대, 공진특성)을 가지고 있다. 사람의 조음 기관(예컨대, 성대 등)도 조음에 따라 변하는 순간에서의 공진 특성이 있으며, 성대파가 이러한 공진 특성에 따라 여과되어 소리로써 표현될 수 있다. 특정음(예컨대, 모음)의 주파수 스펙트럼을 살펴보면 공진 특성 발현 시, 그 공진대역이 복수개 이상 존재함을 확인할 수 있다. 이러한 복수개의 공진 주파수대역을 포먼트(formant)로 지칭한다. The individual speech data according to an embodiment of the present invention includes at least one of a frequency, a pitch, a formant, an utterance time, and a speech rate of a speaker's utterance, 1000, the similarity between the individual voice data for each speech of the speaker can be evaluated. The pitch refers to the pitch of the sound. Voice (voiced) consists of fundamental frequency components of vocal fold vibration and its harmonic components. Each of the vibrating body elements has a specific vibration characteristic (for example, resonance characteristics). The articulation organ of a person (for example, the vocal cords, etc.) also has a resonance characteristic at an instant that changes according to the articulation, and the sad waves can be filtered according to the resonance characteristics and expressed as sound. A frequency spectrum of a specific sound (for example, a vowel) can be confirmed by the presence of a plurality of resonance bands when the resonance characteristics are expressed. Such a plurality of resonance frequency bands is referred to as a formant.

예를 들면, 도 3에서와 같이, 특정의 화자(예컨대, 도 3의 사용자 B)에 의하여 소정의 단어(예컨대, “은행”)가 발화되면 발화된 음성은 음성수신부(2100)에서 수신되고, 음성 특성이 추출될 수 있다. 추출된 음성 특성은 개별 음성 데이터로 구성될 수 있다. 도 4를 참조하면, 문맥 기반 음성 모델 관리 장치(1000)의 유사도 추정부(1200)에서는 화자의 발화별 음성(예컨대, "은행"에 대하여 2주일 전에 발화한 음성, 1주일 전에 발화한 음성, 어제 발화한 음성 등) 각각에 대한 개별 음성 데이터 간의 유사도가 평가될 수 있다. 유사도 추정부(1200)에 의하여 추정된 유사도에 기초하여 선별된 적어도 하나의 개별 음성 데이터(예컨대, "은행"에 대하여 1주일 전에 발화한 음성에 대한 데이터, 어제 발화한 음성에 대한 데이터 등)에 따라 음성모델 생성부(1300)에서는 화자(예컨대, 도 3의 사용자 B)의 제 1 음성 모델이 생성될 수 있다. For example, as shown in FIG. 3, when a predetermined word (for example, "bank") is uttered by a specific speaker (for example, user B in FIG. 3), the uttered voice is received by the voice receiving unit 2100, The voice characteristic can be extracted. The extracted voice characteristic may be composed of individual voice data. 4, the similarity estimating unit 1200 of the context-based speech model managing apparatus 1000 calculates the similarity estimating unit 1200 of the speech based on the speech of the speaker (for example, a speech uttered two weeks before the "bank" The voice uttered yesterday, etc.) can be evaluated. At least one piece of individual voice data selected based on the degree of similarity estimated by the similarity degree estimating unit 1200 (for example, data on voice uttered one week before the "bank", data on voice uttered yesterday, etc.) The first speech model of the speaker (for example, the user B in Fig. 3) can be generated in the speech model generation unit 1300. [

도 1, 도 2 및 도 4를 참조하면, 판단부(1400)에서는 문맥 제시형 화자 식별 시스템(2000)의 저장부(2400)에 제 1 음성 모델에 상응하는 비교 음성 모델이 존재하는지 여부를 판단하고, 존재하지 않는다면 제 1 음성 모델을 문맥 제시형 화자 식별 시스템(2000)의 저장부(2400)로 제공하여 저장되게 하고, 존재한다면 제 1 음성 모델과 비교 음성 모델의 비교유사도가 유사도 추정부(1200)를 통하여 추정되게 할 수 있다. Referring to FIGS. 1, 2, and 4, the determination unit 1400 determines whether a comparison voice model corresponding to the first speech model exists in the storage unit 2400 of the context-presenting speaker identification system 2000 If the speech model does not exist, the first speech model is provided to the storage unit 2400 of the context-presenting speaker identification system 2000 to be stored therein. If there is the similarity between the first speech model and the comparison speech model, 1200). &Lt; / RTI >

이러한 판단부(1400)에 의한 유사도 추정부(1200)에서의 추정 결과인 비교유사도가 소정의 기준값 이상인 경우 음성모델 편집부(1500)에서는 비교 음성 모델을 제 1 음성 모델로 교체하고, 소정의 기준값 미만인 경우 제 1 음성 모델과 비교 음성 모델을 조합하여 제 2 음성 모델을 생성할 수 있다. 이러한 소정의 기준값은 51%(또는 0.51) 이상의 값일 수 있다. 바람직하게는 75%(또는 0.75) 이상의 값일 수 있다. 이러한 소정의 기준값 이상에서 신뢰성 있는 음성 모델의 편집(교체) 등이 가능하다. If the comparative degree of similarity estimated by the similarity estimator 1200 by the determination unit 1400 is equal to or greater than a predetermined reference value, the speech model editing unit 1500 replaces the compared speech model with the first speech model, The second speech model can be generated by combining the first speech model and the comparison speech model. This predetermined reference value may be a value of 51% (or 0.51) or more. And preferably 75% (or 0.75) or more. A reliable voice model can be edited (replaced) at a predetermined reference value or more.

이러한 제 2 음성 모델은 판단부(1400) 및 음성모델 편집부(1500)로 재차 제공될 수 있고, 판단부(1400)에서는 문맥 제시형 화자 식별 시스템(2000)의 저장부(2400)에 제 2 음성 모델(신규 재생성된 음성 모델)에 상응하는 비교 음성 모델이 존재하는지 여부를 판단하고, 존재하지 않는다면 제 2 음성 모델을 문맥 제시형 화자 식별 시스템(2000)의 저장부(2400)로 제공하여 저장되게 하고, 존재한다면 제 2 음성 모델과 비교 음성 모델의 비교유사도가 유사도 추정부(1200)를 통하여 추정되게 할 수 있다. 이러한 과정은 반복적으로 수행될 수 있다. 이러한 반복적 과정 등의 수행을 통하여 화자의 현재 음성 상태에 최적화된 음성 모델이 매트릭스 DB에 저장되어 관리될 수 있다. The second voice model may be provided to the determination unit 1400 and the voice model editing unit 1500. The determination unit 1400 may determine that the second voice is to be stored in the storage unit 2400 of the context- The second voice model is provided to the storage unit 2400 of the context-presenting speaker identification system 2000 and stored in the storage unit 2400 of the context- And if there is a comparative similarity between the second speech model and the comparative speech model, it can be estimated through the similarity estimator 1200. This process can be repeatedly performed. The speech model optimized for the current speech state of the speaker can be stored and managed in the matrix DB through the iterative process.

또한, 본 발명의 일 실시 예에 따른 장치는 음성 모델의 관리 주기를 설정하기 위한 주기설정부(1600)를 더 포함하고, 설정된 관리 주기 내에 모든 음성 모델이 갱신(update)된 경우, 음성모델 편집부(1500)에서는 문맥 제시형 화자 식별 시스템(2000)의 저장부(2400) 상의 기존의 행렬 형태의 음성 모델 DB이 유지되게 하고, 설정된 관리 주기 내에 적어도 하나의 음성 모델이 갱신되지 않은 경우, 음성모델 편집부(1500)에서는 화자와 관련된 신규의 제 1 음성 모델에 기초하여 기존의 행렬 형태의 음성 모델 DB의 일부가 삭제되거나 유지되게 할 수 있다. 본 발명의 일 실시 예에 따른 관리 주기는 1일, 1주일 또는 1달의 기간일 수 있고, 사용자의 의도에 따라 개별적으로 설정될 수 있다. 예를 들어, 특정의 단어("은행")에 대하여는 1주일의 간격으로 음성 모델을 관리하게끔 관리 주기를 설정할 수 있고, 특정의 사용자는 1일 간격으로 관리 주기를 갖고, 또 다른 사용자는 1달 기간으로 관리 주기를 갖도록 사용자별 관리 주기가 개별적으로 설정될 수도 있다. In addition, the apparatus according to an embodiment of the present invention further includes a periodical setting unit 1600 for setting a management period of the voice model. When all the voice models are updated within the set management period, The voice model DB of the existing matrix form on the storage unit 2400 of the contextual presentation type speaker identification system 2000 is maintained and if at least one speech model is not updated within the set management period, The controller 1500 may delete or maintain a portion of the speech model DB of the existing matrix form based on the new first speech model associated with the speaker. The management period according to an embodiment of the present invention may be a period of one day, one week, or one month, and may be set individually according to the intention of the user. For example, for a specific word ("bank"), you can set a management cycle to manage voice models at intervals of one week, a specific user has a management cycle at every one day, A management period for each user may be set individually so as to have a management period as a period.

음성모델 편집부(1500)에서는 화자와 관련된 신규의 제 1 음성 모델이 부존재한다면 미갱신된 적어도 하나의 음성 모델을 행렬 형태의 음성 모델 DB로부터 삭제하고, 신규의 제 1 음성 모델이 존재한다면 미갱신된 적어도 하나의 음성 모델과 신규의 제 1 음성 모델을 비교하고, 비교 결과 차이(difference)가 소정의 범위 내에 포함된다면 음성모델 편집부(1500)에서는 문맥 제시형 화자 식별 시스템의 저장부 상의 기존의 행렬 형태의 음성 모델 DB이 유지되게 하며, 전술한 범위를 벗어난다면 미갱신된 적어도 하나의 음성 모델을 행렬 형태의 음성 모델 DB로부터 삭제할 수 있다. 전술한 차이를 나타내는 차이값의 허용 범위는 0 초과 15%(또는 0.15)일 수 있고, 해당 범위 내에 차이값이 존재하는지 여부에 따라 기존의 행렬 형태의 음성 모델 DB에서의 특정 음성 모델(예컨대, 도 5의 음성 모델 8)이 그대로 유지되거나 삭제될 수 있다. 신규의 제 1 음성 모델과 미갱신된 적어도 하나의 음성 모델의 비교 결과, 차이가 40%(또는 0.4)의 값을 갖는다면, 미갱신된 적어도 하나의 음성 모델(예컨대, 도 5의 음성 모델 8)은 행렬 형태의 음성 모델 DB로부터 삭제된다. If the new first speech model related to the speaker does not exist, the speech model editing unit 1500 deletes at least one speech model that has not been updated from the speech model DB in the matrix form, and if there is a new first speech model, If at least one speech model is compared with a new first speech model and the comparison result difference is within a predetermined range, then the speech model editing unit 1500 extracts the existing matrix form on the storage unit of the context- The at least one speech model that has not been updated can be deleted from the speech model DB of the matrix form if it is out of the above range. The allowable range of the difference value indicating the above difference may be more than 0% to 15% (or 0.15), and the specific voice model in the voice model DB of the existing matrix form (for example, The speech model 8 in Fig. 5) can be maintained or deleted. If the difference between the new first speech model and at least one speech model that has not been updated has a value of 40% (or 0.4), then at least one speech model that has not been updated (e.g., speech model 8 ) Is deleted from the speech model DB in matrix form.

도 6은 본 발명의 일 실시 예에 따른 문맥 기반 음성 모델 관리 장치를 이용한 음성 모델 관리 방법을 나타낸 순서도이다.6 is a flowchart illustrating a method of managing a speech model using a context-based speech model management apparatus according to an embodiment of the present invention.

본 발명의 일 실시 예에 따른 문맥 기반 음성 모델 관리 장치를 이용하여 음성 모델을 관리하는 방법은 (a) 화자로부터의 음성이 수신될 때마다 개별 음성 데이터가 생성되어 저장되는 단계(S100), (b) 개별 음성 데이터가 복수개 저장되면 각각의 개별 음성 데이터를 추출하여 개별 음성 데이터 간의 유사도를 추정하는 단계(S200), (c) 추정된 유사도에 기초하여 선별된 적어도 하나의 개별 음성 데이터에 따라 화자의 제 1 음성 모델을 생성하는 단계(S300), (d) 문맥 제시형 화자 식별 시스템의 저장부에 제 1 음성 모델에 상응하는 비교 음성 모델이 존재하는지 여부를 판단하고, 존재하지 않는다면 제 1 음성 모델을 문맥 제시형 화자 식별 시스템의 저장부로 제공하여 저장되게 하고, 존재한다면 제 1 음성 모델과 비교 음성 모델의 비교유사도가 유사도 추정부를 통하여 추정되게 하는 단계(S400) 및 (e) 비교유사도가 소정의 기준값 이상인 경우 비교 음성 모델을 제 1 음성 모델로 교체하고, 소정의 기준값 미만인 경우 제 1 음성 모델과 비교 음성 모델을 조합하여 제 2 음성 모델을 생성하는 단계(S500)를 포함할 수 있다. 또한, 제 2 음성 모델에 대하여 전술한 (d) 단계 S400 및 (e) 단계 S500이 반복적으로 재수행될 수 있다. A method for managing a speech model using a context-based speech model management apparatus according to an exemplary embodiment of the present invention includes the steps of: (a) generating and storing individual speech data each time a speech is received from a speaker; b) estimating a similarity between the individual voice data by extracting individual voice data when a plurality of individual voice data are stored; and c) comparing the extracted individual voice data with the speaker (s) according to at least one individual voice data selected based on the estimated similarity. (D) determining whether a comparison speech model corresponding to the first speech model exists in the storage unit of the context-presenting speaker identification system, and if the comparison speech model corresponding to the first speech model exists, The model is stored and provided in a storage unit of the context-presenting speaker identification system, and if there is a similarity degree of the first speech model and the comparison speech model, (E) comparing the comparison speech model with the first speech model when the comparison similarity degree is equal to or greater than the predetermined reference value, and combining the first speech model with the comparison speech model when the comparison similarity is less than the predetermined reference value, And generating a voice model (S500). Also, step (d) S400 and step (e) S500 described above with respect to the second voice model may be repeatedly performed again.

또한, 본 발명의 일 실시 예에 따른 방법은 전술한 장치의 주기설정부에 의하여 음성 모델의 관리 주기를 설정하는 단계(S10)를 더 포함할 수 있다. 이러한 관리 주기를 설정하는 단계는 S100 이전에 수행되거나 사용자에 의하여 임의의 시점에 관리 주기가 설정되도록 수행될 수 있다. In addition, the method according to an embodiment of the present invention may further include a step (S10) of setting a management period of the voice model by the periodic configuration of the apparatus. The step of setting the management period may be performed before step S100 or may be performed by the user so that the management period is set at an arbitrary point in time.

또한, 설정된 관리 주기 내에 모든 음성 모델이 갱신(update)된 경우, 장치(1000)의 음성모델 편집부(1500)에서는 문맥 제시형 화자 식별 시스템(2000)의 저장부 상의 기존의 행렬 형태의 음성 모델 DB이 유지되게 하고, 설정된 관리 주기 내에 적어도 하나의 음성 모델이 갱신되지 않은 경우, 음성모델 편집부(1500)에서는 화자와 관련된 신규의 제 1 음성 모델에 기초하여 기존의 행렬 형태의 음성 모델 DB의 일부가 삭제되거나 유지되게 할 수 있다. When all the voice models are updated within the set management period, the voice model editing unit 1500 of the apparatus 1000 searches the voice model DB of the existing matrix form on the storage unit of the context- And if at least one speech model is not updated within the set management period, the speech model editing unit 1500 extracts a part of the speech model DB in the existing matrix form based on the new first speech model related to the speaker Deleted or maintained.

음성모델 편집부(1500)에서는 화자와 관련된 신규의 제 1 음성 모델이 부존재한다면 미갱신된 적어도 하나의 음성 모델을 행렬 형태의 음성 모델 DB로부터 삭제하고, 신규의 제 1 음성 모델이 존재한다면 미갱신된 적어도 하나의 음성 모델과 신규의 제 1 음성 모델을 비교하고, 비교 결과 차이가 소정의 범위 내에 포함된다면 음성모델 편집부(1500)에서는 문맥 제시형 화자 식별 시스템(2000)의 저장부 상의 기존의 행렬 형태의 음성 모델 DB이 유지되게 하며 범위를 벗어난다면 미갱신된 적어도 하나의 음성 모델을 행렬 형태의 음성 모델 DB로부터 삭제할 수 있다.If the new first speech model related to the speaker does not exist, the speech model editing unit 1500 deletes at least one speech model that has not been updated from the speech model DB in the matrix form, and if there is a new first speech model, If at least one speech model is compared with a new first speech model and the comparison result difference is within a predetermined range, the speech model editing unit 1500 extracts the existing matrix form on the storage unit of the context- The at least one speech model that has not been updated can be deleted from the speech model DB of the matrix form.

본 발명의 일 실시 예에 따른 동작 방법과 관련하여서는 전술한 장치에 대한 내용이 적용될 수 있다. 따라서, 방법과 관련하여, 전술한 장치에 대한 내용과 동일한 내용에 대하여는 설명을 생략하였다.The contents of the above-described apparatus can be applied in connection with the operation method according to the embodiment of the present invention. Therefore, the description of the same contents as those of the above-mentioned apparatus has been omitted in connection with the method.

한편, 전술한 방법은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터 판독 가능 매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 또한, 상술한 방법에서 사용된 데이터의 구조는 컴퓨터 판독 가능 매체에 여러 수단을 통하여 기록될 수 있다. 본 발명의 다양한 방법들을 수행하기 위한 실행 가능한 컴퓨터 프로그램이나 코드를 기록하는 기록 매체는, 반송파(carrier waves)나 신호들과 같이 일시적인 대상들은 포함하는 것으로 이해되지는 않아야 한다. 상기 컴퓨터 판독 가능 매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, DVD 등)와 같은 저장 매체를 포함할 수 있다.On the other hand, the above-described method can be implemented in a general-purpose digital computer that can be created as a program that can be executed in a computer and operates the program using a computer-readable medium. Further, the structure of the data used in the above-described method can be recorded on a computer-readable medium through various means. Recording media that record executable computer programs or code for carrying out the various methods of the present invention should not be understood to include transient objects such as carrier waves or signals. The computer-readable medium may comprise a storage medium such as a magnetic storage medium (e.g., ROM, floppy disk, hard disk, etc.), optical readable medium (e.g., CD ROM, DVD, etc.).

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.It will be understood by those skilled in the art that the foregoing description of the present invention is for illustrative purposes only and that those of ordinary skill in the art can readily understand that various changes and modifications may be made without departing from the spirit or essential characteristics of the present invention. will be. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

1000: 문맥 기반 음성 모델 관리 장치
1100, 2400: 저장부
1200: 유사도 추정부
1300: 음성모델 생성부
1400: 판단부
1500: 음성모델 편집부
1600: 주기설정부
1700, 2900: 통신부
2000: 문맥 제시형 화자 식별 시스템
2100: 음성 수신부
2200: 음성특성 추출부
2300: 문맥 음성모델 생성부
2500: 난수 발생부
2600: 음성모델 추출부
2700: 음성발화 요청부
2800: 화자식별부1000: Context-based voice model management device
1100, 2400:
1200:
1300: voice model generation unit
1400:
1500: Voice model editor
1600: Periodical Government
1700, 2900:
2000: Speaker identification system with contextual presentation
2100:
2200: Voice characteristic extracting unit
2300: Context voice model generation unit
2500: random number generator
2600: Voice model extracting unit
2700: voice call request part
2800: Speaker identification part

Claims

12. A context-based voice model management device comprising:
The device may be interfaced with a context-aware speaker identification system,
A storage unit for storing individual voice data generated each time a voice is received from the speaker;
A similarity estimator for extracting the individual speech data from the storage unit and estimating the similarity between the individual speech data when the individual speech data is stored in the storage unit;
A speech model generation unit for generating a first speech model of the speaker according to at least one individual speech data selected based on the similarity estimated by the similarity degree estimation unit;
The method comprising: determining whether a comparison speech model corresponding to the first speech model exists in a storage unit of the context-presenting speaker identification system, and if the comparison speech model does not exist, providing the first speech model to a storage unit of the context- The similarity degree between the first speech model and the comparison speech model is estimated through the similarity degree estimating unit,
Wherein when the comparative similarity as the estimation result by the determination unit by the determination unit is equal to or greater than a predetermined reference value, the comparison speech model is replaced with the first speech model, and when the comparison similarity is less than the predetermined reference value, An audio model editing unit for generating a second audio model by combining audio models; And
And a periodic construction part for setting a management cycle of the voice model,
When all the speech models are updated within the set management period, the speech model editing unit causes the speech model DB of the existing matrix form on the storage unit of the context-presenting type speaker identification system to be maintained, If at least one speech model is not updated, the speech model editing unit deletes or maintains a part of the speech model DB in the existing matrix form based on a new first speech model related to the speaker,
And the second speech model is provided to the determination unit and the speech model editing unit.

The method according to claim 1,
In the context-presenting speaker identification system,
A voice receiving unit for receiving a voice from a speaker;
A voice characteristic extracting unit for extracting a voice characteristic from the received voice;
A contextual speech model generation unit for generating a speech model based on the extracted speech characteristic;
A storage unit for storing the generated speech models in a matrix form;
A random number generator for generating a random number to be used for identification of the speaker;
A speech model extracting unit for extracting a speech model at a position corresponding to the generated random number on the speech model DB in a matrix form of the storage unit;
A speech utterance request unit for requesting the speaker for a predetermined speech utterance based on the extracted speech model; And
And a speaker identification unit for identifying the speaker by comparing the speech uttered by the speaker with the extracted speech model,
Wherein the predetermined voice utterance is a voice of a word or a sentence set in advance at a position on the DB in the form of a matrix of the storage unit corresponding to the generated random number.

The method according to claim 1,
Wherein the individual voice data includes at least one of a frequency, a pitch, a formant, an utterance time, and a utterance speed of a speaker's utterance,
Wherein the similarity estimating unit of the context-based speech model managing apparatus evaluates the similarity between individual speech data for each speech of the speaker.

delete

The method according to claim 1,
The speech model editing unit deletes the at least one speech model that has not been updated from the speech model DB of the matrix form if a new first speech model related to the speaker is absent,
If the new first speech model exists, the at least one speech model that has not been updated is compared with the new first speech model, and if the comparison result difference is within a predetermined range, The voice model DB of the existing matrix form on the storage unit of the context-presenting speaker identification system is maintained, and if the voice model DB is out of the range, the at least one voice model not updated is deleted from the voice model DB of the matrix form. Context - based voice model management device.

A method for managing a speech model using a context-based speech model management apparatus,
The device may be interfaced with a context-aware speaker identification system,
(a) generating and storing individual voice data each time a voice is received from a speaker;
(b) extracting each of the individual voice data when the plurality of individual voice data is stored, and estimating the similarity between individual voice data;
(c) generating a first speech model of the speaker according to the at least one individual speech data selected based on the estimated similarity;
(d) determining whether or not a comparison speech model corresponding to the first speech model exists in the storage unit of the context-presenting speaker identification system, and if the comparison speech model does not exist, And storing the similarity of the first speech model and the comparison speech model, if they exist, in the similarity estimator;
(e) if the comparison similarity degree is equal to or greater than a predetermined reference value, replacing the comparison speech model with the first speech model, and combining the first speech model and the comparison speech model when the comparison similarity is less than the predetermined reference value, ; And
And setting a management period of the voice model by a periodic configuration of the apparatus,
The speech model editing unit of the apparatus allows the speech model DB of the existing matrix form on the storage unit of the context-presenting speaker identification system to be maintained when all the speech models are updated within the set management period, If at least one speech model is not updated within a period, the speech model editing unit causes a part of the speech model DB in the existing matrix form to be deleted or maintained based on a new first speech model related to the speaker,
Wherein the steps (d) and (e) are performed again for the second speech model.

delete

The method according to claim 6,
The speech model editing unit deletes the at least one speech model that has not been updated from the speech model DB of the matrix form if a new first speech model related to the speaker is absent,
If the new first speech model exists, the at least one speech model that has not been updated is compared with the new first speech model, and if the comparison result difference is within a predetermined range, The voice model DB of the existing matrix form on the storage unit of the speaker identification system is maintained, and if the voice model is out of the range, the at least one voice model which is not updated is deleted from the voice model DB of the matrix form. How to manage the model.

9. A computer-readable recording medium on which a program for implementing the method of claim 6 or 8 is recorded.