KR20010069650A

KR20010069650A - Method of Recognizing Vocabulary Comprised by Number and Simultaneously Identifying Speaker and System Therefor

Info

Publication number: KR20010069650A
Application number: KR1020010022225A
Authority: KR
Inventors: 이윤근
Original assignee: 백종관; (주) 보이스웨어
Priority date: 2001-04-25
Filing date: 2001-04-25
Publication date: 2001-07-25

Abstract

PURPOSE: A system for recognizing word comprising numbers and certifying a speaker simultaneously and a method thereof are provided to enhance the speech recognition ratio and combine voice recognition with speaker recognition. CONSTITUTION: A memory stores a number stream transmitted by a communication network and related with certification and an individual character data corresponding to the number stream. A voice recognizer containing a voice model database selects a number stream similar to the speaking characteristics of the word without regard to the voice of the speaker. A speaker recognizer compares the individual character data and the speaker character data by extracting them from the number stream stored in the memory.

Description

Method and Recognizing Vocabulary Comprised by Number and Simultaneously Identifying Speaker and System Therefor}

발명의 분야Field of invention

본 발명은 숫자열을 인식하면서, 동시에 화자를 인증하는 방법에 관한 것이다. 보다 구체적으로 본 발명은 숫자를 기본으로 구성된 숫자열로부터 다수의 음성 인식 대상 후보를 선정하고, 선정된 음성 인식 대상 후보의 성문 정보를 판단해서 유사도가 가장 높은 하나의 숫자열을 선택하는 방법에 관한 것이다.The present invention relates to a method for authenticating a speaker while simultaneously recognizing a string of numbers. More specifically, the present invention relates to a method for selecting a plurality of speech recognition target candidates from a string of numbers based on numbers, and determining the voiceprint information of the selected speech recognition target candidate to select one string string having the highest similarity. will be.

발명의 배경Background of the Invention

본 발명은 숫자열을 음성인식하고, 이를 통해서 화자를 인증하는 방법과 관계된 것이다. 숫자열 음성인식은 주민등록번호, 계좌번호 등의 다양한 형태로 사용되는 숫자의 쓰임과 관계해서 매우 유용한 음성인식 기법이다.The present invention relates to a method of voice recognition of a string of numbers and through which the speaker is authenticated. Numeric speech recognition is a very useful speech recognition technique in connection with the use of numbers used in various forms such as social security numbers and account numbers.

그러나 한국어의 숫자음은 모두 단음절(예를 들면 영, 일, 이 삼등)로 이루어져 있으며, 일과 칠, 오와 구 등 비슷한 음가를 가지는 단어들을 포함하고 있다. 이러한 이유로 해서 숫자열의 음성인식에 있어서 인식의 정확도가 저하되는 원인으로 작용하였다. 따라서 현재의 음성인식과 관계된 기술에서 숫자열의 경우에는 다양한 유용성에도 불구하고 그 실용화가 늦어지고 있다.Korean numerals, however, are all composed of single syllables (eg, zero, one, two, third, etc.), and include words with similar phonemes, such as work, seven, five, and nine. For this reason, it acted as a cause of lowering the accuracy of recognition in speech recognition of numeric strings. Therefore, in the technology related to the current speech recognition, the practical use of the strings is delayed despite various usefulness.

또한 한 개의 숫자가 인식될 확률이 높다고 하더라도, 연속적인 숫자 모두가 정확하게 인식될 확률은 실질적으로는 매우 낮게 나타난다. 예를 들어, 하나의 숫자에 대한 인식률이 95%라고 가정하면, 10개의 연속적인 숫자열을 정확하게 인식할수 있는 확률은 대략 60%에 불과하며, 개수가 늘어날수록 인식확률은 더욱 낮아지게 된다.Also, even if one number is likely to be recognized, the probability that all consecutive numbers are correctly recognized is very low. For example, assuming that the recognition rate for one number is 95%, the probability of accurately recognizing ten consecutive strings is only about 60%. As the number increases, the recognition probability becomes lower.

본 발명의 화자인증 방법은 폰-뱅킹 서비스에서의 계좌번호, 주민등록번호, 비밀번호 등을 인식하는데 사용하거나, 공공기관의 민원업무를 자동응답서비스를 이용하여 처리할 경우 주민등록번호 입력, 비밀번호 입력 등에 이용될 수 있다.The speaker authentication method of the present invention may be used to recognize an account number, social security number, password, etc. in a phone-banking service, or may be used for entering a social security number or password when a civil service of a public institution is processed using an automatic response service. have.

이러한 응용 분야는 공히 사용자의 신분을 확인하는 사용자 인증의 기능이 필요하다. 현재의 기술로 사용자 인증은 음성을 이용한 화자 확인 알고리즘을 이용하면 전화망에서 98%이상의 정확도로 인증이 가능하다.These fields of application all require the ability of user authentication to verify user identity. With current technology, user authentication can be performed with more than 98% accuracy in the telephone network by using a speaker identification algorithm using voice.

본 발명에서는 위의 두 가지 요소, 즉 숫자열 음성 인식 및 화자 인증 방법을 결합해서 숫자열에 따른 화자의 인증 방법 및 그 시스템을 제안한다.The present invention proposes a method and system for authenticating a speaker according to a number string by combining the above two elements, namely, a speech recognition method and a speaker authentication method.

본 발명의 목적은 화자의 인증과 관계되고, 숫자로 구성된 숫자열로부터 사용자의 신분을 확인하는 본 발명의 인증방법을 제공하고자 한다.An object of the present invention is to provide an authentication method of the present invention that is related to the authentication of the speaker, and confirms the identity of the user from a string of numbers.

본 발명의 다른 목적은 개인의 사회생활에 있어서 사용되는 기본 정보인 주민등록번호와 같이 숫자로 구성된 숫자열을 통해서 사용자의 신분을 확인하는 본 발명의 인증 방법을 제공하고자 한다.Another object of the present invention is to provide an authentication method of the present invention for identifying the user's identity through a string of numbers, such as a social security number, which is basic information used in the social life of an individual.

본 발명의 다른 목적은 화자의 인증과 관계되고 숫자로 구성된 숫자열과 화자의 독특한 성문 정보를 통해 사용자의 신분을 확인하는 본 발명의 인증방법을 제공하고자 한다.Another object of the present invention is to provide an authentication method of the present invention that confirms the identity of a user through a number string consisting of numbers and unique voice information of the speaker related to the authentication of the speaker.

본 발명의 또 다른 목적은 종래 저조한 인식률을 나타내는 숫자열 음성인식에 화자의 성문정보를 추가함으로서 상당한 정도의 인식률을 나타내는 본 발명의 숫자열에 의한 화자인증 방법을 제공하고자 한다.It is still another object of the present invention to provide a method for authenticating a speaker by the string of numbers of the present invention, which represents a significant degree of recognition by adding voice text information of the speaker to the string speech recognition indicating a low recognition rate.

본 발명의 또 다른 목적은 폰-뱅킹 서비스에서의 계좌번호, 주민등록번호, 비밀번호 등을 인식하는데 사용하거나, 공공기관의 민원업무 등의 다양한 산업에서도 능히 이용될 수 있으며, 그 경우에도 높은 인식률을 보장하는 본 발명의 인증 방법을 제공하고자 한다.Another object of the present invention can be used to recognize the account number, social security number, password, etc. in the phone-banking service, or can be used in a variety of industries, such as civil affairs of public institutions, even in this case to ensure a high recognition rate An authentication method of the present invention is provided.

본 발명의 또 다른 목적은 주민등록번호 등과 같이 개인의 신분을 나타내는 숫자열과 화자의 고유한 식별 정보인 성문을 이용해서 사용자가 진정한 본인임을 인정하는 인증과 관계된 사항에서 보다 신뢰할 수 있는 본 발명의 인증 방법을 제공하고자 한다.Another object of the present invention is to use the authentication method of the present invention more reliable in the matters related to the authentication that the user is a real identity by using a string of numbers indicating the identity of the individual and the voiceprint that is the unique identification information of the speaker, such as a social security number. To provide.

본 발명의 상기 및 기타의 목적들은 하기 상세히 설명되는 본 발명에 의하여 모 두 달성될 수 있다.The above and other objects of the present invention can be achieved by the present invention described in detail below.

제1도는 본 발명에 따라 구성되는 인증 시스템을 개략적으로 도시한 블록도이다.1 is a block diagram schematically illustrating an authentication system constructed in accordance with the present invention.

제2도는 본 발명의 일 실시예에 따라 진행되는 동작 과정을 도시한 흐름도이다.2 is a flowchart illustrating an operation process performed according to an embodiment of the present invention.

*도면의 주요부호에 대한 간단한 설명** Brief description of the major symbols in the drawings *

100 : 단말기 200 : 인증 서버100: terminal 200: authentication server

210 : 음성모델 데이터베이스 230 : 숫자열 데이터베이스210: speech model database 230: numeric string database

250 : 성문 데이터베이스 270 : 화자 인식기250: Gate Database 270: Speaker Recognizer

290 : 음성 인식기 300 : 사업자 서버290: Speech Recognizer 300: Operator Server

발명의 요약Summary of the Invention

본 발명은 숫자를 기본으로 구성된 숫자열로부터 다수의 음성 인식 대상 후보를 선정하고, 선정된 음성 인식 대상 후보의 성문 정보를 판단해서 유사도가 가장 높은 하나의 숫자열을 선택하는 방법에 관한 것이다.The present invention relates to a method of selecting a plurality of speech recognition target candidates from a string of numbers based on numbers, and determining the voiceprint information of the selected speech recognition target candidate to select one string string having the highest similarity.

본 발명에 따른 화자인증 방법은 음성 인식 및 인증과 관계되고, 번호로 이루어진 숫자열을 등록하는 과정과, 등록된 숫자열을 바탕으로 발화된 숫자열의 음성 인식과 화자를 인증하는 과정으로 이루어지며, 상기 등록과정은, 통신망을 통해 전달되는 화자의 주민등록번호, 계좌번호, 또는 비밀번호 등의 인증과 관계되고, 번호로 이루어진 숫자열을 단말기를 통해 입력해서 인증서버로 전송하고; 그리고 상기 단말기를 통해서 입력된 숫자열을 바탕으로 상기 통신망을 통해 전달된 상기 화자의 성문 정보를 추출해서 상기 단계에서 화자가 입력한 숫자열과 대응하게 숫자열 데이터베이스 및 성문정보 데이터베이스를 작성해서 메모리에 격납하는; 단계로 이루어지며, 상기 숫자열의 음성 인식 및 화자의 인증 과정은, 음성인식기는 상기 통신망을 통해 전송되는 숫자열 음성에서 그 음성의 특징 패러미터를 추출해서 상기 음성인식기가 포함하는 음성모델 데이터베이스로부터 다수의 인식 대상 후보 숫자열들을 선정하고; 상기 음성인식기에 의해 선정된 인식 대상 후보 숫자열들과, 상기 숫자열 데이터베이스에 저장되고 화자들이 상기 등록하는 과정을 통해 상기 메모리에 격납한 숫자열들을 비교해서 동일한 다수의 숫자열들을 선택한 후, 선택된 숫자열에 대응하는 성문 정보를 상기 성문 데이터베이스로부터 호출해서 상기 통신망을 전송된 숫자열 음성에서 추출한 성문 정보와 비교하여 유사도가 가장 높은 하나의 숫자열을 화자인식기는 선택해서 인식 결과로 출력하고; 상기 선택된 숫자열에 대한 문턱치(threshold)를 계산해서 설정된 값보다 높은지를 판단하고; 그리고 상기 단계에서 계산된 문턱치가 기 설정된 값보다 높으면, 상기 화자가 등록된 사람임을 인증하는; 단계로 이루어진다.The speaker authentication method according to the present invention relates to speech recognition and authentication, and comprises a process of registering a sequence of numbers consisting of numbers, and a process of authenticating speech recognition and a speaker of a spoken sequence of numbers based on the registered sequence of numbers, The registration process is related to the authentication of the speaker's resident registration number, account number, or password transmitted through the communication network, and inputs a string of numbers through the terminal to the authentication server; Then, the voiceprint information of the speaker transmitted through the communication network is extracted based on the numeric string input through the terminal, and the numerical string database and the voiceprint information database are created and stored in the memory corresponding to the numeric string input by the speaker in the step. doing; The voice recognition of the numeric string and the authentication process of the speaker, the voice recognizer extracts the feature parameters of the speech from the numeric string voice transmitted through the communication network from a plurality of voice model database including the speech recognizer Selecting candidate strings of recognition candidates; After selecting the same number strings by comparing the candidate strings selected by the speech recognizer with the number strings stored in the number string database and stored in the memory by the speaker through the registration process, the selected number strings are selected. Calling the voiceprint information corresponding to the numeric string from the voiceprint database and comparing the communication network with voiceprint information extracted from the transmitted numeric string voice, the speaker recognizer selects one string string having the highest similarity and outputs it as a recognition result; Calculating a threshold for the selected sequence of numbers to determine whether the value is higher than a set value; And if the threshold calculated in the step is higher than a preset value, authenticating that the speaker is a registered person; Consists of steps.

본 발명의 다른 구성에 따른 인증 서버는 통신망을 통해 전달되는 개인의 주민등록번호, 계좌번호, 또는 비밀번호 등의 인증과 관계되고, 번호로 이루어진 숫자열과, 상기 숫자열에 대응해서 발생되는 개인의 성문 정보를 각각 대응하게 저장하는 메모리, 화자가 발성하는 음성으로부터 해당하는 숫자열을 인식하기 위한 기본 정보인 음성정보 데이터베이스를 포함하며, 화자의 음성과는 무관하게 낱말의 발성 특징에 따라 상기 화자가 발성한 발음과 유사한 다수의 숫자열들을 상기 음성정보 데이터베이스로부터 선택하는 음성 인식기, 및 상기 음성 인식기가 상기 음성정보 데이터베이스로부터 선택한 다수의 숫자열에 대응하는 상기 메모리에 격납된 숫자열로부터 추출된 상기 개인의 성문 정보와, 상기 화자의 성문 정보를 추출해서 두 개의 성문 정보를 비교하고, 그 결과값이 기 설정된 문턱치보다 높으면 인증 결과값으로 출력하는 화자 인식기를 포함하여 구성된다.According to another aspect of the present invention, the authentication server is associated with authentication of an individual's social security number, account number, or password transmitted through a communication network, and includes a numeric string consisting of numbers and personal voiceprint information generated corresponding to the numeric string, respectively. A memory for storing correspondingly, and a voice information database which is basic information for recognizing a corresponding string of numbers from a voice spoken by the speaker, and the pronunciation of the speaker uttered according to the utterance characteristic of the word regardless of the speaker's voice. A voice recognizer for selecting a plurality of similar numeric strings from the voice information database, and the voiceprint information of the individual extracted from the numeric string stored in the memory corresponding to the plurality of numeric strings selected by the voice recognizer from the voice information database; Two voices are extracted by extracting the speaker's voice information. Comparing the beams, and if the result value is higher than the preset threshold comprises a speaker recognizer for outputting the authentication result value.

또한 본 발명의 또 다른 구성에 따른 인증 시스템은 통신망을 통해 단말기와 인증 서버를 연결하고, 상기 단말기로부터 전달된 숫자열을 통해 화자 인증을 하는 인증 시스템에 관한 것으로, 상기 인증 서버는, 상기 통신망을 통해 전달되는 개인의 주민등록번호, 계좌번호, 또는 비밀번호 등의 인증과 관계되고, 번호로 이루어진 숫자열과, 상기 숫자열에 대응해서 발생되는 개인의 성문 정보를 각각 대응하게 저장하는 메모리, 화자가 발성하는 음성으로부터 해당하는 숫자열을 인식하기 위한 기본 정보인 음성정보 데이터베이스를 포함하며, 화자의 음성과는 무관하게 낱말의 발성 특징에 따라 상기 화자가 발성한 발음과 유사한 다수의 숫자열들을 상기 음성정보 데이터베이스로부터 선택하는 음성 인식기, 및 상기 음성 인식기가 상기 음성정보 데이터베이스로부터 선택한 다수의 숫자열에 대응하는 상기 메모리에 격납된 숫자열로부터 추출된 상기 개인의 성문 정보와, 상기 화자의 성문 정보를 추출해서 두 개의 성문 정보를 비교하고, 그 결과값이 기 설정된 문턱치보다 높으면 인증 결과값으로 출력하는 화자 인식기를 포함하여 구성되고, 상기 단말기는, 상기 통신망을 통해 상기 인증 서버에 전달되는 개인의 주민등록번호, 계좌번호, 또는 비밀번호 등의 인증과 관계되고, 번호로 이루어진 숫자열을 입력하기 위한 수단, 및 상기 인증 서버로 상기 개인의 목소리를 입력받아 전송하기 위한 수단을 포함하여 구성된다.In addition, the authentication system according to another aspect of the present invention relates to an authentication system for connecting the terminal and the authentication server through a communication network, and the speaker system through the numeric string transmitted from the terminal, the authentication server, Memory associated with authentication of an individual's social security number, account number, or password, and the like, comprising a number string consisting of numbers, and a memory corresponding to each person's voiceprint information generated corresponding to the number string, and a voice spoken by the speaker. A voice information database, which is basic information for recognizing a corresponding number string, and selecting from the voice information database a plurality of strings similar to the pronunciation of the speaker according to the utterance of the word regardless of the speaker's voice A voice recognizer, and the voice recognizer The voiceprint information of the individual extracted from the numeric string stored in the memory corresponding to the plurality of strings selected from the base and the voiceprint information of the speaker are extracted, and the two voiceprint information are compared, and the result is greater than the preset threshold. And a speaker recognizer that outputs a result of authentication if it is high, and the terminal is connected to an authentication such as a resident registration number, an account number, or a password of an individual transmitted to the authentication server through the communication network. Means for inputting, and means for receiving and transmitting the voice of the individual to the authentication server.

발명의 상세한 설명Detailed description of the invention

이하, 첨부된 도면을 참고로 본 발명의 내용을 하기에 상세히 설명한다. 본 명세서에서 사용되는 서버는 통신망에 연결되어 사용자에게 특정 서비스를 제공하는 모든 장치를 포함하는 것으로, 컴퓨터 통신망에서 사용자의 명령어를 처리하기 위하여 정보를 제공하거나 주변장치를 제공하는 컴퓨터로 한정해서 사용하고자 함은 아니다.Hereinafter, with reference to the accompanying drawings will be described in detail the contents of the present invention. The server used in the present specification includes all devices connected to a communication network to provide a specific service to a user, and are intended to be limited to a computer that provides information or a peripheral device to process a user's command in a computer communication network. It is not.

제1도는 본 발명을 구현하는 시스템을 개략적으로 도시한 블록도이다. 단말기(100), 인증 서버(200), 및 서비스 서버(300)로 구성되는 본 발명에 있어서, 상기 서비스 서버(300)는 본 발명의 인증 과정을 통해서 사용자 인증을 확인하고, 해당하는 서비스를 제공하기 위한 장치를 의미한다.1 is a block diagram schematically illustrating a system implementing the present invention. In the present invention composed of the terminal 100, the authentication server 200, and the service server 300, the service server 300 confirms the user authentication through the authentication process of the present invention, and provides a corresponding service Means a device for.

상기 단말기(100)는 인증과정을 통해서 상기 서비스 서버(300)에서 제공하는서비스를 제공받는 개인 또는 장치를 포함하는 것으로, 통신망(400)이 변함에 따라서 그 단말기도 변하게 된다. 예를 들어, 통신망을 공중전화교환망을 이용하는 경우에는 일반 유선전화기가 단말기에 해당하며, 개인이 인터넷을 이용하는 경우에 상기 단말기는 개인용 컴퓨터가 이에 해당한다.The terminal 100 includes an individual or a device that receives a service provided by the service server 300 through an authentication process, and the terminal also changes as the communication network 400 changes. For example, when the communication network uses a public telephone switching network, a general landline telephone corresponds to a terminal, and when the individual uses the Internet, the terminal corresponds to a personal computer.

상기 단말기는 기본적으로 화자가 인증에 필요한 숫자들의 조합으로 구성되는 숫자열을 입력하기 위한 장치 및 상기 화자의 음성을 전달하기 위한 전송장치를 포함한다. 일 예로, 상기 숫자열을 입력하는 장치에는 숫자 키를 포함하는 키 패드가 이에 해당하며, 상기 전송장치는 음성을 입력받는 마이크를 포함하는 모뎀(modem)이 이에 해당한다. 그러나 이에 한정될 필요는 없으며, 본 발명의 기술적 사상 범위내에서 종래의 어느 것을 사용하여도 무방하다.The terminal basically includes a device for inputting a string of numbers consisting of a combination of numbers necessary for the speaker to authenticate and a transmission device for delivering the speaker's voice. As an example, the device for inputting the string of numbers corresponds to a keypad including a number key, and the transmission device corresponds to a modem including a microphone for receiving a voice. However, the present invention is not limited thereto, and any conventional one may be used within the technical scope of the present invention.

인증서버(300)는 본 발명을 구현하기 위한 장치로, 화자의 음성을 인식하기 위한 엔진(290)(이하 '음성인식기') 및 화자의 음성으로부터 성문을 추출하고, 확인하는 엔진(270)(이하 '화자인식기')을 포함해서 구성된다.Authentication server 300 is a device for implementing the present invention, the engine 290 (hereinafter referred to as 'voice recognizer') for recognizing the speaker's voice and the engine 270 for extracting and verifying the voiceprint from the speaker's voice ( It is configured to include 'speaker recognition' below.

상기 음성인식기(290)는 인식기(290)가 가지고 있는 음성모델 데이터베이스(210)로부터 숫자열의 음성인식에 사용되는 상기 숫자열의 대상 후보를 선정하고, 상기 화자인식기(270)는 상기 음성인식기(290)에서 선택한 숫자열 대상 후보에서 화자의 특징 패러미터를 추출해 화자인증을 제공하게 된다.The speech recognizer 290 selects a target candidate of the string of numbers used for speech recognition of the string of numbers from the speech model database 210 of the recognizer 290, and the speaker recognizer 270 is the speech recognizer 290. Speaker authentication is performed by extracting the speaker's feature parameters from the selected candidate string.

또한, 상기 인증서버(300)는 화자가 인증을 위해 사용되는 개인의 주민등록번호, 계좌번호, 또는 비밀번호등 서비스의 이용과 관계되고, 번호로 이루어진 숫자열을 데이터베이스(230)로 저장해서 관리하고, 상기 숫자열에 따른 화자의 특징패러미터, 즉 성문 정보를 담고있는 성문 데이터베이스(250)를 포함한다.In addition, the authentication server 300 is related to the use of the service, such as the resident registration number, account number, or password of the individual that the speaker is used for authentication, and stores and manages a string of numbers in the database 230, and It includes a voiceprint database 250 that contains the feature parameters of the speaker according to the number string, that is, the voiceprint information.

부재번호 400은 통신망을 의미하는 것으로, 단말기(100)와 인증서버(200)와의 사이에서 통신을 제공하기 위한 것으로, 특별히 본 발명의 기술적 사상을 해치지 않는 범위내에서 종래의 어느 것을 사용하여도 무방하다. 예를 들어, 본 발명이 폰-뱅킹과 관련해서 이용되는 경우에는 상기 통신망은 공중전화교환망이 될 것이고, 인터넷 뱅킹에 이용되는 경우에는 인터넷 망이 될 것이다.The reference numeral 400 denotes a communication network, which is for providing communication between the terminal 100 and the authentication server 200, and may be any conventional one within the scope of not impairing the technical idea of the present invention. Do. For example, when the present invention is used in connection with phone-banking, the communication network will be a public switched telephone network, and when used for internet banking, it will be an internet network.

제2도는 본 발명의 일 실시예에 따라 진행되는 숫자열의 화자인증 과정을 도시한 흐름도이다. 먼저, 인증서버(200)는 개인의 주민등록번호, 계좌번호, 또는 비밀번호등 서비스의 이용과 관계되고, 번호로 이루어진 숫자열을 단말기(100)로부터 전송 받는다(S100). 이후에 자세히 설명되어지지만 상기 숫자열은 숫자열에 대응하는 성문 정보와 대응해서 특정 메모리에 격납되고, 이는 숫자열 데이터베이스(230) 및 성문 데이터베이스(250)로 관리된다.2 is a flowchart illustrating a speaker authentication process of a string of numbers according to an embodiment of the present invention. First, the authentication server 200 is related to the use of the service, such as an individual's social security number, account number, or password, and receives a string of numbers from the terminal 100 (S100). Although described in detail later, the numeric string is stored in a specific memory in correspondence with the voiceprint information corresponding to the numeric string, which is managed by the numeric string database 230 and the voiceprint database 250.

이때, 서버(200)와 단말기(100)와의 사이에서 이루어지는 통신 방법은 어떠한 것을 사용해도 무방하다. 예를 들면, TCP/IP를 기반으로 하는 유선 인터넷 망, WAP을 기반으로 구성된 무선 인터넷 망, 핸드폰을 이용한 무선망, 또는 폰-뱅킹 서비스에서 이용되는 공중전화교환망을 이용할 수 있다.At this time, any communication method may be used between the server 200 and the terminal 100. For example, a wired internet network based on TCP / IP, a wireless internet network based on WAP, a wireless network using a mobile phone, or a public telephone switching network used in a phone-banking service may be used.

그리고 상기 서버(200)로 전송된 숫자열을 바탕으로 인증서버(200)는 화자의 음성 특징, 즉 성문에 해당하는 특징 패러미터를 추출한다(S200). 서버로 전송된 상기 숫자열을 화자가 다수 발음해서 인증서버(200)는 상기 화자의 특징 패러미터를 추출한다.The authentication server 200 extracts the speaker's voice feature, that is, a feature parameter corresponding to the voiceprint, based on the number string transmitted to the server 200 (S200). The speaker pronounces the number string transmitted to the server a plurality of times, the authentication server 200 extracts the feature parameters of the speaker.

상기 특징 패러미터는 화자와는 무관하게 동일한 문자 또는 숫자에 따라 동일한 음성 특징을 나타내기도 하지만, 화자에 따라서는 구강구조, 발음의 특징, 음색등의 차이로 인해서 동일한 낱말이라도 상기 특징 패러미터는 다르게 나타난다. 본 단계(S200)에서 추출되는 특징 패러미터는 화자에 따라 다르게 나타나는 음성 정보를 말하는 것(이하 '성문 정보')으로, 성문에 해당하는 정보를 추출하기 위한 것이다.The feature parameter may represent the same voice feature according to the same letter or number regardless of the speaker, but depending on the speaker, the feature parameter may be different even in the same word due to differences in oral structure, pronunciation characteristics, and tone. The feature parameter extracted in this step (S200) refers to voice information that appears differently according to the speaker (hereinafter, referred to as 'gender information'), and is used to extract information corresponding to the voiceprint.

추출된 숫자열의 성문 정보와 이에 대응하는 숫자열에 관한 정보, 즉 화자가 상기 단계(S100)에서 입력한 숫자의 텍스트 정보는 대응해서 서버의 특정 메모리, 즉 숫자열 데이터베이스(230) 및 성문 데이터베이스(250)에 저장된다.The information on the voiceprint of the extracted numeric string and the corresponding numeric string, that is, the text information of the number input by the speaker in the step S100 may correspond to a specific memory of the server, that is, the numeric string database 230 and the voiceprint database 250. )

본 단계까지는 다음의 인증과정을 위해서 숫자열과 대응하는 화자의 성문정보를 등록하는 단계로 상기 단계들에서 선택된 화자의 성문 정보와 이 성문 정보에 해당하는 숫자열 정보는 이후의 인증과정에서 숫자열의 음성인식과 이에 따른 화자의 인증과정을 수행하는데 이용된다.Up to this step, the voice information of the speaker corresponding to the number string is registered for the next authentication process. The voice information of the speaker selected in the above steps and the number string information corresponding to the voice information are the voice of the number string in the subsequent authentication process. It is used to perform the recognition and thus the speaker authentication process.

음성을 통한 인증을 위해 화자는 상기 단계(S100)에서 등록한 주민등록번호, 계좌 번호, 비밀번호등의 숫자열을 발성한다(S300). 발성된 개인의 음성은 단말기에 입력되고, 통신망을 통해 인증서버로 전송되고, 상기 인증서버에서 처리되어 다음의 단계가 진행된다.The speaker speaks a string of numbers such as a resident registration number, an account number, and a password registered in the step S100 for authentication by voice (S300). The voice of the spoken person is inputted to the terminal, transmitted to the authentication server through a communication network, processed by the authentication server, and the next step is performed.

다음으로, 상기 인증서버(200)의 음성인식기(290)는 발성된 숫자열을 바탕으로 해서 상기 화자의 특징 패러미터를 추출한다(S400). 본 단계에서 추출하는 음성의 특징 패러미터는 상기 단계에서 추출한 성문 정보가 아니라, 화자와는 무관하게단어 또는 숫자에 따라 고유하게 나타나는 특징을 말하는 것이다. 상기 특징 패러미터를 이용하여 음성인식기는 발성된 발음과 유사한 다수의 숫자열들을 선택한다. 이는 음성인식기의 정확성이 떨어질 것을 대비하여 다수의 숫자열 중에 발성한 발음에 해당되는 숫자열이 포함될 확률을 높이고, 이후 화자를 인식하는 과정에서 선택된 다수의 숫자열 중 성문 정보를 이용하여 하나의 숫자열을 선택하게 하기 위함이다. 또한 여러 사용자가 동일한 비밀번호로 등록한 경우 음성 인식기에서 이를 모두 선택한 후, 화자를 인식하는 과정에서 성문정보를 이용하여 하나의 숫자열을 선택하기 위함이다.Next, the voice recognizer 290 of the authentication server 200 extracts the speaker's feature parameters based on the spoken number string (S400). The feature parameter of the voice extracted in this step is not the glottal information extracted in the above step, but refers to a feature that appears uniquely according to a word or a number regardless of the speaker. Using the feature parameter, the speech recognizer selects a plurality of strings similar to the spoken pronunciation. This increases the probability of the number string corresponding to the pronunciation pronounced among a plurality of strings in preparation for the deterioration of the accuracy of the speech recognizer, and then uses a single number by using the glottal information among the number strings selected in the process of recognizing the speaker. To select a column. In addition, when multiple users register with the same password, all of them are selected by the voice recognizer, and then a single string is selected by using voice information in the process of recognizing the speaker.

예를 들어, '홍길동'이라는 화자가 비밀번호로 '123456'을 등록할 수도 있고, '이순신'이 동일한 비밀번호 '123456'을 등록할 수도 있기 때문이다.For example, a speaker named 'Hong Gil Dong' may register '123456' as a password, or 'Yi Sun Shin' may register the same password '123456'.

상기 음성 인식기(290)의 동작에 따라 다수의 인식 대상 후보 숫자열이 음성 모델 데이터베이스(210)로부터 선택되고, 선택된 음성 모델에 대해서 화자의 인증과정이 수행된다.According to the operation of the speech recognizer 290, a plurality of candidate candidate strings to be recognized are selected from the speech model database 210, and a speaker authentication process is performed on the selected speech model.

상기 음성 인식기(290)가 선택한 숫자열에는 대응해서 상기 단계(S200)에서 등록한 화자의 특징 패러미터, 즉 성문에 관한 정보를 가지고 있다. 이를 이용해서 화자 인식기는 상기 음성 인식기(290)의 동작과 함께 S400 단계에서 인증서버로 전송된 화자의 음성에서 화자의 성문 정보를 추출하고, 그리고 상기 음성 인식기(290)가 선정한 숫자열에 대응하는 성문 정보를 상기 성문 데이터베이스로부터 호출해서 두 개의 성문 정보, 즉 추출된 화자의 성문 정보와 성문 데이터베이스에서 선택된 성문 정보를 비교하여 유사도가 가장 높은 하나의 숫자열을 선택하게된다.The speech sequence selected by the speech recognizer 290 has corresponding feature parameters of the speaker registered in the step S200, that is, information on the voiceprint. Using this, the speaker recognizer extracts the voiceprint information of the speaker from the speaker's voice transmitted to the authentication server in operation S400 with the operation of the voice recognizer 290, and the voiceprint corresponding to the number string selected by the voice recognizer 290. The information is called from the gated database, and the two gated information, that is, the extracted voiced gate information and the gated information selected from the gated database, are selected to select a single string having the highest similarity.

본 단계(S500)에서 선정된 숫자열은 바로 화자 인증의 결과값으로 출력되는 것이 아니라, 상기의 화자 인증 과정을 거쳤더라도 다수의 요소에 의해 그 인식 결과값이 낮아질 수가 있다.The number string selected in the step S500 is not directly output as a result of the speaker authentication, but the recognition result may be lowered by a plurality of factors even though the speaker authentication process is performed.

따라서, 상기 화자 인식기(270)는 선택된 하나의 숫자열에 대해서 최종적으로 설정된 문턱치 값보다 높은지를 판단해서 기 설정된 문턱치 값보다 높은 경우에는 상기 단계에서 선정된 하나의 숫자열을 인증 결과값으로 인정한다.Accordingly, the speaker recognizer 270 determines whether the selected one string is higher than the finally set threshold value, and recognizes the selected one string as the authentication result value when the speaker recognizer 270 is higher than the preset threshold value.

반면에 문턱치 값보다 낮은 경우에는 화자인증을 위한 단계인 인증을 위한 숫자열을 발성하는 단계(S300)로 복귀하여 S400∼S600단계를 반복하도록 하거나, 단말기의 키 패드 등을 이용해서 화자가 등록한 비밀번호 확인 등의 다른 단계를 수행할 수 있도록 구성할 수도 있다.On the other hand, if it is lower than the threshold value, the process returns to step S300 for generating a string for authentication, which is a step for speaker authentication, to repeat steps S400 to S600, or a password registered by the speaker using the keypad of the terminal. It can also be configured to perform other steps, such as verification.

본 발명에 따르면, 종래 저조한 인식률을 나타내던 숫자 음성인식에서 비약적인 음성인식률을 나타내며, 음성인식과 화자인식이 결합된 본 발명의 인증방법은 다양한 형태로 산업적인 이용범위를 제한하지 않으면서도 탁월한 효과를 나타낸다.According to the present invention, the present invention shows a remarkable speech recognition rate in the numerical speech recognition that shows a low recognition rate in the prior art, the authentication method of the present invention combined with speech recognition and speaker recognition has an excellent effect without limiting the industrial use range in various forms Indicates.

본 발명의 단순한 변형 내지 변경은 이 분야의 통상의 지식을 가진 자에 의하여 용이하게 이용될 수 있으며, 이러한 변형이나 변경은 모두 본 발명의 영역에 포함되는 것으로 볼 수 있다.Simple modifications and variations of the present invention can be readily used by those skilled in the art, and all such variations or modifications can be considered to be included within the scope of the present invention.

Claims

Related to speech recognition and authentication, the process of registering a number sequence consisting of numbers, and the process of authenticating the speech recognition and the speaker of the spoken number string based on the registered number sequence, the registration process,

Related to authentication of a speaker's resident registration number, account number, or password transmitted through a communication network, and inputs a string of numbers through a terminal to be transmitted to the authentication server; And

Extracting voiceprint information of the speaker transmitted through the communication network based on the numeric string input through the terminal, and creating a numeric string database and a voiceprint database corresponding to the numeric string input by the speaker in the step and storing the same in the memory; Consists of steps,

The voice recognition of the number string and the authentication process of the speaker,

A speech recognizer extracts feature parameters of the speech from the speech string transmitted through the communication network and selects a plurality of candidate strings of recognition candidates from the speech model database included in the speech recognizer;

After selecting the same number strings by comparing the candidate strings selected by the speech recognizer with the number strings stored in the number string database and stored in the memory by the speaker through the registration process, the selected number strings are selected. Calling the voiceprint information corresponding to the numeric string from the voiceprint database and comparing the voiceprint information extracted from the numeric string voice transmitted through the communication network, the speaker recognizer selects and outputs one numeric string having the highest similarity as a recognition result;

Calculating a threshold for the selected sequence of numbers to determine whether the value is higher than a set value; And

If the threshold calculated in the step is higher than a preset value, authenticating that the speaker is a registered person;

Speaker authentication method comprising the steps.

The method of claim 1, wherein when the calculating of the threshold is lower than the set value, the method returns to selecting the candidate strings for recognition.

A memory associated with authentication of an individual's social security number, an account number, or a password transmitted through a communication network, and correspondingly storing a numeric string consisting of numbers and voiceprint information of the individual generated corresponding to the numeric string;

It includes a voice model database that is a basic information for recognizing a corresponding sequence of numbers from the voice spoken by the speaker, wherein the speaker speaks according to the utterance characteristics of the word irrespective of the speaker's voice and recalls a plurality of similar sequence of numbers. A speech recognizer for selecting from a speech model database; And

The voice recognizer extracts the voiceprint information of the person and the voiceprint information of the speaker extracted from the numeric string stored in the memory corresponding to the plurality of numeric strings selected from the voice model database, and compares the two voiceprint information. A speaker recognizer that outputs an authentication result when the result is higher than a preset threshold;

Authentication server, characterized in that configured to include.

The authentication apparatus of claim 3, wherein the speaker recognizer does not output an authentication result value when the result value is lower than a threshold value, and transmits a determination result to the voice recognizer to select a candidate candidate string again. server.

An authentication system which connects a terminal and an authentication server through a communication network and performs speaker authentication through a numeric string transmitted from the terminal, wherein the authentication server includes:

A memory associated with authentication such as a social security number, an account number, or a password of an individual transmitted through the communication network, and correspondingly storing a numeric string consisting of numbers and voiceprint information of the individual generated in correspondence with the numeric string;

The voice recognizer extracts the voiceprint information of the person and the voiceprint information of the speaker extracted from the numeric string stored in the memory corresponding to the plurality of numeric strings selected from the voice model database, and compares the two voiceprint information. And a speaker recognizer that outputs an authentication result when the result is higher than a preset threshold.

The terminal,

Means for inputting a string of numbers that is related to authentication, such as a resident registration number, an account number, or a password of a speaker transmitted through the communication network; And

Means for receiving and transmitting a voice of the individual to the authentication server;

Authentication system, characterized in that configured to include.

The authentication apparatus of claim 5, wherein the speaker recognizer does not output an authentication result when the result value is lower than a threshold value, and transmits the determination result to the voice recognizer to select the candidate candidate string again. system.