KR20200066030A

KR20200066030A - Method for voice registration and certification using voice id system

Info

Publication number: KR20200066030A
Application number: KR1020180153059A
Authority: KR
Inventors: 조태영; 최문철
Original assignee: (주)아틀라스랩스
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-09

Abstract

Disclosed is a method for voice registration and certification using a voice ID system capable of recognizing a voice of a user. The method comprises the steps of: obtaining voice data of a user; generating a unique identifier corresponding to the obtained voice data; obtaining a voice input of the user; and verifying the voice input based on the unique identifier.

Description

Voice registration and authentication method using voice ID system {METHOD FOR VOICE REGISTRATION AND CERTIFICATION USING VOICE ID SYSTEM}

본 발명은 음성 ID 시스템을 이용한 음성 등록 및 인증방법에 관한 것이다. The present invention relates to a voice registration and authentication method using a voice ID system.

음성인식기를 사용하여 사용자의 발화를 검증하는 방식은 의미없는 데이터를 걸러내는 데는 큰 역할을 하지만, 반대로 음성인식기를 사용하기에 발생하는 한계가 존재한다. The method of verifying a user's speech using a voice recognizer plays a large role in filtering out meaningless data, but on the contrary, there are limitations in using the voice recognizer.

음성인식기의 경우 사용자의 발화 내용이 해당 음성인식기의 언어모델에 학습되지 않는 내용일 경우 가장 비슷한 학습된 내용으로 인식하고자 하는 경향이 있는데, 이는 정확한 발음을 학습하기 위한 음향 모델 학습데이터를 구축하는데 문제를 일으킬 수 있다. In the case of the speech recognizer, if the user's speech content is not learned in the language model of the corresponding speech recognizer, there is a tendency to recognize it as the most similar learned content. Can cause

예를 들어 한국어의 경우 지방에 따라 다르게 나타나는 사투리나 완전히 새롭게 나타나는 인터넷 용어들을 사용하여 발화하는 경우에는 실제 오디오와 인식된 텍스트가 상이하게 되는 바 학습데이터로 사용할 수 없다. For example, in the case of Korean, when speaking using a dialect or a completely new Internet term that appears differently depending on the region, the actual audio and the recognized text are different, so it cannot be used as learning data.

따라서 이와 같은 한계를 극복하고 크라우드소싱 데이터의 질을 높이기 위해서는 사용자에게 자신이 할 말을 텍스트로 먼저 표현하도록 하고, 사용자가 직접 작성한 텍스트를 발화하였을 때, 인식기 입장에서 인식된 소리가 실제 사용자의 의도한 텍스트와 발음적으로 얼마나 비슷한지를 수치화 할 수 있는 방법의 개발이 요구된다.Therefore, in order to overcome these limitations and improve the quality of crowdsourcing data, the user must first express his or her words as text, and when the user directly utters the text, the sound recognized by the recognizer is the actual user's intention. There is a need to develop a method to quantify how similar a phonetic text is.

본 발명이 해결하고자 하는 과제는 음성 ID 시스템을 이용한 음성 등록 및 인증방법을 제공하는 것이다.The problem to be solved by the present invention is to provide a voice registration and authentication method using a voice ID system.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위한 본 발명의 일 면에 따른 음성 ID 시스템을 이용한 음성 등록 및 인증방법은, 사용자의 음성 데이터를 획득하는 단계, 상기 획득된 음성 데이터에 대응하는 고유 식별자를 생성하는 단계, 사용자의 음성 입력을 획득하는 단계 및 상기 고유 식별자에 기반하여 상기 음성 입력에 대한 검증을 수행하는 단계를 포함한다.A voice registration and authentication method using a voice ID system according to an aspect of the present invention for solving the above-described problems comprises: obtaining a user's voice data, generating a unique identifier corresponding to the obtained voice data, And obtaining a user's voice input and verifying the voice input based on the unique identifier.

또한, 상기 고유 식별자를 생성하는 단계는, 레퍼런스 모델에 기반하여 생성된 통합 음성 모델을 이용하여, 상기 음성 데이터로부터 음성 매트릭스를 추출하는 단계 및 상기 음성 매트릭스와 상기 사용자의 ID를 이용하여 상기 사용자의 음성 ID를 등록하는 단계를 포함할 수 있다.In addition, the generating of the unique identifier may include extracting a voice matrix from the voice data using the integrated voice model generated based on a reference model, and using the voice matrix and the ID of the user. And registering a voice ID.

또한, 상기 검증을 수행하는 단계는, 상기 통합 음성 모델을 이용하여, 상기 음성 입력으로부터 음성 매트릭스를 추출하는 단계 및 상기 음성 입력으로부터 추출된 음성 매트릭스와, 상기 사용자 ID에 기반하여 검색 및 채점을 수행하되, 상기 채점은 상기 레퍼런스 모델이 기반하여 생성된 채점 모델을 이용하여 수행되는 것을 특징으로 하는, 단계를 포함할 수 있다.In addition, the step of performing the verification may include, using the integrated speech model, extracting a speech matrix from the speech input, and performing a search and scoring based on the speech matrix extracted from the speech input and the user ID. However, the scoring may include a step characterized in that it is performed using a scoring model generated based on the reference model.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific matters of the present invention are included in the detailed description and drawings.

개시된 실시 예에 따르면, 음성 ID 등록을 통해 서로 고유한 특성을 갖는 사용자들의 음성을 미리 등록하고, 이에 기반하여 사용자의 음성을 인식할 수 있는 시스템을 제공할 수 있다.According to the disclosed embodiment, it is possible to provide a system capable of pre-registering voices of users having characteristics unique to each other through voice ID registration and recognizing the voices of the users based on this.

이에 기반하여, 텍스트 음성 변환 시스템을 사용하여 오디오를 생성하는 등의 컴퓨터 사기를 퇴치하고, 제3자 음성 애플리케이션을 대신하여 사용자를 인증할 수 있는 장점이 있다.Based on this, there is an advantage of using a text-to-speech system to combat computer fraud such as generating audio, and authenticating a user on behalf of a third-party voice application.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 일 실시 예에 따른 음성 ID 시스템을 이용한 음성 등록 및 인증방법을 도시한 도면이다.1 is a diagram illustrating a voice registration and authentication method using a voice ID system according to an embodiment.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention, and methods for achieving them will be clarified with reference to embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only the present embodiments allow the disclosure of the present invention to be complete, and are common in the technical field to which the present invention pertains. It is provided to fully inform the skilled person of the scope of the present invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for describing the embodiments and is not intended to limit the present invention. In the present specification, the singular form also includes the plural form unless otherwise specified in the phrase. As used herein, “comprises” and/or “comprising” does not exclude the presence or addition of one or more other components other than the components mentioned. Throughout the specification, the same reference numerals refer to the same components, and “and/or” includes each and every combination of one or more of the components mentioned. Although "first", "second", etc. are used to describe various components, it goes without saying that these components are not limited by these terms. These terms are only used to distinguish one component from another component. Therefore, it goes without saying that the first component mentioned below may be the second component within the technical spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in the present specification may be used as meanings commonly understood by those skilled in the art to which the present invention pertains. In addition, terms defined in the commonly used dictionary are not ideally or excessively interpreted unless explicitly defined.

명세서에서 사용되는 "부" 또는 “모듈”이라는 용어는 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, "부" 또는 “모듈”은 어떤 역할들을 수행한다. 그렇지만 "부" 또는 “모듈”은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. "부" 또는 “모듈”은 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 "부" 또는 “모듈”은 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 "부" 또는 “모듈”들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 "부" 또는 “모듈”들로 결합되거나 추가적인 구성요소들과 "부" 또는 “모듈”들로 더 분리될 수 있다.The term "part" or "module" as used in the specification refers to a hardware component such as software, FPGA, or ASIC, and "part" or "module" performs certain roles. However, "part" or "module" is not meant to be limited to software or hardware. The "unit" or "module" may be configured to be in an addressable storage medium or may be configured to reproduce one or more processors. Thus, as an example, "part" or "module" means components, processes, functions, attributes, such as software components, object-oriented software components, class components and task components. Includes procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, database, data structures, tables, arrays and variables. The functionality provided within components and "parts" or "modules" can be combined into a smaller number of components and "parts" or "modules" or into additional components and "parts" or "modules" Can be further separated.

본 명세서에서, 컴퓨터는 적어도 하나의 프로세서를 포함하는 모든 종류의 하드웨어 장치를 의미하는 것이고, 실시 예에 따라 해당 하드웨어 장치에서 동작하는 소프트웨어적 구성도 포괄하는 의미로서 이해될 수 있다. 예를 들어, 컴퓨터는 스마트폰, 태블릿 PC, 데스크톱, 노트북 및 각 장치에서 구동되는 사용자 클라이언트 및 애플리케이션을 모두 포함하는 의미로서 이해될 수 있으며, 또한 이에 제한되는 것은 아니다.In the present specification, the computer means all kinds of hardware devices including at least one processor, and may be understood as a meaning encompassing software configurations operating in the corresponding hardware device according to embodiments. For example, a computer may be understood as meaning including, but not limited to, a smart phone, a tablet PC, a desktop, a laptop, and a user client and application running on each device.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 명세서에서 설명되는 각 방법 및 단계들은 컴퓨터에 의하여 수행되는 것으로 설명되나, 각 단계의 주체는 이에 제한되는 것은 아니며, 실시 예에 따라 각 단계들의 적어도 일부가 서로 다른 장치에서 수행될 수도 있다.Each method and step described in this specification is described as being performed by a computer, but the subject of each step is not limited thereto, and according to an embodiment, at least some of the steps may be performed in different devices.

이하에서는, 사용자의 음성을 이용하여 인증을 수행하는 음성 인증 방법과, 이를 수행하기 위한 데이터 수집 및 학습방법이 개시된다.Hereinafter, a voice authentication method for performing authentication using a user's voice, and a data collection and learning method for performing the authentication are disclosed.

도 1을 참조하면, 음성 ID 시스템은 크게 3가지 프로세스로 구성되며, 이는 모델 트레이닝 프로세스, 음성 등록 프로세스 및 음성 인증 프로세스를 포함할 수 있다.Referring to FIG. 1, the voice ID system is mainly composed of three processes, which may include a model training process, a voice registration process, and a voice authentication process.

일 실시 예에서, 레퍼런스 모델 트레이닝에 기반하여 통합 음성 모델 및 채점 모델을 생성하는 프로세스가 수행될 수 있다.In one embodiment, a process of generating an integrated speech model and a scoring model based on reference model training may be performed.

예를 들어, 레퍼런스 모델은 수집된 전체 음성 데이터를 사용하여 통합된 음성 모델을 생성할 수 있도록 트레이닝된다. 통합 음성 모델은 사용자 음성의 음성 매트릭스를 추출하는 데 사용되며, 채점 모델은 등록된 음성 데이터와 수신된 음성 데이터 간의 유사도 점수를 계산하는 데 사용될 수 있다.For example, the reference model is trained to generate an integrated voice model using the entire collected voice data. The integrated speech model is used to extract the speech matrix of the user speech, and the scoring model can be used to calculate the similarity score between the registered speech data and the received speech data.

일 실시 예에서, 음성 모델 트레이너는 각 화자의 ID와, 오디오 말뭉치(corpus)를 이용하여 학습을 수행함으로써 통합 음성 모델 및 채점 모델을 생성할 수 있다.In one embodiment, the voice model trainer may generate an integrated voice model and a scoring model by performing learning using each speaker's ID and an audio corpus.

일 실시 예에서, 음성 등록 프로세스가 수행될 수 있다.In one embodiment, a voice registration process may be performed.

예를 들어, 사용자는 음성 등록을 위해 대략 5분짜리의 음성 데이터를 제출할 수 있고, 컴퓨터는 이를 활용하여 음성 ID 역할을 하는 고유 식별자를 생성할 수 있다. 사람의 음성은 각각의 피치, 포먼트(formant) 및 기타 특성들로 인해 개개인마다 고유하므로, 사람별로 고유 식별자가 생성될 수 있다.For example, a user can submit approximately 5 minutes of voice data for voice registration, and the computer can use this to generate a unique identifier that acts as a voice ID. Since the human voice is unique for each individual due to each pitch, formant, and other characteristics, a unique identifier can be generated for each person.

사용자의 음성 데이터가 획득되는 경우, 컴퓨터는 음성 매트릭스 추출기를 사용하여 사용자의 음성 데이터를 기반으로 하는 고유한 음성 매트릭스를 생성한다. 결과 매트릭스는 homomorphically 암호화되고, 사용자 ID와 함께 스마트 컨트랙트에 기반하여 저장될 수 있다. 또한 homomorphic 암호화를 통해 음성 매트릭스를 안전하게 유지하는 동시에, 다른 ID를 비교할 수 있다.When the user's voice data is obtained, the computer uses a voice matrix extractor to generate a unique voice matrix based on the user's voice data. The resulting matrix can be homomorphically encrypted and stored based on the smart contract with the user ID. In addition, homomorphic encryption keeps the voice matrix safe while allowing different IDs to be compared.

즉, 컴퓨터는 데이터베이스에 저장된 모든 음성 데이터를 기반으로 참조용 음성 모델을 학습할 수 있고, 사용자의 음성 데이터를 참조용 모델과 비교하여 사용자의 고유한 음성 매트릭스를 생성할 수 있다. 이렇게 생성된 음성 매트릭스는 음성 ID로 사용될 수 있으며, 이는 암호화되어 스마트 컨트랙트에 기반하여 저장될 수 있다.That is, the computer can learn the reference voice model based on all the voice data stored in the database, and generate the user's own voice matrix by comparing the user's voice data with the reference model. The voice matrix thus generated may be used as a voice ID, which may be encrypted and stored based on a smart contract.

일 실시 예에서, 음성 인증 프로세스가 수행될 수 있다.In one embodiment, a voice authentication process may be performed.

사용자로부터 입력 음성이 수신되고, 이에 대한 데이터 확인 또는 사용자 인증 요청이 있을 경우, 컴퓨터는 수신된 음성을 음성 매트릭스 추출기에 입력하여 음성 매트릭스를 생성할 수 있다.When an input voice is received from a user, and there is a request for data verification or user authentication, the computer may input the received voice into a voice matrix extractor to generate a voice matrix.

결과 매트릭스는 또한 음성으로 암호화된 후, 저장된 음성 매트릭스와 비교하여 일치하는 항목을 탐색할 수 있다.The resulting matrix can also be encrypted and then matched against the stored speech matrix to search for matches.

구체적으로, 컴퓨터는 입력된 음성을 음성 매트릭스 추출기를 이용하여 음성 매트릭스로 추출하고, 추출된 음성 매트릭스는 비교대상 사용자 ID에 기반하여 채점 모델에 의한 검색 및 채점에 이용된다. 채점 결과 유사도 점수가 출력되고, 유사도 점수에 기반하여 인증이 수행될 수 있다.Specifically, the computer extracts the input speech into a speech matrix using a speech matrix extractor, and the extracted speech matrix is used for searching and scoring by a scoring model based on a comparison user ID. As a result of scoring, a similarity score is output, and authentication may be performed based on the similarity score.

개시된 실시 예에 따른 음성 ID 시스템을 이용함으로써, 수신된 음성 데이터의 소유권을 확인할 수 있으며, 텍스트 변환 시스템을 사용하여 오디오를 생성하는 등의 방식을 통한 컴퓨터 사기를 퇴치하고, 제3자 음성 애플리케이션을 대신하여 사용자 인증을 수행할 수 있다.By using the voice ID system according to the disclosed embodiment, ownership of the received voice data can be confirmed, and computer fraud through a method such as generating audio using a text conversion system is eliminated, and a third-party voice application is used. Instead, user authentication can be performed.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of a method or algorithm described in connection with an embodiment of the present invention may be implemented directly in hardware, a software module executed by hardware, or a combination thereof. The software modules may include Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), Flash Memory, Hard Disk, Removable Disk, CD-ROM, or It may reside on any type of computer readable recording medium well known in the art.

본 발명의 구성 요소들은 하드웨어인 컴퓨터와 결합되어 실행되기 위해 프로그램(또는 애플리케이션)으로 구현되어 매체에 저장될 수 있다. 본 발명의 구성 요소들은 소프트웨어 프로그래밍 또는 소프트웨어 요소들로 실행될 수 있으며, 이와 유사하게, 실시 예는 데이터 구조, 프로세스들, 루틴들 또는 다른 프로그래밍 구성들의 조합으로 구현되는 다양한 알고리즘을 포함하여, C, C++, 자바(Java), 어셈블러(assembler) 등과 같은 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능적인 측면들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다.The components of the present invention may be implemented as a program (or application) to be executed in combination with a hardware computer, and stored in a medium. The components of the present invention can be implemented in software programming or software components, and similarly, embodiments include C, C++, including various algorithms implemented in a combination of data structures, processes, routines or other programming components. , Can be implemented in programming or scripting languages such as Java, assembler, etc. Functional aspects can be implemented with algorithms running on one or more processors.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다. The embodiments of the present invention have been described with reference to the accompanying drawings, but a person skilled in the art to which the present invention pertains may implement the present invention in other specific forms without changing its technical spirit or essential features. You will understand. Therefore, it should be understood that the above-described embodiments are illustrative in all respects and not restrictive.

Claims

In the computer-implemented method,
Obtaining the user's voice data;
Generating a unique identifier corresponding to the acquired voice data;
Obtaining a user's voice input; And
Performing verification of the voice input based on the unique identifier; Containing,
Voice registration and authentication method using voice ID system.

According to claim 1,
Generating the unique identifier,
Extracting a speech matrix from the speech data using an integrated speech model generated based on a reference model; And
Registering the user's voice ID using the voice matrix and the user's ID; Containing,
Voice registration and authentication method using voice ID system.

According to claim 2,
The step of performing the verification,
Extracting a speech matrix from the speech input using the integrated speech model; And
Performing a search and scoring based on the voice matrix extracted from the voice input and the user ID, wherein the scoring is performed using a scoring model generated based on the reference model; Containing,
Voice registration and authentication method using voice ID system.