KR100309219B1

KR100309219B1 - Network-based speaker learning and verification method and system thereof

Info

Publication number: KR100309219B1
Application number: KR1020000018326A
Authority: KR
Inventors: 이재희
Original assignee: 이상건; (주)브레이크마인드
Priority date: 2000-04-07
Filing date: 2000-04-07
Publication date: 2001-11-03
Also published as: KR20000037106A

Abstract

본 발명은 네트워크 기반의 화자 학습 및 화자 확인 방법 및 장치에 관한 것으로, 특히 화자의 개인 특성을 잘 나타내는 단어들로 구성된 단어군을 이용한 문자 지시형 화자 학습 및 화자 확인에 관한 것이다.The present invention relates to a network-based speaker learning and speaker identification method and apparatus, and more particularly, to a letter-directed speaker learning and speaker identification using a group of words that well represent the individual characteristics of the speaker.

인터넷을 이용하여 정보를 제공하는 사이트에 로그인하기 위하여 별도의 패스워드를 사용하지 않고, 각 개인이 가지고 있는 고유한 특성 중 각 개인의 음성 특성을 이용한다.Instead of using a separate password to log in to a site providing information using the Internet, the voice characteristics of each individual are used among the unique characteristics of each individual.

또한, 화자 인식률을 높이고 녹취로 인한 사칭 방비를 위해, 화자의 성별, 연령 등에 따라 화자의 개인 특성을 잘 나타내는 단어들로 구성된 단어군을 이용한다.In addition, in order to increase the speaker recognition rate and impersonate defense due to recording, a word group composed of words well representing the individual characteristics of the speaker according to the gender and age of the speaker is used.

인터넷 상에서 동시 사용자수를 예측할 수 없는 관계로 동시에 많은 사용자가 시스템에 접속하였을 경우라도 부하가 한 곳에 집중되는 것을 방지하기 위하여 분산 설계된다.Since the number of simultaneous users on the Internet is unpredictable, it is designed to prevent the load from being concentrated in one place even when many users access the system at the same time.

복수의 단어들을 학습하고, 화자 확인 시마다 시간, 화자의 상태, 접속 장소 등을 고려하여 학습한 단어들 중에서 한 단어를 선정하여 화자 확인을 하는 데 사용할 수 있다.A plurality of words may be learned, and each word may be selected and used to identify the speaker by selecting one word from the learned words in consideration of the time, the speaker's state, and the connection location.

Description

Network based speaker learning and speaker identification method and device {NETWORK-BASED SPEAKER LEARNING AND VERIFICATION METHOD AND SYSTEM THEREOF}

본 발명은 네트워크 기반의 화자 학습 및 화자 확인 방법 및 장치에 관한 것이다.The present invention relates to a network-based speaker learning and speaker identification method and apparatus.

컴퓨터의 발달과 함께 사회는 점점 복잡해지고 급속한 정보의 교류가 필요로 하게 되었다. 특히 인터넷의 등장으로 인해 모든 정보통신 기술이 인터넷과 공존해야만이 그 가치를 인정받는 사회가 되어가고 있다. 최근에는 인터넷을 이용하여 정보를 제공하는 사이트(Site)에서는 특정인에게 양질의 정보를 제공하기 위해 사이트를 회원제로 운영하여 회원에게만 정보를 제공하는 사이트가 늘고 있는 추세이다. 통상 회원제로 운영되는 사이트의 경우 회원번호(Member 식별자)와 패스워드(Password)를 입력한 후 사이트를 이용할 수 있다. 그러나 이러한 로그인 방법에는 하기와 같은 문제점들이 있다.With the development of computers, society has become increasingly complex and requires the rapid exchange of information. In particular, with the advent of the Internet, all information and communication technologies must coexist with the Internet to become a society that is recognized for its value. Recently, sites that provide information using the Internet (Site) have been increasing in the number of sites that provide information only to members by operating the site in order to provide high-quality information to a specific person. In the case of a site operated by a membership system, the site can be used after inputting a membership number and password. However, these login methods have the following problems.

첫째, 패스워드가 쉽게 잊혀지거나 해킹(Hacking) 등에 의해 타인에게 알려질 수 있어 타인에 의한 도용에 취약한 문제점이 있다.First, there is a problem that the password is easily forgotten or known to others by hacking or the like, and thus is vulnerable to theft by others.

둘째, 최근에는 한 개인이 3∼5개정도의 회원번호와 패스워드를 보유하고 있기 때문에, 패스워드의 망각 또는 혼돈으로 인해 사이트에 로그인할 수 없어 회원을 재등록하거나 관리자에게 패스워드를 문의하는 경우가 종종 있고 이로 인해 사이트이용의 활성화를 저해하는 문제점이 있다.사용자 개인의 생물학적 특성을 이용하여 해당 사용자를 인증하는 방법, 예를 들어 화자 인식, 홍채 인식, 지문 인식 등에 의한 사용자 인증 방법이 있다. 특히, 최근에는 화자 인식 방법을 통한 사용자 인증 방법이 나타나고 있다. 그러나, 종래의 화자 인식 방법에 의한 사용자 인증 방법은 사용자에게 하나의 단어만을 발음하게 하여 사용자를 인식하고 있다. 하나의 단어만을 발음하게 하므로, 타인이 그 하나의 단어에 대한 발음 요소를 쉽게 도용할 수 있다는 문제점이 있다.또한, 종래의 단일 단어의 발음에 의한 사용자 인증 방법에 따르면, 사용자의 상태에 적절하게 대응할 수 없는 문제점이 있다. 구체적으로 설명하면, 사용자가 건강할 때, 단일의 단어(예를 들어, '우리집')를 발음한 경우, 해당 단일 단어의 발음 요소가 학습되어 미리 저장된다. 그 후, 사용자가 목감기에 걸려 해당 단어('우리집')를 발음하면, 진정한 사용자임에도 불구하고 사용자 인증이 불허되는 경우도 있다.일반적으로 화자 인식 방법에 의한 사용자 인증 방법에 따르면, 화자 학습 과정, 화자 인식 과정 및/또는 화자 성분 등을 추출하기 위한 과정 등은 대용량의 프로세싱이 필요하다. 종래의 화자 인식 방법에 의한 사용자 인증 방법에 따르면, 서버에서 상기 대용량의 프로세싱이 필요한 각각의 과정을 전담하였다. 이로 인하여, 서버에 부하가 집중되어, 서버의 다운 등으로 원할한 사용자 인증 과정을 수행할 수 없는 문제점이 있다.또한, 종래의 전자 상거래에 있어서, 고객이 일정한 주문을 하여 계약이 성립된 후, 해당 고객이 계약 성립 또는 주문한 사실을 부지하거나 부인하는 경우에 마땅하 대처 방안이 미흡하였다.Second, in recent years, since an individual has 3 to 5 member numbers and passwords, it is often impossible to log in to the site due to the forgetting or confusion of passwords. There is a problem that inhibits the activation of the use of the site. There is a method of authenticating the user by using the biological characteristics of the user, for example, a user authentication method by speaker recognition, iris recognition, fingerprint recognition, and the like. In particular, recently, a user authentication method using a speaker recognition method has appeared. However, the conventional user authentication method using the speaker recognition method recognizes the user by causing the user to pronounce only one word. Since only one word is pronounced, there is a problem that another person can easily steal the pronunciation element of the single word. Further, according to the conventional user authentication method by pronunciation of a single word, it is appropriate to the user's state. There is a problem that cannot be coped. Specifically, when a user pronounces a single word (for example, 'my house') when the user is healthy, the pronunciation elements of the single word are learned and stored in advance. After that, if the user is caught by the throat and pronounces the word ('my house'), the user may not be authenticated despite being a real user. Generally, according to the user authentication method using a speaker recognition method, the speaker learning process is performed. For example, the process of extracting a speaker and / or extracting a speaker component requires a large amount of processing. According to the user authentication method using the conventional speaker recognition method, each process requiring the large amount of processing in the server is dedicated. As a result, there is a problem that the load is concentrated on the server and the user authentication process cannot be performed smoothly due to server down. In addition, in a conventional electronic commerce, after a customer makes a certain order and a contract is established, There was a lack of countermeasures if the customer had not known or denied the establishment or ordering of the contract.

따라서, 본 발명의 목적은 상기의 문제점들을 감안하여 이루어진 것이다.Accordingly, an object of the present invention has been made in view of the above problems.

본 발명의 구체적인 목적은 사용자 확인을 위하여 각 개인이 가지고 있는 고유한 특성 중 기본적인 컴퓨터 환경 이외의 부가적인 장비가 설치될 필요없이 각 개인의 음성 특성을 이용한 화자 학습 및 화자 확인을 포함하는 사용자 인증 방법 및 장치를 제공하는 것이다.A specific object of the present invention is a user authentication method including speaker learning and speaker identification using the voice characteristics of each individual without the need for installing additional equipment other than the basic computer environment among the unique characteristics that each individual has for user verification. And to provide an apparatus.

본 발명의 다른 목적은 화자 인식률을 높이고 녹취로 인한 사칭 방비를 위하여 복수의 단어들로 구성되는 단어군을 이용한 화자 학습 및 화자을 포함하는 사용자 인증 방법 및 장치를 제공하는 것이다.본 발명의 또 다른 목적은 화자 인식률을 높이고 녹취로 인한 사칭 방비를 위해, 화자의 성별, 연령 등에 따라 화자의 개인 특성을 잘 나타내는 단어들로 구성된 단어군을 이용한 문자 지시형 화자 학습 및 화자 확인을 포함하는 사용자 인증 방법 및 장치를 제공하는 것이다.Another object of the present invention is to provide a user learning method and apparatus including a speaker learning and a speaker using a group of words composed of a plurality of words for improving the speaker recognition rate and impersonating the defense due to recording. User authentication method including a letter-directed speaker learning and speaker identification using a word group consisting of words well representing the individual characteristics of the speaker according to the sex, age, etc. of the speaker to improve the speaker recognition rate and impersonation protection due to recording; and To provide a device.

본 발명의 또 다른 목적은 인터넷 상에서 동시 사용자수를 예측할 수 없는 관계로 동시에 많은 사용자가 시스템에 접속하였을 경우라도 부하가 한 곳에 집중되는 것을 방지하기 위하여 분산 설계된 화자 학습 및 화자을 포함하는 사용자 인증 방법 및 장치를 제공하는 것이다.본 발명의 또 다른 목적은 전자 상거래에서 고객이 일정한 주문을 하여 계약이 성립된 경우에, 상기 주문에 상응하는 음성 신호를 입력받아 이를 처리한 후 저장함으로써, 상기 고객이 계약 성립 또는 주문한 사실을 부지하거나 부인하는 경우에 상기 음성 신호를 복원하여 고객에게 들려줌으로써 전자 상거래의 안정성을 고취할 수 있는 것이다.It is still another object of the present invention to provide a user learning method including distributed speaker design and speaker, which prevents the load from being concentrated in one place even when many users are connected to the system because the number of simultaneous users is unpredictable on the Internet. It is another object of the present invention to provide a device. In a case where a contract is established by a customer in an e-commerce transaction, a voice signal corresponding to the order is received, processed, and stored, thereby allowing the customer to contract. In the case of noting or denying the fact of establishment or ordering, it is possible to enhance the stability of electronic commerce by restoring the voice signal to the customer.

도 1은 본 발명이 적용될 수 있는 장치의 개략적인 구성도.1 is a schematic configuration diagram of an apparatus to which the present invention can be applied.

도 2a는 본 발명이 적용될 수 있는 장치의 다른 개략적인 구성도.Figure 2a is another schematic configuration diagram of a device to which the present invention can be applied.

도 2b는 본 발명이 적용될 수 있는 장치의 또 다른 개략적인 구성도.Figure 2b is another schematic configuration diagram of a device to which the present invention can be applied.

도 3은 본 발명의 바람직한 일 실시예에 따른 화자 학습 방법을 나타낸 순서도.3 is a flow chart showing a speaker learning method according to an embodiment of the present invention.

도 4는 본 발명의 바람직한 일 실시예에 따른 전처리(Preprocessing) 과정을 나타낸 개략적인 순서도.4 is a schematic flowchart illustrating a preprocessing process according to an exemplary embodiment of the present invention.

도 5는 본 발명의 바람직한 일실시예에 따른, 분산 처리 방식을 예시한 도면.5 illustrates a distributed processing scheme, in accordance with a preferred embodiment of the present invention.

도 6은 본 발명의 바람직한 일실시예에 따른, 상기 학습 과정을 개략적으로 나타내는 순서도.6 is a flow chart schematically showing the learning process according to an embodiment of the present invention.

도 7은 본 발명의 바람직한 다른 실시예에 따른 화자 학습 방법을 나타낸 순서도,7 is a flowchart showing a speaker learning method according to another embodiment of the present invention;

도 8은 본 발명의 바람직한 일 실시예에 따른 화자 확인 방법을 나타낸 순서도.8 is a flow chart showing a speaker identification method according to an embodiment of the present invention.

도 9는 본 발명의 바람직한 일 실시예에 따른 화자 확인 과정을 나타낸 개략적인 순서도.9 is a schematic flowchart showing a speaker identification process according to an exemplary embodiment of the present invention.

도 10은 본 발명의 바람직한 다른 실시예에 따른 화자 확인 방법을 나타낸 순서도.10 is a flow chart showing a speaker identification method according to another embodiment of the present invention.

도 11은 본 발명의 바람직한 또 다른 실시예에 따른 화자 학습 방법을 나타낸 순서도.11 is a flowchart showing a speaker learning method according to another preferred embodiment of the present invention.

도 12는 본 발명의 바람직한 또 다른 실시예에 따른 화자 학습 방법을 나타낸 순서도.12 is a flowchart showing a speaker learning method according to another preferred embodiment of the present invention.

도 13은 본 발명의 바람직한 또 다른 실시예에 따른 화자 확인 방법을 나타낸 순서도.13 is a flow chart showing a speaker identification method according to another embodiment of the present invention.

도 14는 본 발명의 바람직한 또 다른 실시예에 따른 화자 확인 방법을 나타낸 순서도.14 is a flow chart showing a speaker identification method according to another embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

101…서버 103…클라이언트101... Server 103... Client

105…웹 사이트 데이터베이스 107…음성 신호 데이터베이스105... Web site database 107... Voice signal database

201…네트워크 서버 203…화자 인식 서버201... Network server 203... Speaker recognition server

상기 목적들을 달성하기 위하여, 본 발명의 일 측면에 따르면, 서버에서 마이크를 사용하여 접속한 사용자를 인증하는 네트워크 기반의 사용자 인증에 있어서, 상기 사용자의 음성 신호를 분석하여 얻어진 사용자 음성 특성 정보와 함께 사용자 신상 정보를 저장함으로써 상기 사용자를 등록하고, 상기 네트워크를 통하여 직접 또는 간접적으로 인증 요구자로부터 사용자 인증 요구를 수신하고, 사전 구축된 단어군 또는 문장군에서 상기 사용자의 상태에 상응하는 단어 또는 문장을 선택하여 상기 인증 요구자에게 송신하고, 상기 단어 또는 문장에 대한 상기 인증 요구자의 음성 신호 또는 이 음성 신호로부터 추출된 인증 요구자의 음성 특성 정보를 네트워크를 통하여 수신하고, 상기 등록되어 있는 사용자 음성 특성 정보를 이용하여 상기 음성 신호 또는 상기 음성 신호로부터 추출된 인증 요구자의 음성 특성 정보가 진정한 상기 사용자에 의하여 입력된 것인 지의 여부를 판단하는 서버에서의 네트워크 기반의 단어군 또는 문장군을 이용한 사용자 인증 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.In order to achieve the above object, according to an aspect of the present invention, in the network-based user authentication for authenticating a user connected by using a microphone in the server, with the user voice characteristic information obtained by analyzing the voice signal of the user Register the user by storing user profile information, receive a user authentication request from an authentication requester directly or indirectly through the network, and search for a word or sentence corresponding to the user's status in a pre-built word group or sentence group. Selects and transmits to the authentication requestor a voice signal of the authentication requestor for the word or sentence or voice property information of the authentication requester extracted from the voice signal through a network, and receives the registered user voice property information. Using the voice signal Is a user authentication method using a network-based word group or sentence group in the server for determining whether the voice characteristic information of the authentication requester extracted from the voice signal is actually input by the user, corresponding to the method It is possible to provide an apparatus and a system.

본 발명의 바람직한 다른 측면에 따르면, 서버에서 마이크를 사용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 학습에 있어서, 적어도 회원 식별자를 포함하는 소정의 입력 사항을 클라이언트로부터 수신하여 웹 사이트 데이터베이스에 저장하고, 상기 웹 사이트 데이터베이스에 사전 구축된 복수의 단어군-여기서, 각 단어군은 복수의 단어들로 구성됨- 중 한 개의 단어군을 결정하고, 상기 단어군에 상응하는 사전 구축된 단어군 식별자를 상기 입력 사항과 결합하여 상기 웹 사이트 데이터베이스에 저장하고, 상기 단어군에 속한 복수의 단어들, 각각의 단어에 상응하는 사전 구축된 단어 식별자 및 클라이언트에서 상기 클라이언트에 상응하는 사용자가 상기 단어들을 발음한 음성 신호들을 전처리(Preprocessing)하여 상기 각각의 단어에 상응하는 화자 성분들을 추출하여 서버에 송신하기 위한 컴포넌트 소프트웨어(component software)를 제1문서에 삽입(Embedding)하여 상기 클라이언트에게 송신하고, 상기 클라이언트로부터 각각의 단어에 상응하는 단어 식별자들 및 화자 성분들을 수신하고, 상기 화자 성분들을 학습 과정의 수행을 통하여 각각의 단어에 상응하는 기준 패턴 데이터를 추출하고, 상기 각각의 기준 패턴 데이터에 상응하는 기준 패턴 식별자를 생성한 후, 상기 회원 식별자, 상기 기준 패턴 데이터, 상기 기준 패턴 식별자, 단어 식별자를 음성 정보 데이터베이스에 저장하는 서버에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 학습 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.According to another preferred aspect of the present invention, in network-based speaker learning for authenticating a user connected using a microphone at a server, a predetermined input including at least a member identifier is received from a client and stored in a website database. Determine a word group among a plurality of word groups pre-built in the web site database, where each word group consists of a plurality of words, and pre-built word group identifiers corresponding to the word groups. Combined with the input and stored in the web site database, a plurality of words belonging to the word group, a pre-built word identifier corresponding to each word, and a user corresponding to the client at the client pronouncing the words Preprocessing voice signals to each word Embed component software to extract the corresponding speaker components and send them to the server and send them to the client, and send word identifiers and speaker components corresponding to each word from the client. Receiving the speaker components, extracting reference pattern data corresponding to each word by performing a learning process, generating a reference pattern identifier corresponding to each reference pattern data, and then generating the member identifier and the reference pattern. A speaker learning method using a network-based word group for user authentication in a server storing data, the reference pattern identifier, and a word identifier in a voice information database, and an apparatus and system corresponding to the method may be provided.

바람직한 일 실시예에서, 상기 서버에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 학습 방법이 상기 클라이언트로부터 접속 요청 신호를 수신하여, 상기 사용자에게 소정의 입력 사항을 입력할 수 있는 폼(Form) 형태의 제2 문서를 상기 클라이언트에 송신하는 단계, 상기 소정의 입력 사항을 상기 클라이언트로부터 수신하여, 상기 웹 사이트 데이터베이스에 저장하는 단계, 학습 과정이 수행되지 않은 화자 성분이 존재하는 지 여부를 판단하는 단계, 상기 판단 결과 학습 과정이 수행되지 않은 화자 성분이 존재하면, 상기 기준 패턴 데이터를 추출하는 단계로 이행하는 단계 및 상기 판단 결과 학습 과정이 수행되지 않은 화자 성분이 존재하지 않으면, 상기 클라이언트에 완료 통보를 송신하는 단계를 더 포함할 수 있다.In a preferred embodiment, the speaker learning method using a network-based word group for the user authentication in the server receives a connection request signal from the client, to input a predetermined input to the user (Form Transmitting a second document of the form to the client, receiving the predetermined input from the client, storing the predetermined document in the web site database, and determining whether there is a speaker component for which a learning process has not been performed. And if there is a speaker component for which the determination result learning process has not been performed, proceeding to extracting the reference pattern data, and if there is no speaker component for which the determination result learning process has not been performed, The method may further include transmitting a completion notification.

바람직한 다른 실시예에서, 상기 소정의 입력 사항은 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소이다. 여기에서, 상기 한 개의 단어군을 결정하는 것은 성별, 주소, 전자 메일 주소 중 적어도 하나에 상응하여 결정하는 것이다.In another preferred embodiment, the predetermined input is a member identifier, name, gender, address, social security number, e-mail address. The determining of the one word group corresponds to at least one of a gender, an address, and an e-mail address.

본 발명의 또 다른 측면에 따르면, 서버에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 학습 방법을 수행하는 것에 상응하는 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법을 수행하기 위하여 디지털 처리 장치에 의해 실행될 수 있는 명령어들의 프로그램이 유형적으로 구현되어 있으며, 디지털 처리 장치에 의해 판독될 수 있는 기록 매체에 있어서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법이, 상기 서버로부터 특정의 단어군에 속한 단어들, 각각의 단어에 상응하는 사전 구축된 단어 식별자 및 상기 클라이언트에서 상기 클라이언트에 상응하는 사용자가 상기 단어들을 발음한 음성 신호를 전처리(Preprocessing)하여 상기 각각의 단어에 상응하는 화자 성분들을 추출하여 상기 서버에 송신하기 위한 컴포넌트 소프트웨어(component software)를 삽입(Embedding)한 제1문서를 수신하는 단계, 상기 제 1문서를 표시 장치에 표시하는 단계, 상기 사용자로부터 상기 각각의 단어에 상응하는 음성 신호를 수신하는 단계, 상기 각각의 단어에 상응하는 음성 신호로부터 화자 성분을 추출하는 단계 및 상기 각각의 단어에 상응하는 단어 식별자 및 화자 성분들을 상기 서버에 송신하는 단계를 포함하는 것을 특징으로 하는 클라이언트에서의 사용자 인증을 위한 기록 매체를 제공할 수 있다.According to still another aspect of the present invention, a speaker learning method using a network-based word group in a client corresponding to performing a network-based speaker learning method for authenticating a connected user using a microphone in a server is performed. A program of instructions that can be executed by a digital processing apparatus is tangibly embodied in order to provide a recording medium that can be read by a digital processing apparatus. The speaker learning method using a network-based word group in the client includes: Preprocessing words belonging to a specific word group from the server, a pre-built word identifier corresponding to each word, and a voice signal in which the user corresponding to the client pronounces the words. Add speaker components corresponding to words Receiving a first document embedding component software for transmission to the server, displaying the first document on a display device, and voice corresponding to each word from the user Receiving a signal, extracting a speaker component from a speech signal corresponding to each word and transmitting a word identifier and speaker component corresponding to each word to the server It is possible to provide a recording medium for user authentication in.

바람직한 일 실시예에서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법이 상기 사용자가 입력 수단에 의해 입력한 접속 요청 신호를 상기 서버에 송신하는 단계, 상기 서버로부터 상기 사용자가 소정의 입력 사항을 입력할 수 있는 폼(Form) 형태의 제2문서를 수신하여 상기 표시 장치에 표시하는 단계, 상기 사용자로부터 상기 소정의 입력 사항을 입력받아 상기 서버에 송신하는 단계 및 상기 서버로부터 화자 학습 완료 통보를 수신하여 상기 표시 장치에 표시하는 단계를 더 포함할 수 있다.In a preferred embodiment, the speaker learning method using the network-based word group in the client transmits a connection request signal input by the user by the input means to the server, the user input from the server a predetermined input Receiving and displaying a second document of the form (Form) form that can be entered on the display device, Receiving the predetermined input from the user and transmitting to the server and Completed speaker learning from the server The method may further include receiving a notification and displaying the notification on the display device.

바람직한 다른 실시예에서, 상기 소정의 입력 사항은 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소이다.In another preferred embodiment, the predetermined input is a member identifier, name, gender, address, social security number, e-mail address.

바람직한 또 다른 실시예에서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법이 상기 음성 신호를 디지털 신호로 변환(Analog to Digital Converte)하는 단계를 더 포함할 수 있다.In another preferred embodiment, the speaker learning method using a network-based word group in the client may further comprise the step of converting the voice signal to a digital signal (Analog to Digital Converte).

본 발명의 또 다른 측면에 따르면, 서버에서 마이크를 사용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 학습에 있어서, 적어도 회원 식별자를 포함하는 소정의 입력 사항을 클라이언트로부터 수신하여 웹 사이트 데이터베이스에 저장하고, 상기 웹 사이트 데이터베이스에 사전 구축된 복수의 단어군-여기서, 각 단어군은 복수의 단어들로 구성됨- 중 한 개의 단어군을 결정하고, 상기 단어군에 상응하는 사전 구축된 단어군 식별자를 상기 입력 사항과 결합하여 상기 웹 사이트 데이터베이스에 저장하고, 상기 단어군에 속한 복수의 단어들 및 각각의 단어에 상응하는 사전 구축된 단어 식별자를 제1문서에 삽입(Embedding)하여 상기 클라이언트에 송신하고, 상기 클라이언트로부터 상기 각각의 단어에 상응하는 단어 식별자들 및 음성 신호들을 수신하고, 상기 음성 신호들을 전처리(Preprocessing)하여 상기 각각의 단어에 상응하는 화자 성분들을 추출하고, 상기 화자 성분들을 학습 과정의 수행을 통하여 상기 각각의 단어에 상응하는 기준 패턴 데이터를 추출하고, 상기 각각의 기준 패턴 데이터에 상응하는 기준 패턴 식별자를 생성한 후, 상기 회원 식별자, 상기 기준 패턴 데이터, 상기 기준 패턴 식별자, 단어 식별자를 음성 정보 데이터베이스에 저장하는 서버에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 학습 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.According to another aspect of the present invention, in a network-based speaker learning for authenticating a user connected using a microphone at a server, a predetermined input including at least a member identifier is received from a client and stored in a website database. Determine a word group among a plurality of word groups pre-built in the web site database, where each word group consists of a plurality of words, and pre-built word group identifiers corresponding to the word groups. Combined with the input, stored in the web site database, embeds a plurality of words belonging to the word group and a pre-built word identifier corresponding to each word in the first document and sends the same to the client. Word identifiers and voice signals corresponding to the respective words from the client Receiving, preprocessing the speech signals to extract speaker components corresponding to each word, extracting reference pattern data corresponding to each word by performing a learning process on the speaker components, and After generating a reference pattern identifier corresponding to each reference pattern data, a network-based word for user authentication in a server that stores the member identifier, the reference pattern data, the reference pattern identifier, and a word identifier in a voice information database. A speaker learning method using a group, an apparatus and a system corresponding to the method can be provided.

바람직한 일 실시예에서, 상기 소정의 입력 사항은 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소이다. 여기서, 상기 한 개의 단어군을 결정하는 것은 성별, 주소, 전자 메일 주소 중 적어도 하나에 상응하여 결정하는 것이다.In one preferred embodiment, the predetermined input is a member identifier, name, gender, address, social security number, e-mail address. The determining of the one word group corresponds to at least one of a gender, an address, and an e-mail address.

본 발명의 또 다른 측면에 따르면, 서버에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 학습 방법을 수행하는 것에 상응하는 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법을 수행하기 위하여 디지털 처리 장치에 의해 실행될 수 있는 명령어들의 프로그램이 유형적으로 구현되어 있으며, 디지털 처리 장치에 의해 판독될 수 있는 기록 매체에 있어서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법이, 상기 서버로부터 특정의 단어군에 속한 단어들 및 각각의 단어에 상응하는 사전 구축된 단어 식별자를 삽입(Embedding)한 제1문서를 수신하는 단계, 상기 제1문서를 표시 장치에 표시하는 단계, 사용자로부터 상기 각각의 단어에 상응하는 음성 신호를 수신하는 단계 및 상기 각각의 단어에 상응하는 단어 식별자 및 음성 신호들을 상기 서버에 송신하는 단계를 포함하는 것을 특징으로 하는 클라이언트에서의 사용자 인증을 위한 기록 매체를 제공할 수 있다.According to still another aspect of the present invention, a speaker learning method using a network-based word group in a client corresponding to performing a network-based speaker learning method for authenticating a connected user using a microphone in a server is performed. A program of instructions that can be executed by a digital processing apparatus is tangibly embodied in order to provide a recording medium that can be read by a digital processing apparatus. The speaker learning method using a network-based word group in the client includes: Receiving, from the server, a first document in which words belonging to a specific word group and a pre-built word identifier corresponding to each word are embedded, displaying the first document on a display device, a user Receiving a speech signal corresponding to each word from the and And transmitting a word identifier and voice signals corresponding to each word to the server.

바람직한 다른 실시예에서, 상기 소정의 입력 사항은 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소이다. 여기에서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법이 상기 음성 신호를 디지털 신호로 변환(Analog to Digital Converte)하는 단계를 더 포함할 수 있다.In another preferred embodiment, the predetermined input is a member identifier, name, gender, address, social security number, e-mail address. Here, the speaker learning method using a network-based word group at the client may further include converting the voice signal into a digital signal.

본 발명의 또 다른 측면에 따르면, 서버에서 마이크를 사용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 확인에 있어서, 회원 식별자를 클라이언트로부터 수신하고, 상기 회원 식별자에 상응하는 웹 사이트 데이터베이스에 미리 저장된 단어군 식별자를 검색하고, 상기 단어군 식별자에 상응하는 단어군에 속한 복수의 단어들 중 한 개의 단어를 선택하고, 상기 단어, 상기 단어에 상응하는 단어 식별자 및 사용자가 상기 단어를 발음한 음성 신호를 전처리(Preprocessing)하여 상기 단어에 상응하는 음성 신호를 추출하여 서버에 송신하기 위한 컴포넌트 소프트웨어(component software)를 제1문서에 삽입(Embedding)하여 클라이언트에 송신하고, 상기 클라이언트로부터 상기 단어에 상응하는 음성 신호를 수신하고, 상기 음성 신호를 입력으로 하여 상기 단어 식별자에 상응하는 음성 패턴 데이터를 추출하고, 상기 단어 식별자에 상응하는 음성 정보 데이터베이스에 저장된 기준 패턴 데이터를 검색하고, 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 지 여부를 판단하고, 상기 판단 결과 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하면, 상기 클라이언트에 허가 통지를 송신하고, 상기 판단 결과 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하지 않으면, 상기 클라이언트에 거부 통지를 송신하는 서버에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 확인 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.According to still another aspect of the present invention, in a network-based speaker confirmation for authenticating a user connected using a microphone at a server, a member identifier is received from a client and stored in advance in a website database corresponding to the member identifier. Search for a word group identifier, select a word from among a plurality of words belonging to the word group corresponding to the word group identifier, the word, a word identifier corresponding to the word, and a voice signal in which the user pronounces the word Preprocessing and extracting a voice signal corresponding to the word and transmitting it to a server by embedding component software in a first document for transmission to a server, and transmitting the corresponding software word to the client. Receives an audio signal and receives the audio signal as an input. Extracting voice pattern data corresponding to a word identifier, searching for reference pattern data stored in a voice information database corresponding to the word identifier, determining whether the voice pattern data and the reference pattern data match, and determining the As a result, if the voice pattern data and the reference pattern data match, a permission notification is sent to the client. If the voice pattern data and the reference pattern data do not match, the server sending a reject notification to the client. A speaker identification method using a network-based word group for user authentication of an apparatus, and an apparatus and system corresponding to the method may be provided.

바람직한 일 실시예에서, 상기 서버에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 확인 방법이 상기 클라이언트로부터 접속 요청 신호를 수신하여, 상기 사용자에게 회원 식별자를 입력할 수 있는 폼(Form) 형태의 제2 문서를 클라이언트에 송신하는 단계를 더 포함할 수 있다.In a preferred embodiment, a speaker identification method using a network-based word group for user authentication at the server receives a connection request signal from the client, and forms a member identifier to the user. And sending the second document of to the client.

바람직한 다른 실시예에서, 상기 한 개의 단어를 선택하는 것은 임의(random)로 선택하는 것이다.In another preferred embodiment, selecting one word is random selection.

바람직한 또 다른 실시예에서, 상기 한 개의 단어를 선택하는 것은 화자 확인 시의 시간, 화자의 상태, 접속 장소를 고려하여 상기 단어군에 속하는 복수의 단어들 중 한 개를 선택하는 것이다.In another preferred embodiment, selecting one word is selecting one of a plurality of words belonging to the word group in consideration of the time at the speaker identification, the speaker's state, and the connection location.

본 발명의 또 다른 측면에 따르면, 서버에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 확인 방법을 수행하는 것에 상응하는 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 확인 방법을 수행하기 위하여 디지털 처리 장치에 의해 실행될 수 있는 명령어들의 프로그램이 유형적으로 구현되어 있으며, 디지털 처리 장치에 의해 판독될 수 있는 기록 매체에 있어서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 확인 방법이, 사용자가 입력 수단에 의해 입력한 접속 요청 신호를 상기 서버에 송신하는 단계, 상기 서버로부터 상기 사용자에게 회원 식별자를 입력할 수 있는 폼(Form) 형태의 제2 문서를 상기 서버로부터 수신하여 표시 장치에 표시하는 단계, 상기 사용자가 입력 수단에 의해 입력한 상기 회원 식별자를 상기 서버에 송신하는 단계, 상기 서버로부터 상기 회원 식별자에 상응하는 단어, 상기 단어에 상응하는 단어 식별자 및 상기 사용자가 상기 단어를 발음한 음성 신호를 전처리(Preprocessing)하여 상기 단어에 상응하는 화자 성분을 추출하여 상기 서버에 송신하기 위한 컴포넌트 소프트웨어(component software)를 삽입(Embedding)한 제1문서를 수신하는 단계, 상기 제 1문서를 상기 표시 장치에 표시하는 단계, 상기 사용자로부터 상기 단어에 상응하는 음성 신호를 수신하는 단계, 상기 단어에 상응하는 음성 신호로부터 화자 성분을 추출하는 단계, 상기 화자 성분을 상기 서버에 송신하는 단계, 상기 서버로부터 허가 통지를 수신 받는 경우, 상기 표시 장치에 상기 허가 통지를 표시하는 단계 및 상기 서버로부터 거부 통지를 수신 받는 경우, 상기 표시 장치에 상기 거부 통지를 표시하는 단계를 포함하는 것을 특징으로 하는 클라이언트에서의 사용자 인증을 위한 기록 매체를 제공할 수 있다.According to still another aspect of the present invention, a speaker identification method using a network-based word group in a client corresponding to performing a network-based speaker identification method for authenticating a connected user using a microphone in a server is performed. A program of instructions that can be executed by a digital processing apparatus is tangibly embodied in order to provide a recording medium that can be read by a digital processing apparatus, and the speaker identification method using a network-based word group in the client includes: Transmitting a connection request signal input by a user to the server to the server, receiving a second document in the form of a form from which the user can input a member identifier from the server to the display device; Displaying, input by the user by the input means Transmitting the member identifier to the server, preprocessing a word corresponding to the member identifier, a word identifier corresponding to the word, and a voice signal in which the user pronounces the word from the server to correspond to the word. Extracting a speaker component to receive a first document embedded with component software for transmitting to the server, displaying the first document on the display device, and the word from the user Receiving a speech signal corresponding to the speech signal, extracting a speaker component from the speech signal corresponding to the word, transmitting the speaker component to the server, and receiving a permission notification from the server. Displaying the permission notice and receiving a denial notice from the server, It is possible to provide a recording medium for the user authentication in a client comprising the step of displaying the rejection notice to the group display.

바람직한 일 실시예에서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 확인 방법이 상기 음성 신호를디지털 신호로 변환(Analog to Digital Converte)하는 단계를 더 포함할 수 있다.In a preferred embodiment, the speaker identification method using a network-based word group in the client may further comprise the step of converting the voice signal to a digital signal (Analog to Digital Converte).

본 발명의 또 다른 측면에 따르면, 서버에서 마이크를 사용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 확인에 있어서, 회원 식별자를 클라이언트로부터 수신하고, 상기 회원 식별자에 상응하는 웹 사이트 데이터베이스에 저장된 단어군 식별자를 검색하고, 상기 단어군 식별자에 상응하는 단어군에 속한 복수의 단어들 중 한 개의 단어를 선택하고, 상기 단어 및 상기 단어에 상응하는 단어 식별자를 제1문서에 삽입(Embedding)하여 상기 클라이언트에 송신하고, 상기 클라이언트로부터 상기 단어에 상응하는 음성 신호를 수신하고, 상기 음성 신호를 전처리(Preprocessing)하여 상기 단어에 상응하는 화자 성분을 추출하고, 상기 화자 성분을 입력으로 하여 상기 단어에 상응하는 음성 패턴 데이터를 추출하고, 상기 단어 식별자에 상응하는, 음성 정보 데이터베이스에 저장된 기준 패턴 데이터를 검색하고, 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 지 여부를 판단하고, 상기 판단 결과 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하면, 상기 클라이언트에 허가 통지를 송신하고, 상기 판단 결과 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하지 않으면, 상기 클라이언트에 거부 통지를 송신하는 서버에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 확인 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.According to still another aspect of the present invention, in a network-based speaker identification for authenticating a user connected using a microphone at a server, a member identifier is received from a client and a word is stored in a website database corresponding to the member identifier. Search for a group identifier, select one word among a plurality of words belonging to the word group corresponding to the word group identifier, embed the word and the word identifier corresponding to the word in the first document, and Transmit to a client, receive a voice signal corresponding to the word from the client, preprocess the voice signal to extract the speaker component corresponding to the word, and input the speaker component as the input to correspond to the word Extracts speech pattern data, and corresponds to the word identifier Search for the reference pattern data stored in the information database, determine whether the voice pattern data and the reference pattern data match, and if the voice pattern data and the reference pattern data match, the permission notification to the client If the voice pattern data and the reference pattern data does not match, the speaker identification method using a network-based word group for user authentication at the server for transmitting a rejection notification to the client, Corresponding devices and systems can be provided.

바람직한 일 실시예에서, 상기 서버에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 확인 방법이 상기 클라이언트로부터 접속 요청 신호를 수신하여, 상기 사용자에게 회원 식별자를 입력할 수 있는 폼(Form) 형태의 제2 문서를 상기 클라이언트에 송신하는 단계를 더 포함할 수 있다.In a preferred embodiment, a speaker identification method using a network-based word group for user authentication at the server receives a connection request signal from the client, and forms a member identifier to the user. The method may further include transmitting a second document of the client to the client.

본 발명의 또 다른 측면에 따르면, 서버에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 확인 방법을 수행하는 것에 상응하는 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 확인 방법을 수행하기 위하여 디지털 처리 장치에 의해 실행될 수 있는 명령어들의 프로그램이 유형적으로 구현되어 있으며, 디지털 처리 장치에 의해 판독될 수 있는 기록 매체에 있어서, 상기 클라이언트에서의 단어군을 이용한 네트워크 기반의 화자 확인 방법이, 사용자가 입력 수단에 의해 입력한 접속 요청 신호를 상기 서버에 송신하는 단계, 상기 서버로부터 상기 사용자에게 회원 식별자를 입력할 수 있는 폼(Form) 형태의 제2 문서를 상기 서버로부터 수신하여 표시 장치에 표시하는 단계, 상기 사용자가 상기 입력 수단에 의해 입력한 상기 회원 식별자를 상기 서버에 송신하는 단계, 상기 서버로부터 상기 회원 식별자에 상응하는 단어 및 상기 단어에 상응하는 단어 식별자를 삽입(Embedding)한 제1문서를 수신하는 단계, 상기 제1문서를 표시 장치에 표시하는 단계, 상기 사용자로부터 상기 단어에 상응하는 음성 신호를 수신하는 단계, 상기 음성 신호를 상기 서버에 송신하는 단계, 상기 서버로부터 허가 통지를 수신 받는 경우, 상기 표시 장치에 상기 허가 통지를 표시하는 단계 및 상기 서버로부터 거부 통지를 수신 받는 경우, 상기 표시 장치에 상기 거부 통지를 표시하는 단계를 포함하는 것을 특징으로 하는 클라이언트에서의 사용자 인증을 위한 기록 매체를 제공할 수 있다.According to still another aspect of the present invention, a speaker identification method using a network-based word group in a client corresponding to performing a network-based speaker identification method for authenticating a connected user using a microphone in a server is performed. A program of instructions that can be executed by a digital processing apparatus is tangibly embodied in order to provide a recording medium that can be read by a digital processing apparatus. Transmitting a connection request signal input by a user to the server to the server, receiving a second document in the form of a form from which the user can input a member identifier from the server to the display device; Displaying by the user by the input means. Transmitting the displayed member identifier to the server, receiving a word corresponding to the member identifier and a first document including the word identifier corresponding to the word from the server, and displaying the first document. Displaying on the device, receiving a voice signal corresponding to the word from the user, transmitting the voice signal to the server, and when receiving a permission notification from the server, sending the permission notification to the display device. And displaying the rejection notice on the display device when the rejection notification is received from the server.

본 발명의 또 다른 측면에 따르면, 서버 시스템 중 네트워크 서버 시스템에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 학습에 있어서, 적어도 회원 식별자를 포함하는 소정의 입력 사항을 클라이언트로부터 수신하여 웹 사이트 데이터베이스에 저장하고, 상기 웹 사이트 데이터베이스에 사전 구축된 복수의 단어군-여기서, 각 단어군은 복수의 단어들로 구성됨- 중 한 개의 단어군을 결정하고, 상기 단어군에 상응하는 사전 구축된 단어군 식별자를 상기 입력 사항과 결합하여 상기 웹 사이트 데이터베이스에 저장하고, 상기 단어군에 속한 복수의 단어들, 각각의 단어에 상응하는 사전 구축된 단어 식별자 및 상기 클라이언트에서 사용자가 상기 단어들을 발음한 음성 신호를 전처리(Preprocessing)하여 각각의 단어에 상응하는 화자 성분들을 추출하여 상기 서버에 송신하기 위한 컴포넌트 소프트웨어(component software)를 제1문서에 삽입(Embedding)하여 상기 클라이언트에 송신하고, 상기 클라이언트로부터 회원 식별자, 상기 각각의 단어에 상응하는 단어 식별자 및 상기 각각의 단어에 상응하는 화자 성분을 수신하여 이들을 화자 인식 서버 시스템에 송신하고, 상기 화자 인식 서버 시스템으로부터 학습 완료 통보를 수신하여 상기 클라이언트에 송신하는 서버 시스템 중 네트워크 서버 시스템에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 학습 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.According to another aspect of the present invention, in network-based speaker learning for authenticating a user connected by using a microphone of a network server system among server systems, receiving a predetermined input including at least a member identifier from a client A word group of a plurality of word groups, wherein each word group consists of a plurality of words, and determines a dictionary corresponding to the word group. Combines the constructed word group identifiers with the inputs and stores them in the web site database, the plurality of words belonging to the word group, a pre-built word identifier corresponding to each word, and the user at the client Preprocessing the spoken voice signal to correspond to each word Embedding component software in a first document for extracting speaker components to be transmitted to the server and transmitting the same to the client, and sending a member identifier, a word identifier corresponding to each word from the client, Receiving a speaker component corresponding to each word and transmitting them to a speaker recognition server system, receiving a learning completion notification from the speaker recognition server system for transmitting to the client for user authentication at the network server system A speaker learning method using a network-based word group, an apparatus and a system corresponding to the method can be provided.

바람직한 일 실시예에서, 상기 서버 시스템 중 네트워크 서버 시스템에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 학습 방법이 상기 클라이언트로부터 접속 요청 신호를 수신하여, 상기 사용자에게 소정의 입력 사항을 입력할 수 있는 폼(Form) 형태의 제2문서를 상기 클라이언트에 송신하는 단계 및 상기 소정의 입력 사항을 상기 클라이언트로부터 수신하여, 상기 웹 사이트 데이터베이스에 저장하는 단계를 더 포함할 수 있다.In a preferred embodiment, the speaker learning method using a network-based word group for user authentication in a network server system of the server system receives a connection request signal from the client, to input a predetermined input to the user The method may further include transmitting a second document in a form form to the client, and receiving the predetermined input from the client and storing the input document in the web site database.

바람직한 다른 실시예에서, 상기 소정의 입력 사항은 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소이다. 여기서, 상기 한 개의 단어군을 결정하는 것은 성별, 주소, 전자 메일 주소 중 적어도 하나에 상응하여 결정하는 것이다.In another preferred embodiment, the predetermined input is a member identifier, name, gender, address, social security number, e-mail address. The determining of the one word group corresponds to at least one of a gender, an address, and an e-mail address.

본 발명의 또 다른 측면에 따르면, 서버 시스템 중 화자 인식 시스템에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 학습에 있어서, 클라이언트로부터 직접 또는 네트워크 서버 시스템을 경유하여 입력된 회원 식별자, 각각의 단어에 상응하는 단어 식별자들 및 화자 성분들을 수신하고, 상기 화자 성분들을 학습 과정의 수행을 통하여 각각의 단어에 상응하는 기준 패턴 데이터를 추출하고, 상기 각각의 기준 패턴 데이터에 상응하는 기준 패턴 식별자를 생성한 후, 상기 회원 식별자, 상기 기준 패턴 데이터, 상기 기준 패턴 식별자, 단어 식별자를 음성 정보 데이터베이스에 저장하는 서버 시스템 중 화자 인식 서버 시스템에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 학습 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.According to another aspect of the present invention, in network-based speaker learning for authenticating a user connected using a microphone in a speaker recognition system among server systems, a member identifier input directly from a client or via a network server system Receiving word identifiers and speaker components corresponding to each word, extracting reference pattern data corresponding to each word by performing a learning process on the speaker components, and a reference corresponding to each reference pattern data After generating a pattern identifier, a network-based word group for user authentication in a speaker recognition server system among server systems storing the member identifier, the reference pattern data, the reference pattern identifier, and a word identifier in a voice information database. Speaker learning method, equivalent to the above method It can provide devices and systems.

바람직한 일 실시예에서, 상기 서버 시스템 중 화자 인식 서버 시스템에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 학습 방법이 학습 과정이 수행되지 않은 화자 성분이 존재하는 지 여부를 판단하는 단계, 상기 판단 결과 학습 과정이 수행되지 않은 화자 성분이 존재하는 경우, 상기 기준 패턴 데이터를 추출하는 단계로 이행하는 단계 및 상기 판단 결과 학습 과정이 수행되지 않은 화자 성분이 존재하지 않은 경우, 완료 통보를 상기 클라이언트에 또는 상기 네트워크 서버 시스템에 송신하는 단계를 더 포함할 수 있다.In a preferred embodiment, the speaker learning method using a network-based word group for user authentication in the speaker recognition server system of the server system determines whether there is a speaker component for which the learning process has not been performed, If there is a speaker component for which the learning process has not been performed as a result of the determination, the method proceeds to the step of extracting the reference pattern data; Or transmitting to the network server system.

본 발명의본 발명의 또 다른 측면에 따르면, 서버 시스템 중 네트워크 서버 시스템과 화자 인식 서버 시스템에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 학습 방법을 수행하는 것에 상응하는 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법을 수행하기 위하여 디지털 처리 장치에 의해 실행될 수 있는 명령어들의 프로그램이 유형적으로 구현되어 있으며, 디지털 처리 장치에 의해 판독될 수 있는 기록 매체에 있어서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법이, 상기 네트워크 서버 시스템으로부터 특정의 단어군에 속한 단어들, 각각의 단어에 상응하는 사전 구축된 단어 식별자 및 상기 클라이언트에서 사용자가 상기 단어들을 발음한 음성 신호를 전처리(Preprocessing)하여 각각의 단어에 상응하는 화자 성분들을 추출하여 상기 서버에 송신하기 위한 컴포넌트 소프트웨어(component software)를 삽입(Embedding)한 제1문서를 수신하는 단계, 상기 제1문서를 표시 장치에 표시하는 단계, 상기 사용자로부터 상기 각각의 단어에 상응하는 음성 신호를 수신하는 단계, 상기 각각의 단어에 상응하는 음성 신호로부터 화자 성분을 추출하는 단계 및 상기 회원 식별자, 상기 각각의 단어에 상응하는 단어 식별자들, 화자 성분들을 상기 네트워크 서버 시스템 또는 상기 화자 인식 서버 시스템에 송신하는 단계를 포함하는 것을 특징으로 하는 클라이언트에서의 사용자 인증을 위한 기록 매체를 제공할 수 있다.According to another aspect of the present invention, in a client corresponding to performing a network-based speaker learning method for authenticating a connected user using a microphone in a network server system and a speaker recognition server system of the server system In order to perform the speaker learning method using the network-based word group of the program of instructions that can be executed by the digital processing device is tangibly implemented, and in the recording medium that can be read by the digital processing device, in the client The speaker learning method using the network-based word group of the system includes words belonging to a specific word group from the network server system, a pre-built word identifier corresponding to each word, and a voice of the user pronounced words in the client. Preprocess the signal receiving a first document by embedding component software for extracting speaker components corresponding to each word and transmitting the same to a server; and displaying the first document on a display device. Receiving a speech signal corresponding to each word from the user, extracting a speaker component from the speech signal corresponding to each word and the member identifier, word identifiers corresponding to each word And transmitting the speaker components to the network server system or the speaker recognition server system.

바람직한 일 실시예에서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법이 상기 사용자가 입력 수단에 의해 입력한 접속 요청 신호를 상기 네트워크 서버 시스템에 송신하는 단계, 상기 네트워크 서버 시스템으로부터 상기 사용자가 소정의 입력 사항을 입력할 수 있는 폼(Form) 형태의 제2문서를 수신하여 상기 표시 장치에 표시하는 단계, 상기 사용자로부터 상기 소정의 입력 사항을 입력받아 상기 네트워크 서버 시스템에 송신하는 단계 및 상기 네트워크 서버 시스템 또는 상기 화자 인식 서버 시스템으로부터 화자 학습 완료 통보를 수신하여 상기 표시 장치에 표시하는 단계를 더 포함할 수 있다.In a preferred embodiment, the speaker learning method using a network-based word group in the client transmits a connection request signal input by the user by an input means to the network server system, the user from the network server system Receiving and displaying a second document in a form of a form capable of inputting a predetermined input item on the display device, receiving the predetermined input item from the user, and transmitting it to the network server system; and The method may further include receiving a speaker learning completion notification from the network server system or the speaker recognition server system and displaying the notification on the display device.

바람직한 또 다른 실시예에서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법이 상기 음성 신호를디지털 신호로 변환(Analog to Digital Converte)하는 단계를 더 포함할 수 있다.In another preferred embodiment, the speaker learning method using a network-based word group in the client may further comprise the step of converting the voice signal to a digital signal (Analog to Digital Converte).

본 발명의 또 다른 측면에 따르면, 서버 시스템 중 네트워크 서버 시스템에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 학습에 있어서, 적어도 회원 식별자를 포함하는 소정의 입력 사항을 클라이언트로부터 수신하여 웹 사이트 데이터베이스에 저장고, 상기 웹 사이트 데이터베이스에 사전 구축된 복수의 단어군-여기서, 각 단어군은 복수의 단어들로 구성됨- 중 한 개의 단어군을 결정하고, 상기 단어군에 상응하는 사전 구축된 단어군 식별자를 상기 입력 사항과 결합하여 상기 웹 사이트 데이터베이스에 저장하고, 상기 단어군에 속한 복수의 단어들 및 각각의 단어에 상응하는 사전 구축된 단어 식별자를 제1문서에 삽입(Embedding)하여 상기 클라이언트에 송신하고, 상기 클라이언트로부터 회원 식별자, 상기 각각의 단어에 상응하는 단어 식별자 및 상기 각각의 단어에 상응하는 음성 신호를 수신하여 화자 인식 서버 시스템에 송신하고, 상기 화자 인식 서버 시스템으로부터 학습 완료 통보를 수신하여 상기 클라이언트에 송신하는 서버 시스템 중 네트워크 서버 시스템에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 학습 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.According to another aspect of the present invention, in network-based speaker learning for authenticating a user connected by using a microphone of a network server system among server systems, receiving a predetermined input including at least a member identifier from a client Determine a word group among a plurality of word groups, wherein each word group consists of a plurality of words, and store a dictionary in a web site database and construct a dictionary corresponding to the word group. Combined word group identifiers with the inputs and stored in the web site database, embedding a plurality of words belonging to the word group and a pre-built word identifier corresponding to each word in the first document. Send to the client, and from the client a member identifier, the respective A network server system of the server system which receives a word identifier corresponding to a language and a voice signal corresponding to each word and transmits it to a speaker recognition server system, and receives a learning completion notification from the speaker recognition server system and transmits it to the client. A speaker learning method using a network-based word group for user authentication in, and an apparatus and system corresponding to the method can be provided.

본 발명의 또 다른 측면에 따르면, 서버 시스템 중 화자 인식 시스템에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 학습에 있어서, 클라이언트로부터 직접 또는 네트워크 서버 시스템을 경유하여 입력된 회원 식별자, 각각의 단어에 상응하는 단어 식별자들 및 음성 신호들을 수신하고, 상기 음성 신호들을 전처리(Preprocessing)하여 각각의 단어에 상응하는 화자 성분들을 추출하고, 상기 화자 성분들을 학습 과정의 수행을 통하여 각각의 단어에 상응하는 기준 패턴 데이터를 추출하고, 상기 각각의 기준 패턴 데이터에 상응하는 기준 패턴 식별자를 생성한 후, 상기 회원 식별자, 상기 기준 패턴 데이터, 상기 기준 패턴 식별자, 단어 식별자를 음성 정보 데이터베이스에 저장하는 서버 시스템 중 화자 인식 서버 시스템에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 학습 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.According to another aspect of the present invention, in network-based speaker learning for authenticating a user connected using a microphone in a speaker recognition system among server systems, a member identifier input directly from a client or via a network server system Receiving word identifiers and voice signals corresponding to each word, preprocessing the voice signals to extract speaker components corresponding to each word, and performing the learning process on the speaker components. After extracting reference pattern data corresponding to a word, generating a reference pattern identifier corresponding to each reference pattern data, and storing the member identifier, the reference pattern data, the reference pattern identifier, and a word identifier in a voice information database. Server system of the speaker recognition server system The speaker learning method using the network-based group of words for user authentication, it is possible to provide an apparatus and a system corresponding to the method.

바람직한 일 실시예에서, 상기 서버 시스템 중 화자 인식 서버 시스템에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 학습 방법이 학습 과정이 수행되지 않은 화자 성분이 존재하는 지 여부를 판단하는 단계, 상기 판단 결과 학습 과정이 수행되지 않은 화자 성분이 존재하는 경우, 상기 기준 패턴 데이터를 추출하는 단계로 이행하는 단계 및 상기 판단 결과 학습 과정이 수행되지 않은 화자 성분이 존재하지 않은 경우, 완료 통보를 상기 클라이언트에 또는 상기 네트워크 서버 시스템에 송신하는 단계를 더 포함할 수 있다.In a preferred embodiment, the speaker learning method using a network-based word group for user authentication in the speaker recognition server system of the server system determines whether there is a speaker component for which the learning process has not been performed, If there is a speaker component for which the learning process has not been performed as a result of the determination, the method proceeds to the step of extracting the reference pattern data; and if there is no speaker component for which the learning process has not been performed, the client is notified of completion. Or transmitting to the network server system.

본 발명의 또 다른 측면에 따르면, 서버 시스템 중 네트워크 서버 시스템과 화자 인식 서버 시스템에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 학습 방법을 수행하는 것에 상응하는 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법을 수행하기 위하여 디지털 처리 장치에 의해 실행될 수 있는 명령어들의 프로그램이 유형적으로 구현되어 있으며, 디지털 처리 장치에 의해 판독될 수 있는 기록 매체에 있어서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 학습 방법이, 상기 네트워크 서버 시스템으로부터 특정의 단어군에 속한 단어들 및 각각의 단어에 상응하는 사전 구축된 단어 식별자를 삽입(Embedding)한 제1문서를 수신하는 단계, 상기 제1문서를 표시 장치에 표시하는 단계, 사용자로부터 상기 각각의 단어에 상응하는 음성 신호를 수신하는 단계 및 상기 회원 식별자, 상기 각각의 단어에 상응하는 단어 식별자들, 음성 신호들을 상기 네트워크 서버 시스템 또는 상기 화자 인식 서버 시스템에 송신하는 단계를 포함하는 것을 특징으로 하는 클라이언트에서의 사용자 인증을 위한 기록 매체를 제공할 수 있다.According to another aspect of the present invention, the network-based at the client corresponding to performing a network-based speaker learning method for authenticating the connected user using a microphone in the network server system and the speaker recognition server system of the server system A program of instructions that can be executed by a digital processing apparatus is tangibly embodied to perform a speaker learning method using the word group of. In a recording medium that can be read by a digital processing apparatus, the network-based method of the client is performed. A speaker learning method using a word group of a, receiving a first document in which the words belonging to a specific word group and a pre-built word identifier corresponding to each word from the network server system (Embedding), Displaying the first document on a display device; Receiving a voice signal corresponding to each word from a user and transmitting the member identifier, word identifiers corresponding to each word, voice signals to the network server system or the speaker recognition server system; A recording medium for user authentication in a client can be provided.

본 발명의 또 다른 측면에 따르면, 서버 시스템 중 네트워크 서버 시스템에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 확인에 있어서, 회원 식별자를 클라이언트로부터 수신하고, 상기 회원 식별자에 상응하는 웹 사이트 데이터베이스에 저장된 단어군 식별자를 검색하고, 상기 단어군에 속한 복수의 단어들 중 한 개의 단어를 선택하고, 상기 단어, 상기 단어에 상응하는 단어 식별자 및 상기 클라이언트에서 사용자가 상기 단어를 발음한 음성 신호를 전처리(Preprocessing)하여 상기 단어에 상응하는 화자 성분을 추출하여 상기 네트워크 서버 시스템에 송신하기 위한 컴포넌트 소프트웨어(component software)를 제1문서에 삽입(Embedding)하여 상기 클라이언트에 송신하고, 상기 클라이언트로부터 화자 성분을 수신하여, 상기 화자 성분과 상기 회원 식별자 및 상기 단어 식별자를 화자 인식 서버 시스템에 송신하고, 상기 화자 인식 서버 시스템으로부터 허가 통지를 수신하여 이를 상기 클라이언트에 송신하고, 상기 화자 인식 서버 시스템으로부터 거부 통지를 수신하여 이를 상기 클라이언트에 송신하는 서버 시스템 중 네트워크 서버 시스템에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 확인 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.According to still another aspect of the present invention, in a network-based speaker confirmation for authenticating a user connected using a microphone in a network server system among server systems, a member identifier is received from a client and corresponds to the member identifier. Search for a word group identifier stored in a web site database, select a word among a plurality of words belonging to the word group, and pronounce the word by the user at the word, the word identifier corresponding to the word, and the client Embedding component software into a first document for preprocessing a voice signal, extracting a speaker component corresponding to the word, and transmitting it to the network server system, and transmitting the same to the client; Receiving the speaker component from the Sends a child component, the member identifier and the word identifier to a speaker recognition server system, receives a permission notification from the speaker recognition server system and sends it to the client, receives a rejection notification from the speaker recognition server system, and A speaker identification method using a network-based word group for user authentication in a network server system among a server system transmitted to a client, and an apparatus and system corresponding to the method may be provided.

바람직한 일 실시예에서, 상기 서버 시스템 중 네트워크 서버 시스템에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 확인 방법이 상기 클라이언트로부터 접속 요청 신호를 수신하여, 상기 사용자에게 회원 식별자를 입력할 수 있는 폼(Form) 형태의 제2문서를 상기 클라이언트에 송신하는 단계를 더 포함할 수 있다.In a preferred embodiment, a speaker identification method using a network-based word group for user authentication in a network server system of the server system may receive a connection request signal from the client to input a member identifier to the user The method may further include transmitting a second document in a form form to the client.

본 발명의 또 다른 실시예에서, 서버 시스템 중 화자 인식 시스템에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 확인에 있어서, 클라이언트로부터 직접 또는 네트워크 서버 시스템을 경유하여 입력된 회원 식별자, 단어 식별자 및 화자 성분을 수신하고, 상기 화자 성분을 입력으로 하여 상기 단어 식별자에 상응하는 음성 패턴 데이터를 추출하고, 상기 단어 식별자에 상응하는 음성 정보 데이터베이스에 저장된 기준 패턴 데이터를 검색하고, 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 지 여부를 판단하고, 상기 판단 결과 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 경우, 상기 클라이언트 또는 상기 네트워크 서버 시스템에 허가 통지를 송신하고, 상기 판단 결과 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하지 않는 경우, 상기 클라이언트 또는 상기 네트워크 서버 시스템에 거부 통지를 송신하는 서버 시스템 중 화자 인식 서버 시스템에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 확인 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.In another embodiment of the present invention, in a network-based speaker identification for authenticating a user connected using a microphone in a speaker recognition system of a server system, a member identifier input directly from a client or via a network server system Receive the word identifier and the speaker component, extract the voice pattern data corresponding to the word identifier by inputting the speaker component, retrieve the reference pattern data stored in the voice information database corresponding to the word identifier, It is determined whether the pattern data and the reference pattern data match. If the voice pattern data and the reference pattern data match, a permission notification is sent to the client or the network server system. Having said voice pattern Speaker and a method of identifying a speaker using a network-based word group for user authentication in a speaker recognition server system among a server system transmitting a rejection notification to the client or the network server system when the reference pattern data does not match. It is possible to provide an apparatus and system corresponding to the method.

본 발명의 또 다른 측면에 따르면, 서버 시스템 중 네트워크 서버 시스템과 화자 인식 서버 시스템에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 확인 방법을 수행하는 것에 상응하는 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 확인 방법을 수행하기 위하여 디지털 처리 장치에 의해 실행될 수 있는 명령어들의 프로그램이 유형적으로 구현되어 있으며, 디지털 처리 장치에 의해 판독될 수 있는 기록 매체에 있어서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 확인 방법이, 사용자가 입력 수단에 의해 입력한 접속 요청 신호를 상기 네트워크 서버 시스템에 송신하는 단계, 상기 네트워크 서버 시스템으로부터 상기 사용자에게 회원 식별자를 입력할 수 있는 폼(Form) 형태의 제2문서를 수신하여 표시 장치에 표시하는 단계, 상기 사용자가 입력 수단에 의해 입력한 회원 식별자를 상기 네트워크 서버 시스템에 송신하는 단계, 상기 네트워크 서버 시스템으로부터 상기 회원 식별자에 상응하는 단어, 상기 단어에 상응하는 단어 식별자 및 상기 사용자가 상기 단어를 발음한 음성 신호를 전처리(Preprocessing)하여 상기 단어에 상응하는 화자 성분을 추출하여 상기 네트워크 서버 시스템 또는 상기 화자 인식 서버 시스템에 송신하기 위한 컴포넌트 소프트웨어(component software)를 삽입(Embedding)한 제1문서를 수신하는 단계, 상기 제1문서를 상기 표시 장치에 표시하는 단계, 상기 사용자로부터 상기 단어에 상응하는 음성 신호를 수신하는 단계, 상기 단어에 상응하는 음성 신호로부터 화자 성분을 추출하는 단계, 상기 화자 성분을 상기 네트워크 서버 시스템 또는 상기 화자 인식 서버 시스템에 송신하는 단계, 상기 네트워크 서버 시스템 또는 상기 화자 인식 서버 시스템으로부터 허가 통지를 수신 받는 경우, 상기 표시 장치에 상기 허가 통지를 표시하는 단계 및 상기 네트워크 서버 시스템 또는 상기 화자 인식 서버 시스템으로부터 거부 통지를 수신 받는 경우, 상기 표시 장치에 상기 거부 통지를 표시하는 단계를 포함한다.According to another aspect of the present invention, the network-based at the client corresponding to performing the network-based speaker identification method for authenticating the connected user by using the microphone of the network server system and the speaker recognition server system of the server system A program of instructions that can be executed by a digital processing apparatus is tangibly embodied to perform a speaker identification method using the word group of. A recording medium that can be read by a digital processing apparatus, the network-based in the client A speaker identification method using a group of words, the method comprising: transmitting a connection request signal input by an input means by an input means to the network server system, and a member identifier can be input from the network server system to the user. Form of the second document Sending a member identifier input by the user to the network server system to the network server system; a word corresponding to the member identifier from the network server system; and a word identifier corresponding to the word. And inserting component software for preprocessing a voice signal in which the user pronounces the word, extracting a speaker component corresponding to the word, and transmitting the extracted speaker component to the network server system or the speaker recognition server system. Receiving a first document embedding, displaying the first document on the display device, receiving a voice signal corresponding to the word from the user, and extracting a speaker component from the voice signal corresponding to the word. Extracting the speaker component from the network; Transmitting to the system or the speaker recognition server system; when receiving the permission notification from the network server system or the speaker recognition server system, displaying the permission notification on the display device; and the network server system or the speaker recognition. If receiving a rejection notification from a server system, displaying the rejection notification on the display device.

본 발명의 또 다른 측면에 따르면, 서버 시스템 중 네트워크 서버 시스템에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 확인에 있어서, 회원 식별자를 클라이언트로부터 수신하고, 상기 회원 식별자에 상응하는 웹 사이트 데이터베이스에 저장된 단어군 식별자를 검색하고, 상기 단어군 식별자에 상응하는 단어군에 속한 복수의 단어들 중 한 개의 단어를 선택하고, 상기 단어 및 상기 단어에 상응하는 단어 식별자를 제1문서에 삽입(Embedding)하여 상기 클라이언트에 송신하고, 상기 클라이언트로부터 음성 신호를 수신하여, 상기 음성 신호와 상기 회원 식별자 및 상기 단어 식별자를 화자 인식 서버 시스템에 송신하고, 상기 화자 인식 서버 시스템으로부터 허가 통지를 수신하여 이를 상기 클라이언트에 송신하고, 상기 화자 인식 서버 시스템으로부터 거부 통지를 수신하여 이를 상기 클라이언트에 송신하는 서버 시스템 중 네트워크 서버 시스템에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 확인 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.According to still another aspect of the present invention, in a network-based speaker confirmation for authenticating a user connected using a microphone in a network server system among server systems, a member identifier is received from a client and corresponds to the member identifier. Search for a word group identifier stored in a web site database, select one word among a plurality of words belonging to the word group corresponding to the word group identifier, and convert the word and the word identifier corresponding to the word into the first document. Embed and transmit to the client, receive a voice signal from the client, transmit the voice signal, the member identifier and the word identifier to a speaker recognition server system, and receive a permission notification from the speaker recognition server system. Send it to the client, To provide a speaker identification method using a network-based word group for user authentication in a network server system among the server systems receiving a rejection notification from a speaker recognition server system and transmitting the same to the client, an apparatus and a system corresponding thereto. Can be.

본 발명의 또 다른 측면에 따르면, 서버 시스템 중 화자 인식 시스템에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 확인에 있어서, 클라이언트로부터 직접 또는 네트워크 서버 시스템을 경유하여 입력된 회원 식별자, 단어 식별자 및 음성 신호를 수신하고, 상기 음성 신호를 전처리(Preprocessing)하여 상기 단어에 상응하는 화자 성분을 추출하고, 상기 화자 성분을 입력으로 하여 상기 단어에 상응하는 음성 패턴 데이터를 추출하고, 상기 단어 식별자에 상응하는 음성 정보 데이터베이스에 저장된 기준 패턴 데이터를 검색하고, 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 지 여부를 판단하고, 상기 판단 결과 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하면, 상기 클라이언트 또는 상기 네트워크 서버 시스템에 허가 통지를 송신하고, 상기 판단 결과 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하지 않으면, 상기 클라이언트 또는 상기 네트워크 서버 시스템에 거부 통지를 송신하는 서버 시스템 중 화자 인식 서버 시스템에서의 사용자 인증을 위한 네트워크 기반의 단어군을 이용한 화자 확인 방법, 상기 방법에 상응하는 장치 및 시스템을 제공할 수 있다.According to another aspect of the present invention, a member identifier input from a client or via a network server system in network-based speaker identification for authenticating a user connected using a microphone in a speaker recognition system among server systems Receiving a word identifier and a speech signal, preprocessing the speech signal to extract a speaker component corresponding to the word, extracting the speech pattern data corresponding to the word by inputting the speaker component, Search for reference pattern data stored in a voice information database corresponding to a word identifier, determine whether the voice pattern data and the reference pattern data match, and if the voice pattern data and the reference pattern data match, , The client or the net A user at the speaker recognition server system of the server system which transmits a permission notification to the network server system and transmits a rejection notification to the client or the network server system if the voice pattern data and the reference pattern data do not match. A speaker identification method using a network-based word group for authentication, and an apparatus and system corresponding to the method can be provided.

본 발명의 또 다른 측면에 따르면, 서버 시스템 중 네트워크 서버 시스템과 화자 인식 서버 시스템에서의 마이크를 이용하여 접속한 사용자를 인증하기 위한 네트워크 기반의 화자 확인 방법을 수행하는 것에 상응하는 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 확인 방법을 수행하기 위하여 디지털 처리 장치에 의해 실행될 수 있는 명령어들의 프로그램이 유형적으로 구현되어 있으며, 디지털 처리 장치에 의해 판독될 수 있는 기록 매체에 있어서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 확인 방법이, 사용자가 입력 수단에 의해 입력한 접속 요청 신호를 상기 네트워크 서버 시스템에 송신하는 단계, 상기 네트워크 서버 시스템으로부터 상기 사용자에게 회원 식별자를 입력할 수 있는 폼(Form) 형태의 제2문서를 수신하여 표시 장치에 표시하는 단계, 상기 사용자가 상기 입력 수단에 의해 입력한 회원 식별자를 상기 네트워크 서버 시스템에 송신하는 단계, 상기 네트워크 서버 시스템으로부터 상기 회원 식별자에 상응하는 단어 및 상기 단어에 상응하는 단어 식별자를 삽입(Embedding)한 제1문서를 수신하는 단계, 상기 제1문서를 상기 표시 장치에 표시하는 단계, 상기 사용자로부터 상기 단어에 상응하는 음성 신호를 수신하는 단계, 상기 음성 신호를 상기 네트워크 서버 시스템 또는 상기 화자 인식 서버 시스템에 송신하는 단계, 상기 네트워크 서버 시스템 또는 상기 화자 인식 서버 시스템으로부터 허가 통지를 수신 받는 경우, 상기 표시 장치에 상기 허가 통지를 표시하는 단계 및 상기 네트워크 서버 시스템 또는 상기 화자 인식 서버 시스템으로부터 거부 통지를 수신 받는 경우, 상기 표시 장치에 상기 거부 통지를 표시하는 단계를 포함한다.According to another aspect of the present invention, the network-based at the client corresponding to performing the network-based speaker identification method for authenticating the connected user by using the microphone of the network server system and the speaker recognition server system of the server system A program of instructions that can be executed by a digital processing apparatus is tangibly embodied to perform a speaker identification method using the word group of. A recording medium that can be read by a digital processing apparatus, the network-based in the client A speaker identification method using a group of words, the method comprising: transmitting a connection request signal input by an input means by an input means to the network server system, and a member identifier can be input from the network server system to the user. Form of the second document Sending a member identifier input by the user by the input means to the network server system; a word corresponding to the member identifier and a word corresponding to the word from the network server system Receiving a first document embedded with an identifier, displaying the first document on the display device, receiving a voice signal corresponding to the word from the user, and transmitting the voice signal to the network server. Transmitting to the system or the speaker recognition server system; when receiving the permission notification from the network server system or the speaker recognition server system, displaying the permission notification on the display device; and the network server system or the speaker recognition. Rejection notice from server system And receiving the reject notification on the display device.

바람직한 일 실시예에서, 상기 클라이언트에서의 네트워크 기반의 단어군을 이용한 화자 확인 방법이 상기 음성 신호를 디지털 신호로 변환(Analog to Digital Converte)하는 단계를 더 포함한다.본 발명의 또 다른 측면에 따르면, 전자 상거래에서 고객이 적어도 하나의 주문 또는 청약을 하여 계약이 성립된 후, 상기 고객이 계약 성립 또는 주문한 사실을 부지 또는 부인하는 것을 방지하기 위한 음성 정보 복원 방법에 있어서, 미리 저장된 단어군에서 각각의 주문 또는 청약에 상응하는 단어 또는 문장을 클라이언트에 전송하는 단계, 상기 클라이언트로부터 상기 단어 또는 문장에 상응하는 구매 주문 음성 신호를 수신하는 단계, 상기 구매 주문 음성 신호에 상응하는 음성 패턴 데이터를 음성 정보 데이터베이스에 저장하는 단계를 포함하는 것을 특징으로 하는 단어군을 이용한 음성 정보 복원 방법을 제공할 수 있다.바람직한 일 실시예에서, 상기 단어군을 이용한 음성 정보 복원 방법은 상기 고객이 계약 성립 또는 주문한 사실을 부지 또는 부인하는 경우, 상기 음성 패턴 데이터를 상기 구매 주문 음성 신호로 복원하는 단계를 더 포함할 수 있다.바람직한 다른 실시예에서, 상기 단어군을 이용한 음성 정보 복원 방법은, 클라이언트로부터 상기 구매 주문 음성 신호를 수신하는 단계, 상기 구매 주문 음성 신호를 전처리(Preprocessing)하여 상기 구매 주문 음성 신호에 상응하는 화자 성분을 추출하는 단계 및 상기 화자 성분으로부터 상기 구매 주문 음성 신호에 상응하는 음성 패턴 데이터를 추출하는 단계를 더 포함할 수 있다.바람직한 또 다른 실시예에서, 상기 단어군을 이용한 음성 정보 복원 방법은, 클라이언트로부터 상기 구매 주문 음성 신호를 전처리(Preprocessing)한 화자 성분을 수신하는 단계 및 상기 화자 성분으로부터 상기 구매 주문 음성 신호에 상응하는 상기 음성 패턴 데이터를 추출하는 단계를 더 포함할 수 있다.바람직한 또 다른 실시예에서, 상기 단어군을 이용한 음성 정보 복원 방법은, 클라이언트로부터 상기 구매 주문 음성 신호를 수신하는 단계, 상기 구매 주문 음성 신호를 전처리(Preprocessing)하여 상기 구매 주문 음성 신호에 상응하는 화자 성분을 추출하는 단계, 상기 화자 성분으로부터 상기 구매 주문 음성 신호에 상응하는 음성 패턴 데이터를 추출하는 단계 및 상기 복원된 구매 주문 음성 신호를 상기 고갱의 출력 장치에 출력하는 단계를 더 포함할 수 있다.바람직한 또 다른 실시예에서, 상기 단어군을 이용한 음성 정보 복원 방법은, 클라이언트로부터 상기 구매 주문 음성 신호를 전처리(Preprocessing)한 화자 성분을 수신하는 단계, 상기 화자 성분으로부터 상기 구매 주문 음성 신호에 상응하는 상기 음성 패턴 데이터를 추출하는 단계 및 상기 복원된 구매 주문 음성 신호를 상기 고객의 출력 장치에 출력하는 단계를 더 포함할 수 있다.In a preferred embodiment, the method of speaker identification using a network-based word group in the client further comprises the step of converting the voice signal to a digital signal (Analog to Digital Converte). According to another aspect of the present invention In the e-commerce, after the customer makes at least one order or subscription, and the contract is established, the voice information restoration method for preventing the customer from making or denying the fact of the contract establishment or ordering, each of the prestored word groups Transmitting a word or sentence corresponding to the order or subscription of the client to the client, receiving a purchase order voice signal corresponding to the word or sentence from the client, and receiving voice pattern data corresponding to the purchase order voice signal. And storing the data in a database. May provide a method for restoring voice information using a word group. In a preferred embodiment, the method for restoring voice information using the word group may include the voice pattern data when the customer establishes or denies a fact of a contract or an order. The method may further include reconstructing the purchase order voice signal. In another preferred embodiment, the voice information reconstruction method using the word group may include: receiving the purchase order voice signal from a client; The method may further include extracting a speaker component corresponding to the purchase order voice signal by preprocessing a signal, and extracting voice pattern data corresponding to the purchase order voice signal from the speaker component. In another embodiment, the method of restoring voice information using the word group may be large. The method may further include receiving a speaker component preprocessing the purchase order voice signal from an client and extracting the voice pattern data corresponding to the purchase order voice signal from the speaker component. In an embodiment, the voice information restoration method using the word group may include receiving the purchase order voice signal from a client, preprocessing the purchase order voice signal, and extracting a speaker component corresponding to the purchase order voice signal. The method may further include extracting voice pattern data corresponding to the purchase order voice signal from the speaker component and outputting the restored purchase order voice signal to the Gauguin output device. In an embodiment, the voice information restoration room using the word group Receiving a speaker component preprocessing the purchase order voice signal from a client, extracting the voice pattern data corresponding to the purchase order voice signal from the speaker component and the restored purchase order voice signal The method may further include outputting to the output device of the customer.

이어서, 첨부한 도면들을 참조하여 본 발명의 바람직한 실시예를 상세히 설명하기로 한다.Next, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명이 적용될 수 있는 장치의 개략적인 구성도를 나타낸 것이다.1 shows a schematic configuration diagram of a device to which the present invention can be applied.

도 1을 참조하면, 본 발명에 바람직한 일실시예에 따른 네트워크 기반의 화자 학습 및 화자 확인 장치는 개략적으로 사용자로부터 입력된 음성 신호나 음성 특징 요소인 화자 성분을 추출하는 클라이언트(103), 일반적인 네트워크 서비스를 수행할 뿐만 아니라 화자 확인 작업, 화자 인식 작업 및 관리 조정을 하는 서버(101), 회원의 신상 정보 및 일반적인 사이트 정보를 저장하는 웹 사이트 데이터베이스(105) 및 회원의 음성 정보를 저장하는 음성 데이터베이스(107)로 구성된다.Referring to FIG. 1, a network-based speaker learning and speaker identification apparatus according to an exemplary embodiment of the present invention may include a client 103 which extracts a speaker component which is a speech signal or a voice feature input from a user, and a general network. A server 101 that not only performs services, but also performs speaker identification, speaker recognition and management coordination, a web site database 105 that stores personal information and general site information, and a voice database that stores voice information of members. 107.

도 2a는 본 발명이 적용될 수 있는 장치의 다른 개략적인 구성도를 나타낸것이다.Figure 2a shows another schematic diagram of a device to which the present invention can be applied.

도 2b는 본 발명이 적용될 수 있는 장치의 또 다른 개략적인 구성도를 나타낸 것이다.2b shows another schematic configuration diagram of a device to which the present invention can be applied.

도2a 및 도2b를 참조하면, 본 발명의 다른 바람직한 실시예에 따른, 네트워크 기반의 화자 학습 및 화자 확인 장치는 개략적으로 사용자로부터 입력된 음성 신호나 음성 특징 요소인 화자 성분을 추출하는 클라이언트(209a, 209b), 일반적인 네트워크 서비스를 수행하는 네트워크서버(201a, 201b), 화자 확인 작업, 화자 인식 작업 및 기타 관리 조정하는 화자 인식 서버(203a, 203b), 회원의 신상 정보 및 일반적인 사이트 정보를 저장하는 웹 사이트 데이터베이스(205a, 205b) 및 회원의 음성 정보를 저장하는 음성 데이터베이스(207a, 207b)로 구성된다.2A and 2B, a network-based speaker learning and speaker identification apparatus according to another preferred embodiment of the present invention is a client 209a which extracts a speaker component which is a speech signal or a voice feature element that is schematically input from a user. 209b), network servers 201a and 201b that perform general network services, speaker identification tasks, speaker recognition tasks, and other management coordinating speaker identification servers 203a and 203b, storing member's personal information and general site information. Web site databases 205a and 205b and voice databases 207a and 207b that store voice information of members.

도1에 예시된 장치와 도2a 및 도2b에서 예시된 장치를 비교하면, 도1에서의 서버(101)가 도2a 및 도2b에서의 네트워크 서버(201a, 201b)와 화자 인식 서버(203a, 203b)로 분리되었다는 점이 상이하다. 본 발명은 하드웨어나 소프트웨어에 의하여 도1에서와 같이 통합적으로 수행될 수 있고, 도2에서와 같이 분리하여 수행되어질 수도 있다. 따라서, 앞으로는 도1에서 예시된 바와 같은 장치 및 도2b에 예시된 바와 같은 장치에 의하여 본 발명의 실시예를 기술할 것이다.Comparing the apparatus illustrated in FIG. 1 with the apparatus illustrated in FIGS. 2A and 2B, the server 101 in FIG. 1 is connected to the network servers 201a and 201b and the speaker recognition server 203a in FIGS. 2A and 2B. Different from 203b). The present invention may be performed integrally as shown in FIG. 1 by hardware or software, or may be separately performed as shown in FIG. Accordingly, embodiments of the present invention will be described in the future by the apparatus as illustrated in FIG. 1 and the apparatus as illustrated in FIG. 2B.

먼저 도1에서 예시된 바와 같은 장치에 의하여 본 발명의 실시예를 기술하면 아래와 같다.First, an embodiment of the present invention will be described by the apparatus as illustrated in FIG. 1.

도 3은 본 발명의 바람직한 일 실시예에 따른 화자 학습 방법을 나타낸 순서도이다.3 is a flowchart illustrating a speaker learning method according to an exemplary embodiment of the present invention.

사용자가 입력 장치를 통하여 클라이언트(103)로부터 접속 요청 신호를 송신(단계 301)하면, 서버(101)에서는 상기 접속 요청 신호를 수신(단계 303)하고, 상기 클라이언트(103)에게로 제1 HTML 문서를 송신(단계 305)한다.When the user transmits a connection request signal from the client 103 via an input device (step 301), the server 101 receives the connection request signal (step 303) and sends a first HTML document to the client 103. Is transmitted (step 305).

여기서, 제1 HTML 문서는 일반적인 네트워크 서비스를 수행하기 위한 HTML 문서에 소정의 입력 사항, 예를 들어, 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소 등을 사용자가 입력할 수 있는 폼(Form) 형태의 HTML 문서를 삽입(Embedding)한 것이다.Here, the first HTML document may be a user input a predetermined input, for example, a member identifier, name, gender, address, social security number, e-mail address, etc. in the HTML document for performing a general network service Embedding an HTML document in the form of a form.

상기 클라이언트(103)에서는 상기 제1 HTML 문서를 수신하여 표시 장치에 표시(단계 307)한다. 그 후, 사용자로부터 입력 장치를 통하여 상기 소정의 입력 사항, 예를 들어, 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소 등을 입력받아 이를 상기 서버(101)에 송신(단계 309)한다.The client 103 receives the first HTML document and displays it on the display device (step 307). Thereafter, the user inputs the predetermined input item, for example, a member identifier, a name, a gender, an address, a social security number, an e-mail address, and the like through the input device, and transmits it to the server 101 (step 309). )do.

상기 서버(101)는 상기 소정의 입력 사항을 수신하여 이를 웹 사이트 데이터베이스(105)에 저장(단계 311)한다. 본 발명의 바람직한 일 실시예에 따르면, 상기 웹 사이트 데이터베이스(105)에는 적어도, 회원 테이블, 단어군 테이블 및 단어 테이블이 있고, 상기 회원 테이블에는 회원 식별자 필드, 성명 필드, 주민 등록 번호 필드, 성별 필드, 주소 필드, 전자 메일 주소 필드 및 단어군 식별자 필드가 있다(도시되지 않음). 상기 단어군 테이블에는 단어군 식별자 필드와 복수의 단어 식별자 필드들이 있다. 그리고, 상기 단어 테이블에는 단어 식별자 필드 및 단어 필드가 있다.The server 101 receives the predetermined input and stores it in the web site database 105 (step 311). According to a preferred embodiment of the present invention, the web site database 105 includes at least a member table, a word group table and a word table, wherein the member table includes a member identifier field, a name field, a social security number field and a gender field. There is an address field, an e-mail address field and a word family identifier field (not shown). The word group table includes a word group identifier field and a plurality of word identifier fields. The word table includes a word identifier field and a word field.

화자가 발성할 문장이 고정되어 있는 화자 인식 시스템을 문장 종속형(Textdependent)이라 하고, 문장이 정해져 있지 않고 자유롭게 발음하는 경우를 문장 독립형(Text independent)이라 한다. 문자 종속형은 사전에 녹취한 음성 정보를 이용하여 화자 인식 시스템을 손쉽게 속일 수 있다는 단점이 있다. 이러한 단점을 보완하기 위해 본 발명의 네트워크 기반 화자 학습 및 화자 학인 방법 및 장치에서는 문자 지시형(Text prompted) 화자 인식 방식에 의하였다.The speaker recognition system in which the speaker has a fixed sentence to be spoken is called text dependent, and the case where the sentence is undefined and pronounced freely is called text independent. Character dependent type has a disadvantage of easily deceiving a speaker recognition system using previously recorded voice information. In order to make up for this drawback, the method and apparatus for network-based speaker learning and speaker science of the present invention are based on a text prompted speaker recognition method.

그 후, 상기 서버(101)는 상기 소정의 입력 사항, 예를 들어 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소 등을 이용하여 사전에 구축된 복수의 단어군 테이블에서 학습할 단어군을 결정(단계 313)한다. 사전 구축된 복수의 단어군 각각은 복수의 단어로 구성되어 있다. 상기 결정 방법은 예를 들어, 사용자의 주소가 서울이라면 사전 구축된 복수의 단어군 중 서울과 관계 있는 단어가 속한 단어군이 결정된다. 따라서, 학습할 단어군을 결정하는 것은 적어도 상기 소정의 입력 사항, 예를 들어 성별, 주소, 전자 메일 주소 중 어느 하나에 상응하는 것이 바람직하다.Thereafter, the server 101 can learn from a plurality of word group tables that have been previously constructed using the predetermined input, for example, a member identifier, name, gender, address, social security number, e-mail address, and the like. The word group is determined (step 313). Each of the plurality of pre-built word groups is composed of a plurality of words. In the determination method, for example, if a user's address is Seoul, a word group to which a word related to Seoul belongs is determined among a plurality of pre-built word groups. Thus, determining the word group to learn preferably corresponds to at least one of the predetermined inputs, eg, gender, address, e-mail address.

그리고, 상기 서버(101)는 제2 HTML 문서를 상기 클라이언트(103)에게 송신(단계 315)한다. 상기 제2 HTML 문서는 상기 소정의 입력 사항에 의해 결정된 학습할 단어군에 속한 단어들과 이에 상응하는 사전 구축된 단어 식별자 및 상기 클라이언트(103)에서 사용자가 상기 단어들을 발음한 음성 신호를 전처리(Preprocessing)하여 각각의 단어에 상응하는 화자 성분들을 추출하여 서버에 송신하기 위한 컴포넌트 소프트웨어(component software)를 삽입(Embedding)한 것이다.The server 101 then transmits a second HTML document to the client 103 (step 315). The second HTML document may preprocess the words belonging to the word group to be learned determined by the predetermined input, a pre-built word identifier corresponding thereto, and a voice signal in which the user pronounces the words in the client 103 ( By embedding component software for preprocessing to extract speaker components corresponding to each word and transmit them to the server.

상기 클라이언트(103)는 상기 제2 HTML 문서를 수신하여 표시 장치에 표시(317)하고, 본 발명의 바람직한 일실시예에 따라서, 사용자로부터 마이크를 통하여 상기 각 단어에 상응하는 음성 신호를 수신(319)한다. 사용자는 각 단어별로 4회 이상 발음하는 것이 화자 학습에 바람직하다. 그 후, 상기 클라이언트(103)는 상기 음성 신호를 A/D 변환(도시되지 않음)한 후, 이를 입력으로한 전처리(Preprocessing)과정을 행한다.The client 103 receives the second HTML document and displays it on the display device (317), and according to a preferred embodiment of the present invention, receives a voice signal corresponding to each word from the user through a microphone (319) )do. It is desirable for the user to speak four or more times for each speaker. Thereafter, the client 103 performs an A / D conversion (not shown) of the voice signal and then performs a preprocessing process using the same as an input.

도 4는 본 발명의 바람직한 일 실시예에 따른 전처리(Preprocessing) 과정을 나타낸 개략적인 순서도이다.4 is a schematic flowchart illustrating a preprocessing process according to an exemplary embodiment of the present invention.

전처리 과정에서는 어떤 음성 특징 요소(Feature parameter) 즉, 화자 성분을 선택해야 하는 가가 문제이다. 음성 신호에는 여러 가지 정보와 잡음이 섞여 있다. 이 가운데는 발음한 화자의 신원에 관한 정보도 포함되어 있고, 그 외의 정보도 있다. 화자 확인을 위한 관점에서 바라볼 때, 화자의 신원에 관한 정보 이외에 다른 모든 정보는 일종의 잡음이다. 이러한 잡음은 인식률 저하의 원인이 된다. 그러므로, 화자 신원에 관한 정보만 포함하고 그 이외의 모든 정보는 억압한 화자 성분을 전처리 과정에서 추출해야 한다.In the preprocessing, it is a question of which feature feature, that is, the speaker component, is selected. Voice signals contain a mixture of information and noise. This includes information about the speaker's identity, as well as other information. From the point of view of speaker identification, all other information besides the speaker's identity is a kind of noise. This noise causes a decrease in recognition rate. Therefore, all information other than the speaker's identity should be extracted during the preprocessing.

도 4를 참조하면, 본 발명의 바람직한 일실시예에 따른 전처리 과정은 직류 성분을 제거(단계 401)하는 단계, 에너지 정규화 단계(단계 403), 묵음 구간 제거 단계(단계 405) 및 특징 요소 즉, 화자 성분을 추출하는 단계(단계 407)를 포함한다.Referring to FIG. 4, a pretreatment process according to an exemplary embodiment of the present invention may include removing a direct current component (step 401), an energy normalization step (step 403), a silent section removing step (step 405), and a feature element. Extracting the speaker component (step 407).

상기 클라이언트(103)는 상기 A/D 변환된 음성 신호를 전처리(Preprocessing)하여 화자 성분을 추출(단계 321)한다. 그 후, 상기 회원 식별자, 상기 단어 식별자 및 상기 화자 성분을 상기 서버(101)에 송신(단계 323)한다.The client 103 preprocesses the A / D converted voice signal to extract a speaker component (step 321). The member identifier, the word identifier and the speaker component are then sent to the server 101 (step 323).

도 5는 본 발명의 바람직한 일실시예에 따른, 분산 처리 방식을 예시한 도면이다.5 is a diagram illustrating a distributed processing scheme according to a preferred embodiment of the present invention.

도 5를 참조하면, 본 발명의 바람직한 일실시예에 따른, 화자 학습 시스템은 크게 클라이언트 컴포넌트(503) 및 서버 컴포넌트(501)로 구성되어 있다. 클라이언트 컴포넌트(503)는 2개의 인터페이스(Interface)로 구성되어 있다. 디지털화된 음성 신호로부터 화자 성분을 추출하는 인터페이스와 인식 결과를 서버측으로부터 통보 받는 인터페이스로 구성된다. 상기 프록시 객체(Proxy Object)에서는 추출된 화자 성분을 패킷(Packet)화 하는 Marshalling 과정을 수행한 후 채널을 통해 서버로 전송하고 스텁 객체(Stub Object)를 통해 서버로 전송된 패킷으로부터 음성 패턴 데이터를 복원하는 Unmarshalling 과정을 수행하여 서버의 인식 컴포넌트내 인식 과정 인터페이스에 화자 성분을 전달한다. 인식 과정을 수행한 후 그 결과를 앞의 전송된 절차의 역순으로 하여 클라이언트의 결과 동작 인터페이스에 통보한다.Referring to FIG. 5, the speaker learning system according to the preferred embodiment of the present invention is largely composed of a client component 503 and a server component 501. The client component 503 is composed of two interfaces. It consists of an interface that extracts the speaker component from the digitized voice signal and an interface that is informed of the recognition result from the server. The proxy object performs a marshalling process to packetize the extracted speaker component and then transmits it to the server through a channel and receives voice pattern data from the packet transmitted to the server through a stub object. Perform unmarshalling process to restore and deliver speaker component to recognition process interface in server's recognition component. After performing the recognition process, the result is notified to the result operation interface of the client in the reverse order of the previous procedure.

상기와 같은 분산 처리 방식에 의하여, 인터넷 상에서 동시에 많은 사용자가 화자 확인 시스템에 접속하였을 경우에도 전체 시스템에 부하가 일부 시스템에 집중되는 것을 방지할 수 있다.By such a distributed processing scheme, even if many users access the speaker verification system on the Internet at the same time, it is possible to prevent the load on the entire system from being concentrated on some systems.

상기 서버(101)는 상기 클라이언트로부터 상기 단어들, 단어 식별자들, 화자 성분들을 수신(단계 325)한다. 그 후, 상기 화자 성분들을 학습 과정 수행을 통한 기준 패턴 데이터를 추출(단계 327)한다. 상기 기준 패턴 데이터는 화자 확인이 필요한 경우 비교의 기준이 되는 값이다.The server 101 receives (step 325) the words, word identifiers, and speaker components from the client. Thereafter, reference pattern data is extracted through performing a learning process on the speaker components (step 327). The reference pattern data is a value used as a reference for comparison when speaker identification is required.

도 6은 본 발명의 바람직한 일실시예에 따른, 상기 학습 과정을 개략적으로 나타내는 순서도이다.6 is a flowchart schematically illustrating the learning process according to an embodiment of the present invention.

도 6을 참조하면, 상기 학습 과정은 K-means 크러스트링 단계(단계 601), 벡터 양자화 단계(단계 603), 확률값 초기화 단계(단계 605), 전향 후향 알고리즘 단계(단계 607), Baum-welch 추정 단계(단계 609)를 포함한다.Referring to FIG. 6, the learning process includes a K-means crusting step (step 601), a vector quantization step (step 603), a probability value initialization step (step 605), a forward backward algorithm step (step 607), and Baum-welch estimation. Step 609 is included.

상기 서버(101)는 추출된 상기 기준 패턴 데이터를 상기 음성 정보 데이터베이스(107)에 저장(단계 329)한다. 본 발명의 바람직한 일 실시예에 따르면, 상기 음성 정보 데이터베이스(107)에는 적어도, 기준 패턴 데이터 테이블, 회원-기준 패턴 식별자 테이블 및 단어 테이블이 있다(도시되지 않음). 상기 기준 패턴 데이터 테이블에는 기준 패턴 식별자 필드, 기준 패턴 데이터 필드가 있고, 상기 회원-기준 패턴 식별자 테이블에는 회원 식별자 필드, 복수의 기준 패턴 식별자 필드가 있으며, 상기 단어 테이블에는 단어 식별자 필드 및 기준 패턴 식별자 필드가 있다.The server 101 stores the extracted reference pattern data in the voice information database 107 (step 329). According to a preferred embodiment of the present invention, the voice information database 107 has at least a reference pattern data table, a member-reference pattern identifier table and a word table (not shown). The reference pattern data table includes a reference pattern identifier field and a reference pattern data field, the member-reference pattern identifier table includes a member identifier field and a plurality of reference pattern identifier fields, and the word table includes a word identifier field and a reference pattern identifier. There is a field.

그 후, 상기 서버(101)는 상기 학습 과정이 수행되지 않은 화자 성분이 존재하는 지 여부를 판단(단계 331)하고, 판단 결과가 긍정이면 상기 기준 패턴 데이터 추출 단계(단계 327)로 되돌아가고, 판단 결과가 부정이면 완료 통보를 상기 클라이언트(103)에게 송신(단계 333)한다.Thereafter, the server 101 determines whether there is a speaker component for which the learning process has not been performed (step 331), and if the determination result is positive, the server 101 returns to the reference pattern data extraction step (step 327), If the determination is negative, a completion notification is sent to the client 103 (step 333).

상기 클라이언트는 상기 서버(101)로부터 상기 완료 통보를 수신하여 표시 장치에 표시(단계 335)한다.The client receives the completion notification from the server 101 and displays it on the display device (step 335).

도 7은 본 발명의 바람직한 다른 실시예에 따른 화자 학습 방법을 나타낸 순서도이다.7 is a flowchart illustrating a speaker learning method according to another exemplary embodiment of the present invention.

도7을 참조하면, 상기 화자 학습 방법은 도3에서 나타낸 바와 같은 상기 화자 학습 방법과 전체적으로 동일하다. 따라서, 비슷한 설명으로 대체할 수 있을 것이다.Referring to FIG. 7, the speaker learning method is generally the same as the speaker learning method as shown in FIG. Thus, a similar description may be substituted.

사용자가 입력 장치를 통하여 클라이언트(103)로부터 접속 요청 신호를 송신(단계 701)하면, 서버(101)에서는 상기 접속 요청 신호를 수신(단계 703)하고, 상기 클라이언트(103)에게로 제3 HTML 문서를 송신(단계 705)한다.When the user transmits a connection request signal from the client 103 through an input device (step 701), the server 101 receives the connection request signal (step 703) and sends a third HTML document to the client 103. Is transmitted (step 705).

여기서, 제3 HTML 문서는 일반적인 네트워크 서비스를 수행하기 위한 HTML 문서에 소정의 입력 사항, 예를 들어, 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소 등을 사용자가 입력할 수 있는 폼(Form) 형태의 HTML 문서를 삽입(Embedding)한 것이다.Here, the third HTML document may allow a user to input predetermined input items, for example, a member identifier, name, gender, address, social security number, e-mail address, etc., into an HTML document for performing a general network service. Embedding an HTML document in the form of a form.

상기 클라이언트(103)에서는 상기 제3 HTML 문서를 수신하여 표시 장치에 표시(단계 707)한다. 그 후, 사용자로부터 입력 장치를 통하여 상기 소정의 입력 사항, 예를 들어, 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소 등을 입력받아 이를 상기 서버(101)에 송신(단계 709)한다.The client 103 receives the third HTML document and displays it on the display device (step 707). Thereafter, the user inputs the predetermined input item, for example, a member identifier, a name, a gender, an address, a social security number, an e-mail address, and the like through the input device, and transmits it to the server 101 (step 709). )do.

상기 서버(101)는 상기 소정의 입력 사항을 수신하여 이를 웹 사이트 데이터베이스(105)에 저장(단계 711)한다. 본 발명의 바람직한 일 실시예에 따르면, 상기 웹 사이트 데이터베이스(105)에는 적어도, 회원 테이블, 단어군 테이블 및 단어 테이블이 있고, 상기 회원 테이블에는 회원 식별자 필드, 성명 필드, 주민 등록 번호 필드, 성명 필드, 주소 필드, 전자 메일 주소 필드 및 단어군 식별자 필드가 있다(도시되지 않음). 상기 단어군 테이블에는 단어군 식별자 필드와 복수의 단어 식별자 필드들이 있다. 그리고, 상기 단어 테이블에는 단어 식별자 필드 및 단어 필드가 있다.The server 101 receives the predetermined input and stores it in the web site database 105 (step 711). According to a preferred embodiment of the present invention, the web site database 105 includes at least a member table, a word group table and a word table, wherein the member table includes a member identifier field, a name field, a social security number field and a name field. There is an address field, an e-mail address field and a word family identifier field (not shown). The word group table includes a word group identifier field and a plurality of word identifier fields. The word table includes a word identifier field and a word field.

그 후, 상기 서버(101)는 상기 소정의 입력 사항, 예를 들어 성별, 성명, 전자 메일 주소 등을 이용하여 사전에 구축된 복수의 단어군 테이블에서 학습할 단어군을 결정(단계 713)한다. 사전 구축된 복수의 단어군 각각은 복수의 단어로 구성되어 있다. 상기 결정 방법은 예를 들어, 사용자의 주소가 서울이라면 사전 구축된 복수의 단어군 중 서울과 관계 있는 단어가 속한 단어군이 결정된다. 따라서, 학습할 단어군을 결정하는 것은 적어도 상기 소정의 입력 사항, 예를 들어 성별, 성명, 전자 메일 주소 중 어느 하나에 상응하는 것이 바람직하다.Thereafter, the server 101 determines a word group to be learned from a plurality of word group tables previously constructed using the predetermined input items, for example, sex, name, e-mail address, and the like (step 713). . Each of the plurality of pre-built word groups is composed of a plurality of words. In the determination method, for example, if a user's address is Seoul, a word group to which a word related to Seoul belongs is determined among a plurality of pre-built word groups. Thus, determining the word group to learn preferably corresponds to at least one of the predetermined inputs, eg, gender, full name, e-mail address.

그리고, 상기 서버(101)는 제4 HTML 문서를 상기 클라이언트(103)에게 송신(단계 715)한다. 상기 제4 HTML 문서는 상기 소정의 입력 사항에 의해 결정된 학습할 단어군에 속한 단어들과 이에 상응하는 사전 구축된 단어 식별자를 삽입(Embedding)한 것이다.The server 101 then transmits a fourth HTML document to the client 103 (step 715). The fourth HTML document embeds words belonging to the word group to be learned determined by the predetermined input item and corresponding pre-built word identifiers.

상기 클라이언트(103)는 상기 제4 HTML 문서를 수신하여 표시 장치에 표시(단계 717)하고, 본 발명의 바람직한 다른 실시예에 따라서, 사용자로부터 마이크를 통하여 상기 각 단어에 상응하는 음성 신호를 수신(단계 719)한다. 사용자는 각 단어별로 4회 이상 발음하는 것이 화자 학습에 바람직하다. 그 후, 상기 클라이언트(103)는 상기 음성 신호를 A/D 변환(도시되지 않음)하여 단어 식별자와 상기 음성 신호를 상기 서버(101)로 송신(단계 721)한다.The client 103 receives the fourth HTML document and displays it on the display device (step 717). According to another preferred embodiment of the present invention, the client 103 receives a voice signal corresponding to each word from a user through a microphone. Step 719). It is desirable for the user to speak four or more times for each speaker. The client 103 then A / D converts the voice signal (not shown) to transmit the word identifier and the voice signal to the server 101 (step 721).

상기 서버(101)는 상기 클라이언트(103)으로부터 상기 단어 식별자, 음성 신호를 수신(단계 723)하고 상기 전처리(Preprocessing) 과정을 통하여 화자 성분을 추출(단계 725)한다. 그 후, 상기 화자 성분들을 상기 학습 과정 수행을 통한 기준 패턴 데이터를 추출(단계 727)한다.The server 101 receives the word identifier and the voice signal from the client 103 (step 723) and extracts the speaker component through the preprocessing process (step 725). Thereafter, reference speaker data is extracted through performing the learning process on the speaker components (step 727).

상기 서버(101)는추출된 상기 기준 패턴 데이터를 상기 음성 정보 데이터베이스(107)에 저장(단계 729)한다. 본 발명의 바람직한 일 실시예에 따르면, 상기 음성 정보 데이터베이스(107)에는 적어도, 기준 패턴 데이터 테이블, 회원-기준 패턴 식별자 테이블 및 단어 테이블이 있다(도시되지 않음). 상기 기준 패턴 데이터 테이블에는 기준 패턴 식별자 필드, 기준 패턴 데이터 필드가 있고, 상기 회원-기준 패턴 식별자 테이블에는 회원 식별자 필드, 복수의 기준 패턴 식별자 필드가 있으며, 상기 단어 테이블에는 단어 식별자 필드 및 기준 패턴 식별자 필드가 있다.The server 101 stores the extracted reference pattern data in the voice information database 107 (step 729). According to a preferred embodiment of the present invention, the voice information database 107 has at least a reference pattern data table, a member-reference pattern identifier table and a word table (not shown). The reference pattern data table includes a reference pattern identifier field and a reference pattern data field, the member-reference pattern identifier table includes a member identifier field and a plurality of reference pattern identifier fields, and the word table includes a word identifier field and a reference pattern identifier. There is a field.

그 후, 상기 서버(101)는 상기 학습 과정이 수행되지 않은 화자 성분이 존재하는 지 여부를 판단(단계 731)하고, 판단 결과가 긍정이면 상기 기준 패턴 데이터 추출 단계(단계 727)로 되돌아가고, 판단 결과가 부정이면 완료 통보를 상기 클라이언트(103)에게 송신(단계 733)한다.Thereafter, the server 101 determines whether there is a speaker component for which the learning process has not been performed (step 731), and if the determination result is affirmative, the server 101 returns to the reference pattern data extraction step (step 727), If the determination is negative, a completion notification is sent to the client 103 (step 733).

상기 클라이언트는 상기 서버(101)로부터 상기 완료 통보를 수신하여 표시 장치에 표시(단계735)한다.The client receives the completion notification from the server 101 and displays it on the display device ( step 735).

도 8은 본 발명의 바람직한 일 실시예에 따른 화자 확인 방법을 나타낸 순서도이다.8 is a flowchart illustrating a speaker identification method according to an embodiment of the present invention.

사용자가 입력 장치를 통하여 클라이언트(103)로부터 접속 요청 신호를 송신(단계 801)하면, 서버(101)에서는 상기 접속 요청 신호를 수신(단계 803)하고, 상기 클라이언트(103)에게로 제5 HTML 문서를 송신(단계 805)한다.When the user transmits a connection request signal from the client 103 through an input device (step 801), the server 101 receives the connection request signal (step 803) and sends a fifth HTML document to the client 103. Is transmitted (step 805).

여기서, 제5 HTML 문서는 일반적인 네트워크 서비스를 수행하기 위한 HTML 문서에 회원 식별자를 사용자가 입력할 수 있는 폼(Form) 형태의 HTML 문서를 삽입(Embedding)한 것이다.Here, the fifth HTML document is an embedded HTML document in which a user can input a member identifier into an HTML document for performing a general network service.

상기 클라이언트(103)에서는 상기 제5 HTML 문서를 수신하여 표시 장치에 표시(단계 807)한다. 그 후, 사용자로부터 입력 장치를 통하여 회원 식별자를 입력받아 이를 상기 서버(101)에 송신(단계 809)한다.The client 103 receives the fifth HTML document and displays it on the display device (step 807). Thereafter, the member identifier is received from the user through the input device and transmitted to the server 101 (step 809).

상기 서버(101)는 상기 회원 식별자를 수신(단계 811)하고, 상기 회원 식별자에 상응하는 웹 사이트 데이터베이스(105)에 저장된 단어군 식별자를 검색(단계 813)한다.The server 101 receives the member identifier (step 811) and retrieves the word group identifier stored in the web site database 105 corresponding to the member identifier (step 813).

상기 단어군은 사전에 구축되어 있으며, 상기 서버(101)는 상기 단어군에 속하는 복수의 단어들 중 한 개의 단어를 임의로 선택(단계 815)한다. 그러나, 여기서, 화자 확인 시마다 시간, 화자의 상태, 접속 장소 등을 고려하여 상기 단어군에 속하는 복수의 단어들 중 한 개를 선택할 수 있다. 예를 들어, 사용자가 코감기 상태인 경우, 발음의 형태 중 비음인 경우를 제외(예를 들어 '엄마'라는 단어는 비음이 포함되므로 제외)한 나머지를 선택한다.The word group is built in a dictionary, and the server 101 randomly selects one word among a plurality of words belonging to the word group (step 815). However, each time the speaker is identified, one of a plurality of words belonging to the word group may be selected in consideration of the time, the speaker's state, and the connection location. For example, when the user is in a nasal cold state, the user selects the remainder except for the non-negative form (for example, the word 'mom' includes the non-negative).

화자가 발성할 문장이 고정되어 있는 화자 인식 시스템을 문장 종속형(Text dependent)이라 하고, 문장이 정해져 있지 않고 자유롭게 발음하는 경우를 문장 독립형(Text independent)이라 한다. 문자 종속형은 사전에 녹취한 음성 정보를 이용하여 화자 인식 시스템을 손쉽게 속일 수 있다는 단점이 있다. 이러한 단점을 보완하기 위해 본 발명의 네트워크 기반 화자 학습 및 화자 학인 방법 및 장치에서는문자 지시형(Text prompted) 화자 인식 방식에 의하였다.A speaker recognition system in which a sentence to be uttered by a speaker is fixed is referred to as text dependent, and a case in which a sentence is undefined and pronounced freely is referred to as text independent. Character dependent type has a disadvantage of easily deceiving a speaker recognition system using previously recorded voice information. In order to compensate for this drawback, the method and apparatus for network-based speaker learning and speaker studies of the present invention are based on a text prompted speaker recognition method.

그리고, 상기 서버(101)는 제6 HTML 문서를 상기 클라이언트(103)에게 송신(단계 817)한다. 상기 제6 HTML 문서는 상기 단어에 상응하는 사전 구축된 단어 식별자 및 상기 클라이언트(103)에서 사용자가 상기 단어를 발음한 음성 신호를 전처리(Preprocessing)하여 상기 단어에 상응하는 화자 성분을 추출하여 서버에 송신하기 위한 컴포넌트 소프트웨어(component software)를 삽입(Embedding)한 것이다.The server 101 then sends a sixth HTML document to the client 103 (step 817). The sixth HTML document preprocesses a pre-established word identifier corresponding to the word and a voice signal in which the user pronounces the word in the client 103 to extract a speaker component corresponding to the word to the server. Embedding component software for transmission.

상기 클라이언트(103)는 상기 제6 HTML 문서를 수신하여 표시 장치에 표시(단계 819)하고, 본 발명의 바람직한 일실시예에 따라서, 사용자로부터 마이크를 통하여 상기 각 단어에 상응하는 음성 신호를 수신(단계 821)한다. 그 후, 상기 클라이언트(103)는 상기 음성 신호를 A/D 변환(도시되지 않음)한 후, 이를 입력으로한 전처리(Preprocessing)과정을 행한다.The client 103 receives the sixth HTML document and displays it on the display device (step 819). According to an exemplary embodiment of the present invention, the client 103 receives a voice signal corresponding to each word from a user through a microphone. Step 821). Thereafter, the client 103 performs an A / D conversion (not shown) of the voice signal and then performs a preprocessing process using the same as an input.

도 9는 본 발명의 바람직한 다른 실시예에 따른 화자 확인 과정을 나타낸 개략적인 순서도이다.9 is a schematic flowchart illustrating a speaker identification process according to another exemplary embodiment of the present invention.

도 9를 참조하면, 상기 과정은 화자 성분을 입력으로 하여, VQ 인덱싱 단계(단계 901), 관측 파라미터 추출 단계(단계 903)를 거쳐 음성 패턴 데이터를 추출하고, 비터비 계산 단계(단계 905)에서 상기 음성 패턴 데이터와 이에 상응하는 기준 패턴 데이터를 비교하여 그 결과를 결정하는 단계(단계 907)를 포함한다.Referring to FIG. 9, the process extracts the speech pattern data through the VQ indexing step (step 901) and the observation parameter extraction step (step 903) using the speaker component as an input, and in the Viterbi calculation step (step 905). And comparing the voice pattern data with the corresponding reference pattern data to determine the result (step 907).

상기 화자 성분이 다시 도 4에서 도시된 바와 같은 전처리(Preprocessing) 과정을 통하여 화자 성분으로 추출된다.The speaker component is again extracted as the speaker component through a preprocessing process as shown in FIG. 4.

상기 클라이언트(103)는 상기 A/D 변환된 음성 신호를 전처리(Preprocessing)하여 화자 성분을 추출(단계 823)한다. 그 후, 상기 화자 성분을 상기 서버(101)에 송신(단계 825)한다.The client 103 preprocesses the A / D converted voice signal to extract a speaker component (step 823). Thereafter, the speaker component is transmitted to the server 101 (step 825).

도 5를 참조하면, 본 발명의 바람직한 일실시예에 따른, 화자 학습 시스템은 크게 클라이언트 컴포넌트(503) 및 서버 컴포넌트(501)로 구성되어 있다. 클라이언트 컴포넌트(503)는 2개의 인터페이스(Interface)로 구성되어 있다. 디지털화된 음성 신호로부터 화자 성분을 추출하는 인터페이스와 인식 결과를 서버측으로부터 통보 받는 인터페이스로 구성된다. 전처리(Preprocessing) 과정 인터페이스는 추출한 화자 성분을 프록시 객체(Proxy Object)에 전달한다. 상기 프록시 객체(Proxy Object)에서는 추출된 화자 성분을 패킷(Packet)화 하는 Marshalling 과정을 수행한 후 채널을 통해 서버로 전송하고 스텁 객체(Stub Object)를 통해 서버로 전송된 패킷으로부터 화자 성분을 복원하는 Unmarshalling 과정을 수행하여 서버의 인식컴포넌트내 인식 과정 인터페이스에 화자 성분을 전달한다. 인식 과정을 수행한 후 그 결과를 앞의 전송된 절차의 역순으로 하여 클라이언트의 결과 동작 인터페이스에 통보한다.Referring to FIG. 5, the speaker learning system according to the preferred embodiment of the present invention is largely composed of a client component 503 and a server component 501. The client component 503 is composed of two interfaces. It consists of an interface that extracts the speaker component from the digitized voice signal and an interface that is informed of the recognition result from the server. The preprocessing process interface delivers the extracted speaker component to the proxy object. The proxy object performs a marshalling process to packetize the extracted speaker component, transmits it to the server through a channel, and restores the speaker component from the packet transmitted to the server through a stub object. Unmarshalling is performed to deliver the speaker component to the recognition process interface in the server's recognition component. After performing the recognition process, the result is notified to the result operation interface of the client in the reverse order of the previous procedure.

상기 서버(101)는 상기 클라이언트로부터 상기 화자 성분을 수신(단계 827)한다. 그 후, 상기 서버(101)는 도9에 나타낸 방식에 의해 상기 단어 식별자에 상응하는 음성 패턴 데이터를 추출(단계 828)한다. 그 후, 상기 단어 식별자와 상기 음성 패턴 데이터에 상응하는 상기 음성 정보 데이터베이스(107)에 저장된 기준 패턴 데이터를 검색(단계 829)한다.The server 101 receives (step 827) the speaker component from the client. The server 101 then extracts the speech pattern data corresponding to the word identifier (step 828) in the manner shown in FIG. Thereafter, reference pattern data stored in the voice information database 107 corresponding to the word identifier and the voice pattern data is retrieved (step 829).

그 후, 상기 서버(101)는 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 지 여부를 판단(단계 831)하고, 판단 결과가 긍정이면 상기 클라이언트(103)에 허가 통지를 송신(단계 833)하고, 판단 결과가 부정이면 상기 클라이언트(103)에 거부 통지를 송신(단계 835)한다.Thereafter, the server 101 determines whether the voice pattern data and the reference pattern data match (step 831), and if the determination is affirmative, transmits a permission notification to the client 103 (step 833). If the determination is negative, a rejection notification is sent to the client 103 (step 835).

상기 클라이언트는 상기 서버(101)로부터 상기 완료 통지를 수신하여 표시 장치에 표시(단계 837))하거나 상기 거부 통지를 수신하여 표시 장치에 표시(단계 839)한다.The client receives the completion notification from the server 101 and displays it on the display device (step 837) or receives the rejection notification and displays it on the display device (step 839).

도 10은 본 발명의 바람직한 다른 실시예에 따른 화자 확인 방법을 나타낸 순서도이다.10 is a flowchart illustrating a speaker identification method according to another exemplary embodiment of the present invention.

도10을 참조하면, 상기 화자 학습 방법은 도8에서 나타낸 바와 같은 상기 화자 학습 방법과 전체적으로 동일하다. 따라서, 비슷한 설명으로 대체할 수 있을 것이다.Referring to FIG. 10, the speaker learning method is generally the same as the speaker learning method as shown in FIG. Thus, a similar description may be substituted.

사용자가 입력 장치를 통하여 클라이언트(103)로부터 접속 요청 신호를 송신(단계 1001)하면, 서버(101)에서는 상기 접속 요청 신호를 수신(단계 1003)하고, 상기 클라이언트(103)에게로 제7 HTML 문서를 송신(단계 1005)한다.When the user transmits a connection request signal from the client 103 through an input device (step 1001), the server 101 receives the connection request signal (step 1003) and sends the seventh HTML document to the client 103. (Step 1005).

여기서, 제7 HTML 문서는 일반적인 네트워크 서비스를 수행하기 위한 HTML 문서에 회원 식별자를 사용자가 입력할 수 있는 폼(Form) 형태의 HTML 문서를 삽입(Embedding)한 것이다.Here, the seventh HTML document is an embedded HTML document in which a user can input a member identifier into an HTML document for performing a general network service.

상기 클라이언트(103)에서는 상기 제5 HTML 문서를 수신하여 표시 장치에 표시(단계 1007)한다. 그 후, 사용자로부터 입력 장치를 통하여 회원 식별자를 입력받아 이를 상기 서버(101)에 송신(단계 1009)한다.The client 103 receives the fifth HTML document and displays it on the display device (step 1007). Thereafter, a member identifier is input from the user through the input device and transmitted to the server 101 (step 1009).

상기 서버(101)는 상기 회원 식별자를 수신(단계 1011)하고, 상기 회원 식별자에 상응하는 웹 사이트 데이터베이스(105)에 저장된 단어군 식별자를 검색(단계 1013)한다.The server 101 receives the member identifier (step 1011) and retrieves the word group identifier stored in the web site database 105 corresponding to the member identifier (step 1013).

상기 단어군은 사전에 구축되어 있으며, 상기 서버(101)는 상기 단어군에 속하는 복수의 단어들 중 한 개의 단어를 임의로 선택(단계 1015)한다. 그러나, 여기서, 화자 확인 시마다 시간, 화자의 상태, 접속 장소 등을 고려하여 상기 단어군에 속하는 복수의 단어들 중 한 개를 선택할 수 있다. 예를 들어, 사용자가 코감기 상태인 경우, 발음의 형태 중 비음인 경우를 제외(예를 들어 '엄마'라는 단어는 비음이 포함되므로 제외)한 나머지를 선택한다.The word group is built in a dictionary, and the server 101 arbitrarily selects one word among a plurality of words belonging to the word group (step 1015). However, each time the speaker is identified, one of a plurality of words belonging to the word group may be selected in consideration of the time, the speaker's state, and the connection location. For example, when the user is in a nasal cold state, the user selects the remainder except for the non-negative form (for example, the word 'mom' includes the non-negative).

그리고, 상기 서버(101)는 제8 HTML 문서를 상기 클라이언트(103)에게 송신(단계 1017)한다. 상기 제6 HTML 문서는 상기 단어에 상응하는 사전 구축된 단어 식별자를 삽입(Embedding)한 것이다.The server 101 then transmits an eighth HTML document to the client 103 (step 1017). The sixth HTML document is embedded with a pre-built word identifier corresponding to the word.

상기 클라이언트(103)는 상기 제8 HTML 문서를 수신하여 표시 장치에 표시(단계 1019)하고, 본 발명의 바람직한 일실시예에 따라서, 사용자로부터 마이크를 통하여 상기 각 단어에 상응하는 음성 신호를 수신(단계 1021)한다. 그 후, 상기 클라이언트(103)는 상기 음성 신호를 A/D 변환(도시되지 않음)한 후, 이를 상기 서버(단계 1023)로 송신한다.The client 103 receives the eighth HTML document and displays it on a display device (step 1019), and according to a preferred embodiment of the present invention, receives a voice signal corresponding to each word from a user through a microphone ( Step 1021). The client 103 then A / D converts the voice signal (not shown) and sends it to the server (step 1023).

상기 서버(101)는 상기 클라이언트로부터 상기 음성 신호를 수신(단계 1025)한다. 그 후, 도9에서 나타낸 바와 같은 과정을 통하여 단어 식별자에 상응하는 음성 패턴 데이터를 추출(단계 1027)한다.The server 101 receives the voice signal from the client (step 1025). Thereafter, voice pattern data corresponding to the word identifier is extracted through the process as shown in FIG. 9 (step 1027).

도 9는 본 발명의 바람직한 일 실시예에 따른 화자 확인 과정을 나타낸 개략적인 순서도이다.9 is a schematic flowchart illustrating a speaker identification process according to an exemplary embodiment of the present invention.

그 후, 상기 서버(101)는 상기 단어 식별자와 상기 음성 패턴 데이터에 상응하는 상기 음성 정보 데이터베이스(107)에 저장된 기준 패턴 데이터를 검색(단계 1029)한다.Thereafter, the server 101 searches (step 1029) reference pattern data stored in the voice information database 107 corresponding to the word identifier and the voice pattern data.

그 후, 상기 서버(101)는 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 지 여부를 판단(단계 1031)하고, 판단 결과가 긍정이면 상기 클라이언트(103)에 허가 통지를 송신(단계 1033)하고, 판단 결과가 부정이면 상기 클라이언트(103)에 거부 통지를 송신(단계 1035)한다.Thereafter, the server 101 determines whether the voice pattern data and the reference pattern data match (step 1031), and if the determination is affirmative, transmits a permission notification to the client 103 (step 1033). If the result of the determination is negative, a rejection notification is sent to the client 103 (step 1035).

상기 클라이언트는 상기 서버(101)로부터 상기 완료 통지를 수신하여 표시 장치에 표시(단계 1037)하거나 상기 거부 통지를 수신하여 표시 장치에 표시(단계 1039)한다.The client receives the completion notification from the server 101 and displays it on the display device (step 1037) or receives the rejection notification and displays it on the display device (step 1039).

도 2b에서 예시한 바와 같은 장치에 의하여 본 발명의 다른 실시예를 기술하면 아래와 같다.Another embodiment of the present invention is described by the apparatus as illustrated in FIG. 2B.

도 11은 본 발명의 바람직한 또 다른 실시예에 따른 화자 학습 방법을 나타낸 순서도이다.11 is a flowchart illustrating a speaker learning method according to another preferred embodiment of the present invention.

사용자가 입력 장치를 통하여 클라이언트(209b)로부터 접속 요청 신호를 송신(단계 1101)하면, 네트워크 서버 시스템(201b)에서는 상기 접속 요청 신호를 수신(단계 1103)하고, 상기 클라이언트(209b)에게로 제9 HTML 문서를 송신(단계 1105)한다.When the user transmits a connection request signal from the client 209b through an input device (step 1101), the network server system 201b receives the connection request signal (step 1103), and sends the ninth request to the client 209b. The HTML document is sent (step 1105).

여기서, 제9 HTML 문서는 일반적인 네트워크 서비스를 수행하기 위한 HTML 문서에 소정의 입력 사항, 예를 들어, 회원 식별자, 성명, 성별, 주소, 주민 등록번호, 전자 메일 주소 등을 사용자가 입력할 수 있는 폼(Form) 형태의 HTML 문서를 삽입(Embedding)한 것이다.Here, the ninth HTML document may allow a user to input predetermined input items, for example, a member identifier, a name, a gender, an address, a social security number, an e-mail address, and the like into an HTML document for performing a general network service. Embedding an HTML document in the form of a form.

상기 클라이언트(209b)에서는 상기 제9 HTML 문서를 수신하여 표시 장치에 표시(단계 1107)한다. 그 후, 사용자로부터 입력 장치를 통하여 상기 소정의 입력 사항, 예를 들어, 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소 등을 입력받아 이를 상기 네트워크 서버 시스템(201b)에 송신(단계 1109)한다.The client 209b receives the ninth HTML document and displays it on the display device (step 1107). Thereafter, the user inputs the predetermined input items, for example, a member identifier, a name, a gender, an address, a social security number, an e-mail address, and the like through the input device and transmits the same to the network server system 201b. Step 1109).

상기 네트워크 서버 시스템(201b)는 상기 소정의 입력 사항을 수신하여 이를 웹 사이트 데이터베이스(105)에 저장(단계 1111)한다. 상기 웹 사이트 데이터베이스(205b)의 데이터 구조는 전술한 바와 같다.The network server system 201b receives the predetermined input and stores it in the web site database 105 (step 1111). The data structure of the web site database 205b is as described above.

그 후, 상기 네트워크 서버 시스템(201b)는 상기 소정의 입력 사항, 예를 들어 성별, 주소, 전자 메일 주소 등을 이용하여 사전에 구축된 복수의 단어군 테이블에서 학습할 단어군을 결정(단계 1113)한다. 사전 구축된 복수의 단어군 각각은 복수의 단어로 구성되어 있다. 단어군 결정 방법은 전술한 바와 같다.Thereafter, the network server system 201b determines a word group to learn from a plurality of word group tables constructed in advance using the predetermined input items, for example, gender, address, and e-mail address (step 1113). )do. Each of the plurality of pre-built word groups is composed of a plurality of words. The word group determination method is as described above.

그리고, 상기 네트워크 서버 시스템(201b)는 제10 HTML 문서를 상기 클라이언트(209b)에게 송신(단계 1115)한다. 상기 제10 HTML 문서는 상기 소정의 입력 사항에 의해 결정된 학습할 단어군에 속한 단어들과 이에 상응하는 사전 구축된 단어 식별자 및 상기 클라이언트(209b)에서 사용자가 상기 단어들을 발음한 음성 신호를 전처리(Preprocessing)하여 각각의 단어에 상응하는 화자 성분들을 추출하여 서버에 송신하기 위한 컴포넌트 소프트웨어(component software)를 삽입(Embedding)한 것이다.The network server system 201b then transmits a tenth HTML document to the client 209b (step 1115). The tenth HTML document may preprocess the words belonging to the word group to be learned determined by the predetermined input, a pre-built word identifier corresponding thereto, and a voice signal in which the user pronounces the words in the client 209b. By embedding component software for preprocessing to extract speaker components corresponding to each word and transmit them to the server.

상기 클라이언트(209b)는 상기 제10 HTML 문서를 수신하여 표시 장치에 표시(단계 1117)하고, 본 발명의 바람직한 일실시예에 따라서, 사용자로부터 마이크를 통하여 상기 각 단어에 상응하는 음성 신호를 수신(단계 1119)한다. 사용자는 각 단어별로 4회 이상 발음하는 것이 화자 학습에 바람직하다. 그 후, 상기 클라이언트(209b)는 상기 음성 신호를 A/D 변환(도시되지 않음)한 후, 이를 입력으로한 전처리(Preprocessing)과정을 행한다.The client 209b receives the tenth HTML document and displays it on the display device (step 1117). According to an exemplary embodiment of the present invention, the client 209b receives a voice signal corresponding to each word from a user through a microphone. Step 1119). It is desirable for the user to speak four or more times for each speaker. After that, the client 209b performs A / D conversion (not shown) on the voice signal and then performs a preprocessing process using the input signal.

본 발명의 바람직한 또 다른 실시예에 따른 전처리(Preprocessing) 과정은 전술한 바와 같다.Preprocessing process according to another preferred embodiment of the present invention is as described above.

상기 클라이언트(209b)는 상기 A/D 변환된 음성 신호를 전처리(Preprocessing)하여 화자 성분을 추출(단계 1121)한다. 그 후, 상기 회원 식별자, 상기 단어 식별자 및 상기 화자 성분을 상기 네트워크 서버 시스템(201b)에 송신(단계 1123)한다. 여기서, 상기 클라이언트(209b)는 추출한 화자 성분을 상기 화자 인식 서버(203b)로 직접 송신할 수도 있다.The client 209b extracts the speaker component by preprocessing the A / D converted voice signal (step 1121). The member identifier, the word identifier and the speaker component are then sent to the network server system 201b (step 1123). Here, the client 209b may directly transmit the extracted speaker component to the speaker recognition server 203b.

본 발명의 바람직한 또 다른실시예에 따른, 분산 처리 방식은 전술한 바와 같다.According to another preferred embodiment of the present invention, the dispersion processing scheme is as described above.

상기 네트워크 서버 시스템(201b)는 상기 클라이언트로부터 상기 단어들, 단어 식별자들, 화자 성분들을 수신(단계 1125)한다. 그 후, 상기 회원 식별자, 단어 식별자, 화자 성분을 상기 화자 인식 서버 시스템(203b)로 송신(단계 1125)한다.The network server system 201b receives (step 1125) the words, word identifiers, and speaker components from the client. Thereafter, the member identifier, the word identifier, and the speaker component are transmitted (step 1125) to the speaker recognition server system 203b.

상기 화자 인식 서버 시스템(203b)은 상기 회원 식별자, 단어 식별자, 화자 성분을 수신(단계 1127)한다. 물론, 본 발명의 다른 실시예에서, 상기 클라이언트(209b)로부터 직접 수신할 수도 있다. 그 후, 상기 화자 성분들을 학습 과정 수행을 통한 기준 패턴 데이터를 추출(단계 1129)한다.The speaker recognition server system 203b receives the member identifier, the word identifier, and the speaker component (step 1127). Of course, in another embodiment of the present invention, it may be received directly from the client (209b). Thereafter, reference pattern data is extracted through performing a learning process on the speaker components (step 1129).

본 발명의 바람직한 또 다른 실시예에 따른, 상기 학습 과정은 전술한 바와 같다.According to another preferred embodiment of the present invention, the learning process is as described above.

상기 화자 인식 서버 시스템(203b)은 추출된 상기 기준 패턴 데이터를 상기 음성 정보 데이터베이스(207b))에 저장(단계 1131)한다. 본 발명의 바람직한 또 다른 실시예에 따른 상기 음성 정보 데이터베이스(207b)는 전술한 바와 같다.The speaker recognition server system 203b stores the extracted reference pattern data in the voice information database 207b (step 1131). The voice information database 207b according to another preferred embodiment of the present invention is as described above.

그 후, 상기 화자 인식 서버 시스템(203b)은 상기 학습 과정이 수행되지 않은 화자 성분이 존재하는 지 여부를 판단(단계 1133)하고, 판단 결과가 긍정이면 상기 기준 패턴 데이터 추출 단계(단계 1129)로 되돌아가고, 판단 결과가 부정이면 완료 통보를 상기 네트워크 서버 시스템(201b)에게 송신(단계 1135)한다. 본 발명의 또 다른 실시예에서, 완료 통보를 상기 클라이언트(209b)에 직접 송신할 수도 있다.Thereafter, the speaker recognition server system 203b determines whether there is a speaker component for which the learning process has not been performed (step 1133), and if the result of the determination is affirmative, extracts the reference pattern data (step 1129). If the result of the determination is negative, a completion notification is sent to the network server system 201b (step 1135). In another embodiment of the present invention, a completion notification may be sent directly to the client 209b.

상기 네트워크 서버 시스템(201b)은 완료 통보를 수신하여 상기 클라이언트(209b)에 송신(단계 1137)한다.The network server system 201b receives the completion notification and transmits it to the client 209b (step 1137).

상기 클라이언트(209b)는 상기 네트워크 서버 시스템(201b)로부터 상기 완료 통보를 수신하여 표시 장치에 표시(단계 1139)한다.The client 209b receives the completion notification from the network server system 201b and displays it on the display device (step 1139).

도 12는 본 발명의 바람직한 또 다른 실시예에 따른 화자 학습 방법을 나타낸 순서도이다.12 is a flowchart illustrating a speaker learning method according to another preferred embodiment of the present invention.

도12를 참조하면, 상기 화자 학습 방법은 도11에서 나타낸 바와 같은 상기 화자 학습 방법과 전체적으로 동일하다. 따라서, 비슷한 설명으로 대체할 수 있을것이다.Referring to FIG. 12, the speaker learning method is generally the same as the speaker learning method as shown in FIG. Thus, a similar description may be substituted.

사용자가 입력 장치를 통하여 클라이언트(209b)로부터 접속 요청 신호를 송신(단계 1201)하면, 네트워크 서버 시스템(201b)에서는 상기 접속 요청 신호를 수신(단계 1203)하고, 상기 클라이언트(209b)에게로 제11 HTML 문서를 송신(단계 1205)한다.When the user transmits a connection request signal from the client 209b through an input device (step 1201), the network server system 201b receives the connection request signal (step 1203), and sends the message to the client 209b. The HTML document is sent (step 1205).

여기서, 제11 HTML 문서는 일반적인 네트워크 서비스를 수행하기 위한 HTML 문서에 소정의 입력 사항, 예를 들어, 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소 등을 사용자가 입력할 수 있는 폼(Form) 형태의 HTML 문서를 삽입(Embedding)한 것이다.Herein, the eleventh HTML document may allow a user to input predetermined input items, for example, a member identifier, a name, a gender, an address, a social security number, an e-mail address, and the like into an HTML document for performing a general network service. Embedding an HTML document in the form of a form.

상기 클라이언트(209b)에서는 상기 제11 HTML 문서를 수신하여 표시 장치에 표시(단계 1207)한다. 그 후, 사용자로부터 입력 장치를 통하여 상기 소정의 입력 사항, 예를 들어, 회원 식별자, 성명, 성별, 주소, 주민 등록 번호, 전자 메일 주소 등을 입력받아 이를 상기 네트워크 서버 시스템(201b)에 송신(단계 1209)한다.The client 209b receives the eleventh HTML document and displays it on the display device (step 1207). Thereafter, the user inputs the predetermined input items, for example, a member identifier, a name, a gender, an address, a social security number, an e-mail address, and the like through the input device and transmits the same to the network server system 201b. Step 1209).

상기 네트워크 서버 시스템(201b)는 상기 소정의 입력 사항을 수신하여 이를 웹 사이트 데이터베이스(205b)에 저장(단계 1211)한다. 본 발명의 바람직한 또 다른 실시예에 따른 상기 웹 사이트 데이터베이스(205b)는 전술한 바와 같다.The network server system 201b receives the predetermined input and stores it in the web site database 205b (step 1211). The web site database 205b according to another preferred embodiment of the present invention is as described above.

그 후, 상기 네트워크 서버 시스템(201b)는 상기 소정의 입력 사항, 예를 들어 성별, 주소, 전자 메일 주소 등을 이용하여 사전에 구축된 복수의 단어군 테이블에서 학습할 단어군을 결정(단계 1213)한다. 사전 구축된 복수의 단어군 각각은 복수의 단어로 구성되어 있다. 상기 결정 방법은 전술한 바와 같다.Thereafter, the network server system 201b determines a word group to learn from a plurality of word group tables constructed in advance using the predetermined input items, for example, gender, address, and e-mail address (step 1213). )do. Each of the plurality of pre-built word groups is composed of a plurality of words. The determination method is as described above.

그리고, 상기 네트워크 서버 시스템(201b)는 제12 HTML 문서를 상기 클라이언트(209b)에게 송신(1215)한다. 상기 제12 HTML 문서는 상기 소정의 입력 사항에 의해 결정된 학습할 단어군에 속한 단어들과 이에 상응하는 사전 구축된 단어 식별자를 삽입(Embedding)한 것이다.The network server system 201b then transmits 1215 an twelfth HTML document to the client 209b. The twelfth HTML document embeds words belonging to a word group to be learned determined by the predetermined input item and corresponding pre-built word identifiers.

상기 클라이언트(209b)는 상기 제12 HTML 문서를 수신하여 표시 장치에 표시(단계 1217)하고, 본 발명의 바람직한 또 다른 실시예에 따라서, 사용자로부터 마이크를 통하여 상기 각 단어에 상응하는 음성 신호를 수신(단계 1219)한다. 사용자는 각 단어별로 4회 이상 발음하는 것이 화자 학습에 바람직하다. 그 후, 상기 클라이언트(209b)는 상기 음성 신호를 A/D 변환(도시되지 않음)하여 회원 식별자, 단어 식별자와 상기 음성 신호를 상기 네트워크 서버 시스템(201b)로 송신(단계 1221)한다. 본 발명의 또 다른 실시예에서는 상기 클라이언트(209b)가 상기 회원 식별자, 단어 식별자 및 상기 음성 신호를 상기 화자 인식 서버 시스템(203b)로 송신할 수도 있다.The client 209b receives the twelfth HTML document and displays it on the display device (step 1217). According to another preferred embodiment of the present invention, the client 209b receives a voice signal corresponding to each word from a user through a microphone. (Step 1219). It is desirable for the user to speak four or more times for each speaker. The client 209b then performs A / D conversion (not shown) on the voice signal to transmit a member identifier, a word identifier and the voice signal to the network server system 201b (step 1221). In another embodiment of the present invention, the client 209b may transmit the member identifier, the word identifier, and the voice signal to the speaker recognition server system 203b.

상기 네트워크 서버 시스템(201b)는 상기 클라이언트(209b)으로부터 상기 회원 식별자,단어 식별자, 음성 신호를 수신하고 이를 상기 화자 인식 서버 시스템(203b)로 송신(단계 1223)한다. 상기 화자 인식 서버 시스템(203b)은 상기 회원 식별자, 상기 단어 식별자, 상기 음성 신호를 수신(단계 1225)한다. 그 후, 상기 화자 인식 서버 시스템(203b)은 상기 전처리(Preprocessing) 과정을 통하여 화자 성분을 추출(단계 1227)한다. 그 후, 상기 화자 성분들을 상기 학습 과정 수행을 통한 기준 패턴 데이터를 추출(단계 1229)한다.The network server system 201b receives the member identifier, the word identifier, and the voice signal from the client 209b and transmits it to the speaker recognition server system 203b (step 1223). The speaker recognition server system 203b receives the member identifier, the word identifier, and the voice signal (step 1225). Thereafter, the speaker recognition server system 203b extracts the speaker component through the preprocessing process (step 1227). Thereafter, the speaker component extracts reference pattern data by performing the learning process (step 1229).

상기 화자 인식 서버 시스템(203b)은 추출된 상기 기준 패턴 데이터를 상기 음성 정보 데이터베이스(207b)에 저장(단계 1231)한다. 본 발명의 바람직한 또 다른 실시예에 따른 상기 음성 정보 데이터베이스(207b)는 전술한 바와 같다.The speaker recognition server system 203b stores the extracted reference pattern data in the voice information database 207b (step 1231). The voice information database 207b according to another preferred embodiment of the present invention is as described above.

그 후, 상기 화자 인식 서버 시스템(203b)은 상기 학습 과정이 수행되지 않은 화자 성분이 존재하는 지 여부를 판단(단계 1233)하고, 판단 결과가 긍정이면 상기 기준 패턴 데이터 추출 단계(단계 1229)로 되돌아가고, 판단 결과가 부정이면 완료 통보를 상기 네트워크 서버 시스템(201b)에 송신(단계 1235)한다, 본 발명의 또 다른 실시예에 따르면, 상기 화자 인식 서버 시스템(203b)은 상기 완료 통보를 클라이언트(209b)에게 직접 송신한다.Thereafter, the speaker recognition server system 203b determines whether there is a speaker component for which the learning process has not been performed (step 1233), and if the determination result is positive, the reference pattern data extraction step (step 1229). Returning, if the determination is negative, a completion notification is sent to the network server system 201b (step 1235). According to another embodiment of the present invention, the speaker recognition server system 203b sends the completion notification to the client. Send directly to 209b.

상기 네트워크 서버 시스템(201b)은 상기 완료 통보를 상기 클라이언트(209b)에 송신(단계 1237)한다.The network server system 201b sends the completion notification to the client 209b (step 1237).

상기 클라이언트(209b)는 상기 네트워크 서버 시스템(201b)로부터 상기 완료 통보를 수신하여 표시 장치에 표시(단계 1239)한다.The client 209b receives the completion notification from the network server system 201b and displays it on the display device (step 1239).

도 13은 본 발명의 바람직한 또 다른 실시예에 따른 화자 확인 방법을 나타낸 순서도이다.13 is a flowchart showing a speaker identification method according to another preferred embodiment of the present invention.

사용자가 입력 장치를 통하여 클라이언트(209b)로부터 접속 요청 신호를 송신(단계 1301)하면, 네트워크 서버 시스템(201b)에서는 상기 접속 요청 신호를 수신(단계 1303)하고, 상기 클라이언트(209b)에게로 제13 HTML 문서를 송신(단계 1305)한다.When the user transmits a connection request signal from the client 209b through an input device (step 1301), the network server system 201b receives the connection request signal (step 1303), and sends it to the client 209b (13th). The HTML document is sent (step 1305).

여기서, 제13 HTML 문서는 일반적인 네트워크 서비스를 수행하기 위한 HTML 문서에 회원 식별자를 사용자가 입력할 수 있는 폼(Form) 형태의 HTML 문서를 삽입(Embedding)한 것이다.Herein, the thirteenth HTML document is an embedded HTML document in which a user can input a member identifier into an HTML document for performing a general network service.

상기 클라이언트(209b)에서는 상기 제13 HTML 문서를 수신하여 표시 장치에 표시(단계 1307)한다. 그 후, 사용자로부터 입력 장치를 통하여 회원 식별자를 입력받아 이를 상기 네트워크 서버 시스템(201b)에 송신(단계 1309)한다.The client 209b receives the thirteenth HTML document and displays it on a display device (step 1307). Thereafter, a member identifier is received from the user through the input device and transmitted to the network server system 201b (step 1309).

상기 네트워크 서버 시스템(201b)는 상기 회원 식별자를 수신(단계 1311)하고, 상기 회원 식별자에 상응하는 웹 사이트 데이터베이스(205b)에 저장된 단어군 식별자를 검색(단계 1313)한다.The network server system 201b receives the member identifier (step 1311) and retrieves the word group identifier stored in the web site database 205b corresponding to the member identifier (step 1313).

상기 단어군은 사전에 구축되어 있으며, 상기 네트워크 서버 시스템(201b)는 상기 단어군에 속하는 복수의 단어들 중 한 개의 단어를 임의로 선택(단계 1315)한다. 그러나, 여기서, 화자 확인 시마다 시간, 화자의 상태, 접속 장소 등을 고려하여 상기 단어군에 속하는 복수의 단어들 중 한 개를 선택할 수 있다. 예를 들어, 사용자가 코감기 상태인 경우, 발음의 형태 중 비음인 경우를 제외(예를 들어 '엄마'라는 단어는 비음이 포함되므로 제외)한 나머지를 선택한다.The word group is built in a dictionary, and the network server system 201b arbitrarily selects one word among a plurality of words belonging to the word group (step 1315). However, each time the speaker is identified, one of a plurality of words belonging to the word group may be selected in consideration of the time, the speaker's state, and the connection location. For example, when the user is in a nasal cold state, the user selects the remainder except for the non-negative form (for example, the word 'mom' includes the non-negative).

그리고, 상기 네트워크 서버 시스템(201b)는 제14 HTML 문서를 상기 클라이언트(209b)에게 송신(단계 1317)한다. 상기 제14 HTML 문서는 상기 단어에 상응하는 사전 구축된 단어 식별자 및 상기 클라이언트(209b)에서 사용자가 상기 단어를 발음한 음성 신호를 전처리(Preprocessing)하여 상기 단어에 상응하는 화자 성분을 추출하여 서버에 송신하기 위한 컴포넌트 소프트웨어(component software)를 삽입(Embedding)한 것이다.The network server system 201b then sends a fourteenth HTML document to the client 209b (step 1317). The fourteenth HTML document preprocesses a pre-built word identifier corresponding to the word and a voice signal in which the user pronounces the word in the client 209b to extract a speaker component corresponding to the word to the server. Embedding component software for transmission.

상기 클라이언트(209b)는 상기 제14 HTML 문서를 수신하여 표시 장치에 표시(단계 1319)하고, 본 발명의 바람직한 또 다른 실시예에 따라서, 사용자로부터 마이크를 통하여 상기 각 단어에 상응하는 음성 신호를 수신(단계 1321)한다. 그 후, 상기 클라이언트(209b)는 상기 음성 신호를 A/D 변환(도시되지 않음)한 후, 이를 입력으로한 전처리(Preprocessing)과정을 행한다.The client 209b receives the fourteenth HTML document and displays it on a display device (step 1319). According to another preferred embodiment of the present invention, the client 209b receives a voice signal corresponding to each word from a user through a microphone. (Step 1321). After that, the client 209b performs A / D conversion (not shown) on the voice signal and then performs a preprocessing process using the input signal.

상기 음성 신호를 입력으로하여, 상기 전처리(Preprocessing)과정을 통하여 화자 성분을 추출(단계 1323)한다.The audio signal is input, and the speaker component is extracted (step 1323) through the preprocessing process.

그 후, 상기 화자 성분을 상기 네트워크 서버 시스템(201b)에 송신(단계 1325)한다. 본 발명의 또 다른 실시예에서는, 상기 클라이언트(209b)가 상기 화자 성분을 상기 화자 인식 서버 시스템(203b)에 직접 송신한다.Thereafter, the speaker component is transmitted to the network server system 201b (step 1325). In another embodiment of the present invention, the client 209b directly sends the speaker component to the speaker recognition server system 203b.

본 발명의 바람직한 또 다른 실시예에 따른, 분산 처리 방식은 전술한 바와 같다.According to another preferred embodiment of the present invention, the dispersion processing scheme is as described above.

상기 네트워크 서버 시스템(201b)는 상기 클라이언트(209b)로부터 상기 화자 성분을 수신하여 상기 회원 식별자, 상기 단어 식별자와 함께 상기 화자 인식 서버 시스템(203b)에 송신(단계 1327)한다.The network server system 201b receives the speaker component from the client 209b and transmits it (step 1327) to the speaker recognition server system 203b together with the member identifier and the word identifier.

그 후, 상기 화자 인식 서버 시스템(203b)은 상기 회원 식별자, 상기 단어 식별자 및 상기 화자 성분을 수신(단계 1329)한 후, 상기 화자 성분을 입력으로 하여 상기 도9에 나타낸 바와 같은 화자 확인 과정을 통하여 음성 패턴 데이터를 추출(단계 1330)한다.Thereafter, the speaker recognition server system 203b receives the member identifier, the word identifier, and the speaker component (step 1329), and inputs the speaker component to perform the speaker verification process as shown in FIG. Speech pattern data is extracted (step 1330).

그 다음, 상기 단어 식별자와 상기 음성 패턴 데이터에 상응하는 상기 음성 정보 데이터베이스(207b)에 저장된 기준 패턴 데이터를 검색(단계 1331)한다.Then, the reference pattern data stored in the voice information database 207b corresponding to the word identifier and the voice pattern data is retrieved (step 1331).

그 후, 상기 화자 인식 서버 시스템(203b)은 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 지 여부를 판단(단계 1333)하고, 판단 결과가 긍정이면 상기 네트워크 서버 시스템(201b)에 허가 통지를 송신(단계 1335)하고, 판단 결과가 부정이면 상기 네트워크 서버 시스템(201b)에 거부 통지를 송신(단계 1337)한다. 본 발명의 또 다른 실시예에 따라서, 상기 화자 인식 서버 시스템(203b)은 상기 허가 통지 또는 상기 거부 통지를 상기 클라이언트(209b)에 송신한다.Thereafter, the speaker recognition server system 203b determines whether the voice pattern data and the reference pattern data match (step 1333), and if the result of the determination is affirmative, notifies the network server system 201b of the permission notification. In step 1335, if the determination is negative, a rejection notification is sent to the network server system 201b (step 1337). According to another embodiment of the present invention, the speaker recognition server system 203b transmits the permission notification or the rejection notification to the client 209b.

상기 네트워크 서버 시스템(201b)은 상기 허가 통지를 수신하고 이를 상기 클라이언트(209b)에 송신(단계 1339)하거나, 상기 거부 통지를 수신하고 이를 상기 클라이언트(209b)에 송신(단계 1341)한다.The network server system 201b receives the grant notification and sends it to the client 209b (step 1339), or receives the reject notification and sends it to the client 209b (step 1341).

상기 클라이언트(209b)는 상기 네트워크 서버 시스템(201b)로부터 상기 완료 통지를 수신하여 표시 장치에 표시(단계 1343)하거나 상기 거부 통지를 수신하여 표시 장치에 표시(단계 1345)한다.The client 209b receives the completion notification from the network server system 201b and displays it on the display device (step 1343) or receives the rejection notification and displays it on the display device (step 1345).

도 14는 본 발명의 바람직한 또 다른 실시예에 따른 화자 확인 방법을 나타낸 순서도이다.14 is a flowchart showing a speaker identification method according to another preferred embodiment of the present invention.

도14를 참조하면, 상기 화자 학습 방법은 도13에서 나타낸 바와 같은 상기 화자 학습 방법과 전체적으로 동일하다. 따라서, 비슷한 설명으로 대체할 수 있을 것이다.Referring to FIG. 14, the speaker learning method is generally the same as the speaker learning method as shown in FIG. Thus, a similar description may be substituted.

사용자가 입력 장치를 통하여 클라이언트(209b)로부터 접속 요청 신호를 송신(단계 1401)하면, 네트워크 서버 시스템(201b)에서는 상기 접속 요청 신호를 수신(단계 1403)하고, 상기 클라이언트(209b)에게로 제15 HTML 문서를 송신(단계 1405)한다.When the user transmits a connection request signal from the client 209b through the input device (step 1401), the network server system 201b receives the connection request signal (step 1403), and sends it to the client 209b. The HTML document is sent (step 1405).

여기서, 제15 HTML 문서는 일반적인 네트워크 서비스를 수행하기 위한 HTML 문서에 회원 식별자를 사용자가 입력할 수 있는 폼(Form) 형태의 HTML 문서를 삽입(Embedding)한 것이다.Here, the fifteenth HTML document is an embedded HTML document in which a user can input a member identifier into an HTML document for performing a general network service.

상기 클라이언트(209b)에서는 상기 제15 HTML 문서를 수신하여 표시 장치에 표시(단계 1407)한다. 그 후, 사용자로부터 입력 장치를 통하여 회원 식별자를 입력받아 이를 상기 네트워크 서버 시스템(201b)에 송신(단계 1409)한다.The client 209b receives the fifteenth HTML document and displays it on the display device (step 1407). Thereafter, a member identifier is received from the user through the input device and transmitted to the network server system 201b (step 1409).

상기 네트워크 서버 시스템(201b)는 상기 회원 식별자를 수신(단계 1411)하고, 상기 회원 식별자에 상응하는 웹 사이트 데이터베이스(205b)에 저장된 단어군 식별자를 검색(단계 1413)한다.The network server system 201b receives the member identifier (step 1411) and retrieves the word group identifier stored in the web site database 205b corresponding to the member identifier (step 1413).

상기 단어군은 사전에 구축되어 있으며, 상기 네트워크 서버 시스템(201b)는 상기 단어군에 속하는 복수의 단어들 중 한 개의 단어를 임의로 선택(단계 1415)한다. 그러나, 여기서, 화자 확인 시마다 시간, 화자의 상태, 접속 장소 등을 고려하여 상기 단어군에 속하는 복수의 단어들 중 한 개를 선택할 수 있다. 예를 들어, 사용자가 코감기 상태인 경우, 발음의 형태 중 비음인 경우를 제외(예를 들어 '엄마'라는 단어는 비음이 포함되므로 제외)한 나머지를 선택한다.The word group is constructed in a dictionary, and the network server system 201b arbitrarily selects one word among a plurality of words belonging to the word group (step 1415). However, each time the speaker is identified, one of a plurality of words belonging to the word group may be selected in consideration of the time, the speaker's state, and the connection location. For example, when the user is in a nasal cold state, the user selects the remainder except for the non-negative form (for example, the word 'mom' includes the non-negative).

그리고, 상기 네트워크 서버 시스템(201b)는 제16 HTML 문서를 상기 클라이언트(209b)에게 송신(단계 1417)한다. 상기 제16 HTML 문서는 상기 단어에 상응하는 사전 구축된 단어 식별자를 삽입(Embedding)한 것이다.The network server system 201b then transmits a sixteenth HTML document to the client 209b (step 1417). The sixteenth HTML document is embedded with a pre-built word identifier corresponding to the word.

상기 클라이언트(209b)는 상기 제16 HTML 문서를 수신하여 표시 장치에 표시(단계 1419)하고, 본 발명의 바람직한 또 다른 실시예에 따라서, 사용자로부터 마이크를 통하여 상기 각 단어에 상응하는 음성 신호를 수신(단계 1421)한다. 그 후, 상기 클라이언트(209b)는 상기 음성 신호를 A/D 변환(도시되지 않음)한 후, 이를 상기 네트워크 서버 시스템(201b)로 송신(단계 1423)로 송신한다. 본 발명의 또 다른 실시예에 따라서, 상기 클라이언트(209b)가 상기 음성 신호를 상기 화자 인식 서버 시스템(203b)으로 직접 송신한다.The client 209b receives the sixteenth HTML document and displays it on a display device (step 1419). According to another preferred embodiment of the present invention, the client 209b receives a voice signal corresponding to each word from a user through a microphone. (Step 1421). The client 209b then A / D converts the voice signal (not shown) and then transmits it to the network server system 201b (step 1423). According to another embodiment of the present invention, the client 209b directly transmits the voice signal to the speaker recognition server system 203b.

상기 네트워크 서버 시스템(201b)는 상기 클라이언트로부터 상기 음성 신호를 수신하여 상기 회원 식별자, 상기 단어 식별자와 함께 상기 화자 인식 서버 시스템(203b)으로 송신(단계 1425)한다.The network server system 201b receives the voice signal from the client and transmits the voice signal to the speaker recognition server system 203b together with the member identifier and the word identifier (step 1425).

상기 화자 인식 서버 시스템(203b)은 상기 회원 식별자, 상기 단어 식별자, 상기 음성 신호를 수신(단계 1427)한 후, 상기 전처리(Preprocessing) 과정을 통하여 단어 식별자에 상응하는 화자 성분을 추출한 후, 상기 화자 성분을 입력으로 하여, 상기 도9에 나타낸 바와 같은 화자 확인 과정을 통하여 상기 단어 식별자에 상응하는 음성 패턴 데이터를 추출(단계 1429)한다.The speaker recognition server system 203b receives the member identifier, the word identifier, and the voice signal (step 1427), extracts the speaker component corresponding to the word identifier through the preprocessing process, and then the speaker. The component is input, and the speech pattern data corresponding to the word identifier is extracted through the speaker confirmation process as shown in FIG. 9 (step 1429).

그 후, 상기 화자 인식 서버 시스템(203b)은 상기 단어 식별자와 상기 음성 패턴 데이터에 상응하는 상기 음성 정보 데이터베이스(207b)에 저장된 기준 패턴 데이터를 검색(단계 1431)한다.Thereafter, the speaker recognition server system 203b searches for reference pattern data stored in the speech information database 207b corresponding to the word identifier and the speech pattern data (step 1431).

그 후, 상기 화자 인식 서버 시스템(203b)은 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 지 여부를 판단(단계 1433)하고, 판단 결과가 긍정이면 상기 네트워크 서버 시스템(201B)에 허가 통지를 송신(단계 1435)하고, 판단 결과가 부정이면 상기 네트워크 서버 시스템(201B)에 거부 통지를 송신(단계 1437)한다. 본 발명의 또 다른 실시예에 따르면 상기 화자 인식 서버 시스템(203b)은 상기 허가 통지 또는 상기 거부 통지를 상기 클라이언트(209b)에 송신한다.Thereafter, the speaker recognition server system 203b determines whether or not the voice pattern data and the reference pattern data match (step 1433). If the result of the determination is affirmative, the speaker server server 203b notifies the network server system 201B of a permission notification. In step 1435, if the determination is negative, a rejection notification is sent to the network server system 201B (step 1437). According to another embodiment of the present invention, the speaker recognition server system 203b transmits the permission notification or the rejection notification to the client 209b.

상기 네트워크 서버 시스템(201b)는 상기 허가 통지를 수신하여 이를 상기 클라이언트(209b)에 송신(단계 1439)하거나, 상기 거부 통지를 수신하여 이를 상기 클라이언트(209b)에 송신(단계 1441)한다.The network server system 201b receives the permission notification and sends it to the client 209b (step 1439), or receives the reject notification and sends it to the client 209b (step 1441).

상기 클라이언트(209b)는 상기 네트워크 서버 시스템(201b)로부터 상기 완료 통지를 수신하여 표시 장치에 표시(단계 1443)하거나 상기 거부 통지를 수신하여 표시 장치에 표시(단계 1445)한다.The client 209b receives the completion notification from the network server system 201b and displays it on the display device (step 1443) or receives the rejection notification and displays it on the display device (step 1445).

본 발명의 바람직한 일실시예에 따른, 네트워크 기반의 화자 학습 장치(도시되지 않음)는 적어도 회원 식별자를 포함하는 소정의 입력 사항을 클라이언트(103)로부터 수신하여 웹 사이트 데이터베이스에 저장하는 수단을 구비한다. 그리고, 상기 장치는 상기 웹 사이트 데이터베이스에 사전 구축된 복수의 단어군 중 한 개의 단어군을 결정하고, 상기 단어군에 상응하는 사전 구축된 단어군 식별자를 상기 입력 사항과 결합하여 상기 웹 사이트 데이터베이스에 저장하는 수단을 구비한다. 여기서, 각 단어군은 복수의 단어들로 구성된다.In accordance with a preferred embodiment of the present invention, a network-based speaker learning apparatus (not shown) is provided with means for receiving from the client 103 a predetermined input including at least a member identifier in a website database. . The apparatus determines one word group among a plurality of word groups pre-built in the web site database, and combines the pre-built word group identifier corresponding to the word group with the inputs to the web site database. Means for storing. Here, each word group is composed of a plurality of words.

또한, 상기 장치는 상기 단어군에 속한 복수의 단어들, 각각의 단어에 상응하는 사전 구축된 단어 식별자 및 클라이언트(103)에서 사용자가 상기 단어들을 발음한 음성 신호들을 전처리(Preprocessing)하여 각각의 단어에 상응하는 화자 성분들을 추출하여 서버(101)에 송신하기 위한 컴포넌트 소프트웨어(component software)를 제2 HTML 문서에 삽입(Embedding)하여 클라이언트(103)에 송신하는 수단을 구비한다. 상기 전처리(Preprocessing) 과정은 앞에서 설명한 바와 같다.In addition, the apparatus preprocesses a plurality of words belonging to the word group, a pre-established word identifier corresponding to each word, and voice signals in which the user pronounces the words in the client 103. Means for extracting the speaker components corresponding to the component component for transmission to the server 101 and embedding the component software in the second HTML document for transmission to the client 103. The preprocessing process is as described above.

그리고 상기 장치는 클라이언트(103)로부터 각각의 단어에 상응하는 단어 식별자들 및 화자 성분들을 수신하는 수단, 상기 화자 성분들을 학습 과정의 수행을 통하여 각각의 단어에 상응하는 기준 패턴 데이터를 추출하는 수단 및 상기 각각의 기준 패턴 데이터에 상응하는 기준 패턴 식별자를 생성한 후, 상기 회원 식별자, 상기 기준 패턴 데이터, 상기 기준 패턴 식별자, 단어 식별자를 음성 정보 데이터베이스에 저장하는 수단을 구비한다.And the apparatus comprises means for receiving word identifiers and speaker components corresponding to each word from the client 103, means for extracting the reference pattern data corresponding to each word by performing the learning process on the speaker components; Means for generating a reference pattern identifier corresponding to each of the reference pattern data and storing the member identifier, the reference pattern data, the reference pattern identifier, and a word identifier in a voice information database.

본 발명의 다른 바람직한 실시예에 따라, 네트워크 기반의 화자 학습 장치는 적어도 회원 식별자를 포함하는 소정의 입력 사항을 클라이언트(103)로부터 수신하여 웹 사이트 데이터베이스에 저장하는 수단을 구비하며, 상기 웹 사이트 데이터베이스에 사전 구축된 복수의 단어군 중 한 개의 단어군을 결정하고, 상기 단어군에 상응하는 사전 구축된 단어군 식별자를 상기 입력 사항과 결합하여 상기 웹 사이트 데이터베이스에 저장하는 수단을 구비한다. 여기서, 각 단어군은 복수의 단어들로 구성된다.According to another preferred embodiment of the present invention, the network-based speaker learning apparatus has means for receiving from the client 103 a predetermined input including at least a member identifier and storing it in a website database, wherein the website database is provided. Means for determining one word group from among a plurality of word groups previously built in, and storing in said web site database a dictionary word group identifier corresponding to said word group in combination with said input. Here, each word group is composed of a plurality of words.

또한 상기 장치는 상기 단어군에 속한 복수의 단어들 및 각각의 단어에 상응하는 사전 구축된 단어 식별자를 제4 HTML 문서에 삽입(Embedding)하여 클라이언트(103)에 송신하는 수단을 구비하며, 클라이언트(103)로부터 각각의 단어에 상응하는 단어 식별자들 및 음성 신호들을 수신하는 수단, 상기 음성 신호들을 전처리(Preprocessing)하여 각각의 단어에 상응하는 화자 성분들을 추출하는 수단을 구비한다. 상기 전처리(Preprocessing) 과정은 앞에서 설명한 바와 같다.The apparatus further includes means for embedding a plurality of words belonging to the word group and a pre-built word identifier corresponding to each word into a fourth HTML document and transmitting the same to the client 103. Means for receiving word identifiers and speech signals corresponding to each word from 103, and means for preprocessing the speech signals to extract speaker components corresponding to each word. The preprocessing process is as described above.

또한, 상기 장치는 상기 화자 성분들을 학습 과정의 수행을 통하여 각각의 단어에 상응하는 기준 패턴 데이터를 추출하는 수단 및 상기 각각의 기준 패턴 데이터에 상응하는 기준 패턴 식별자를 생성한 후, 상기 회원 식별자, 상기 기준 패턴 데이터, 상기 기준 패턴 식별자, 단어 식별자를 음성 정보 데이터베이스(207b)에 저장하는 수단을 구비한다.The apparatus may further include means for extracting reference pattern data corresponding to each word through performing a learning process on the speaker components, generating a reference pattern identifier corresponding to each reference pattern data, and then generating the member identifier, Means for storing the reference pattern data, the reference pattern identifier, and a word identifier in a voice information database 207b.

본 발명의 또 다른 바람직한 실시예에 따라, 네트워크 기반의 화자 확인 장치(도시되지 않음)는 회원 식별자를 클라이언트(103)로부터 수신하는 수단, 상기 회원 식별자에 상응하는 웹 사이트 데이터베이스에 저장된 단어군 식별자를 검색하는 수단, 상기 단어군에 속한 복수의 단어들 중 한 개의 단어를 선택하는 단계를 구비한다.According to another preferred embodiment of the present invention, a network-based speaker identification apparatus (not shown) includes means for receiving a member identifier from the client 103 and a word group identifier stored in a web site database corresponding to the member identifier. Means for searching, selecting a word of a plurality of words belonging to said word group.

그리고 상기 장치는 상기 단어, 상기 단어에 상응하는 단어 식별자 및 사용자가 상기 단어를 발음한 음성 신호를 전처리(Preprocessing)하여 상기 단어에 상응하는 음성 패턴 데이터를 추출하여 서버(101)에 송신하기 위한 컴포넌트 소프트웨어(component software)를 제6 HTML 문서에 삽입(Embedding)하여 클라이언트(103)에 송신하는 수단, 클라이언트(103)로부터 상기 단어에 상응하는 음성 패턴 데이터를 수신하는 수단, 상기 단어 식별자에 상응하는 음성 정보 데이터베이스(207b)에 저장된 기준 패턴 데이터를 검색하는 수단, 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 지 여부를 판단하는 수단, 상기 판단 결과가 긍정이면 클라이언트(103)에 허가 통지를 송신하는 수단 및 상기 판단 결과가 부정이면 클라이언트(103)에 거부 통지를 송신하는 수단을 구비한다.The device is a component for preprocessing the word, a word identifier corresponding to the word, and a voice signal in which the user pronounces the word to extract voice pattern data corresponding to the word and transmit the same to the server 101. Means for embedding component software into a sixth HTML document and transmitting it to the client 103, means for receiving voice pattern data corresponding to the word from the client 103, and voice corresponding to the word identifier Means for retrieving reference pattern data stored in the information database 207b, means for determining whether the voice pattern data and the reference pattern data match, and if the result of the determination is affirmative, sending a permission notification to the client 103. Means and means for sending a rejection notification to the client 103 if the determination result is negative. The.

본 발명의 또 다른 바람직한 실시예에 따라, 네트워크 기반의 화자 확인 장치(도시되지 않음)는 회원 식별자를 클라이언트(103)로부터 수신하는 수단, 상기 회원 식별자에 상응하는 웹 사이트 데이터베이스에 저장된 단어군 식별자를 검색하는 수단, 상기 단어군에 속한 복수의 단어들 중 한 개의 단어를 선택하는 수단, 상기 단어 및 상기 단어에 상응하는 단어 식별자를 제8 HTML 문서에 삽입(Embedding)하여 클라이언트(103)에 송신하는 수단, 클라이언트(103)로부터 상기 단어에 상응하는 음성 신호를 수신하는 수단, 상기 음성 신호를 전처리(Preprocessing)하여 상기 단어에 상응하는 음성 패턴 데이터를 추출하는 수단, 상기 단어 식별자에 상응하는, 음성 정보 데이터베이스(207b)에 저장된 기준 패턴 데이터를 검색하는 수단, 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 지 여부를 판단하는 수단, 상기 판단 결과가 긍정이면 클라이언트(103)에 허가 통지를 송신하는 수단 및 상기 판단 결과가 부정이면 클라이언트(103)에 거부 통지를 송신하는 수단을 구비한다.According to another preferred embodiment of the present invention, a network-based speaker identification apparatus (not shown) includes means for receiving a member identifier from the client 103 and a word group identifier stored in a web site database corresponding to the member identifier. Means for searching, means for selecting one word among a plurality of words belonging to the word group, embedding the word and a word identifier corresponding to the word in an eighth HTML document and transmitting it to the client 103. Means, means for receiving a speech signal corresponding to the word from a client 103, means for preprocessing the speech signal to extract speech pattern data corresponding to the word, speech information corresponding to the word identifier Means for retrieving reference pattern data stored in a database 207b, the speech pattern data and the reference Means for determining whether the turn data matches, means for sending a grant notification to the client 103 if the determination is affirmative, and means for sending a reject notification to the client 103 if the determination is negative. .

본 발명의 또 다른 바람직한 실시예에 따른, 네트워크 서버 시스템(201b)에서의 네트워크 기반의 화자 학습 장치(도시되지 않음)는 적어도 회원 식별자를 포함하는 소정의 입력 사항을 클라이언트(209b)로부터 수신하여 웹 사이트 데이터베이스에 저장하는 수단, 상기 웹 사이트 데이터베이스에 사전 구축된 복수의 단어군 중 한 개의 단어군을 결정하고, 상기 단어군에 상응하는 사전 구축된 단어군 식별자를 상기 입력 사항과 결합하여 상기 웹 사이트 데이터베이스에 저장하는 수단을 구비한다. 여기서 각 단어군은 복수의 단어들로 구성된다.According to another preferred embodiment of the present invention, a network-based speaker learning apparatus (not shown) in the network server system 201b receives a predetermined input from the client 209b including at least a member identifier from the client 209b. Means for storing in a site database, a word group of a plurality of word groups pre-built in the web site database, and combining the input word with a pre-built word group identifier corresponding to the word group Means for storing in a database. Here, each word group is composed of a plurality of words.

또한, 상기 장치는 상기 단어군에 속한 복수의 단어들, 각각의 단어에 상응하는 사전 구축된 단어 식별자 및 클라이언트(209b)에서 사용자가 상기 단어들을 발음한 음성 신호를 전처리(Preprocessing)하여 각각의 단어에 상응하는 화자 성분들을 추출하여 서버에 송신하기 위한 컴포넌트 소프트웨어(component software)를 제10 HTML 문서에 삽입(Embedding)하여 클라이언트(209b)에 송신하는 수단, 클라이언트(209b)로부터 회원 식별자, 상기 각각의 단어에 상응하는 단어 식별자 및 상기 각각의 단어에 상응하는 화자 성분을 수신하여 이들을 화자 인식 서버 시스템(203b)에 송신하는 수단 및 상기 화자 인식 서버 시스템(203b)으로부터 학습 완료 통보를 수신하여 상기 클라이언트(209b)에 송신하는 수단을 구비한다.Further, the apparatus preprocesses a plurality of words belonging to the word group, a pre-built word identifier corresponding to each word, and a voice signal in which the user pronounces the words in the client 209b. Means for embedding component software in the tenth HTML document for transmission and extracting the corresponding speaker components into the tenth HTML document and transmitting the same to the client 209b, a member identifier from the client 209b, and the respective components. Means for receiving a word identifier corresponding to a word and a speaker component corresponding to each word and transmitting them to the speaker recognition server system 203b and receiving a learning completion notification from the speaker recognition server system 203b to receive the client ( Means for transmitting to 209b).

본 발명의 또 다른 바람직한 실시예에 따른, 화자 인식 시스템에서의 네트워크 기반의 화자 학습 장치(도시되지 않음)는 클라이언트(209b)로부터 직접 또는 네트워크 서버 시스템(201b)을 경유하여 입력된 회원 식별자, 각각의 단어에 상응하는 단어 식별자들 및 화자 성분들을 수신하는 수단, 상기 화자 성분들을 학습 과정의 수행을 통하여 각각의 단어에 상응하는 기준 패턴 데이터를 추출하는 수단 및 상기 각각의 기준 패턴 데이터에 상응하는 기준 패턴 식별자를 생성한 후, 상기 회원 식별자, 상기 기준 패턴 데이터, 상기 기준 패턴 식별자, 단어 식별자를 음성 정보 데이터베이스(207b)에 저장하는 수단을 구비한다.According to another preferred embodiment of the present invention, a network-based speaker learning apparatus (not shown) in a speaker recognition system may be a member identifier inputted directly from a client 209b or via a network server system 201b, respectively. Means for receiving word identifiers and speaker components corresponding to the word of, means for extracting the reference component data corresponding to each word by performing the learning process on the speaker components and a reference corresponding to the respective reference pattern data Means for storing the member identifier, the reference pattern data, the reference pattern identifier, and the word identifier in the voice information database 207b after generating the pattern identifier.

본 발명의 또 다른 바람직한 실시예에 따라, 네트워크 서버 시스템(201b)에서의 네트워크 기반의 화자 학습 장치(도시되지 않음)는 적어도 회원 식별자를 포함하는 소정의 입력 사항을 클라이언트(209b)로부터 수신하여 웹 사이트 데이터베이스에 저장하는 수단, 상기 웹 사이트 데이터베이스에 사전 구축된 복수의 단어군 중 한 개의 단어군을 결정하고, 상기 단어군에 상응하는 사전 구축된 단어군 식별자를 상기 입력 사항과 결합하여 상기 웹 사이트 데이터베이스에 저장하는 수단을 구비한다. 여기서 각 단어군은 복수의 단어들로 구성된다.According to another preferred embodiment of the present invention, the network-based speaker learning apparatus (not shown) in the network server system 201b receives a predetermined input from the client 209b including at least a member identifier from the web 209b. Means for storing in a site database, a word group of a plurality of word groups pre-built in the web site database, and combining the input word with a pre-built word group identifier corresponding to the word group Means for storing in a database. Here, each word group is composed of a plurality of words.

또한 상기 장치는 상기 단어군에 속한 복수의 단어들 및 각각의 단어에 상응하는 사전 구축된 단어 식별자를 제12 HTML 문서에 삽입(Embedding)하여 클라이언트(209b)에 송신하는 수단, 클라이언트(209b)로부터 회원 식별자, 상기 각각의 단어에 상응하는 단어 식별자 및 상기 각각의 단어에 상응하는 음성 신호를 수신하여 이들을 화자 인식 서버 시스템(203b)에 송신하는 수단 및 상기 화자 인식 서버 시스템(203b)으로부터 학습 완료 통보를 수신하여 상기 클라이언트(209b)에 송신하는 수단을 구비한다.The apparatus further comprises means for embedding a plurality of words belonging to the word group and a pre-built word identifier corresponding to each word into a twelfth HTML document and transmitting it to the client 209b, from the client 209b. Means for receiving a member identifier, a word identifier corresponding to each word, and a speech signal corresponding to each word and transmitting them to the speaker recognition server system 203b and notification of learning completion from the speaker recognition server system 203b. Means for receiving and transmitting to the client 209b.

본 발명의 또 다른 바람직한 실시예에 따른, 화자 인식 시스템에서의 네트워크 기반의 화자 학습 장치(도시되지 않음)는 클라이언트(209b)로부터 직접 또는 네트워크 서버 시스템(201b)을 경유하여 입력된 회원 식별자, 각각의 단어에 상응하는 단어 식별자들 및 음성 신호들을 수신하는 수단, 상기 음성 신호들을 전처리(Preprocessing)하여 각각의 단어에 상응하는 화자 성분들을 추출하는 수단, 상기 화자 성분들을 학습 과정의 수행을 통하여 각각의 단어에 상응하는 기준 패턴 데이터를 추출하는 수단 및 상기 각각의 기준 패턴 데이터에 상응하는 기준 패턴 식별자를 생성한 후, 상기 회원 식별자, 상기 기준 패턴 데이터, 상기 기준 패턴 식별자, 단어 식별자를 음성 정보 데이터베이스(207b)에 저장하는 수단을 구비한다.According to another preferred embodiment of the present invention, a network-based speaker learning apparatus (not shown) in a speaker recognition system may be a member identifier inputted directly from a client 209b or via a network server system 201b, respectively. Means for receiving word identifiers and speech signals corresponding to the word of, means for preprocessing the speech signals to extract speaker components corresponding to each word, and performing the learning process on the speaker components, respectively. Means for extracting reference pattern data corresponding to a word, and generating a reference pattern identifier corresponding to each reference pattern data, and then converting the member identifier, the reference pattern data, the reference pattern identifier, and a word identifier into a voice information database. Means for storing in 207b).

본 발명의 또 다른 바람직한 실시예에 따른, 네트워크 서버 시스템(201b)에서의 네트워크 기반의 화자 확인 장치(도시되지 않음)는 회원 식별자를 클라이언트(209b)로부터 수신하는 수단, 상기 회원 식별자에 상응하는 웹 사이트 데이터베이스에 저장된 단어군 식별자를 검색하는 수단, 상기 단어군에 속한 복수의 단어들 중 한 개의 단어를 선택하는 수단, 상기 단어, 상기 단어에 상응하는 단어 식별자 및 클라이언트(209b)에서 사용자가 상기 단어를 발음한 음성 신호를 전처리(Preprocessing)하여 상기 단어에 상응하는 음성 패턴 데이터를 추출하여 상기 네트워크 서버 시스템(201b)에 송신하기 위한 컴포넌트 소프트웨어(component software)를 제14 HTML 문서에 삽입(Embedding)하여 상기 클라이언트(209b)에 송신하는 수단, 상기 클라이언트(209b)로부터 음성 패턴 데이터를 수신하여, 상기 음성 패턴 데이터와 상기 회원 식별자 및 상기 단어 식별자를 화자 인식 서버 시스템(203b)에 송신하는 수단, 상기 화자 인식 서버 시스템(203b)으로부터 허가 통지를 수신하여 이를 상기 클라이언트(209b)에 송신하는 수단 및 상기 화자 인식 서버 시스템(203b)으로부터 거부 통지를 수신하여 이를 상기 클라이언트(209b)에 송신하는 수단을 구비한다.According to another preferred embodiment of the present invention, a network-based speaker identification apparatus (not shown) in a network server system 201b is provided with means for receiving a member identifier from a client 209b, a web corresponding to the member identifier. Means for retrieving a word family identifier stored in a site database, means for selecting one word among a plurality of words belonging to the word family, the word, a word identifier corresponding to the word, and the user at the client 209b. Preprocessing a speech signal having a pronounced sound, and extracting the speech pattern data corresponding to the word, and embedding component software for transmission to the network server system 201b in a fourteenth HTML document. Means for transmitting to the client 209b, and voice pattern data from the client 209b. Means for receiving and transmitting the speech pattern data, the member identifier and the word identifier to the speaker recognition server system 203b, receiving a permission notification from the speaker recognition server system 203b and transmitting it to the client 209b. Means for receiving a rejection notification from the speaker recognition server system 203b and transmitting it to the client 209b.

본 발명의 또 다른 바람직한 실시예에 따른, 화자 인식 시스템에서의 네트워크 기반의 화자 확인 장치(도시되지 않음)는 클라이언트(209b)로부터 직접 또는 네트워크 서버 시스템(201b)을 경유하여 입력된 회원 식별자, 단어 식별자 및 음성 패턴 데이터를 수신하는 수단, 상기 단어 식별자에 상응하는 음성 정보 데이터베이스(207b)에 저장된 기준 패턴 데이터를 검색하는 수단, 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 지 여부를 판단하는 수단, 상기 판단 결과가 긍정이면 상기 클라이언트(209b) 또는 상기 네트워크 서버 시스템(201b)에 허가 통지를 송신하는 수단 및 상기 판단 결과가 부정이면 상기 클라이언트(209b) 또는 상기 네트워크 서버 시스템(201b)에 거부 통지를 송신하는 수단을 구비한다.According to another preferred embodiment of the present invention, a network-based speaker identification apparatus (not shown) in a speaker recognition system may include a member identifier, a word input directly from a client 209b or via a network server system 201b. Means for receiving an identifier and speech pattern data, means for retrieving reference pattern data stored in a speech information database 207b corresponding to the word identifier, and means for determining whether the speech pattern data and the reference pattern data match. Means for sending an authorization notification to the client 209b or the network server system 201b if the determination result is affirmative, and a rejection notification to the client 209b or the network server system 201b if the determination result is negative. Means for transmitting.

본 발명의 또 다른 바람직한 실시예에 따른, 네트워크 서버 시스템(201b)에서의 네트워크 기반의 화자 확인 장치(도시되지 않음)는 회원 식별자를 클라이언트(209b)로부터 수신하는 수단, 상기 회원 식별자에 상응하는 웹 사이트 데이터베이스에 저장된 단어군 식별자를 검색하는 수단, 상기 단어군에 속한 복수의 단어들 중 한 개의 단어를 선택하는 수단, 상기 단어 및 상기 단어에 상응하는 단어 식별자를 제16 HTML 문서에 삽입(Embedding)하여 상기 클라이언트(209b)에 송신하는 수단, 상기 클라이언트(209b)로부터 음성 신호를 수신하여, 상기 음성 신호와 상기 회원 식별자 및 상기 단어 식별자를 화자 인식 서버 시스템(203b)에 송신하는 수단, 상기 화자 인식 서버 시스템(203b)으로부터 허가 통지를 수신하여 이를 상기 클라이언트(209b)에 송신하는 수단 및 상기 화자 인식 서버 시스템(203b)으로부터 거부 통지를 수신하여 이를 상기 클라이언트(209b)에 송신하는 수단을 구비한다.According to another preferred embodiment of the present invention, a network-based speaker identification apparatus (not shown) in a network server system 201b is provided with means for receiving a member identifier from a client 209b, a web corresponding to the member identifier. Means for retrieving a word group identifier stored in a site database, means for selecting one word among a plurality of words belonging to the word group, and embedding the word and a word identifier corresponding to the word in a sixteenth HTML document. Means for receiving the voice signal from the client 209b, transmitting the voice signal, the member identifier and the word identifier to the speaker recognition server system 203b, and the speaker recognition. Means for receiving an authorization notice from server system 203b and sending it to the client 209b; and Party receiving the rejection notice from the recognition server system (203b) includes means for transmitting to said client (209b).

본 발명의 또 다른 바람직한 실시예에 따른, 화자 인식 시스템에서의 네트워크 기반의 화자 확인 장치(도시되지 않음)는 클라이언트(209b)로부터 직접 또는 네트워크 서버 시스템(201b)을 경유하여 입력된 회원 식별자, 단어 식별자 및 음성신호를 수신하는 수단, 상기 음성 신호를 전처리(Preprocessing)하여 상기 단어에 상응하는 음성 패턴 데이터를 추출하는 수단, 상기 단어 식별자에 상응하는 음성 정보 데이터베이스(207b)에 저장된 기준 패턴 데이터를 검색하는 수단, 상기 음성 패턴 데이터와 상기 기준 패턴 데이터가 일치하는 지 여부를 판단하는 수단, 상기 판단 결과가 긍정이면 상기 클라이언트(209b) 또는 상기 네트워크 서버 시스템(201b)에 허가 통지를 송신하는 수단 및 상기 판단 결과가 부정이면 상기 클라이언트(209b) 또는 상기 네트워크 서버 시스템(201b)에 거부 통지를 송신하는 수단을 구비한다.According to another preferred embodiment of the present invention, a network-based speaker identification apparatus (not shown) in a speaker recognition system may include a member identifier, a word input directly from a client 209b or via a network server system 201b. Means for receiving an identifier and a speech signal, means for preprocessing the speech signal to extract speech pattern data corresponding to the word, and searching for reference pattern data stored in the speech information database 207b corresponding to the word identifier. Means for determining whether or not the voice pattern data and the reference pattern data match, and if the determination result is affirmative, means for transmitting a permission notification to the client 209b or the network server system 201b, and If the determination is negative, the client 209b or the network server system. And a means for sending a rejection notice to (201b).

전자 상거래 결재의 경우, 사용자의 신분 확인시 신분 확인이 본 발명에 따른 화자 확인 방법에 의한 경우, 그 발음 성분에 상응하는 음성 정보를 음성 정보 데이터에 저장하고, 만약, 상기 사용자가 거래의 이의를 제기할 경우 상기 발음 성분을 복원하여 상기 사용자에게 들려줌으로써, 거래의 활성화를 막는 요인을 제거할 수 있다. 여기서, 상기 발음 성분은 사용자가 복수의 주문을 할 경우 상기 각각의 주문에 상응하는 것이다.In the case of electronic commerce payment, if the identification is confirmed by the speaker according to the present invention, the identification information according to the present invention stores the voice information corresponding to the pronunciation component in the voice information data. In case of filing, the user can restore the pronunciation component to the user and remove the factor preventing the activation of the transaction. Here, the pronunciation component corresponds to each order when the user makes a plurality of orders.

본 발명에 따른 네트워크 기반의 화자 학습 방법 및 화자 확인 방법은 웹 사이트 데이터베이스, 음성 신호 데이터베이스 및 프로그램이 저장되어 있는 메모리와 상기 메모리에 결합되어 상기 프로그램을 실행하는 프로세서를 포함하는 시스템에 의하여 실현될 수 있다. 여기서, 상기 프로세서는 상기 프로그램에 의해, 각각의 단계들을 포함하는 방법을 실행할 수 있다.The network-based speaker learning method and speaker identification method according to the present invention can be realized by a system including a web site database, a voice signal database and a memory in which a program is stored, and a processor coupled to the memory to execute the program. have. Here, the processor may execute a method including the respective steps by the program.

본 발명은 상기 실시예에 한정되지 않으며, 많은 변형이 본 발명의 사상 내에서 당 분야에서 통상의 지식을 가진 자에 의하여 가능함은 물론이다.The present invention is not limited to the above embodiments, and many variations are possible by those skilled in the art within the spirit of the present invention.

이상에서 상술한 바와 같이, 본 발명은 네트워크 기반의 화자 학습 및 화자 확인 방법 및 장치에 관한 것으로, 특히 화자의 개인 특성을 잘 나타내는 단어들로 구성된 단어군을 이용한 문자 지시형 화자 학습 및 화자 확인 방법 및 장치를 제공할 수 있다.As described above, the present invention relates to a network-based speaker learning and speaker identification method and apparatus, and in particular, a letter-directed speaker learning and speaker identification method using a word group composed of words representing personal characteristics of the speaker well. And an apparatus.

본 발명에 의하여, 인터넷을 이용하여 정보를 제공하는 사이트에 로그인하기 위하여 별도의 패스워드를 사용하지 않고, 각 개인이 가지고 있는 고유한 특성 중 기본적인 컴퓨터 환경 이외의 부가적인 장비가 설치될 필요없이 각 개인의 음성 특성을 이용한 화자 학습 및 화자 확인 방법 및 장치를 제공할 수 있다.According to the present invention, without using a separate password to log in to a site providing information using the Internet, each individual without having to install additional equipment other than the basic computer environment among the unique characteristics each individual has A speaker learning and speaker identification method and apparatus using the voice characteristic of the present invention can be provided.

본 발명에 의하여, 화자 인식률을 높이고 녹취로 인한 사칭 방비를 위해, 화자의 성별, 연령 등에 따라 화자의 개인 특성을 잘 나타내는 단어들로 구성된 단어군을 이용한 문자 지시형 화자 학습 및 화자 확인 방법 및 장치를 제공할 수 있다.According to the present invention, a method and apparatus for character-directed speaker learning and speaker identification using a word group composed of words well representing the speaker's personal characteristics according to the speaker's gender and age, in order to improve speaker recognition and impersonate defense due to recording. Can be provided.

또한, 본 발명에 의하여, 인터넷 상에서 동시 사용자수를 예측할 수 없는 관계로 동시에 많은 사용자가 시스템에 접속하였을 경우라도 부하가 한 곳에 집중되는 것을 방지하기 위하여 분산 설계된 화자 학습 및 화자 확인 방법 및 장치를 제공할 수 있다.In addition, the present invention provides a method and apparatus for distributed speaker design and speaker identification designed to prevent the load from being concentrated even when many users are connected to the system because the number of simultaneous users cannot be predicted on the Internet. can do.

그리고, 본 발명에 의하여, 복수의 단어들을 학습하고, 화자 확인 시마다 시간, 화자의 상태, 접속 장소 등을 고려하여 학습한 단어들 중에서 한 단어를 선정하여 화자 확인을 할 수 있는 화자 학습 및 화자 확인 방법 및 장치를 제공할 수 있다.그리고, 본 발명에 의하여, 전자 상거래에서 고객이 일정한 주문을 하여 계약이 성립된 경우에, 상기 주문에 상응하는 음성 신호를 입력받아 이를 처리한 후 저장함으로써, 상기 고객이 계약 성립 또는 주문한 사실을 부지하거나 부인하는 경우에 상기 음성 신호를 복원하여 고객에게 들려줌으로써 저자 상거래의 안정성을 고취할 수 있는 음성 복원 방법을 제공할 수 있다.And, according to the present invention, learn a plurality of words, speaker learning and speaker identification that can be confirmed by the speaker by selecting one word from the words learned in consideration of the time, speaker status, connection location, etc. each time the speaker is confirmed According to the present invention, when a contract is established by a customer placing a certain order in an electronic commerce, the voice signal corresponding to the order is received, processed and stored therein, When the customer knows whether the contract is established or ordered, the voice signal may be restored to the customer and provided to the customer, thereby providing a voice restoration method capable of enhancing the stability of the author commerce.

Claims

In a network-based user authentication method for authenticating a user connected using a microphone at a server,

Registering the user by storing user identification information together with user voice characteristic information obtained by analyzing the voice signal of the user;

Receiving a user authentication request from an authentication requester directly or indirectly through the network;

Selecting and transmitting a sentence or sentence corresponding to the user's state from a pre-built word group or sentence group to the authentication requester;

Receiving a voice signal of the authentication requestor for the word or sentence or voice characteristic information of the authentication requester extracted from the voice signal through a network; And

Determining whether the voice signal or voice property information of an authentication requester extracted from the voice signal is input by the user using the registered user voice property information.

User authentication method using a network-based word group or sentence group in the server comprising a.

In the network-based speaker learning method for authenticating a user connected using a microphone in a server,

Receiving from the client predetermined input including at least a member identifier and storing it in a website database;

Determine a word group among a plurality of word groups pre-built in the web site database, wherein each word group consists of a plurality of words, and input a pre-built word group identifier corresponding to the word group Storing the data in the web site database in combination with the details;

Preprocessing a plurality of words belonging to the word group, a pre-established word identifier corresponding to each word, and voice signals in which a user corresponding to the client pronounces the words in a client to correspond to the respective words Extracting speaker components to be transmitted to the client by embedding component software in a first document for transmission to a server;

Receiving word identifiers and speaker components corresponding to each word from the client;

Extracting reference pattern data corresponding to each word by performing a learning process on the speaker components; And

Generating a reference pattern identifier corresponding to each of the reference pattern data, and then storing the member identifier, the reference pattern data, the reference pattern identifier, and a word identifier in a voice information database. Speaker learning method using network-based word group for user authentication.

The method of claim 3,

Speaker learning method using a network-based word group for user authentication in the server

Receiving a connection request signal from the client, and transmitting a second document in a form of a form for inputting a predetermined input item to the user to the client;

Receiving the predetermined input from the client and storing it in the web site database;

Determining whether there is a speaker component for which a learning process is not performed;

If there is a speaker component for which a learning process is not performed as a result of the determination, shifting to extracting the reference pattern data; And

If there is no speaker component for which the learning process has not been performed as a result of the determination, a speaker learning method using a network-based word group for user authentication in a server, further comprising transmitting a completion notification to the client. .

The method of claim 3,

The predetermined input items are a member identifier, a name, a gender, an address, a social security number, an e-mail address, and the speaker learning method using a network-based word group for user authentication in a server.

The method of claim 5,

The determining of the one word group corresponds to at least one of a gender, an address, and an e-mail address. The speaker learning method using a network-based word group for user authentication in a server.

It can be executed by the digital processing apparatus to perform the speaker learning method using the network-based word group in the client corresponding to performing the network-based speaker learning method for authenticating the connected user using the microphone in the server. A recording medium in which a program of instructions is tangibly embodied, and which can be read by a digital processing apparatus,

Speaker learning method using a network-based word group in the client,

Preprocessing words belonging to a specific word group from the server, a pre-built word identifier corresponding to each word, and a voice signal in which the user corresponding to the client pronounces the words. Receiving a first document embedding component software for extracting speaker components corresponding to a word and transmitting the same to a server;

Displaying the first document on a display device;

Receiving a voice signal corresponding to each word from the user;

Extracting a speaker component from the speech signal corresponding to each word; And

And sending the word identifier and speaker components corresponding to each word to the server.

The method of claim 7, wherein

Speaker learning method using the network-based word group in the client

Transmitting a connection request signal input by the user by an input means to the server;

Receiving a second document in the form of a form from which the user can input a predetermined input item from the server and displaying the second document on the display device;

Receiving the predetermined input item from the user and transmitting the predetermined input item to the server; And

And receiving the speaker learning completion notification from the server and displaying the speaker learning completion notice on the display device.

The method of claim 7, wherein

And the predetermined input item is a member identifier, a name, a gender, an address, a social security number, and an e-mail address.

The method of claim 7, wherein

Speaker learning method using the network-based word group in the client

And converting the voice signal into a digital signal. 10. The recording medium of claim 1, further comprising: converting the voice signal to a digital signal.

Embedding a plurality of words belonging to the word group and a pre-built word identifier corresponding to each word into a first document and transmitting the same to the client;

Receiving word identifiers and voice signals corresponding to the respective words from the client;

Preprocessing the speech signals to extract speaker components corresponding to each word;

The method of claim 11,

The predetermined input items are a member identifier, a name, a gender, an address, a social security number, and an e-mail address.

The method of claim 13,

Determining the one word group is determined in correspondence with at least one of gender, address, and e-mail address. Speaker learning method using a network-based word group for user authentication in the server.

It can be executed by the digital processing apparatus to perform the speaker learning method using the network-based word group in the client corresponding to performing the network-based speaker learning method for authenticating the connected user using the microphone at the server. A recording medium in which a program of instructions is tangibly embodied, and which can be read by a digital processing apparatus,

Speaker learning method using a network-based word group in the client,

Receiving, from the server, a first document in which words belonging to a specific word group and a pre-built word identifier corresponding to each word are embedded;

Displaying the first document on a display device;

Receiving a voice signal corresponding to each word from a user; And

And transmitting word identifiers and voice signals corresponding to the respective words to the server.

The method of claim 15,

Speaker learning method using the network-based word group in the client

The method of claim 15,

Speaker learning method using the network-based word group in the client

In the network-based speaker identification method for authenticating a user connected using a microphone in the server,

Receiving a member identifier from a client;

Retrieving a word group identifier previously stored in a web site database corresponding to the member identifier;

Selecting one word among a plurality of words belonging to a word group corresponding to the word group identifier;

A first component software for preprocessing the word, the word identifier corresponding to the word, and a voice signal in which the user pronounces the word to extract a voice signal corresponding to the word and transmit it to a server Embedding in a document and sending it to the client;

Receiving a voice signal corresponding to the word from the client;

Extracting voice pattern data corresponding to the word identifier by inputting the voice signal;

Retrieving reference pattern data stored in a voice information database corresponding to the word identifier;

Determining whether the voice pattern data and the reference pattern data match;

Sending a permission notification to the client if the voice pattern data and the reference pattern data match as a result of the determination; And

And if the voice pattern data and the reference pattern data do not match, transmitting a rejection notification to the client, wherein the speaker check method using the network-based word group for user authentication in the server. .

The method of claim 19,

Speaker verification method using a network-based word group for user authentication in the server

Receiving a connection request signal from the client, and transmitting to the client a second document of the form (Form) form for inputting the member identifier to the user, characterized in that for the user authentication in the server Speaker identification method using network based word group.

The method of claim 19,

Speaker selection method using a network-based word group for user authentication in the server, characterized in that the selection of the single word random (random).

The method of claim 19,

The selecting of the single word may include selecting one of a plurality of words belonging to the word group in consideration of the time at the speaker confirmation, the speaker's state, and the connection location. being-

Speaker verification method using network-based word group for user authentication on server.

It can be executed by the digital processing apparatus to perform the speaker identification method using the network-based word group in the client corresponding to performing the network-based speaker identification method for authenticating the connected user using the microphone at the server. A recording medium in which a program of instructions is tangibly embodied, and which can be read by a digital processing apparatus,

Speaker verification method using a network-based word group in the client,

Transmitting a connection request signal input by a user by an input means to the server;

Receiving, from the server, a second document in the form of a form capable of inputting a member identifier from the server to the display device;

Transmitting the member identifier input by the user by an input means to the server;

Preprocessing a word corresponding to the member identifier, a word identifier corresponding to the word, and a voice signal in which the user pronounces the word from the server to extract a speaker component corresponding to the word and transmit the same to the server Receiving a first document in which component software is embedded;

Displaying the first document on the display device;

Receiving a voice signal corresponding to the word from the user;

Extracting a speaker component from a speech signal corresponding to the word;

Sending the speaker component to the server;

If the permission notification is received from the server, displaying the permission notification on the display device; And

And displaying the rejection notice on the display device when receiving the rejection notice from the server.

The method of claim 23, wherein

Speaker verification method using the network-based word group in the client

Receiving a member identifier from a client;

Retrieving a word group identifier stored in a web site database corresponding to the member identifier;

Embedding the word and a word identifier corresponding to the word in a first document and transmitting it to the client;

Receiving a voice signal corresponding to the word from the client;

Preprocessing the voice signal to extract a speaker component corresponding to the word;

Extracting voice pattern data corresponding to the word by using the speaker component as an input;

The method of claim 25,

Receiving a connection request signal from the client, and sending a second document in the form (Form) form for inputting a member identifier to the user to the client, characterized in that the user authentication at the server Speaker identification method using network-based word group.

The method of claim 25,

The network-based speaker identification method using the word group in the client,

Transmitting the member identifier input by the user by the input means to the server;

Receiving a first document in which the word corresponding to the member identifier and the word identifier corresponding to the word are embedded from the server;

Displaying the first document on a display device;

Receiving a voice signal corresponding to the word from the user;

Transmitting the voice signal to the server;

The method of claim 29,

Speaker verification method using the network-based word group in the client

In the network-based speaker learning method for authenticating a user connected by using a microphone of a network server system among server systems,

Pre-processing a plurality of words belonging to the word group, a pre-built word identifier corresponding to each word, and a voice signal in which the user pronounces the words at the client to extract speaker components corresponding to each word Embedding component software for transmission to the server into a first document and transmitting it to the client;

Receiving a member identifier, a word identifier corresponding to each word, and a speaker component corresponding to each word from the client and transmitting them to a speaker recognition server system; And

And receiving a learning completion notification from the speaker recognition server system and transmitting the notification to the client. The speaker learning method using a network-based word group for user authentication in a network server system among the server systems.

The method of claim 31, wherein

Speaker learning method using a network-based word group for user authentication in a network server system of the server system

Receiving a connection request signal from the client and transmitting a second document in a form of a form capable of inputting a predetermined input item to the user to the client; And

Receiving the predetermined input from the client and storing the predetermined input item in the web site database; a speaker learning method using a network-based word group for user authentication in a network server system among server systems .

The method of claim 31, wherein

The predetermined input items are a member identifier, a name, a gender, an address, a social security number, and an e-mail address. The speaker learning method using a network-based word group for user authentication in a network server system among server systems.

The method of claim 33, wherein

Determining the single word group corresponds to at least one of a gender, an address, and an e-mail address. A speaker learning method using a network-based word group for user authentication in a network server system among server systems .

In the network-based speaker learning method for authenticating a connected user using a microphone in a speaker recognition system among server systems,

Receiving a member identifier, word identifiers corresponding to each word, and speaker components entered directly from the client or via a network server system;

Generating a reference pattern identifier corresponding to each of the reference pattern data, and then storing the member identifier, the reference pattern data, the reference pattern identifier, and a word identifier in a voice information database. Speaker learning method using network-based word group for user authentication in Chinese speaker recognition server system.

36. The method of claim 35 wherein

Speaker learning method using a network-based word group for user authentication in a speaker recognition server system of the server system

If there is no speaker component for which the learning process has not been performed as a result of the determination, transmitting a completion notification to the client or the network server system. Speaker learning method using network-based word group for authentication.

The speaker learning method using a network-based word group in a client is equivalent to performing a network-based speaker learning method for authenticating a connected user using a microphone in a network server system and a speaker recognition server system among server systems. A program of instructions that can be executed by a digital processing device to perform is tangibly embodied, and in a recording medium that can be read by a digital processing device,

Speaker learning method using a network-based word group in the client,

Pre-processing words belonging to a specific word group from the network server system, a pre-built word identifier corresponding to each word, and a voice signal in which the user pronounces the words in the client corresponding to each word Receiving a first document embedding component software for extracting speaker components and transmitting the same to the server;

Displaying the first document on a display device;

Receiving a voice signal corresponding to each word from the user;

Transmitting the member identifier, word identifiers corresponding to each word, and speaker components to the network server system or the speaker recognition server system.

The method of claim 37,

Speaker learning method using the network-based word group in the client

Transmitting a connection request signal input by the user by an input means to the network server system;

Receiving a second document in the form of a form from which the user can input predetermined input from the network server system and displaying the second document on the display device;

Receiving the predetermined input from the user and transmitting the predetermined input to the network server system; And

And receiving the speaker learning completion notification from the network server system or the speaker recognition server system and displaying the speaker learning completion notification on the display device.

The method of claim 37,

Speaker learning method using the network-based word group in the client

Receiving a member identifier, a word identifier corresponding to each word, and a voice signal corresponding to each word from the client and transmitting the same to the speaker recognition server system; And

The method of claim 41, wherein

The method of claim 43,

Receiving a member identifier, word identifiers corresponding to each word, and voice signals input directly from the client or via a network server system;

Preprocessing the voice signals to extract speaker components corresponding to each word;

The method of claim 45,

Speaker learning method using a network-based word group in the client,

Receiving, from the network server system, a first document in which words belonging to a specific word group and a pre-built word identifier corresponding to each word are embedded;

Displaying the first document on a display device;

Receiving a voice signal corresponding to each word from a user; And

Transmitting the member identifier, word identifiers corresponding to each word, and voice signals to the network server system or the speaker recognition server system.

The method of claim 47,

Speaker learning method using the network-based word group in the client

The method of claim 47,

Speaker learning method using the network-based word group in the client

In the network-based speaker identification method for authenticating a user connected by using the microphone of the network server system of the server system,

Receiving a member identifier from a client;

Selecting one word among a plurality of words belonging to the word group;

A component software for preprocessing the word, a word identifier corresponding to the word, and a voice signal in which the user pronounces the word in the client, extracting a speaker component corresponding to the word and transmitting it to the network server system ( embedding component software into a first document and transmitting the same to the client;

Receiving a speaker component from the client, and transmitting the speaker component, the member identifier, and the word identifier to a speaker recognition server system;

Receiving a permission notification from the speaker recognition server system and transmitting it to the client; And

And receiving the rejection notification from the speaker recognition server system and transmitting the same to the client.

The method of claim 51,

Speaker verification method using a network-based word group for user authentication in a network server system

Receiving a connection request signal from the client, the network server system of the server system, characterized in that it further comprises the step of transmitting to the client a second document in the form (Form) form Speaker Verification Method Using Network based Word Group for User Authentication in.

The method of claim 51,

The method of claim 1, wherein selecting one word is randomly selected.

The method of claim 51,

Speaker verification method using network based word group for user authentication in network server system among server systems.

In the network-based speaker identification method for authenticating a connected user using a microphone in a speaker recognition system of the server system,

Receiving a member identifier, a word identifier and a speaker component entered directly from a client or via a network server system;

Extracting speech pattern data corresponding to the word identifier by inputting the speaker component;

Transmitting a permission notification to the client or the network server system when the voice pattern data and the reference pattern data match as a result of the determination; And

And if the voice pattern data and the reference pattern data do not match, transmitting a rejection notification to the client or the network server system. Speaker verification method using network-based word group.

A speaker identification method using a network-based word group in a client corresponding to performing a network-based speaker identification method for authenticating a connected user using a microphone in a network server system and a speaker recognition server system among server systems is described. A program of instructions that can be executed by a digital processing device to perform is tangibly embodied, and in a recording medium that can be read by a digital processing device,

Speaker verification method using a network-based word group in the client,

Transmitting a connection request signal input by a user by an input means to the network server system;

Receiving a second document in a form of a form capable of inputting a member identifier from the network server system and displaying the second document on a display device;

Transmitting the member identifier input by the user by the input means to the network server system;

Preprocessing the word corresponding to the member identifier, the word identifier corresponding to the word, and the voice signal of the user pronouncing the word from the network server system to extract the speaker component corresponding to the word and the network server Receiving a first document embedding component software for transmission to a system or said speaker recognition server system;

Displaying the first document on the display device;

Receiving a voice signal corresponding to the word from the user;

Extracting a speaker component from a speech signal corresponding to the word;

Sending the speaker component to the network server system or the speaker recognition server system;

Displaying the permission notification on the display device when receiving the permission notification from the network server system or the speaker recognition server system; And

And displaying the rejection notification on the display device when receiving the rejection notification from the network server system or the speaker recognition server system.

The method of claim 56, wherein

Speaker verification method using the network-based word group in the client

Receiving a member identifier from a client;

Receiving a voice signal from the client and transmitting the voice signal, the member identifier and the word identifier to a speaker recognition server system;

The method of claim 58,

The method of claim 1, wherein selecting one word is randomly selected.

The method of claim 58,

The selecting of one word may include selecting one of a plurality of words belonging to the word group in consideration of a time of speaker identification, a speaker state, and a connection location. Speaker verification method using network-based word group for user authentication.

Receiving a member identifier, a word identifier and a voice signal input directly from the client or via a network server system;

Transmitting a permission notification to the client or the network server system if the voice pattern data and the reference pattern data match as a result of the determination; And

And if the voice pattern data and the reference pattern data do not match, sending a rejection notification to the client or the network server system. Speaker identification method using network-based word group.

Speaker verification method using a network-based word group in the client,

Receiving a first document in which the word corresponding to the member identifier and the word identifier corresponding to the word are embedded from the network server system;

Displaying the first document on the display device;

Receiving a voice signal corresponding to the word from the user;

Transmitting the voice signal to the network server system or the speaker recognition server system;

The method of claim 63, wherein

Speaker verification method using the network-based word group in the client

In the network-based user authentication device for authenticating a user connected using a microphone,

Means for registering the user by storing the user identification information together with the user voice characteristic information obtained by analyzing the user's voice signal;

Means for receiving a user authentication request from an authentication requester directly or indirectly through the network;

Selecting a word or sentence corresponding to the user's state from a pre-built word group or sentence group and transmitting the selected word or sentence to the authentication requester;

Means for receiving a voice signal of the authentication requestor for the word or sentence or voice characteristic information of the authentication requester extracted from the voice signal through the network;

Means for determining whether the voice signal or voice property information of an authentication requester extracted from the voice signal is input by the user using the registered user voice property information.

User authentication device using a network-based word group or sentence group having a.

66. The method of claim 65,

And the word is selected based on personal information of the user registered in advance corresponding to the authentication requestor.

In the network-based speaker learning apparatus for authenticating a user connected using a microphone,

Means for receiving from the client predetermined input including at least a member identifier and storing it in a website database;

Determine a word group among a plurality of word groups pre-built in the web site database, wherein each word group consists of a plurality of words, and input a pre-built word group identifier corresponding to the word group Means for storing in the web site database in conjunction with the details;

Preprocessing a plurality of words belonging to the word group, a pre-established word identifier corresponding to each word, and voice signals in which the user corresponding to the client pronounces the words in the client corresponding to each word Means for extracting speaker components and embedding component software in a first document for transmission to a server and transmitting the same to the client;

Means for receiving word identifiers and speaker components corresponding to each word from the client;

Means for extracting reference pattern data corresponding to each word by performing a learning process on the speaker components; And

And means for storing the member identifier, the reference pattern data, the reference pattern identifier, and a word identifier in a voice information database after generating a reference pattern identifier corresponding to each of the reference pattern data. Speaker learning device using network-based word group.

Means for embedding a plurality of words belonging to the word group and a pre-built word identifier corresponding to each word into a first document and transmitting the same to the client;

Means for receiving word identifiers and voice signals corresponding to each word from the client;

Means for preprocessing the speech signals to extract speaker components corresponding to each word;

In the network-based speaker identification device for authenticating a user connected using a microphone,

Means for receiving a member identifier from a client;

Means for retrieving a word-group identifier stored in a web site database corresponding to the member identifier;

A first component software for preprocessing the word, the word identifier corresponding to the word, and a voice signal in which the user pronounces the word, extracts a speaker component corresponding to the word, and transmits it to a server Means for embedding in a document and sending it to the client;

Means for receiving a speaker component corresponding to the word from the client;

Means for extracting speech pattern data corresponding to the word by inputting the speaker component;

Means for retrieving reference pattern data stored in a voice information database corresponding to the word identifier;

Means for determining whether the speech pattern data and the reference pattern data match;

Means for sending a permission notification to the client if the voice pattern data and the reference pattern data match as a result of the determination; And

And a means for transmitting a rejection notification to the client if the voice pattern data and the reference pattern data do not match, as a result of the determination.

Means for receiving a member identifier from a client;

Means for selecting one word among a plurality of words belonging to a word group corresponding to the word group identifier;

Means for embedding the word and a word identifier corresponding to the word in a first document to transmit to the client;

Means for receiving a voice signal corresponding to the word from the client;

Means for preprocessing the speech signal to extract a speaker component corresponding to the word;

A network-based speaker learning apparatus for authenticating a user who is connected by using a microphone of a network server system among server systems,

Pre-processing a plurality of words belonging to the word group, a pre-built word identifier corresponding to each word, and a voice signal in which the user pronounces the words at the client to extract speaker components corresponding to each word Means for embedding component software in a first document for transmission to a server to transmit to the client;

Means for receiving a member identifier, a word identifier corresponding to each word, and a speaker component corresponding to each word from the client and transmitting them to a speaker recognition server system; And

And a means for receiving a learning completion notification from the speaker recognition server system and transmitting the learning completion notification to the client. The apparatus for learning a speaker using a network-based word group for user authentication in a network server system.

A network-based speaker learning apparatus for authenticating a user who is connected by using a microphone of a speaker recognition system among server systems,

Means for receiving a member identifier, word identifiers corresponding to each word, and speaker components entered directly from the client or via a network server system;

Means for generating a reference pattern identifier corresponding to each of the reference pattern data and storing the member identifier, the reference pattern data, the reference pattern identifier, and a word identifier in a voice information database. Speaker learning apparatus using network-based word group for user authentication in Chinese speaker recognition server system.

Means for receiving a member identifier, a word identifier corresponding to each word and a voice signal corresponding to each word from the client and transmitting them to a speaker recognition server system; And

Means for receiving a member identifier, word identifiers corresponding to each word, and voice signals input directly from the client or via a network server system;

In the network-based speaker identification device for authenticating a user connected by using the microphone of the network server system of the server system,

Means for receiving a member identifier from a client;

A component software for preprocessing the word, a word identifier corresponding to the word, and a voice signal in which the user pronounces the word in the client, extracting a speaker component corresponding to the word and transmitting it to the network server system ( means for embedding component software into a first document and sending it to the client;

Means for receiving the speaker component from the client and transmitting the speaker component, the member identifier and the word identifier to a speaker recognition server system;

Means for receiving a permission notification from the speaker recognition server system and transmitting it to the client; And

And a means for receiving a rejection notification from the speaker recognition server system and transmitting the rejection notification to the client.

A network-based speaker identification apparatus for authenticating a user who is connected by using a microphone of a speaker recognition system among server systems,

Means for receiving a member identifier, word identifier and speaker component entered directly from a client or via a network server system;

Means for sending a permission notification to the client or the network server system if the voice pattern data and the reference pattern data match as a result of the determination; And

And a means for transmitting a rejection notification to the client or the network server system if the voice pattern data and the reference pattern data do not match as a result of the determination. Speaker verification apparatus using a network-based word group for.

Means for receiving a member identifier from a client;

Means for receiving a voice signal from the client and transmitting the voice signal, the member identifier and the word identifier to a speaker recognition server system;

Means for receiving a member identifier, a word identifier and a voice signal input directly from the client or via a network server system;

A memory storing a web site database, a voice information database, and a program;

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Registering the user by storing user image information together with user voice characteristic information obtained by analyzing a voice signal of the user,

Receiving a user authentication request from an authentication requester directly or indirectly through the network,

Selecting a word or sentence corresponding to the state of the user from a pre-built word group or sentence group and transmitting the same to the authentication requester;

Receiving the voice requestor's voice signal for the word or sentence or voice property information of the authentication requester extracted from the voice signal through the network; and

User authentication system using a network-based word group, characterized in that to execute.

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Receiving from the client some input including at least a member identifier and storing it in a website database,

Determine a word group among a plurality of word groups pre-built in the web site database, wherein each word group consists of a plurality of words, and input a pre-built word group identifier corresponding to the word group In conjunction with the information stored in the website database,

Preprocessing a plurality of words belonging to the word group, a pre-built word identifier corresponding to each word, and voice signals in which the user corresponding to the client pronounces the words in the client to correspond to each word Extracting speaker components to be transmitted to the client by embedding component software for transmitting to the server to the client;

Receiving word identifiers and speaker components corresponding to each word from the client,

Generating a reference pattern identifier corresponding to each of the reference pattern data, and then storing the member identifier, the reference pattern data, the reference pattern identifier, and a word identifier in a voice information database. Speaker learning system using network-based word group for authentication.

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Embedding a plurality of words belonging to the word group and a pre-built word identifier corresponding to each word in the first document and transmitting the same to the client;

Receiving word identifiers and voice signals corresponding to each word from the client,

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Receiving a member identifier from a client,

Selecting one word among a plurality of words belonging to a word group corresponding to the word group identifier,

A first component software for preprocessing the word, the word identifier corresponding to the word, and a voice signal in which the user pronounces the word, extracts a speaker component corresponding to the word, and transmits it to a server Embedding in a document and sending it to the client,

Receiving a speaker component corresponding to the word from the client,

Extracting voice pattern data corresponding to the word by inputting the speaker component;

Determining whether the voice pattern data and the reference pattern data match each other;

Transmitting a permission notification to the client if the voice pattern data and the reference pattern data match as a result of the determination;

And if the voice pattern data and the reference pattern data do not coincide with each other as a result of the determination, transmitting a rejection notification to the client.

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Receiving a member identifier from a client,

Embedding the word and a word identifier corresponding to the word in a first document and transmitting the same to the client;

Receiving a voice signal corresponding to the word from the client,

Retrieving reference pattern data stored in a voice information database corresponding to the word identifier,

And if the voice pattern data and the reference pattern data do not match, determining to send a rejection notification to the client.

Memory that stores Web site databases and programs;

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Preprocessing a plurality of words belonging to the word group, a pre-built word identifier corresponding to each word, and a voice signal in which the user corresponding to the client pronounces the words in the client to correspond to each word Extracting speaker components to be transmitted to the client by embedding component software for transmitting to the server to the client;

Receiving a member identifier, a word identifier corresponding to each word, and a speaker component corresponding to each word from the client and transmitting them to a speaker recognition server system;

And receiving a learning completion notification from the speaker recognition server system and transmitting the learning completion notification to the client. The speaker learning network server system using the network-based word group for user authentication.

A memory storing a voice information database and a program;

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Receiving a member identifier, word identifiers corresponding to each word, and speaker components entered directly from a client or via a network server system,

Generating a reference pattern identifier corresponding to each of the reference pattern data, and storing the member identifier, the reference pattern data, the reference pattern identifier, and a word identifier in a voice information database. Speaker Recognition Speaker Recognition Server System using Network-based Word Group for Users.

Memory that stores Web site databases and programs;

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Receiving a member identifier, a word identifier corresponding to each word, and a voice signal corresponding to each word from the client and transmitting them to a speaker recognition server system; and

A memory storing a voice information database and a program;

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Receiving a member identifier, word identifiers corresponding to each word and voice signals input directly from the client or via a network server system,

Memory that stores Web site databases and programs;

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Receiving a member identifier from a client,

Preprocessing the word, a word identifier corresponding to the word, and a voice signal in which the user corresponding to the client pronounces the word in the client extracts a speaker component corresponding to the word and transmits it to the network server system. Embedding component software in a first document for transmission to the client,

Receiving a speaker component from the client and transmitting the speaker component, the member identifier and the word identifier to a speaker recognition server system;

Receiving a permission notification from the speaker recognition server system and sending it to the client; and

And receiving the rejection notification from the speaker recognition server system and transmitting the rejection notification to the client.

A memory storing a voice information database and a program;

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Receiving a member identifier, word identifier and speaker component entered directly from a client or via a network server system,

Transmitting a permission notification to the client or the network server system when the voice pattern data and the reference pattern data match with the determination result;

If the voice pattern data and the reference pattern data do not match, a step of transmitting a rejection notification to the client or the network server system, wherein the speaker uses the network-based word group for user authentication. Check speaker recognition server system.

Memory that stores Web site databases and programs;

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Receiving a member identifier from a client;

A memory storing a voice information database and a program;

A processor coupled to the memory to execute the program

Including but not limited to:

The processor by the program,

Receiving a member identifier, a word identifier and a voice signal input directly from the client or via a network server system,

A method of restoring voice information for preventing a customer from arranging or denying a contract or an order after a contract is established by a customer in at least one order or subscription in an e-commerce,

Transmitting a word or sentence corresponding to each order or subscription from the pre-stored word group to the client;

Receiving a purchase order voice signal corresponding to the word or sentence from the client;

Storing voice pattern data corresponding to the purchase order voice signal in a voice information database;

Voice information restoration method using a word group comprising a.

92. The method of claim 92,

The voice information restoration method using the word group

Restoring the voice pattern data to the purchase order voice signal when the customer establishes or denies the fact of the contract establishment or order.

Speech information restoration method using a word group characterized in that it further comprises.

92. The method of claim 92,

Voice information restoration method using the word group,

Receiving the purchase order voice signal from a client;

Preprocessing the purchase order voice signal to extract a speaker component corresponding to the purchase order voice signal; And

Extracting speech pattern data corresponding to the purchase order speech signal from the speaker component