KR20020060975A

KR20020060975A - System and method of templating specific human voices

Info

Publication number: KR20020060975A
Application number: KR1020027006630A
Authority: KR
Inventors: 스티븐 제이. 커우; 캐서린 에이셔 커우
Original assignee: 스티븐 제이. 커우; 캐서린 에이셔 커우
Priority date: 1999-11-23
Filing date: 2000-11-23
Publication date: 2002-07-19
Also published as: CA2392436A1; NO20022406L; NO20022406D0; EA200200587A1; BR0015773A; IL149813A0; ZA200204036B; WO2001039180A1; EP1252620A1; CN1391690A; AU2048001A; AP2002002524A0; EA004079B1; JP2003515768A

Abstract

음성의 인에이블링 부분을 포착한 후(103) 원래의 음성을 재구성하기 위해 다른 기원의 노이즈와 차후에 결합될 수 있는 음성 템플릿 또는 프로파일 신호를 생성(127)하는 시스템 및 방법이 개시되어 있다. 이와 같이 재구성된 음성은 디지털 입력을 통해 제공된 임의의 형태 또는 컨텐츠를 소리내고 원래의 음성에 의한 원래의 방식으로 소리낸 것이 아닌 컨텐츠를 말하도록 사용될 수 있다. 특정 비즈니스 방법 및 산업 애플리케이션과 같이 온라인 사용을 위한 제품 및 프로세스가 개시되어 있다.A system and method for generating (127) a voice template or profile signal that can be combined with noise of another origin to reconstruct the original voice after capturing (103) the enabling portion of the voice is disclosed. The reconstructed voice may be used to speak any form or content provided via the digital input and to speak the content that is not originally sounded by the original voice. Products and processes for online use are disclosed, such as specific business methods and industrial applications.

Description

[0001] SYSTEM AND METHOD OF TEMPLATING SPECIFIC HUMAN VOICES [0002]

본래, 포유류 및 기타 생물들은 음성 또는 유사한 노이즈에 의한 임의의 방식으로 의사 소통하였다. 실제로, 이러한 노이즈는 통상적으로, 종(species) 내에서도 생물의 형태학(morphology)적 차이에서 볼 때 상당히 구별된다. 생물의 특수성(distinctiveness)은 음성 패턴 및 톤의 매우 독특한 요소를 포함하게 된다. 불행히도, 인간이 죽거나 또는 청취자와의 교신이 중단되면, 특히 관심있는 음성으로 상대의 발언을 청취하는 즐거움은 상실된다.Originally, mammals and other organisms communicated in any way by voice or similar noise. Indeed, such noise is typically quite different in morphological differences of organisms within species. The distinctiveness of the organism includes a very unique element of the voice pattern and tone. Unfortunately, when a person dies or stops talking to the listener, the pleasure of listening to the opponent's remarks with a voice of particular interest is lost.

현재는, 음성이 저장될 수 있는 매우 기본적인 방식의 미디어 캡처(media capture)만이 존재한다. 예를 들어, 테이프 또는 디지털 기록 장치가 어떤 사람의 음성을 기록하기 위해 사용되면, 차후에 청취할 수 있도록 그 음성이 보유되어, 원래 기록된 대로 재생되거나 원하는 경우에는 원래의 기록 중 일부가 재생될 수도 있다. 또한, 이러한 음성 기록 장치 및 방법은 컴퓨터에 의해 생성된 인공 음성의 범위도 포함하는데, 예컨대 전화 자동 안내 및 확인, 장난감 또는 기기와 사용자간의 매우 간단한 대화, 영화 및 엔터테인먼트 사업용의 합성 음성 등을 포함하는 각종의 다양한 기능으로 사용될 수 있다. 몇몇 애플리케이션에 있어서, 이러한 인공 음성은 특정한 입력에 따라 한정된 집합의 응답들로 사전 프로그래밍되어 있다. 몇몇 예에서는 실제의 음성을 단순히 기록하는 것보다 더 응답하지만, 그럼에도 불구하고 이러한 인공 음성은 본 발명의 견고한(robust) 음성 능력에 비해서 단순할 뿐이다. 실제로, 본 발명의 특정 실시예들에서는 상술된 시스템들과 매우 상이하거나, 종래의 발견 또는 종래의 혁신기술에 의해서 고려되거나 제안된 것보다 훨씬 더 이전의 기술을 이용하는 요소들이 있다.Currently, there is only a very basic way of media capture in which the voice can be stored. For example, if a tape or digital recording device is used to record a person's voice, the voice may be retained for future listening so that it may be played back as originally recorded or, if desired, some of the original record have. Such a voice recording apparatus and method also include a range of artificial voice generated by a computer, including, for example, automatic telephone guidance and confirmation, a very simple conversation between a toy or a device and a user, a synthetic voice for a movie and entertainment business, It can be used for various functions. In some applications, such artificial speech is pre-programmed with a limited set of responses according to a particular input. In some instances it is more responsive than simply recording the actual voice, but nevertheless such artificial voice is only as simple as the robust voice capability of the present invention. Indeed, in certain embodiments of the present invention there are elements that are significantly different from the systems described above, or that use a much older technology than those considered or suggested by conventional discovery or conventional innovation.

여러 세계적인 간행물들이 여러 가지 인공 음성화에 대해 개시하고 있다. 유사하게, 몇몇 문헌들은 인공 음성을 사용하고 생성하는 시스템 및 기술을 개시하고 있다. 그러나, 이들 중 어느 것도 본 발명의 사상을 개시하지는 않는다.Several international publications disclose various artificial phonations. Similarly, some documents disclose systems and techniques for using and generating artificial speech. However, none of these discloses the idea of the present invention.

본 발명은 소리, 특히 인간의 음성을 저장하고 변조하기 위한 시스템, 방법, 및 제품에 관한 것이다.The present invention relates to systems, methods, and products for storing and modulating sound, particularly human speech.

도 1은 본 발명의 시스템 동작의 일 실시예에 대한 흐름도.1 is a flow diagram of one embodiment of a system operation of the present invention;

도 2는 음성 캡처 서브시스템의 일 실시예의 개략도.Figure 2 is a schematic diagram of one embodiment of a voice capture subsystem.

도 3은 음성 분석 서브시스템의 일 실시예의 개략도.Figure 3 is a schematic diagram of one embodiment of a speech analysis subsystem.

도 4는 음성 특성화 서브시스템의 일 실시예의 개략도.Figure 4 is a schematic diagram of one embodiment of a speech characterization subsystem.

도 5는 음성 템플릿 서브시스템의 일 실시예의 개략도.5 is a schematic diagram of one embodiment of a speech template subsystem;

도 6은 음성 템플릿 신호 번들러 서브시스템의 일 실시예의 개략도.Figure 6 is a schematic diagram of one embodiment of a speech template signal bundler subsystem.

도 7은 원격 정보 다운로드 및 업로드 옵션이 사용되는 본 발명의 시스템의 일 실시예의 개략도.7 is a schematic diagram of one embodiment of a system of the present invention in which a remote information download and upload option is used;

도 8은 이동 컴팩트 컴포넌트에 구현되는 본 발명의 일 실시예의 평면도.Figure 8 is a top view of an embodiment of the present invention implemented in a mobile compact component.

도 9는 비주얼 미디어 소스가 사용되는 본 발명의 일 실시예의 평면도.9 is a top view of an embodiment of the present invention in which a visual media source is used;

<발명의 요약>SUMMARY OF THE INVENTION [

인간의 음성의 인에이블링 양(enabling amount)을 기록하거나 포착(capture)하여 음성 패턴 템플릿을 형성하는 시스템 및 방법이 제공된다. 이 때, 이러한 템플릿은, 템플릿을 사용하여 정확한 음성과 유사하게 소리를 내는 새로운 음성을 만들어 내기 위한 툴로서 유용한데, 이 새로운 음성은 인간이 정확한 문맥이나 문장으로 실제로 말할 수 없거나 말하지 않았음에도 불구하고 인간의 실제 음성과 모든 면에서 동일하게 실제로 소리나는 것이다. 인에이블링 부분은 실제 음성을 재구성하는 데 필요한 실제 음성의 요소들을 포착하도록 설계되지만, 개시할 수 있을 정도로 충분한 인에이블링 음성이 없는 경우에는, 재구성되거나 재생성되는 음성의 범위를 예측하는 데 신뢰율(confidence rating)이 이용 가능하다. 새로운 음성 또는 음성들은 템플릿된 음성의 발음자가 존재했던 것과 같이 사용자와 새로운 논의를 할 수 있게 하기 위해 대상에 대한 데이터베이스, 역사적 데이터, 및 적응성 또는 인공 지능 모듈들과 함께 사용될 수도 있다. 이러한 시스템 및 방법은 소프트웨어 파일, 칩 내장 툴, 또는 기타 다른 형체의 다른 미디어와 결합될 수도 있다. 단위 모듈 자체가 본 발명의 실시예 전체, 예를 들어, 본 명세서에 개시된 방식으로 음성을 포착하고 사용을 가능케 하도록 구성된 칩 또는 전자 모드를 포함할 수도 있다.A system and method are provided for recording or capturing an enabling amount of human voice to form a voice pattern template. At this time, these templates are useful as tools for creating new voices that use a template to produce sounds that are similar to the correct voices, even though the new voices do not speak or speak in the correct context or sentence It is actually sounding the same in all aspects of human actual voice. The enabling portion is designed to capture the elements of the actual speech needed to reconstruct the actual speech, but if there is not enough enabling speech to be able to start, the confidence rate is used to predict the range of speech to be reconstructed or reproduced a confidence rating is available. New voices or voices may be used with the database, historical data, and adaptive or AI modules for the subject to enable new discussions with the user, such as the presence of a pronounced voice of the template voice. Such systems and methods may be combined with software files, chip embedded tools, or other media of other shapes. The unit module itself may include a chip or electronic mode configured to capture and use voice in its entirety, for example, in the manner disclosed herein.

템플릿은, 예를 들어, 더 이상 즉시 유용하지 않거나, 사망하거나, 또는 음성을 상기 방식으로 템플릿되어 사용되도록 동의한 사람과의 새로운 대화를 포착 및 생성하기 위한 툴로서 유용하다. 다른 일례는 발음자와의 주문형 가상 대화를 생성하기 위해 실제 발음자의 필름, 사진, 또는 기타 다른 묘사물 등의 미디어에 대한 애플리케이션이다.Templates are useful, for example, as a tool for capturing and creating new conversations with a person who is no longer immediately available, dead, or who has agreed to use the voice as templates in this manner. Another example is an application for media, such as a film, photograph, or other description of an actual speaker to create an on-demand virtual conversation with a pronoun.

음성은 포유류 간의 특유한 능력으로부터의 소리이다. 엄마의 음성은 심지어 출생하기 전에 태아에게 인식되어 태아를 진정시키고, 할아버지의 음성은 심지어 성인의 공포감을 가라앉힌다. 다른 음성은 완전히 낯선 사람들을 고무시킬 수도 있고 또는 오랜 과거의 이벤트와 사건의 사랑하는 사람들로부터 추억을 이끌어낼 수도 있다. 그러나, 인간 및 다른 종이 갖는 특수성과; 각 생물의 음성의 매우 고유한 소리로 다른 종들 (및 그 자신)에게 영향을 미치는 능력이라는 커다란 재능의 몇몇 예들이 존재한다. 예를 들면, 인간의 경우, 한 사람의 음성의 특이성(particularity)은 말하거나 또는 음성으로 또는 구음 및 비음을 통해 의사 전달할 때 한 사람이 소리를 내는 방식에 영향을 주는 다양한 인간 신체 구성 요소들의 형태, 크기, 및 위치를 야기하는 부모의 유전적인 기여로부터 유도된다. 다른 영향들도 또한 존재한다. 동일한 가족이더라도 사람들 간에 광범위한 차이점들이 존재함을 이해할 수 있다. 실제로, 동일한 사람이 건강, 스트레스 정도, 감정적인 상태, 피로, 사람 주변의 주위 온도, 또는 다른 요인들과 같은 일시적인 영향들에 따라 약간 상이한 사운드를 낼 수 있다.Voice is the sound from the unique ability of mammals. Mom's voice is even recognized by the fetus before birth, calms the fetus, and the grandfather's voice even sinks the fears of adults. Other voices can inspire totally unfamiliar people or draw memories from loved ones of long-standing past events and events. However, the specificity of humans and other species; There are some examples of the great talent of being able to influence other species (and themselves) with a very unique sound of the voice of each creature. For example, in the case of humans, the particularity of a person's voice is the form of various human body components that affect the way a person makes a sound when communicating through speech or voice, or through oral and non- , Size, and location of the parent. Other influences also exist. Even the same family can understand that there are wide range of differences between people. Indeed, the same person can produce a slightly different sound depending on transient influences such as health, stress level, emotional state, fatigue, ambient temperature around a person, or other factors.

인간의 음질은 이전에 음성을 들어왔던 사람들에게 식별될 수 있는 매우 고유한 결합을 나타낸다고 일반적으로 동의한다. 인간이 감각을 통해 연상하는 능력은 주목할 만한데, 특히 이러한 감각은 인간 음성에 의한 식별 및 연상과 관련된다. 생애의 중요하고 평이한 사건들이 기억되는 평이나 어조에 의해 수년간 또는 수십년간 종종 생각난다. 음성의 영속적인 힘 및 감동적인 힘이다.It is generally agreed that human sound quality represents a very unique combination that can be identified to those who have previously heard the voice. The ability of humans to associate through senses is noteworthy, especially this sense is associated with identification and association by human voice. Important and paltry events of life are often remembered for years or decades by memorization or tone. It is the perpetual strength and the moving force of voice.

물론, 다양한 미디어 및 기계를 통해 인간 음성을 포착 및 재생하는 것이 널리 공지되어 있다. 기록된 인간 음성의 기본적인 조작은 수십년 동안 테이프 및 디지털 미디어에서 의도적으로 또는 비의도적으로 이루어졌다. 그러나, 이러한 조작은 일반적으로 인간이 진술할 수 있었던 것보다는 도리어 인간에 의해 실제로 진술되었던 것의 범위들에 의해 제한되었다. 예를 들어, 인간에 의한 실제 진술문의 분절들은 때때로 심지어 상이한 속도들로 재생, 편집, 혼합 및 재생되었다. 인간 음성 사용의 다른 일례들은 만화 또는 애니메이션 또는 특정 음악과 관련된 다른 오디오에서 사용될 수 있는 바와 같이 의도적으로 왜곡된 음성 분절들의 재생을 포함한다. 물론, 애니메이션 매체는 또한 반드시 실제 음성을 사용해서 생성된 것이 아닌 인공 음성을 사용하기도 했다. 이 일례는 몇몇 전화 및 통신 시스템들에 의해 사용되는 컴퓨터 생성 "보이스(voice)" 오퍼레이터이다. 음성 및 사운드를 합성하는 한 방법은 연쇄된(concatenative) 것과 관련되고, 파형 데이터 샘플들 또는 실제 인간 음성의 기록들과 관련된다. 상기 방법은 그 후 미리 기록된 고유 인간 음성을 분절들로 분류(break down)하고, 음절(syllable), 단어 또는 어구(phrase)를 구성하기 위해 상기 인간 음성 분절들을 연결함으로써 음성 발음들(utterances)을 생성한다. 상기 분절의 크기는 변한다. 다른 인간 음성 합성 방법은 파라메트릭(parametric)한 것으로 공지되어 있다. 상기 방법에서, 수학적 모델들이 희망 음성 사운드를 재생하기 위해 사용된다. 각각의 희망하는 사운드에 있어서, 수학적 모델 또는 함수가 상기 사운드를 생성하는 데 사용된다. 이와 같이, 파라메트릭 방법은 일반적으로 한 요소로서 인간 사운드를 갖지 않는다. 일반적으로 소수의 널리 공지된 타입들의 파라메트릭 음성 합성기들이 있다. 그 중 하나로서 인간의 폐, 후두, 및 성도 및 코의 물리적인 양상들을 수학적으로 모델링하는 조음 합성기(articulatory synthesizer)가 공지되어 있다. 다른 타입의 파라메트릭 음성 합성기는 인간 성도의 청각(acoustic) 양상들을 수학적으로 모델링하는 포르만트(formant) 합성기이다.Of course, it is widely known to capture and reproduce human voice through a variety of media and machines. The basic manipulation of recorded human voice has been intentional or unintentional in tape and digital media for decades. However, this manipulation was generally limited by the range of what was actually stated by humans rather than what humans could say. For example, segments of actual statements by humans were sometimes reproduced, edited, mixed and reproduced at even different rates. Other examples of human voice use include the reproduction of intentionally distorted speech segments as can be used in comics or animation or other audio associated with a particular music. Of course, animated media also used artificial speech, not necessarily generated using real speech. An example of this is a computer generated " voice " operator used by some telephone and communication systems. One way to synthesize speech and sound is related to concatenative and relates to records of waveform data samples or actual human voice. The method then breaks down the prerecorded native human voice into fragments and generates voice utterances by connecting the human voice segments to form a syllable, word or phrase. . The size of the segment varies. Other methods of human speech synthesis are known to be parametric. In the method, mathematical models are used to reproduce the desired voice sound. For each desired sound, a mathematical model or function is used to generate the sound. As such, parametric methods generally do not have a human sound as an element. There are generally a few well known types of parametric speech synthesizers. As one of them, articulatory synthesizers are known which mathematically model physical aspects of the human lungs, larynx, and nose and nose. Another type of parametric speech synthesizer is a formant synthesizer that mathematically models the acoustic aspects of human sincerity.

사용 시스템이 일단 한 음성에 익숙해졌으면, 다른 시스템들은 특정 음성을 인식하기 위한 수단을 포함한다. 이 일례들은 구술(dictation) 등을 위한 시스템들과 같이, 구어를 포착한 후 상기 사운드들을 텍스트로 번역하는 분야에서 사용되는 다양한 음성 인식 시스템들을 포함한다. 다른 음성 관련 시스템들은 생물 측정학(biometrics) 분야와 관련되고, 보안 코드들 또는 암호들로서 특정 구두 단어들을 사용한다. 기술된 상기 시스템들, 방법들, 수단들 또는 다른 형태들 중 어느 것도 본 명세서에 기술된 다양한 발명들을 인식하지 못하고, 어떠한 것도 설명하지 못하며, 심지어 기술적인 혁신에 대한 필요성도 인식하지 못한다. 발신자 등에 의한 차후 사용을 위해 동적이며 적합한 방식으로 다른 생물들의 음성들을 보존하기 위한 시스템 및 방법이 오랫동안 필요했다. 그 밖에 이은 것이 없음을 나타내고, 또렷하게 발음하는 방식으로 음성 포착 또는 프로파일을 달성 및 사용하기 위한 시스템들 및 방법들이 더 필요했다. 또는 다른 경우 인간에 의해 결코 숙고되지 않았을 방법들로 고유 인간의 음성으로 된 진짜 유성음화 또는 음성을 달성 및 이용하기 위한 시스템들 및 방법들이 더 필요했다. 이를 달성하기 위한 시스템들 및 방법들에는 가상의 임의의 기술, 문화 또는 언어의 모든 사람들에 의해 쉽게 사용된다는 특정한 추가 장점들이 발생한다. 또한 비지니스 또는 즐거움과 관련되든 아니든, 특정 음성 템플릿들 생성 및 그에 대한 액세스를 용이하게 하고 인간의 필요 또는 욕구를 위해 상기 음성 템플릿들의 사용을 용이하게 하기 위해 장치 및 다른 수단 구현과 함께 새로운 비즈니스 방법, 기술 및 모델이 더 필요했다. 음성 기술 분야에서 달성되었더라도, 과거의 노력들은 본 발명을 숙고하지 못했고 신규한 것을 강조하지도 못했고 본 발명에 대한 필요성도 인식하지 못했다.Once the used system has become accustomed to one voice, other systems include means for recognizing the specific voice. These examples include various speech recognition systems used in the field of capturing speech and translating the sounds into text, such as systems for dictation and the like. Other voice-related systems are related to the field of biometrics and use specific verbal words as security codes or passwords. None of the above described systems, methods, means, or other aspects described herein are aware of the various inventions described herein, nor describe any, nor even recognize the need for technical innovation. There has long been a need for systems and methods for preserving the voices of other organisms in a dynamic and appropriate manner for future use by callers, There was also a need for systems and methods for achieving and using a voice capture or profile in a manner that clearly indicated that nothing else had happened. There has been a need for systems and methods for achieving and utilizing real voiced speech or voices in native human voices in ways that would otherwise never be contemplated by humans. Certain additional advantages arise that systems and methods for achieving this are readily used by anyone of any virtual skill, culture, or language. In addition to implementing devices and other means to facilitate the creation and access to specific voice templates, whether related to business or pleasure, and to facilitate use of the voice templates for human needs or desires, new business methods, We needed more technology and models. Although accomplished in the field of speech technology, past efforts have not contemplated the present invention, emphasized the novelty, nor recognized the need for the present invention.

도 1은 음성 특성들의 차후 사용시 템플릿으로서 사용되기에 충분한 특정 음성의 인에이블링 부분을 포착하기 위한 시스템(10)의 일 실시예의 개략도이다. 시스템(10)은 전자 핸드헬드 디바이스와 같은 핸드헬드 디바이스의 파트일 수 있으며, 또는 랩탑, 노트북, 또는 데스크탑 크기의 컴퓨팅 디바이스의 파트일 수도 있다. 또는, 시스템(10)은 단지 다른 디바이스 내의 회로 보드의 파트일 수도 있으며, 또는 다른 전자 소자, 회로, 또는 시스템에 일시적으로 또는 영구적으로 배치되어 사용되도록 설계된 일렉트로닉스 컴포넌트 또는 소자일 수도 있다. 또는, 시스템(10)은 컴퓨터 판독 가능 코드 또는 단지 신경 시스템의 로직 또는 기능 회로를 전체적으로 또는 부분적으로 포함할 수도 있다. 또는, 시스템(10)은 분산 네트워크 스타일 시스템과 같이 몇몇 다른 장치 또는 제품으로서 형성될 수도 있다.일 실시예에서, 시스템(10)은 특정 음성, 또는 발신자의 음성의 분명한 음성으로서 사운드 또는 노이즈를 구성하기에 적합한 사운드 또는 노이즈의 다른 조직화(organization)에 대한 애플리케이션 또는 템플릿을 위해 조직된 복수의 음성 특성들을 정의 및 재생성하기 위한 데이터 스트림, 데이터 팩키지, 텔레커뮤니케이션 신호, 소프트웨어 코드 수단으로 형성될 수도 있는 음성 알고리즘 또는 템플릿 수단(19)의 프로세싱 및 구성을 위해 음성 부분을 포착 또는 수신하기 위한 입력 또는 포착 수단(15)을 포함한다. 컴퓨터 판독 가능 프로그램 코드 수단을 포맷하기 위한 다른 수단, 또는 특정 식별 음성 특성들을 사용하여 음성을 인공적으로 생성하기 위한 다른 수단이 또한 본 발명 내에서 고려된다. 알고리즘 또는 템플릿 수단(19)의 로직 또는 규칙들은 양호하게 최소의 음성 입력으로 형성되지만, 다양한 양상들의 음성 및 다른 데이터가 특정 음성을 위한 만족스러운 데이터 세트를 형성하기 위해 필요할 수도 있다.1 is a schematic diagram of one embodiment of a system 10 for capturing an enabling portion of a particular voice sufficient to be used as a template in subsequent use of voice features. The system 10 may be part of a handheld device, such as an electronic handheld device, or it may be part of a laptop, notebook, or desktop sized computing device. Alternatively, the system 10 may simply be part of a circuit board in another device, or it may be an electronic component or device designed to be used temporarily or permanently in another electronic device, circuit, or system. Alternatively, system 10 may include, in whole or in part, computer readable code or logic or functional circuitry of a neural system. Alternatively, the system 10 may be configured as some other device or product, such as a distributed network-style system. In one embodiment, the system 10 may comprise a sound or noise configured as a specific voice, or a clear voice of the caller's voice A data stream, a data package, a telecommunication signal, a voice that may be formed by software code means for defining and regenerating a plurality of voice characteristics organized for an application or template for another organization of sound or noise suitable for And an input or acquisition means (15) for acquiring or receiving speech portions for processing and configuration of the algorithm or template means (19). Other means for formatting computer readable program code means, or other means for artificially generating speech using certain identified speech characteristics, are also contemplated within the present invention. The logic or rules of the algorithm or template means 19 are preferably formed with minimal voice input, but various aspects of voice and other data may be needed to form a satisfactory data set for a particular voice.

본 발명의 일 실시예에서, 예를 들어, 템플릿될 인간 음성의 소량의 아날로그 또는 디지털 기록 또는 실시간 라이브 입력으로 인간 음성의 인에이블링 부분을 포착할 필요가 있다. 실제로, 규정된 그룹의 단어들이 음성의 정확한 복제를 가능하게 하기 위해 인간의 가장 관련 있는 음성 특성들의 데이터 포착을 최적화하도록 형성될 수도 있다. 어떤 형태의 인에이블링 부분이 특정 인간에게 최상인지를 가장 효율적으로 결정하기 위한 분석 수단이 고려된다. 단일 데이터 입력이건 또는 시리즈 입력이건 간에, 음성 데이터는 포착되어 저장 수단(22)의 적어도 한 부분에 저장된다.In one embodiment of the present invention, it is necessary to capture, for example, a small amount of analog or digital recording of human voice to be templated or an enabling portion of human voice with real-time live input. In practice, words of a defined group may be formed to optimize the data capture of the most relevant voice characteristics of the human to enable accurate reproduction of the voice. An analytical means for determining which type of enabling portion is best for a particular human is considered. Whether it is a single data input or a series input, the voice data is captured and stored in at least a portion of the storage means 22.

음성 데이터 분석은 특정 사용자의 음성의 템플릿을 생성하는 데 유용한 특성을 식별하기 위해 프로세서 수단(25)에서 실행된다. 음성 데이터가 프로세서 수단에 직접 루팅될 수도 있으며 초기에 저장 수단(22)으로 반드시 갈 필요가 없는 것은 이해될 것이다. 프로세서 수단, 저장 수단, 및 템플릿 수단 간의 상호 작용에 대한 일례의 설명이 도 2 내지 도 8과 관련해서 이하에 제공된다. 일 실시예에서, 적합한 음성 데이터가 분석된 후에, 음성의 템플릿은 프로세서 수단(25)에 의해 호출될 때까지 저장된다. 예를 들어, 음성 AA가 포착, 분석, 및 템플릿된 인에이블링 부분 (이하, AA_t라고 함)을 가진 후, 요청이 발생할 때까지 저장 수단(22)에 저장된다 (다른 컴포넌트들 주위에 상주할 수도 있고, 또는 하나 이상의 로케이션들에 원격 또는 분산 모드로 배치될 수도 있음). 요청의 한 일례는 음성 AA의 실제 라이브 사용이 아닌 생성된 음성으로서 참여하는 음성 AA와의 새롭게 생성된 대화에서 음성 AA 템플릿 AA_t를 이용하기 위해 시스템(10) 사용자가 대표 입력 수단(29)을 통해 요청을 제출하는 것이다. 이는 하나 이상의 다양한 데이터베이스들과 관련해서 또는 상기 데이터베이스들을 이용해서 발생할 수도 있다. 상기 데이터베이스들 중 소수의 데이터베이스들은 상황 데이터베이스(33) 또는 개인 데이터베이스(36)로 대표된다. 음성 AA 템플릿 AA_t는 일단 형성되면 고유 입력 데이터의 고유 음성 AA와 정확하게 유사하게 소리를 내는 새로운 대화 음성 AA¹을 생성하기 위해 특정 다른 노이즈를 갖는 형성 메카니즘으로서 지칭되며 제공된다. 새로운 음성 AA¹이 모든 면에서 고유 음성 AA와 유사하게 소리를 내더라도, 이는 실제로 템플릿 AA_t를 갖는 인공 생성 음성이며, 유전 코드와 같이 음성 AA에 대한 매칭 키를 제공한다. 이러한 방식으로, 실제 음성의 인에이블링 부분은 사용자가 요구한 임의의 가상 방식으로 포착된 음성의 재생성 및 무제한 사용을 가능하게 하도록 템플릿을 사용해서 시스템(10)을 인코드할 수 있다. 이는 단지 연쇄 기술 또는 포르만트 기술에 의해 전자적으로 함께 혼합된 음성 AA의 비트들의 이전 발음들의 합성이 아니라, 음성 AA의 음성 데이터 특성 (즉, 음성 템플릿 또는 프로파일) 및 음성 AA의 발신자와 관련된 다른 가능한 특성들을 사용해서 설계, 제조, 및 어셈블 또는 구성된 전체적으로 새로운 음성이다.Voice data analysis is performed in the processor means 25 to identify characteristics that are useful for generating a template for a particular user's voice. It will be appreciated that the voice data may be routed directly to the processor means and does not necessarily have to go to the storage means 22 initially. An exemplary description of the interaction between the processor means, the storage means, and the template means is provided below with respect to Figures 2-8. In one embodiment, after the appropriate voice data is analyzed, the template of the voice is stored until it is called by the processor means 25. For example, after the voice AA has been captured, analyzed, and has a templated enabling portion (hereinafter AA _t ), it is stored in the storage means 22 until a request occurs Or may be located in one or more locations in a remote or distributed mode). One example of the request is for the system 10 user to use the voice AA template AA _t in the newly generated conversation with the participating voice AA as a generated voice rather than the actual live use of the voice AA via the representative input means 29 Submit the request. This may occur in connection with one or more of the various databases or using the databases. A small number of the databases are represented by a situation database (33) or a personal database (36). The voice AA template AA _t is referred to and provided as a forming mechanism with certain other noise to generate a new conversation voice AA ¹ , which, once formed, sounds exactly like the native voice AA of the unique input data. Although the new voice AA ¹ sounds similar to the native voice AA in all respects, it is actually an artificial voice with the template AA _t and provides a matching key for the voice AA, such as a genetic code. In this way, the enabling portion of the actual voice may encode the system 10 using a template to enable regeneration and unlimited use of the voice captured in any virtual manner required by the user. This is not a synthesis of previous pronunciations of the bits of speech AA that are merely electronically mixed together by chain technique or formant technology, but rather the synthesis of voice data characteristics (i.e., voice templates or profiles) of voice AA and other Manufactured, assembled, or constructed entirely using the available features.

물론, 상기 기술이 의미하는 바는 광대하고, 상기 템플릿 음성 기술의 적합한 사용을 유지하기 위한 보호 수단이 필요하다고 인정된다. 실제로, 상기 기술은 인증된 사용자만이 음성 템플릿 기술 및 데이터에 액세스하여 사용할 수 있도록 인증 수단을 사용할 필요가 있을 수 있다. 또한, 생성된 음성들의 기반적인 사용 또는 인증되지 않은 사용을 방지하기 위해 들려진 음성들이 실제이거나 또는 템플릿된 것임을 검증하기 위한 수단이 더 필요할 수 있다. 대부분의 국가들에서 이미 존재하는 면허, 계약, 및 다른 메카니즘들 외에, 상기 범위의 기술을 인식하기 위해 합법적인 메카니즘들이 생성될 필요가 있을 수 있다.Of course, what this technique means is vast, and it is recognized that there is a need for a safeguard to maintain the proper use of the template speech technology. In fact, the technique may need to use authentication means so that only authenticated users can access and use voice template technology and data. In addition, there may be a further need for means for verifying that the voices heard to prevent the underlying or unauthorized use of the generated voices are real or template. In addition to licenses, contracts, and other mechanisms that already exist in most countries, legitimate mechanisms may need to be created to recognize this range of technology.

도 1에서, 접속 수단(41)은 시스템 컴포넌트들 간의 실제 리드, 광 채널, 또는 다른 전자, 생물학적, 또는 다른 활성화 경로들일 수도 있는 에너지 또는 데이터 흐름을 위한 경로를 나타낸다. 일 실시예에서 파워 수단(44)이 시스템(10) 내에 도시되어 있는데, 원하는 경우 원격일 수도 있다.In FIG. 1, the connection means 41 represents a path for energy or data flow that may be the actual lead, optical channel, or other electronic, biological, or other activation paths between the system components. In one embodiment power means 44 is shown in system 10, which may be remote if desired.

시스템(10)의 다른 실시예에서, 전체적으로 또는 부분적으로 생성된 알고리즘, 신호, 코드 수단, 또는 템플릿은 저장 수단(22), 템플릿 수단(19), 또는 다른 시스템 컴포넌트, 또는 아키텍처 내에서의 저장 또는 정련을 위해 리턴될 수도 있다. 이러한 기능은 개발자 또는 다른 사용자의 명령에 따라 특정 음성 템플릿의 개선 또는 적응을 허용 및 용이하게 한다. 이는 예를 들어 동일한 인간 음성의 복수의 데이터 세트가 시간이 지남에 따라 입력될 수 있으면, 또는 음성 발신자의 상이한 나이, 발달 또는 생리학(physiology) 또는 기질(temperament)의 다른 변경들이 발생하면 달성될 수 있다. 실제로, 이전 약속의 문맥을 재호출하기 위해 또한 차후 동작들에서 상기 인식을 포함하기 위해 템플릿 음성을 단련할 수 있다. 이러한 일례들에서, 음성 AA¹템플릿(AA¹ _t)을 검색하고 비교로 음성 또는 템플릿을 정련하기 위해 정련 모드를 선택하고 분석 수단(22) 또는 입력 수단(29)을 사용해서 갱신하는 것이 유용할 수도 있다. 또 다른 일례는 음성 템플릿 AA¹ _t에 대한 발신자였던 음성 AA에 유사한 하나 이상의 음성 특성들을 포함하는 음성 BB를 갖는 인간의 로케이션을 포함한다. 이러한 경우에, 음성 AA¹또는 음성 템플릿 AA¹ _t에 대한 제한된 입력들 또는 일반적인 정련 입력들로서 음성 BB로부터 하나 이상의 유사 특성들을 입력하는 것이 유용할 수도 있다. 그 후, 또한 음성 BB를 보유할 수 있고, 차후에 유용할 수도 있는 음성 BB¹및 음성 템플릿 BB¹ _t를 생성할 수도 있다. 다른 일례는 제시된 상황에 따라 요구시 유용하거나 시스템 또는 사용자에 의해 적합한 음성의 싱글 발신자를 위한 다양하게 정련된 음성들의 데이터베이스 생성을 포함한다. 또 다른 일례에서, 음성 매치 및 사용자의 희망에 따라 음성 템플릿들을 정련하기 위해 자연적 또는 인공적 생성 파형들 또는 다른 청각 또는 신호 소자들과 같은 적합한 정련 도구들을 제공하는 서비스가 제공될 수도 있다.In other embodiments of the system 10, the algorithm, signal, code means, or template, wholly or partially generated, may be stored or stored in the storage means 22, the template means 19, It may also be returned for refinement. This function allows and facilitates the improvement or adaptation of a specific speech template according to the instructions of the developer or other user. This can be achieved, for example, if a plurality of sets of data of the same human voice can be input over time or if different changes in the age, development or physiology or temperament of the voice originator occur have. Indeed, the template voice can be trained to recall the context of the previous appointment and also to include the recognition in subsequent actions. In such instances, it may be useful to select the refinement mode to search for the speech AA ¹ template (AA ¹ _t ) and refine the speech or template in comparison and update it using the analysis means 22 or the input means 29 It is possible. Another example includes the location of a human with a voice BB that includes one or more voice characteristics similar to voice AA that was the originator for the voice template AA ¹ _t . In this case, it may be useful to input one or more similar characteristics from speech BB as limited inputs for speech AA ¹ or speech template AA ¹ _t or general refining inputs. Thereafter, it is also possible to retain the voice BB and generate the voice BB ¹ and the voice template BB ¹ _t , which may be useful later. Another example includes creating a database of various refined voices for a single sender of voice that is useful on demand or suited by the system or user depending on the context presented. In another example, a service may be provided that provides suitable refinement tools, such as natural or artificial generated waveforms or other auditory or signal elements, to refine speech templates and speech templates according to the user's desires.

시스템(10) 또는 관련 시스템들 및 방법들의 다른 실시예들을 기술하기 전에, 상기 기술의 가능한 애플리케이션들을 검사하는 것이 유용하다. 일반적으로, 모두 열거하기 어려울 정도로 다양한 애플리케이션들이 있다. 그러나, 음성 유사 노이즈의 생성을 위한 템플릿 또는 코딩 도구에 제공되는 데이터 또는 상기 도구로부터 야기된 데이터에 의해 생성된 음성 유사 노이즈의 임의의 사용이 본 발명의 원리 내에서, 특히 필요한 경우 발신자의 실제 음성과 가상적으로 동일한 음성 사운드를 재생하기 위해 다른 노이즈 또는 사운드 생성 수단에 의해 상기 코딩 도구가 사용될 때 포착되는 것으로 고려된다. 완전히 새로운 문장들 또는 다른 언어 구조들로 생성된 음성의 사용도 또한 본 발명의 원리 내에 속한다. 음성 템플릿 프로세스 또는 생산의 신호 형성 또는 송신 파트로서 기계, 컴포넌트, 또는 컴퓨터 판독 가능 코드 수단을 제공하는 기능은 기술 사용을 용이하게 한다. 음성 템플릿 및 음성 생성 기술을 스트리밍 또는 다른 형태들의 데이터에 결합시키거나 상기 기술의 사용을 활성화하는 수단은 정보를 제공하거나 또는 반응적일 뿐만 아니라 적응적이며 지능적일 수도 있는 가상 대화를 허용하고, 상기 다이얼로그 또는 대화들은 사용자가 선택한 음성들에 의한 것이다. 또한, 본 명세서에 설명된 기술이 청각 사운드 뿐만 아니라 시각적 이미지에서도 사용될 수 있음이 인정된다.Before describing the system 10 or other embodiments of related systems and methods, it is useful to examine possible applications of the technique. In general, there are a variety of applications that are all difficult to enumerate. However, it should be understood that any use of speech-like noise generated by data or data provided to a template or coding tool for the generation of speech-like noise, or data derived from the tool may, within the principles of the present invention, And is considered to be captured when the coding tool is used by another noise or sound generating means to reproduce virtually the same voice sound. The use of speech produced with entirely new sentences or other language structures is also within the principles of the present invention. The ability to provide machine, component, or computer readable code means as part of a voice template process or production of a signal forming or transmitting portion facilitates use of the technique. Means for combining voice template and voice generation techniques with streaming or other types of data or activating the use of the techniques may provide information or allow for virtual conversations that may be responsive as well as adaptive and intelligent, Or conversations are due to the voices selected by the user. It is also appreciated that the techniques described herein can be used in visual as well as auditory sounds.

또한, 본 명세서에 기술된 음성 템플릿이 발신자 음성의 실제 인에이블링 부분을 포함하지 않는 데이터를 사용해서 생성될 수도 있으며, 발신자의 음성의 인에이블링 부분이 발신자 음성의 복제 정확성(replication accuracy)을 검사하기 위해 어떻게든지 다른 데이터로 사용될 수도 있다고 생각된다. 이러한 방식으로, 음성 템플릿에서 또는 단지 다른 경우 템플릿된 음성의 정확성 검사에서 음성의 인에이블링 부분을 사용할 수 있다. 템플릿되거나 복제된 음성이 컴퓨터 또는 다른 기계 및 시스템 사용자들과 상호 동작하거나 또는 상기 사용자들에게 프롬프트하는 데 사용될 수도 있다. 사용자는 자신의 템플릿 음성 라이브러리, 다른 템플릿 음성 소스로부터 템플릿 음성을 선택할 수도 있다. 또는, 사용자는 간단히 새로운 음성을 생성할 수도 있다. 예를 들어, 템플릿 음성 AA¹은 음성메일 프롬프트 또는 텍스트 판독, 또는 다른 통신 인터페이스를 위해 사용자에 의해 선택될 수 있는 반면, 템플릿 음성 CC는 인터랙티브 엔터테인먼트 사용과 관련된 사용을 위해 선택될 수도 있다. 사용자 기계에 숨어 있는 장해 추적(troubleshooting) 또는 문제점들, 또는 디바이스 사용자에 대한 경고 신호들은 사용자에 의해 식별되거나 해제될 수 있으며, 동시에 템플릿 음성 DD와 함께 동작한다. 이들은 단지 상기 기술이 템플릿 음성 기술을 사용해서 어떻게 향상된 사용자 인터페이스 및 기능, 태스크, 모드또는 다른 기능들과의 사용자에 의한 연관을 가능케 하는지를 보여주는 일례들이다. 템플릿 선택 및 사용, 생성된 음성 생성 및 사용은 사용자의 기계 또는 장치 내에서, 사용자의 기계 또는 장치 내에서 부분적으로, 또는 사용자의 기계 또는 장치 외부에서 달성될 수 있다. 이들은 단지 호텔 방, 방문 오피스, 또는 다른 일시적인 시나리오에서와 같이 하나 이상의 디바이스들을 일시적으로 사용하는 일례들일 수 있지만, 그럼에도 불구하고 상술된 방식으로 상술된 기능들을 제공한다. 예를 들어, 여행자가 항공기 상에서 또는 호텔 방에서 여행자의 부속물용으로 특정 음성들을 지니거나 액세스하기 원할 수도 있다. 본 발명은 병실 또는 호스피스 룸 또는 다른 로케이션들에서 유용할 수도 있다. 이러한 사용이 본 명세서의 하나 이상의 실시예들에 의해 가능하다. 흥미롭게, 이러한 시스템은 또한 자신의 음성에 대해 몇몇 개인들에 의해 사용될 수도 있고 타인들에게 유산으로서 제공될 수도 있다. 여러 가지 다른 사용은 본 명세서의 범위 내에 있다.The voice template described herein may also be generated using data that does not include the actual enabling portion of the caller's voice, and the enabling portion of the caller's voice may be used to determine the replication accuracy of the caller's voice It is believed that the data may be used as other data for inspection. In this way, the enabling portion of speech can be used in the speech template or, in other cases, in the accuracy checking of the speech uttered. Templed or replicated voice may be used to interact with or prompt the computer or other machine and system users. The user can also select a template voice from his or her template voice library or another template voice source. Alternatively, the user may simply generate a new voice. For example, the template voice AA ¹ may be selected by the user for voice mail prompting or text reading, or other communication interface, while the template voice CC may be selected for use in connection with the use of interactive entertainment. Fault traces or problems hidden in the user's machine, or warning signals for the device user, can be identified or released by the user and at the same time operate with the template voice DD. These are merely examples of how the above-described techniques may enable enhanced user interfaces and user associations with functions, tasks, modes or other functions using template voice technology. Template selection and use, generated voice creation and use may be accomplished within the user's machine or device, partly within the user's machine or device, or outside the user's machine or device. These may be examples of temporary use of one or more devices, such as in a hotel room, a visiting office, or other casual scenario, but nevertheless provide the functions described above in the manner described above. For example, a traveler may want to have or access certain voices on an aircraft or in a hotel room for travelers' appendages. The present invention may be useful in a room or hospice room or other locations. Such use is possible by one or more embodiments of the disclosure. Interestingly, such a system may also be used by some individuals for their voice and may be provided as a legacy to others. Various other uses are within the scope of the present disclosure.

본 명세서에 기술된 본 발명의 다른 사용은 정선한 템플릿 음성을 사용해서 역사적 사건들에 대해 아이들 등을 가르치는 것과 같은 교육용을 포함한다. 예를 들어, 부모가 자신의 아이가 아이의 고인이 된 조부모의 음성들 중 하나를 사용해서 1960년대의 미국의 인종 관계에 대해 배우기를 희망하면, (유용한 경우) 선택된 조부모의 템플릿이 사용되기 위해 디자인, 제조 및 선정된다. 시스템(10)은 선정된 주제에 대한 정보 및 지식을 수확하기 위해 하나 이상의 데이터베이스들에 액세스하고, 필요한 경우 사용될 상황 데이터베이스(33)와 같은 시스템(10) 내의 하나이상의 데이터베이스에 상기 정보를 제공한다. 조부모의 템플릿 음성 EE¹이 사용되어 희망 정보에 액세스하고, 희망 요청은 희망할 때 선정된 주제에 대한 논의로부터 시작해서 템플릿 음성 EE¹에 의해 만족된다. 이러한 논의는 희망하는 경우 시스템(10) 내에서 또는 원격 로케이션에서 차후 사용되기 위해 절약될 수 있다. 또는, 상기 논의는 "조부모", 즉 템플릿 음성과 아이 사이에서 상호작용할 수도 있다. 이러한 기능은 아이의 음성의 아이덴티티를 논의에 앞서 미리 알고 있고 적합한 어휘 및 차이로부터 있음직한 다양한 질문 결합들의 신경 인식을 포함하는 음성 인식 모듈을 사용해서 가능하다. 또한, 입력 및 음성 인식 모듈로부터 시스템(10)의 템플릿 음성 부분에 브리지(bridge)가 제공되어, 템플릿에 의한 반응이 가능하다. 다양한 음성 인식 도구들이 본 명세서에 기술된 신규한 용도들에 따라 구성될 때 상기 방식으로 사용되도록 상상될 수 있다. 물론, 이러한 구성은 신속하게 질문에 대한 답을 탐색하고 청취하고 있는 아이에게 적합한 응답을 공식화하는 수단을 필요로 한다. 명백하게 이러한 일례는 특히 적합한 데이터, 시스템 파워, 및 시스템 속도와 결합될 때 상기 기술의 비상한 잠재력을 보여준다.Other uses of the invention described herein include educational purposes such as teaching children on historical events using a selected template voice. For example, if a parent wishes to learn about US racial relationships in the 1960s by using one of her child's deceased grandparents' voices, the template of the selected grandparent (if useful) will be used Design, manufacture and are selected. The system 10 accesses one or more databases to harvest information and knowledge about a selected topic, and provides the information to one or more databases in the system 10, such as a situation database 33 to be used as needed. The grandparent's template voice EE ¹ is used to access the desired information, and the desired request is satisfied by the template voice EE ¹ starting from a discussion of the selected topic when desired. This discussion can be saved for future use within the system 10 or at a remote location if desired. Alternatively, the above discussion may interact between " grandparents ", i.e., the template voice and the child. This function is possible by using a speech recognition module that includes a neural recognition of the various question combinations likely to be known from the appropriate vocabulary and differences in advance of the discussion of the child's voice identity. Also, a bridge is provided to the template speech portion of the system 10 from the input and speech recognition module, allowing for reaction by the template. It can be envisaged that the various speech recognition tools will be used in this manner when constructed according to the novel uses described herein. Of course, this configuration requires a means to quickly navigate the answers to the questions and formulate the appropriate responses for the listening child. Obviously, this example shows the extraordinary potential of the technology, especially when combined with appropriate data, system power, and system speed.

대안으로, 선택적인 음성 인식 모듈을 사용해서, 템플릿 음성 청취자가 생성된 음성을 중단되게 하거나 계속되게 할 수 있거나, 또는 특정 커맨드들로 특정 다른 기능들을 가능케 하는 제한된 기능들만을 이용할 수 있다. 이는 모든 타입의 사용이 아닌 몇몇 타입 사용에 적합한 제한된 인터랙티브 모드 형태이다. 사용자가 선택적인 기능을 선택하는 대신 단지 조부모의 음성으로 스토리 또는 논의를 위해 구성된 기능들을 선택하더라도, 여타 타입의 사용에 대한 그 효과와 용도는 엄청나다.Alternatively, using the optional speech recognition module, the template voice listener may be able to interrupt or continue the generated voice, or may use only limited functions that enable certain other functions with certain commands. This is a limited interactive mode suitable for some types of use, not all types. Even if the user chooses the features that are configured for story or discussion with just the voice of the grandparent instead of selecting the optional features, the effects and uses for other types of use are enormous.

사용자가 오직 해당 음성의 발신자의 교육 및 생애 경험들과 일관된 템플릿을 사용하기 원하는 경우에, 이는 다양한 필터들 또는 변경자(modifier)를 통해 가능하다. 예를 들어, 템플릿 음성은 상술된 선택된 조부모의 템플릿 음성(템플릿 음성 EE¹)일 수도 있고, DATA DATES의 필터가 1960년대의 미국의 인종 관계에 대한 논의를 위해 "BEFORE DECEMBER 1963"이라는 선택된 날짜에 의해 사용된다. 그 결과는 지정된 날짜 후에 발생한 임의의 정보를 포함하지 않는 논의이다. 본 일례에서, "조부모"는 해당 국가의 1965년의 투표권법(Voting Rights Act) 또는 1960년대 말의 도시 폭동(urban riots)을 논의할 수 없다. 유사한 방식으로, 예를 들어 도 4에 도시된 데이터의 특정 타입을 사용해서 데이터 또는 템플릿 음성 자체의 복수의 상이한 양상들을 조정할 수 있다. 그러나, 다른 조정들도 본 명세서의 발명의 원리 내에서 가능하고 숙고되며, 상술된 일례들은 단지 본 발명의 기술의 기능들을 대표하는 것임이 인정된다.If the user only wishes to use a template that is consistent with the training and life experiences of the sender of the voice, this is possible through various filters or modifiers. For example, the template voice may be the template voice (template voice EE ¹ ) of the selected grandparent described above, and the filter of DATA DATES may be selected on the selected date " BEFORE DECEMBER 1963 " for discussion of US racial relationships in the 1960s Lt; / RTI > The result is a discussion that does not include any information that occurred after the specified date. In this example, "grandparents" can not discuss the country's 1965 Voting Rights Act or the urban riots of the late 1960s. In a similar manner, for example, a particular type of data shown in FIG. 4 may be used to coordinate a plurality of different aspects of the data or template speech itself. However, it will be appreciated that other adjustments are possible and contemplated within the principles of the invention herein, and that the above-described examples merely represent functions of the techniques of the present invention.

본 명세서에 기술된 시스템 및 방법들의 다른 실시예에서, 사용자는 사망한 가족 또는 다른 사람의 템플릿 음성이 사용자에게 읽혀지게 할 수도 있다. 본 일례에서, 모든 나이의 사람들은 부재하거나 사망한 가족 또는 사용자가 알고 있는 타인의 음성으로 책들을 읽을 수 있다. 데이터 링크들을 구현하기 위해 적합하게 구성된 미디어 및 컴퓨터 판독 가능 코드 수단의 다양한 배열과 결합될 때, 본 발명만이 사용자들에게 복수의 장점을 제공하게 된다. 이러한 타입의 용도는 단지 제공된 특정 일례를 넘어 광범위하게 적용된다. 실제로, 상기 방식의 상기 기술의 보다 광범위한 사용으로 요금을 지불하거나 또는 다른 형태의 보상으로 타인들에 의해 액세스될 수 있고 사용될 수 있는 승인되고 템플릿된 음성들의 데이터베이스가 유용하게 된다. 음악용으로 사용될 때, 특히 명성있는 과거 및 현재 가수들 (복수의 가수의 음성들이 템플릿용으로 사용될 수 있음)의 템플릿 음성들에 액세스할 수 있으면, 상기 기술은 유사한 깊은 의미를 갖는다. 명백하게, 상기 기술은 새로운 제조, 임대, 구매 산업, 또는 비즈니스를 실행하는 음성 템플릿들 및 관련 수단, 기술들 및 방법들을 사용하는 산업을 가능하게 한다.In another embodiment of the systems and methods described herein, the user may have the template voice of a deceased family member or other person read by the user. In this example, people of all ages can read books with the voice of a family member who is absent or deceased, or someone the user knows. When combined with various arrangements of media and computer readable code means suitably configured for implementing data links, the present invention only provides a plurality of advantages to users. This type of application is broadly applied beyond just the specific example provided. In fact, a database of approved and templated voices, which can be accessed and used by others by paying a fee or in other forms of compensation, is made available in a wider range of uses of the above-described techniques. When used for music, the technique has a similar deep meaning, especially when it is possible to access template voices of reputable past and present singers (voices of plural singers can be used for templates). Obviously, the techniques enable industry to use voice templates and associated means, techniques and methods to implement new manufacturing, leasing, purchasing industry, or business.

본 발명은 또한 중요하거나 또는 중요하지 않은 특정한 심리학적 병들에 대한 의학적 치료에 유용할 수도 있다. 템플릿 음성 요법을 적합하게 사용해서 병을 꽤 완화시킬 수도 있고 또는 병을 치료할 수도 있다. 상기 기술의 가능한 또 다른 사용은 사용을 위해, 그러나, 실제 포유류의 기원으로부터 하나 이상의 템플릿 음성들의 기초(basis) 또는 전조(precursor)를 갖는 새롭게 디자인된 음성을 생성하는 것이다. 새롭게 생성된 음성의 소유 및 다른 사용은 면허 또는 로얄티 등과 같은 다양한 수단 또는 합법적인 시행으로 제어될 수 있다. 물론, 이러한 음성들은 창조자에 의한 제한된 사용을 위해 사적인 재산으로 유지될 수도 있다. 생성될 수도 있는 라이브러리들의 본성(nature)을 상상할 수 있다. 이러한 음성들은 창조자의 창조적인 포부를 나타내게 되지만, 각각의 음성은 실제로 조직 DNA의 성분과 유사하지만 특정 음성에 응용될 수 있는 템플릿 도구 또는 코드를 사용해서 기초로서실제 포유류 음성의 성분 또는 스트레인을 가지게 된다. 이러한 타입의 결합은 포유 동물들에 의해 생성된 음성 및 다른 사운드들을 근거로 한 강력하고 새로운 통신 기능 및 관계를 나타낸다.The present invention may also be useful for medical treatment of certain psychological illnesses that are important or unimportant. Template speech therapy may be used to moderate the disease considerably or to treat the disease. Another possible use of the technique is to generate a newly designed speech for use, but with a basis or precursor of one or more template voices from the origin of the actual mammal. Ownership and other uses of the newly generated voice may be controlled by various means such as a license or royalties or by lawful enforcement. Of course, these voices may be kept as private property for limited use by the creator. You can imagine the nature of the libraries that may be created. These voices represent the creative aspirations of the creator, but each voice has a component or strain of the actual mammalian voice as the basis, using a template tool or code similar to the component of the tissue DNA, but applicable to a particular voice . This type of combination represents a powerful new communication function and relationship based on voice and other sounds produced by mammals.

본 발명에 따른 시스템들은 손바닥 크기이거나 또는 다른 크기일 수도 있다. 시스템들은 다른 시스템들에 내장될 수도 있고 또는 독립형으로 동작할 수도 있다. 본 명세서의 시스템들 및 방법들은 분산 네트워크 또는 관계 있는 다른 원격 시스템의 일부 요소들 또는 요소 전체를 가질 수 있다. 본 명세서의 시스템들 및 방법들은 다운로드할 수 있거나 또는 원격으로 액세스할 수 있는 데이터를 이용할 수 있고, 다른 다양한 시스템들 또는 방법들 또는 프로세스들 제어에 사용될 수 있다. 본 발명의 실시예들은 다른 오퍼레이팅 또는 애플리케이션 시스템들에 의해 전체적으로 또는 부분적으로 실행될 수도 있는 본 명세서에 기술된 방법들 및 동작들을 요청 및 구현하기 위한 노출된 인터페이스 루틴들을 포함한다. 템플릿 프로세스 및 템플릿 음성들의 사용은 포유 동물 또는 인공적인 기계 또는 프로세스에 의해 달성 및 사용될 수도 있다. 예를 들어, 보트(bot) 또는 다른 지능적인 조력자가 상기 타입의 하나 이상의 템플릿 음성들을 생성 또는 사용할 수도 있다. 상기 조력자는 또한 일정한 일반적인 또는 제한적인 기준에 따라 자동으로 음성들을 탐색하는데 사용될 수도 있고, 그 후 음성 요인들로 가상의 또는 물리적인 템플릿 음성들을 생성할 수도 있다. 이러한 방식으로, 템플릿 음성들의 대형 데이터베이스가 효율적으로 생성될 수도 있다. 이러한 또는 유사한 시스템 사용에서, 데이터 또는 다른 타입들의 태그 및 식별 기술을 생성해서 템플릿 음성 생성에 사용되는 실제음성의 하나 이상의 부분들에 적용하는 것이 바람직할 수도 있다.The systems according to the invention may be palm-sized or of different sizes. The systems may be embedded in other systems or may operate standalone. The systems and methods herein may have some elements or elements of a distributed network or other remote system of interest. The systems and methods herein may utilize downloadable or remotely accessible data and may be used to control other various systems or methods or processes. Embodiments of the present invention include exposed interface routines for requesting and implementing the methods and operations described herein that may be performed in whole or in part by other operating or application systems. The use of template processes and template sounds may be achieved and used by mammals or artificial machines or processes. For example, a bot or other intelligent helper may generate or use one or more template voices of this type. The helper may also be used to automatically search for voices according to certain generic or restrictive criteria and then generate virtual or physical template voices with the voice factors. In this way, a large database of template sounds may be efficiently generated. In using such or similar systems, it may be desirable to generate data or other types of tags and identification techniques and apply them to one or more portions of the actual speech used in template speech generation.

이하는 본 명세서에 서술된 기술들을 사용하는 애플리케이션들의 일례들이다. 이들은 제한적인 의미가 아니라, 본 명세서에서 가능하고 제안된 일례들 외에 가능한 대표적인 사용예들로서 제공된다.The following are examples of applications that use the techniques described herein. They are not meant to be limiting, but are provided as possible exemplary uses in addition to the examples that are possible and suggested herein.

일례 1Example 1

본 명세서의 실시예들의 소자들을 사용하는 템플릿 프로세스는 음성 사운드를 정확하게 복제하는 데 필수적인 특정 음성의 특성들의 논리 구조를 포함하는 음성 코딩 신호를 산출한다.A template process using the elements of the embodiments of the present invention yields a speech coded signal that contains the logical structure of the characteristics of a particular speech, which is essential for accurately reproducing the speech sound.

일례 2Example 2

본 명세서의 기술을 사용하는 하나 이상의 선택된 음성들을 사용하는 퍼스널 컴퓨터 프롬프터 및 갱신기, 상태 리포터, 또는 메이트.A personal computer prompt and an updater, a status reporter, or a mate using one or more selected voices using the teachings herein.

일례 3Example 3

본 명세서의 기술을 사용하는 하나 이상의 선택된 음성들을 사용하는 홈 에너지 모니터, 리포터, 또는 메이트.A home energy monitor, a reporter, or a mate using one or more selected voices using the techniques herein.

일례 4Example 4

예를 들어 사용자가 선택한 음성으로 호텔에서 기상 호출하는 것과 같이 희망 프롬프팅에 따라 사용자를 프롬프트하기 위한 호텔 객실 보조물, 또는 자동 보조물. 유사한 방식으로, 차량 오퍼레이터가 사용자가 선택한 음성으로 정보를 수신할 수도 있다.For example, a hotel room aide, or an auto aide, to prompt the user according to the desired prompting, such as a weather call at a hotel with a voice of their choice. In a similar manner, the vehicle operator may also receive information with the voice selected by the user.

일례 5Example 5

음성 포착, 메이트, 경고기 등을 위해 임의의 시간에 퍼스널 디지털 보조물, 핸드헬드 퍼스널 컴퓨팅 디바이스, 또는 다른 전자 디바이스 또는 컴포넌트에서 본 명세서의 기술을 사용해서 하나 이상의 선택된 음성들을 사용한다.At least one selected voice is used in the personal digital assistant, handheld personal computing device, or other electronic device or component at any time for voice acquisition, mate, alerter, etc., using the teachings of the present specification.

일례 6Example 6

본 명세서에 기술된 비즈니스 및 기술 방법들 및 제조들을 구현하기 위한 컴퓨터/전자 칩 로직, 명령들 또는 코드 수단으로 하나 이상의 선택된 음성들 또는 음성 템플릿들을 생성 또는 관리한다.Generates or manages one or more selected voices or speech templates with computer / electronic chip logic, instructions or code means for implementing the business and technical methods and manufactures described herein.

일례 7Example 7

사진, 디지털 비디오 또는 레이저 사진 이미지와 같은 다른 가시적인 매체들과 결합된 음성 템플릿 기술을 사용한다.It uses voice template technology combined with other visible media such as photographs, digital video or laser photographic images.

일례 8Example 8

음성을 기록, 재생 또는 재구성할 수 있는 임의의 장치와 플러그-인하기 위한 플래시-메모리 베이스 프로파일 카드에 의한 본 명세서의 기술을 사용한다.Any device capable of recording, playing back, or reconfiguring voice and a flash-memory based profile card for plug-in use the teachings herein.

일례 9Example 9

사용자가 선택한 음성 또는 음성들로 희망할 때 사용자를 위해 다운로드 가능한 정보를 스캔 및 갱신하는 퍼스널 디바이스에 의한 본 명세서의 기술을 사용한다. 예를 들어, 이는 사용자가 유효하지 않고 본 명세서의 기술을 사용해서 하나 이상의 지정된 음성들로 사용자에게 상태를 보고할 때 배경 탐색을 위한 인포-보트(info-bot) 및 인터페이스와 같이 보트에 의해 실행될 수 있는 동작들을 조직화하는 데 유용할 수 있다.Uses the description of the present specification by a personal device that scans and updates downloadable information for a user when he or she desires with the voice or voices selected by the user. For example, this may be done by a boat, such as an info-bot and interface for background navigation when the user is not available and uses the description herein to report status to the user with one or more designated voices May be useful in organizing actions that can be performed.

일례 10Example 10

차량 또는 다른 운송 시스템의 하나 이상의 컴포넌트들과 결합된 본 명세서의 기술을 사용한다.Use the techniques of this disclosure in combination with one or more components of a vehicle or other transportation system.

일례 11Example 11

비행 중 지침서용으로 비행기의 하나 이상의 컴포넌트들에 의한 본 명세서의 기술을 사용한다.Use the description of this specification by one or more components of the airplane for flight guidance.

일례 12Example 12

퍼스널 컴퓨터 포스쳐 모니터, 전기 장치, 위험한 장치 등과 같은 워크플레이스의 기어 또는 장치의 하나 이상의 컴포넌트들에 의해 사용될 때 안전 신호로서 본 명세서의 기술을 사용한다.A personal computer posture monitor, an electrical device, a hazardous device, or the like, as a safety signal when used by one or more components of a gear or device of a workplace.

일례 13Example 13

구술 디바이스, 프롬프트, 지침서, 또는 텍스트 리더와 같은 다른 음성 활성화 시스템의 애드온으로서 본 명세서의 기술을 사용한다.It uses the teachings of this specification as an add-on to other voice activation systems such as oral devices, prompts, tutorials, or text readers.

일례 14Example 14

드라이버에 의해 또는 자동으로 또는 다른 수단에 의해 활성화될 수 있는 노상 분노(road rage) 또는 다른 형태의 분노 및 좌절과 같은 사회적인 생각 또는 제어 메카니즘으로서 본 명세서의 기술을 사용한다.Uses the teachings of the present disclosure as a social thought or control mechanism, such as road rage or other forms of anger and frustration that can be activated by the driver, automatically or by other means.

일례 15Example 15

가정, 학교 또는 직장에서의 교육 도구로서 본 명세서의 기술을 사용한다.Use the description in this specification as an educational tool at home, at school, or at work.

일례 16Example 16

영감있는 독서를 위한 본 명세서의 기술을 사용한다.Use the description of this specification for inspiring reading.

일례 17Example 17

가족 역사 기계로서 작용하는 도구로서 본 명세서의 기술을 사용한다.It uses the techniques of this specification as a tool that acts as a family history machine.

일례 18Example 18

가수들을 위해 최상 또는 희망 음성으로 음성 소싱 및 매칭하는 기술의 MusicMatch(상표) 브랜드로서 본 명세서의 기술을 사용한다.The technology of this specification is used as the MusicMatch (trademark) brand of technology for voice sourcing and matching with best or desired voice for singers.

일례 19Example 19

고유 실행자에 의해 이미 사용되거나 음성 템플릿 기술 결합 사용을 위해 생성된 엔터테인먼트 스크립트를 템플릿하기 위해 양호한 음성들을 사용하는 영화 또는 비디오 매치 기술의 VoiceSelect(상표) 브랜드로서 본 명세서의 기술을 사용한다.The technology of the present specification is used as the VoiceSelect (trademark) brand of a movie or video matching technique that uses good voices to template an entertainment script that is already used by a native implementer or created for use in combining voice template techniques.

일례 20Example 20

"SelectVoice(상표)" 브랜드 또는 "VoiceX(상표)" 브랜드 동작 모드(들)에 관여하고 일례 7과 유사하게, 선택될 수 있는 익명의 모델들 뿐만 아니라 음성을 매치하는 사람의 이미지들의 데이터베이스를 갖는 핸드헬드 유닛과 같은 "자아 변경(alter ego)" 디바이스로서 본 명세서의 기술을 사용한다.Having a database of images of people matching the voice as well as anonymous models that are involved in the " SelectVoice (trademark) " brand or the " VoiceX (trademark) " brand mode of operation (s) Quot; alter ego " device, such as a handheld unit.

일례 21Example 21

프로파일 또는 템플릿 음성의 프로파일을 생성하기 위해 본 명세서의 기술을 사용한다.The techniques of the present specification are used to generate a profile of a profile or template voice.

일례 22Example 22

모니터 및 인터랙티브 보안을 위한 거주지의 취침 리더 또는 나이트 메이트로서 본 명세서의 기술을 사용한다.Monitor and use the techniques of this disclosure as sleeping readers or nightmates in the residence for interactive security.

도 2는 템플릿을 위해 설정된 음성 AA의 포착, 분석 및 사용을 달성하기 위한 컴퓨터 판독 가능 코드 수단 또는 방법을 포함할 수 있는 음성 포착 서브시스템의 실시예의 흐름도이다. 도 3은 음성 데이터 특성 루팅을 효율적으로 결정하기 위한 로직 또는 방법 수단을 포함할 수 있는 음성 분석 시스템의 실시예이다. 상기 실시예들에서, 음성 AA는 획득 모듈 또는 단계(103)에서 포착된 후 템플릿 프로세스를 통해 경로(106)와 같은 로직 단계들 및 데이터 도전성 경로들에 의해 루팅된다. 포착은 디지털 또는 아날로그 방법들 및 컴포넌트들에 의해 달성될 수도 있다. 포착된 음성 AA를 나타내는 신호는 기존 음성 프로파일 또는 템플릿이 음성 AA와 매치하는지를 결정하기 위해 분석 수단(111) 또는 방법을 통해 루팅된다. 이는 예를 들어 획득 모듈(103) 또는 분석 수단(111)에 의해 결정된 하나의 특성 또는 복수의 특성들을 (도 4의 음성 특성 서브시스템(113)에 도시됨) 비교하고, 하나 이상의 특성들을 분석 단계(111)에서와 같이 액세스에 유용한 공지된 음성 프로파일들 또는 템플릿들과 비교함으로써 달성될 수도 있다. 대표적인 피드백 및 초기 분석 루프(114)는 경로(116)에서와 같이 상기 단계들을 용이하게 한다. 상기 비교는 국부적으로 또는 원격으로 음성 프로파일 데이터베이스 또는 다른 저장 매체의 문의를 포함할 수 있다. 분석 모듈의 분석 단계(111) 및 음성 특성 서브시스템(113)은 분석되는 음성이 기존 음성 프로파일 또는 데이터 파일과 관계되거나 매치하는지를 확인하기 위해 알고리즘 기술, 통계적 기술 또는 다른 기술들에 따라 반복될 수도 있다. 도 4는 음성 특성 서브시스템(113)을 더 상세히 나타낸다.2 is a flow diagram of an embodiment of a speech acquisition subsystem that may include computer readable code means or methods for achieving acquisition, analysis and use of speech AA set for a template. 3 is an embodiment of a speech analysis system that may include logic or method means for efficiently determining voice data characteristic routing. In the above embodiments, the voice AA is routed by the acquisition steps or logic steps such as path 106 and data conductive paths through the template process after being captured in step 103. The acquisition may be accomplished by digital or analog methods and components. The signal indicative of the captured voice AA is routed through the analysis means 111 or method to determine whether the existing voice profile or template matches the voice AA. Which compares, for example, one characteristic or plurality of characteristics (as shown in the voice characteristic subsystem 113 of FIG. 4) determined by the acquisition module 103 or the analysis means 111, May be accomplished by comparing it with known speech profiles or templates useful for accessing, such as in step 111. [ Representative feedback and initial analysis loop 114 facilitates these steps as in path 116. [ The comparison may include querying a voice profile database or other storage medium, either locally or remotely. The analysis step 111 and the voice property subsystem 113 of the analysis module may be repeated according to algorithmic techniques, statistical techniques or other techniques to ascertain whether the voice being analyzed relates to or matches an existing voice profile or data file . 4 shows the voice characteristic subsystem 113 in more detail.

도 2를 다시 참조하면, 음성 AA에 대응하는 신호가 기존 음성 프로파일 세트와 매치하지 않으며 기존 음성 프로파일 세트에 의해 식별되지 않으면, 신호는 포괄적인 특성을 위해 음성 특성 서브시스템에 루팅된다. 그러나, 기존 음성 프로파일 데이터 파일이 음성 AA의 프로파일 신호와 매치하면, 모듈/단계(127)에서 템플릿 생성이 불필요할 수도 있다. 이러한 상황에서, 신호는 후에 자체가 저장되거나 적용될 수도 있는 교정된 프로파일 또는 템플릿의 가능한 생성을 위해 분석 및/또는 특성화될 수 있다. 예를 들어 이러한 상황은 이전에는 유효하지 않았던 추가 특성 데이터 (인에이블링 부분의 크기, 스트레스 존재 또는 결여, 또는 다른 요인들)가 유효할 때 발생할 수 있다. 따라서, 특정 음성 데이터 파일이 복수의 템플릿들을 포함할 수 있다. 이는 도 2 및 도 3의 유효성(validate) 검사 서브시스템(133)에 일반적으로 도시된 로직 단계들 및 시스템 컴포넌트들을 갖는 유효성 검사 프로세스이다. 서브시스템들 및 컴포넌트들의 관계 위치에 있어서, 상기 도면들은 일반적으로 개략적임이 강조된다. 또한, 도 3에 도시된 바와 같이, 음성 프로파일 데이터 파일이 존재한다고 결정한 후에(단계(137)), 단계(139)에서 유효성 검사 로직은 선택적으로 발생한다. 기존 템플릿의 교정이 타당하면, 이는 단계(142)에서 생성된다. 대안으로, 로직 단계(145)는 기존 템플릿에 대한 교정이 달성되지 않음을 나타낸다. 단계(143) 또는 단계(145)에 이어서, 새로운 교정 음성 프로파일 또는 템플릿, 또는 이전 음성 프로파일 또는 템플릿이 단계(155)에서저장 또는 사용된다.Referring back to FIG. 2, if the signal corresponding to the voice AA does not match the existing voice profile set and is not identified by the existing voice profile set, the signal is routed to the voice characteristic subsystem for comprehensive characteristics. However, if the existing voice profile data file matches the profile signal of voice AA, template generation may be unnecessary in module / step 127. In such a situation, the signal may be analyzed and / or characterized for possible generation of a calibrated profile or template, which may then be stored or applied itself. For example, this situation may occur when additional feature data (size of the enabling portion, presence or absence of stress, or other factors) that were not previously valid is valid. Thus, a particular voice data file may comprise a plurality of templates. This is a validation process with the logic steps and system components shown generally in the validate checking subsystem 133 of FIGS. For the relative positions of the subsystems and components, it is emphasized that the figures are generally schematic. Also, as shown in FIG. 3, after determining that a voice profile data file is present (step 137), the validation logic is optionally generated in step 139. If the calibration of the existing template is valid, it is created in step 142. Alternatively, the logic step 145 indicates that calibration for the existing template is not achieved. Following step 143 or step 145, a new calibration voice profile or template, or a previous voice profile or template, is stored or used in step 155.

도 2의 템플릿 생성 모듈/단계(127)는 템플릿되거나 또는 프로파일되는 특정 음성을 위해 고유 식별자, 양호하게, 디지털 식별자를 생성하기 위해 음성 특성 서브시스템을 사용하는 것을 포함한다. 상기 데이터는 추상적으로 유전 코드, 유전자 시퀀스 코드, 또는 바 코드들과 유사하고, 대단히 고유한 객체들, 엔티티들 또는 현상들의 유사 식별자들이다. 따라서, 출원인들은 "Voice DNA(상표) 또는 VDNA(상표)" 및 "Voice Sequence Codes(상표) 또는 Voice Sequence Coding(상표)" 뿐만 아니라 "Voice Template Technology(상표)"로서 음성 프로파일 또는 템플릿과 관련된다. 용어 "프로파일, 프로파일들, 또는 프로파일링(Profile, Profiles or Profiling)" 및 파생어들은 이러한 새로운 기술을 위한 상기 상표 또는 다른 참조 용어들로 대체될 수도 있다. 템플릿 생성 완료에 이어서, 음성 템플릿은 저장될 수도 있다 (저장 모듈 또는 단계(161)에서 도시됨 또는 모듈 또는 단계(164)에서 사용시 적용됨).The template generation module / step 127 of FIG. 2 includes using a voice characteristic subsystem to generate a unique identifier, preferably a digital identifier, for a particular voice that is being templated or profiled. The data is abstractly similar to a genetic code, a gene sequence code, or a bar code, and is a very similar unique identifier of objects, entities, or phenomena. Thus, the applicants are associated with a voice profile or template as " Voice Template Technology (trademark) " as well as " Voice DNA TM or VDNA TM and Voice Sequence Codes TM or Voice Sequence Coding TM. . The term " Profile, Profiles, or Profiling " and derivations may be replaced by such trademarks or other reference terms for this new technology. Following completion of the template creation, the speech template may be saved (shown in storage module or step 161 or applied in use in module or step 164).

도 4는 음성 특성 서브시스템의 개략도이다. 이러한 기술은 본 명세서에 특성 데이터 및 기술된 바와 같이 음성 템플릿 또는 프로파일링을 사용해서 음성을 정의하기 위해 현저한 데이터를 결정 및 특성화하기 위한 수단의 적어도 한 실시예를 포함한다. 도시된 바와 같이, 다양한 타입들의 데이터가 특성 데이터를 공식화할 때의 비교를 위해 사용될 수도 있다. 특성 데이터는 코딩 기준에 따라 음성 템플릿 또는 프로파일을 생성하는 데 사용된다. 도 4의 데이터가 개별적인 모듈들로 구성된 것으로 보이더라도, 다양한 시퀀스들 또는 가중 우선권들 중 임의의 시퀀스또는 가중 우선권의 비교를 위해 임의의 데이터가 액세스될 수도 있는 개방 비교 프로세스가 양호할 수도 있다. 여하튼, 도면에 도시된 바와 같이, 데이터는 언어, 성, 방언, 지역 또는 액센트 카테고리들 (모듈 또는 단계(201)에서 "음성 특성" 출력 신호 VC₀으로 도시됨); 주파수, 피치, 톤, 존속 기간 또는 진폭 (모듈 또는 단계(203)에서 출력 신호 VC₁로 도시됨); 나이, 건강, 발음, 어휘, 또는 생리학 (모듈 또는 단계(205)에서 출력 신호 VC₂로 도시됨); 패턴, 구문론, 볼륨, 과도기, 또는 음성 타임 (모듈 또는 단계(207)에서 출력 신호 VC₃으로 도시됨); 교육, 경력, 경험, 반복 어구, 또는 문법 (모듈 또는 단계(209)에서 출력 신호 VC₄로 도시됨); 직업, 국적, 민족성, 풍습, 또는 환경 (모듈 또는 단계(211)에서 출력 신호 VC₅로 도시됨); 문맥, 불일치, 규칙/모델, 인에이블링 부분 타입, 크기 또는 수 (모듈 또는 단계(213)에서 출력 신호 VC₆으로 도시됨); 속도, 감정, 자음 결합, 유사성, 또는 청각 모델 (모듈 또는 단계(215)에서 출력 신호 VC₇로 도시됨); 수학 모델, 프로세싱 모델, 신호 모델, 사운드 유사 모델, 또는 공유 모델 (모듈 또는 단계(217)에서 출력 신호 VC₈로 도시됨); 벡터 모델, 적응 데이터, 분류, 음성학, 또는 유절 발음 (모듈 또는 단계(219)에서 출력 신호 VC₉로 도시됨); 분절, 음절, 결합, 자기 학습, 또는 침묵 (모듈 또는 단계(221)에서 출력 신호 VC₁₀으로 도시됨); 패킷, 호흡률, 음색, 울림, 또는 재현 모델 (모듈 또는 단계(223)에서 VC₁₁로 도시됨); 화성, 합성 모델, 음절 분열, 충실도, 또는 다른 특성 (모듈 또는 단계(225)에서 출력 신호 VC₁₂로 도시됨); 또는 음성의 일부분 (소수부 또는 전체)을 고유하게 식별하기 위한 다양한 다른 기술들을 포함한다. 예를 들어, 디지털 또는 아날로그 음성 서명, 변조, 합성 입력 데이터, 또는 상기 목적을 위해 형성되거나 유용한 다른 데이터를 더 포함할 수도 있으며, 모두 모듈 또는 단계(227)에서 출력 신호 VC_x로 도시되어 있다.Figure 4 is a schematic diagram of a voice characteristic subsystem. Such techniques include at least one embodiment of means for determining and characterizing significant data for defining speech using characteristic data and speech templates or profiling as described herein. As shown, various types of data may be used for comparison when formulating characteristic data. The characteristic data is used to generate a speech template or profile according to a coding criterion. Although the data of Fig. 4 appears to be composed of discrete modules, an open comparison process may be preferred wherein any data may be accessed for any sequence of various sequences or weighted priorities or for comparison of weighted priorities. In any case, as shown in the figure, the data may be in the language, gender, dialect, region or accent categories (shown as the "voice characteristic" output signal VC ₀ in module or step 201); Frequency, pitch, tone, duration or amplitude (shown as output signal VC ₁ in module or step 203); Age, health, pronunciation, vocabulary, or physiology (shown as output signal VC ₂ in module or step 205); Pattern, syntax, volume, transient, or voice time (shown as output signal VC ₃ in module or step 207); Training, experience, experience, repeatable phrase, or grammar (shown as output signal VC ₄ in module or step 209); Occupation, nationality, ethnicity, customs, or environment (shown as output signal VC ₅ in module or step 211); Context, mismatch, rule / model, enabling partial type, size or number (shown as output signal VC ₆ in module or step 213); Speed, emotion, consonant combination, similarity, or auditory model (shown as output signal VC ₇ in module or step 215); A mathematical model, a processing model, a signal model, a sound-like model, or a shared model (shown as an output signal VC ₈ in module or step 217); Vector model, adaptive data, classification, phonetics, or pronunciations (shown as output signal VC ₉ in module or step 219); Segment, syllable, combination, self-learning, or silence (shown as output signal VC ₁₀ in module or step 221); Packet, respiratory rate, tone, sound, or to reproduce the model (shown as VC ₁₁ from the module or step 223); Mars, synthetic model, syllable fission, fidelity, or other characteristics (shown as output signal VC ₁₂ in module or step 225); Or various other techniques for uniquely identifying a portion of a speech (a decimal or whole). For example, digital or analog voice signatures, modulation, synthetic input data, or other data formed or useful for this purpose, all shown in module or step 227 as the output signal VC _x .

임의의 하나 이상의 모듈들 또는 단계들로부터의 하나 이상의 데이터 타입들이 음성 템플릿에 값을 제공할 수도 있음이 인정된다. 또한, 특정 음성을 위한 고유 음성 프로파일 또는 템플릿을 정의하는 데 유용하고 본 명세서의 신규한 기술에 따라 사용되면, 본 발명의 목적을 위해, VC_x는 본 명세서에 언급된 바와 무관하게 해석할 때 임의의 공지된 카테고리 기술을 포함한다. 음성 특성 파일들 및 출력 신호들 VC₀, VC₁, VC₂, VC₃, VC₄, VC₅, VC₆, VC₇, VC₈, VC₉, VC₁₀, VC₁₁, VC₁₂, 및 VC_x의 결합된 데이터가 VC_x에 의해 음성을 정확하게 또한 효율적으로 분석 및 특성화하기 위해 다양한 방법들로 우선순위화 및 결합될 수 있음이 인정되는데, 이는 참조용으로 본 명세서에 인용된 또 다른 기술들을 나타낸다.It is recognized that one or more data types from any one or more of the modules or steps may provide values to the voice template. Also, if it is useful to define a unique voice profile or template for a particular voice and used in accordance with the novel techniques of this specification, for purposes of the present invention, VC _x may be arbitrary, &Lt; / RTI > The voice attributes file, and the output signal _{_{_{VC 0, VC 1, VC 2}}} , VC 3, VC 4, VC 5, VC 6, VC 7, VC 8, VC 9, VC 10, VC 11, VC 12, and VC _x It is appreciated that the combined data of VC _x can be prioritized and combined in various ways to accurately and efficiently characterize and characterize speech by VC _x , which refers to other techniques cited herein for reference .

도 5 및 도 6은 관련된 것으로 생각되는 정보를 나타내고 템플릿되는 음성을 형성하는 디지털 또는 부호화 데이터와 같은 다양한 음성 특성 데이터를 수신하기에 적합한 일례의 신호 번들러를 도시한 것이다. 신호 번들러(316)는 신호 컨텐츠 모듈 또는 단계(332)의 출력과 하나 이상의 신호들(VC₀-VC_x)로부터의 값들/스코어를 결합하고, 출력 음성 템플릿, 코드 또는 신호 VT_x를 생성하기 위해 다양한 잠재적인 사용자 인터페이스들, 디바이스들 또는 전송 수단에 의한 적합한 전송 및 사용에 적합하게 모듈 또는 단계(343)에서 신호 또는 코드를 포맷한다. 다양한 음성 특성들을 묘사하기 위해 고유 식별자를 생성하는 다양한 방법들이 가능하고, 이러한 다양한 가능성들이 몇몇 컴포넌트 방법론과 무관하게 일정한 정도로 본 발명의 보다 넓은 문맥 및 범위의 관점에서 가능하다고 인정된다.Figures 5 and 6 illustrate an example signal bundler suitable for receiving various voice characteristic data, such as digital or encoded data, representing information thought to be related and forming a voice to be templated. Stop time signal 316 is coupled to the values / scores from the output and the one or more signals of the signal content module or step _{_{(332) (VC 0 -VC x}} ) , and generate an output speech templates, code or signal VT _x (S) or code (s) in step 343 suitable for proper transmission and use by various potential user interfaces, devices, or transmission means. It is appreciated that various ways of generating unique identifiers to describe various voice characteristics are possible, and that these various possibilities are possible to some extent, in view of the broader context and scope of the present invention, regardless of some component methodologies.

도 7은 음성 템플릿 생성 또는 저장 장치(404)와 원격 사용자 간의 전자 문의 및 전송의 대표적인 구성 및 방법이다. 본 도면에서, 인에이블링 부분들은 임의의 수의 다양한 사용자들(410, 413, 416)에 의해 원격 음성 템플릿 생성 또는 저장 장치(404)에 송신될 수 있다. 장치(404)는 그 후 음성 템플릿 데이터 파일을 생성 또는 검색하고 음성 템플릿 신호를 생성 또는 검색한다. 템플릿 신호는 그 후 단계(437)에 도시된 바와 같이 사용자 또는 지정인에게 송신 또는 다운로드된다. 다운로드될 때, 또는 차후에, 사용자 요청(441)에 이어서, 템플릿 신호는 단계(457)에 도시된 바와 같이 활성화 명령들 및 프로토콜들을 포함해서 행선 디바이스에 의해 적합한 사용을 위해 포맷된다.7 is a representative configuration and method of electronic inquiry and transmission between a voice template creation or storage device 404 and a remote user. In this figure, the enabling portions may be transmitted to the remote voice template creation or storage device 404 by any number of different users 410, 413, The device 404 then generates or retrieves a voice template data file and generates or retrieves a voice template signal. The template signal is then transmitted or downloaded to the user or designee as shown in step 437. Upon download, or subsequently, following a user request 441, the template signal is formatted for proper use by the routing device, including activation commands and protocols, as shown in step 457.

도 8은 사용자 모드 및 필요에 따라 음성 템플릿 기술을 사용하기 위해 필수적인 컴포넌트들인 카드, 디스크, 또는 칩과 같은 이동 매체를 도시한 개략도이다.예를 들어, 도 7 및 도 8을 사용해서, 호텔 객실문 카드(477)가 여행자에 의해 호텔에 체크인시 제공될 수도 있다. 그러나, 카드에 적용된 일반적인 온사이트 보안 코드 프로그래밍 및 회로(479) 외에, 본 발명의 양상들을 포함하는 추가 기능들이 유용하게 달성될 수도 있다. 이러한 카드 내의 선택적인 기능들의 개략도는 여행자가 호텔에 투숙하는 동안 다양한 목적을 위해 여행자가 선택한 음성 또는 음성들에 대한 음성 템플릿을 수신 및 사용하기 위한 수단(481)을 포함한다. 도시된 바와 같이, 이러한 기능들은 템플릿 수신 및 저장 장치(501), 노이즈 생성기 또는 생성 회로(506), 중앙 처리 장치(511), 입출력 회로(515), 디지털-아날로그/아날로그-디지털 소자(518) 및 클록 수단(521)을 포함할 수도 있다. 셀룰러 폰 산업에서 공지된 바와 같은 음성 압축 또는 확장 수단, 또는 카드가 원하는대로 작동하게 할 수 있는 다른 컴포넌트들과 같이, 다양한 다른 소자들이 사용될 수도 있다. 사용자는 여행자가 선택한 음성(음성들)으로 호텔 내에서 무생물 디바이스들로 다이얼로그 또는 인터페이스를 즐길 수도 있다. 실제로, 여행자 프로파일은 적합하게 이러한 음성 프리퍼런스 정보를 보유할 수도 있고, 특정 추가 광고 또는 이익들이 본 발명을 사용할 때 발생할 수도 있다. 본 발명은 광범위한 애플리케이션들 및 품목들에서 사용될 수 있고, 도 8 및 도 9의 일례는 제한적인 의미가 아님이 인정된다.Figure 8 is a schematic diagram illustrating a mobile medium such as a card, disk, or chip, which is a vital component for using the user mode and voice template technology as needed. For example, using Figures 7 and 8, A door card 477 may be provided by the traveler upon check-in at the hotel. However, in addition to the general on-site secure code programming and circuitry 479 applied to the card, additional functions, including aspects of the invention, may be advantageously accomplished. A schematic diagram of optional features within these cards includes means 481 for receiving and using voice templates for voice or voice selected by the traveler for various purposes while the traveler is staying at the hotel. As shown, these functions include a template receiving and storing device 501, a noise generator or generating circuit 506, a central processing unit 511, an input / output circuit 515, a digital-to-analog / And a clock means 521. Various other components may be used, such as voice compression or expansion means as is known in the cellular phone industry, or other components that may cause the card to operate as desired. The user may enjoy dialogs or interfaces with inanimate devices within the hotel with the voice (s) selected by the traveler. Indeed, the traveler profile may suitably retain such voice preference information, and certain additional advertisements or benefits may arise when using the present invention. It will be appreciated that the present invention can be used in a wide variety of applications and items, and that the examples of Figures 8 and 9 are not meant to be limiting.

도 9는 음성 템플릿 기술의 상호작용적 사용을 위해 구성된 사진 조각판(602)의 도면인데, 음성 JJ는 그림 F_JJ에 기인한 것이고 음성 KK는 그림 F_KK에기인한 것이다. 사용자가 희망한 대로 발생할 가능성이 있거나 발생했을 수 있는 다이얼로그를 재생성하기 위해 적합한 음성 템플릿들과 사진 (또는 다른 매체)의 주제 또는 객체들을 인터페이싱하기 위한 컴퓨터 판독 가능 코드 수단 또는 간단한 3차원 물질이던지 간에, 수단은 프레임(610) 또는 다른 구조와 결합된다.Figure 9 is the diagram of the picture fragment plate 602 is configured for interactive use of the voice template technology, voice JJ will due to Figure F voice _JJ KK is due Figure F _KK Aiguille. Whether it is a computer readable code means or a simple three-dimensional material for interfacing the subject matter or objects of a photograph (or other medium) with suitable speech templates to reproduce a dialogue that may or may occur as the user wishes to, Means are associated with the frame 610 or other structure.

실제 인공 음성 성분들을 포착, 분석 및 합성하기 위한 다양한 수단 및 방법들이 존재한다고 인정된다. 예를 들어, 이하의 미국 특허들, 및 인용되거나 목록에 기입된 참조 문헌들은 음성들을 포착, 합성, 번역, 인식, 특성화 또는 분석하기 위한 소수의 수단을 예시한 것이고, 설명을 위해 참조용으로 전체가 본 명세서에 인용되었다: 4,493,050; 4,710,959; 5,930,755; 5,307,444; 5,890,117; 5,030,101; 4,257,304; 5,794,193; 5,774,837; 5,634,085; 5,704,007; 5,280,527; 5,465,290; 5,428,707; 5,231,670; 4,914,703; 4,083,729; 5,850,627; 5,765,132; 5,715,367; 4,829,578; 4,903,305; 4,805,218; 5,915,236; 5,920,836; 5,909,666; 5,920,837; 4,907,279; 5,859,913; 5,978,765; 5,475,796; 5,483,579; 4,122,742; 5,278,943; 4,833,718; 4,757,737; 4,754,485; 4,975,957; 4,912,768; 4,907,279; 4,888,806; 4,682,292; 4,415,767; 4,181,821; 3,932,070; 및 4,884,972. 상기 참조문들 중 어느 것도 청구되거나 본 명세서에 기술된 본 발명의 공헌을 나타내지는 않는다. 도리어, 상술된 특허들은 본 발명의 하나 이상의 실시예들을 실현하는데 필요하기 보다는 유용할 수도 있는 도구들을 나타낸다. 따라서, 다양한 시스템들, 제품들, 수단, 방법들, 프로세스들, 데이터 포맷들, 데이터 관련 저장 및 전송 매체, 데이터 컨텐츠 및 다른 양상들이 본 명세서의 기술의 신규한 명백하지 않은 혁신 사항, 장점, 생산 및 애플리케이션들을 달성하기 위해 본 발명 내에서 숙고됨이 인정된다. 따라서, 상술된 설명들은 제한적인 것이라기 보다는 예시적인 것으로 생각되고, 기술들을 구현하는 개발성 및 유용성 면에서 제한 없이 선구적인 기술에 권리가 부여되는 범위가 청구항들에 의해 주어진다.It is recognized that there are various means and methods for capturing, analyzing and synthesizing real artificial speech components. For example, the following U.S. patents, and references cited or listed in the list, illustrate a small number of means for capturing, synthesizing, translating, recognizing, characterizing, or analyzing voices, Are incorporated herein by reference: 4,493,050; 4,710,959; 5,930,755; 5,307,444; 5,890,117; 5,030,101; 4,257,304; 5,794,193; 5,774,837; 5,634,085; 5,704,007; 5,280,527; 5,465,290; 5,428,707; 5,231,670; 4,914,703; 4,083,729; 5,850,627; 5,765,132; 5,715,367; 4,829,578; 4,903,305; 4,805,218; 5,915,236; 5,920,836; 5,909,666; 5,920,837; 4,907,279; 5,859,913; 5,978,765; 5,475,796; 5,483,579; 4,122,742; 5,278,943; 4,833,718; 4,757,737; 4,754,485; 4,975,957; 4,912,768; 4,907,279; 4,888,806; 4,682,292; 4,415,767; 4,181,821; 3,932,070; And 4,884,972. None of the above references are intended to represent the contribution of the invention claimed or described herein. Rather, the foregoing patents describe tools that may be useful rather than necessary to realize one or more embodiments of the present invention. Accordingly, it is to be understood that the various systems, products, means, methods, processes, data formats, data-related storage and transmission media, data content, and other aspects may be combined with the novel and not- And applications of the present invention. Accordingly, the above description is intended to be illustrative rather than restrictive, and the scope of the present invention is given by the claims as to the extent to which they are entitled to the pioneering techniques without undue limitations in terms of their developability and usefulness to implement the techniques.

Claims

A system for capturing an enabling portion of a particular voice sufficient to be used as a template upon subsequent use of voice,

a. Means for capturing an enabling portion of speech in a form useful for analysis of speech characteristics;

b. Analysis means for receiving and analyzing the captured speech and characterizing the elements of the captured speech as characteristic data;

c. Storage means for receiving characteristic data from said analyzing means for a specific voice; And

d. Retrieval means for retrieving said analysis and characteristic data for further use;

&Lt; / RTI >

The method according to claim 1,

Wherein the voice capturing means comprises digital recording means.

The method according to claim 1,

Wherein the voice capture means comprises a flash memory card.

The method according to claim 1,

Wherein the sound acquisition means comprises analog recording means.

The method according to claim 1,

Wherein the sound acquisition means includes input means for receiving live speech and for transmitting the live speech to the analysis means.

The method according to claim 1,

Wherein the analysis means comprises digital data storage means.

The method according to claim 1,

Wherein the analyzing means comprises means for identifying a specific pattern, phrase, frequency, pitch, and tone of the speech in the captured speech data.

The method according to claim 1,

Wherein the analyzing means comprises means for identifying a specific vocabulary, pronunciation, or accent that is unique to the captured speech.

The method according to claim 1,

Wherein the means for analyzing comprises means for identifying specific features that are inherent in the captured speech, which are predominantly derived from a specific anatomical structure of a pronoun of the voice.

The method according to claim 1,

Wherein the analyzing means comprises means for determining a vocabulary of the pronunciator of the captured voice.

11. The method of claim 10,

Wherein the analysis means comprises means for setting the vocabulary as characteristic data for use in forming a subsequent template voice.

The method according to claim 1,

Wherein the analyzing means comprises a digital processing device for digitally processing impulse data in voice form or recorded voice expressed in digital form.

The method according to claim 1,

Wherein said analysis means comprises second input means for receiving further physiologically relevant data of the voice pronunciator.

14. The method of claim 13,

Wherein said analysis means comprises digital signal processor means adapted to selectively receive audio or other data comprising morphologically relevant visualization information of a spoken word of said speech.

The method according to claim 1,

The analysis means compares input voice data sets with stored data including age data, language data, education data, gender data, occupation data, accent data, nationality data, race data, voice type data, customs data, and environment data. And a comparison means for comparing the input signal with the reference signal.

The method according to claim 1,

The analyzing means analyzes data on the pronunciation of the voice including age data, education data, gender data, occupation data, accent data, nationality data, race data, voice type data, customs data, language data, And third input means for receiving the first input signal.

A method for generating a voice-like noise having the same specific human voice and sound,

a. Capturing an enabling portion of a particular human voice for storage and use;

b. Storing the enabling portion of the specific human voice;

c. Analyzing the enabling portion to identify a major component or characteristic of the captured speech; And

d. Using the identified main component or characteristic to generate a new voice aloud to the listener with normal auditory discrimination when all of the data from the one or more database means is designated to be audible,

&Lt; / RTI >

18. The method of claim 17,

The analysis step may be performed on a captured enabling portion of the particular human voice associated with at least one of the components including frequency, tone, pitch, volume, accent, gender, harmonic structure, hearing, voice or timing accent, &Lt; / RTI >

19. The method of claim 18,

Wherein capturing the enabling portion of the particular human voice for storage and use comprises capturing the laryngeal or turbulent generation noise of the particular human voice.

In a method for accurately reproducing human voice,

a. Identifying a minimum size data set comprising a combination of words, notes, or phrases that must be emitted by the pronunciator of the voice to be duplicated;

b. Capturing the emission of a combination of words, notes, or phrases by a phonetician of the voice to be copied to the medium; And

c. Analyzing the captured emissions to identify voice characteristics of the voice pronunciator sufficient to enable artificial generation of voice using the identified characteristics, wherein the artificially generated voice is a voice of the actual voice of the voice pronunciator Substantially equal to the listener in all aspects when the listener with normal auditory discrimination hears the generated voice using a predetermined language component not included in the captured emissions,

&Lt; / RTI >

In the article of manufacture,

a. A computer usable medium comprising computer readable program code means embodied therein for replicating human voice

Wherein the computer readable program code means of the article of manufacture

b. Computer readable program code means for causing the computer to perform analysis of the captured enabling portion of the speech of the pronoun to identify voice characteristic data sufficient to enable artificial generation of speech; And

c. Computer readable program code means for causing identified speech characteristic data to be used to artificially generate speech wherein the artificially generated speech uses a predetermined language component not included in the captured speech of the actual speech of the pronunciator The sound and usage for the listener being substantially the same when the listener listens to the generated voice,

&Lt; / RTI >

22. The method of claim 21,

Further comprising computer readable program code means for storing the generated voice for future use.

22. The method of claim 21,

Further comprising computer readable program code means for using voice characteristic data to generate a voice profile of the voice pronunciator.

22. The method of claim 21,

A computer program for accessing database means for storing data including age data, education data, gender data, vocational data, accent data, language, nationality data, race data, voice type data, customary data, Further comprising computer readable program code means.

A computer program product for an auditory output device,

a. A computer usable medium having computer readable program code means embodied therein for replicating human voice through an auditory output device;

c. Computer readable program code means for using the identified voice characteristic data to artificially generate and output speech through an auditory output device, the artificially generated voice comprising: Wherein the sound and usage for the listener is substantially the same when the listener listens to the generated voice using the language component of the listener,

&Lt; / RTI >

A computer program product for a display device,

a. A computer usable medium comprising computer readable program code means embodied therein for performing duplication of human voice and verification of the accuracy of the cloned speech displayed on the display device;

d. Computer readable program code means for causing the computer to perform analysis of the captured enabling portion of the speech of the pronoun to identify voice characteristic data sufficient to enable artificial generation of speech; And

e. Computer readable program code means for artificially generating speech and using the identified speech characteristic data to compare the speech of the pronunciator with the characteristics of the generated speech on the display, The sound is substantially the same for the listener when the listener actually listens to the generated voice using a predetermined language component not included in the captured emissions of the actual voice and when the display device indicates it,

&Lt; / RTI >

A computer program product for an auditory output device,

a. A computer usable medium comprising computer readable program code means embodied therein to initiate replication of human voice via an output auditory device;

b. Computer readable program code means for causing a computer to receive and activate a voice characteristic data file unique to a particular voice sufficient to enable artificial generation of voice; And

c. Computer readable program code means for causing the computer to use identified voice characteristic data to artificially generate and output speech through an auditory output device, the artificially generated voice being generated by the listener from the captured voice, The sound being substantially the same for the listener when listening to the emissions;

&Lt; / RTI >

A computer program product for an electronic device,

a. A computer usable medium comprising computer readable program code means embodied therein to initiate replication of human voice;

b. Computer readable program code means for receiving and activating a voice characteristic data file unique to a particular voice sufficient to enable artificial generation of voice; And

c. A computer-readable program code means for causing the computer to use a voice characteristic data file and noise generating means sound output identified to artificially generate voice, the artificially generated voice being substantially identical in sound to the actual voice of the voice pronoun,

&Lt; / RTI >

A memory for storing data for access by an application program running in a data processing subsystem,

a. A data structure stored in the memory, the data structure including information residing in a database used by the application program;

Wherein the data structure comprises:

b. At least one voice enabling partial data file stored in the memory, each of the voice enabling partial data file sets comprising substantially different information from any other voice enabling partial data file sets;

c. A plurality of voice characteristic data files including different reference information for a plurality of voice characteristics; And

d. A plurality of voice profile sets each having at least one voice profile data file having data unique to voice profile data files,

Lt; / RTI >

Wherein the data structure allows access to a voice characteristic data file and a voice profile data file to perform a comparison operation with at least one voice enabling partial data file.

CLAIMS What is claimed is: 1. A data processing system for executing an application program and comprising a database used by the application program,

a. CPU means for processing the application program; And

b. Memory means for retaining a data structure for access by the application program

/ RTI >

Wherein the data structure comprises information residing in a database used by the application program,

At least one voice enabling partial data file stored in the memory, each of the voice enabling partial data file sets comprising substantially different information from any other voice enabling partial data file sets;

A plurality of voice characteristic data files including different reference information for a plurality of voice characteristics; And

A plurality of voice profile sets each having at least one voice profile data file having data unique to voice profile data files,

Lt; / RTI >

c. Wherein the data processing system allows access to a voice characteristic data file and a voice profile data file to perform a comparison operation with at least one voice enabling partial data file.

In a computer data signal embodied in a transmission medium,

a. A cryptographic source code for a unique voice profile template useful for keying additional electronic noise to generate a specific generated voice; And

b. A carrier medium suitable for delivering the cryptographic source code to a location, the carrier medium being configured to be removable from the carrier medium such that the cryptographic source code is applied as a key for generating a generated voice,

/ RTI >

A method for using a voice selected as a personal voice aid with an electronic device,

a. Activating an electronic means for accessing a remote database;

b. Transmitting a signal portion to a remote database having a voice database comprising a plurality of voice profile sets each having at least one voice profile data file unique to the voice profile data file and having data identifiable by a unique identifier;

c. Transmitting a signal portion to the remote database to uniquely identify a desired data file and then transfer the data file content to a user specified electronic device location; And

d. Implementing the use of a data file selected and transmitted as a speech template in combination with appropriate noise generated by an electronic device or by other means for generating noise, The sound of the selected voice as can be received from the electronic device -

&Lt; / RTI >

33. The method of claim 32,

Wherein the data file comprises data characteristics of the selected voice configured as computer readable program code means for causing identified voice characteristic data to be used to artificially generate a voice template.

33. The method of claim 32,

Wherein the implementing step comprises applying authentication means that allows only authenticated users to access and use the voice template description and data.

33. The method of claim 32,

Wherein the implementing step comprises applying selectively accessible verification means to verify that the audible speech is real or a generated template.

1. A business execution method in which a system for capturing an enabling portion of a specific voice sufficient to be used as a template for subsequent use of voice is used,

a. Capturing an enabling portion of speech in a form useful for analysis of speech characteristics;

b. Inputting the enabling portion into an analysis module to characterize the elements of the captured speech as characteristic data;

c. Receiving the characteristic data from the analysis module for a specific voice; And

d. Storing the characteristic data for later use

A business execution method comprising:

37. The method of claim 36,

Wherein the voice capturing means comprises digital input means.

37. The method of claim 36,

Wherein the enabling portion of the voice is received electronically.

37. The method of claim 36,

Wherein the characteristic data is bundled to form a voice template signal useful for combining with the generated noise to produce a templated voice that sounds similar to the original specific voice.

37. The method of claim 36,

Wherein the template voice is controlled to receive a voice input command for deriving new words with a template voice not input by the specific voice.

An automated machine for capturing an enabling portion of a particular voice and using the enabling portion as a template useful for subsequent use of the template voice,

a. An acquisition module for obtaining an enabling portion of speech in a form useful for analysis of speech characteristics;

b. An analysis module for receiving and analyzing the captured speech and characterizing the elements of the captured speech as characteristic data; And

c. A template generation module for automatically generating a voice template signal as a unique identifier of the acquired specific voice;

An automation machine.

42. The method of claim 41,

Further comprising communication means for communicating with storage means for receiving characteristic data from a database.

42. The method of claim 41,

Communication means for communicating with storage means for storing the generated template until requested;

Further comprising:

An on-line method for generating a voice template and generating revenue for such a creation,

a. Capturing an enabling portion of a particular voice;

b. Analyzing an enabling portion of the specific voice to generate a data profile that defines the characteristics of the captured voice in a manner that can be reconfigured for future use;

c. Generating a speech template signal as a unique identifier of the acquired specific speech; And

d. Providing at least one generated data profile for commercial use by another person

/ RTI >

A method of operating a machine for generating a voice template and generating revenue for such creation,

a. Capturing an enabling portion of a particular voice;

c. Generating a speech template signal as a unique identifier of the specific speech acquired using the data profile; And

d. Providing at least one speech template signal for commercial use

&Lt; / RTI >

A business method for creating a speech template,

a. Capturing an enabling portion of a specific speech or template speech;

b. Analyzing an enabling portion of the speech to produce a data profile that defines characteristics of the captured speech in a manner that can be reconfigured for future use, using computer means;

c. Electronically generating or retrieving a voice template signal as a unique identifier of the captured specific voice; And

d. Providing at least one speech template signal for commercial use

Business method.

47. The method of claim 46,

The providing step is a business method that is achieved when exchanging electronic data.

A method for generating a speech template from a plurality of speech,

a. Capturing an enabling portion of a plurality of specific speech or template speech;

b. Analyzing enabling portions of the voices to produce a data profile that uses the computer means to define the characteristics of the captured voice in a manner that can be bundled as a single voice signal suitable for reconstruction for future use; And

c. Electronically generating a voice template signal as a unique identifier of the newly generated voice

&Lt; / RTI >