KR20200011796A

KR20200011796A - Voice recognition system

Info

Publication number: KR20200011796A
Application number: KR1020180086695A
Authority: KR
Inventors: 김재홍; 이태호; 정한길
Original assignee: 엘지전자 주식회사
Priority date: 2018-07-25
Filing date: 2018-07-25
Publication date: 2020-02-04
Also published as: WO2020022545A1; DE112018007850B4; US20210287665A1; CN112437956B; DE112018007850T5; CN112437956A

Abstract

Disclosed is a voice recognition system capable of learning a voice of a user. According to an embodiment of the present invention, the voice recognition system comprises: a voice recognition agent receiving voice data from a user and transmitting the voice data to an artificial intelligence server; and an artificial intelligence server inputting the voice data into a voice recognition model, transmitting a recognition result for the voice data to the voice recognition agent, and learning the voice data. The voice recognition agent requests the user for additional data to learn the voice of the user when the voice recognition rate for the voice data is lower than a previously set criterion.

Description

Speech Recognition System {VOICE RECOGNITION SYSTEM}

본 발명은, 음성 인식 모델의 학습에 사용자가 직접 참여하도록 하여 음성 데이터나 텍스트를 확보하고, 확보한 데이터로 사용자의 음성을 학습할 수 있는 음성 인식 시스템에 관한 것이다.The present invention relates to a speech recognition system that enables a user to directly participate in learning a speech recognition model to secure speech data or text and to learn the user's speech with the obtained data.

인공 지능(artificial intelligence)은 인간의 지능으로 할 수 있는 사고, 학습, 자기계발 등을 컴퓨터가 할 수 있도록 하는 방법을 연구하는 컴퓨터 공학 및 정보기술의 한 분야로, 컴퓨터가 인간의 지능적인 행동을 모방할 수 있도록 하는 것을 의미한다. Artificial intelligence is a branch of computer science and information technology that studies how to enable computers to do things like thinking, learning, and self-development that human intelligence can do. It means to be able to imitate.

또한, 인공지능은 그 자체로 존재하는 것이 아니라, 컴퓨터 과학의 다른 분야와 직간접으로 많은 관련을 맺고 있다. 특히 현대에는 정보기술의 여러 분야에서 인공지능적 요소를 도입하여, 그 분야의 문제 풀이에 활용하려는 시도가 매우 활발하게 이루어지고 있다.In addition, artificial intelligence does not exist by itself, but is directly or indirectly related to other fields of computer science. Particularly in modern times, attempts are being actively made to introduce artificial intelligence elements in various fields of information technology and use them to solve problems in those fields.

한편, 종래에는 인공지능을 이용하여 사용자가 처한 상황을 인지하고 사용자가 원하는 정보를 원하는 형태로 제공하는 상황 인지(Context Awareness) 기술이 활발하게 연구되고 있다.On the other hand, in the related art, the situation awareness (Context Awareness) technology for recognizing a user's situation using the artificial intelligence and providing the desired information in a desired form has been actively studied.

상술한, 상황 인지 기술이 발달함과 동시에 사용자가 처한 상황에 적합한 기능을 수행할 수 있는 시스템에 대한 수요가 증가하고 있다.As the situation-aware technology is developed as described above, there is an increasing demand for a system capable of performing a function suitable for a user's situation.

한편, 사용자의 음성 인식과 상황 인지 기술을 결합하여, 음성 인식을 통하여 사용자에게 각종 동작과 기능을 제공하는 음성 인식 시스템이 증가하고 있다.On the other hand, the voice recognition system that combines the user's voice recognition and situational awareness technology, to provide various operations and functions to the user through the voice recognition is increasing.

음성 인식(voice recognition)이란, 음성 신호를 해석하여 패턴화되어 있는 데이터 베이스와 조합함으로써 음성 신호를 문자열로 변환하거나 언어적 의미 내용을 식별하는 것을 의미한다.Voice recognition refers to converting a speech signal into a string or identifying linguistic semantic content by interpreting the speech signal and combining it with a patterned database.

음성 인식 기술은 입력 받은 음성 데이터를 음성 인식 모델이 분석하고, 특징을 추출한 다음, 미리 수집된 음성 모델 데이터베이스와의 유사도를 측정하여 가장 유사한 것을 문자 또는 명령어로 변환한다. The speech recognition technology analyzes the received speech data by the speech recognition model, extracts features, and measures similarity with the previously collected speech model database to convert the most similar text or commands.

음성 인식 기술은 일종의 패턴 인식 과정으로, 사람마다 목소리와 발음, 억양 등이 다르므로 종래의 음성 인식 기술은 최대한 많은 사람들로부터 음성 데이터를 수집하여 이로부터 공통된 특성을 추출하고, 기준 패턴을 생성하였다.Speech recognition technology is a kind of pattern recognition process. Since voices, pronunciation, and intonation differ from person to person, the conventional voice recognition technology collects voice data from as many people as possible, extracts common features, and generates reference patterns.

다만 이러한 기준 패턴은 실험실 환경에서 만든 학습 데이터를 통하여 학습 모델이 구성되므로, 실제 사용자의 음성이나 음색에 맞게 최적화 되지는 못하였다.However, since the learning model is composed of the training data created in the laboratory environment, the reference pattern is not optimized for the voice or tone of the real user.

따라서, 음성 인식 모델이 음성 인식 장치를 직접 사용하는 사용자에게 개인화 되도록, 추가적인 적응 학습이 필요하다.Therefore, additional adaptive learning is needed so that the speech recognition model is personalized to the user who directly uses the speech recognition apparatus.

따라서 본 발명은 적응 학습의 정확도와 효율을 높힐 수 있는 방법을 제안한다.Accordingly, the present invention proposes a method for improving the accuracy and efficiency of adaptive learning.

본 발명의 목적은, 음성 인식 모델의 학습에 사용자가 직접 참여하도록 하여 음성 데이터나 텍스트 데이터를 확보하고, 확보한 데이터로 사용자의 음성을 학습할 수 있는 음성 인식 시스템에 관한 것이다.SUMMARY OF THE INVENTION The present invention relates to a speech recognition system that enables a user to directly participate in learning a speech recognition model to secure speech data or text data and to learn the user's speech with the obtained data.

본 발명의 실시 예에 따른 음성 인식 시스템은, 사용자로부터 음성 데이터를 수신하고, 상기 음성 데이터를 인공지능 서버로 전송하는 음성 인식 에이전트, 및, 상기 음성 데이터를 음성인식 모델에 입력하고, 상기 음성 데이터에 대한 인식 결과를 상기 음성 인식 에이전트에 전송하고, 상기 음성 데이터를 학습하는 인공지능 서버를 포함하고, 상기 음성 인식 에이전트는, 상기 음성 데이터에 대한 음성 인식률이 기 설정된 기준보다 낮은 경우, 상기 사용자의 음성을 학습하기 위한 추가 데이터를 상기 사용자에게 요청한다.A voice recognition system according to an embodiment of the present invention includes a voice recognition agent for receiving voice data from a user, transmitting the voice data to an artificial intelligence server, and inputting the voice data into a voice recognition model, And an artificial intelligence server for transmitting the recognition result to the voice recognition agent and learning the voice data, wherein the voice recognition agent includes: when the voice recognition rate of the voice data is lower than a preset criterion; Ask the user for additional data to learn the voice.

이 경우 상기 음성 인식 에이전트는, 상기 사용자에게 특정 문장을 제공하고, 상기 특정 문장에 대응하는 제2 음성 데이터가 수신되면 상기 제2 음성 데이터를 상기 인공지능 서버에 전송하고, 상기 인공지능 서버는, 상기 특정 문장에 대응하는 상기 제2 음성 데이터를 학습할 수 있다.In this case, the voice recognition agent provides a specific sentence to the user, and when the second voice data corresponding to the specific sentence is received, transmits the second voice data to the artificial intelligence server, and the artificial intelligence server, The second voice data corresponding to the specific sentence may be learned.

이 경우 상기 인공지능 서버는, 상기 음성 데이터의 특성에 기초하여 복수의 문장 중 상기 음성 데이터의 특성에 대응하는 상기 특정 문장을 상기 음성 인식 에이전트에 전송할 수 있다.In this case, the artificial intelligence server may transmit the specific sentence corresponding to the characteristic of the speech data among the plurality of sentences to the speech recognition agent based on the characteristic of the speech data.

이 경우 상기 복수의 문장은, 제품 기능, 국가, 지역, 나이, 사투리, 성별 및 외래어 중 적어도 하나를 포함하는 카테고리로 분류되고, 상기 인공지능 서버는, 상기 음성 데이터의 특성에 기초하여, 복수의 카테고리 중 상기 사용자에게 추가적인 학습이 요구되는 카테고리에 포함되는 상기 특정 문장을 상기 음성인식 에이전트에 전송할 수 있다.In this case, the plurality of sentences are classified into categories including at least one of a product function, a country, a region, an age, a dialect, a gender, and a foreign language, and the artificial intelligence server is configured to generate a plurality of sentences based on the characteristics of the voice data. The specific sentence included in a category requiring additional learning by the user among categories may be transmitted to the voice recognition agent.

한편 상기 특정 문장은, 상기 음성 인식 에이전트의 기능에 대응하는 명령어를 포함할 수 있다.The specific sentence may include a command corresponding to a function of the speech recognition agent.

한편 상기 음성 인식 시스템은, 이동 단말기를 더 포함하고, 상기 음성 인식 에이전트는, 상기 사용자의 상기 이동 단말기에 상기 특정 문장을 전송하고, 상기 이동 단말기는, 상기 특정 문장에 대응하는 텍스트를 디스플레이 할 수 있다.The voice recognition system may further include a mobile terminal, and the voice recognition agent may transmit the specific sentence to the mobile terminal of the user, and the mobile terminal may display text corresponding to the specific sentence. have.

한편 상기 음성 인식 에이전트는, 상기 음성 인식률이 기 설정된 기준보다 낮은 경우, 상기 음성 데이터에 대응하는 텍스트의 입력을 상기 사용자에게 요청할 수 있다.Meanwhile, when the speech recognition rate is lower than a preset criterion, the speech recognition agent may request the user to input text corresponding to the speech data.

이 경우 상기 인공지능 서버는, 상기 음성 데이터를 저장하고, 상기 음성 인식 에이전트는, 상기 음성 데이터에 대응하는 텍스트가 입력되면 상기 음성 데이터에 대응하는 텍스트를 상기 인공지능 서버에 전송하고, 상기 인공지능 서버는, 상기 텍스트에 대응하는 상기 저장된 음성 데이터를 학습할 수 있다.In this case, the artificial intelligence server stores the voice data, and when the text corresponding to the voice data is input, the artificial intelligence server transmits the text corresponding to the voice data to the artificial intelligence server, and the artificial intelligence The server may learn the stored voice data corresponding to the text.

이 경우 상기 인공지능 서버는, 상기 텍스트를 음성 데이터로 변환하고, 상기 변환된 음성 데이터 및 상기 저장된 음성 데이터의 유사도에 기초하여 상기 저장된 음성 데이터를 유효 데이터로 결정하고, 상기 유효 데이터로 결정된 음성 데이터를 학습할 수 있다.In this case, the artificial intelligence server converts the text into speech data, determines the stored speech data as valid data based on the similarity between the converted speech data and the stored speech data, and determines the speech data as the valid data. Can learn.

한편 상기 음성 인식 시스템은, 상기 음성 데이터에 대응하는 텍스트의 입력을 수신하고, 상기 음성 데이터에 대응하는 텍스트를 상기 음성 인식 에이전트에 전송하는 이동 단말기를 더 포함할 수 있다.The voice recognition system may further include a mobile terminal that receives an input of a text corresponding to the voice data and transmits a text corresponding to the voice data to the voice recognition agent.

한편 상기 음성 인식 에이전트는, 상기 사용자가 특정 텍스트 및 상기 특정 텍스트에 대응하는 제3 음성 데이터를 입력하면, 상기 특정 텍스트 및 상기 특정 텍스트에 대응하는 제3 음성 데이터를 상기 인공지능 서버에 전송하고, 상기 인공지능 서버는, 상기 특정 텍스트에 대응하는 상기 제3 음성 데이터를 학습할 수 있다.Meanwhile, when the user inputs specific text and third voice data corresponding to the specific text, the voice recognition agent transmits the specific text and third voice data corresponding to the specific text to the artificial intelligence server, The artificial intelligence server may learn the third voice data corresponding to the specific text.

한편 상기 음성 인식 에이전트는, 제시 음성 따라 말하기의 제1 옵션, 제시 문장 따라 말하기의 제2 옵션 및 직접 문장 작성하고 따라 말하기 제3 옵션을 제공하고, 상기 제1 내지 제3 옵션 중 상기 음성 인식률이 가장 높은 옵션으로 상기 추가 데이터를 요청할 수 있다.The voice recognition agent may provide a first option of speaking according to the presented voice, a second option of speaking according to the presented sentence, and a third option of directly writing and speaking along the sentence, wherein the speech recognition rate of the first to third options may be The highest option may be to request the additional data.

한편 상기 인공 지능 서버는, 상기 추가 데이터를 학습하고, 상기 추가 데이터를 학습한 결과에 따라 변화된 음성 인식률을 상기 음성인식 에이전트에 전송할 수 있다.The artificial intelligence server may learn the additional data and transmit the changed speech recognition rate to the speech recognition agent according to a result of learning the additional data.

한편 본 발명의 실시 예에 따른 음성 인식 장치는, 사용자로부터 음성 데이터를 수신하는 입력부, 및, 상기 음성 데이터를 음성인식 모델에 입력하고, 상기 음성 데이터에 대한 인식 결과를 획득하고, 상기 음성 데이터를 학습하는 인공지능부를 포함하고, 상기 인공지능부는, 상기 음성 데이터에 대한 음성 인식률이 기 설정된 기준보다 낮은 경우, 상기 사용자의 음성을 학습하기 위한 추가 데이터를 상기 사용자에게 요청할 수 있다.On the other hand, the voice recognition apparatus according to an embodiment of the present invention, the input unit for receiving voice data from a user, and inputs the voice data to a voice recognition model, obtains the recognition result for the voice data, And an artificial intelligence unit for learning, wherein the artificial intelligence unit may request additional data for learning the user's voice when the speech recognition rate of the voice data is lower than a preset criterion.

본 발명은, 사용자의 음성을 수동적으로 수집하여 학습하는 종래의 방식과는 달리, 사용자의 발화 습관을 가장 잘 파악할 수 있는 문장을 제시하여 음성 입력을 요청하거나, 사용자가 발화한 문장을 직접 텍스트로 요청한다. 따라서 본 발명에 따르면, 학습 성능을 대폭 향상시킬 수 있으며, 빠른 개인화가 가능한 장점이 있다.According to the present invention, unlike the conventional method of manually collecting and learning a user's voice, the present invention presents a sentence that best grasps a user's speech habit, requests a voice input, or directly inputs a sentence spoken by the user as text. request. Therefore, according to the present invention, it is possible to greatly improve the learning performance, there is an advantage capable of fast personalization.

도 1은 본 발명의 실시 예에 따른 음성 인식 시스템을 설명하기 위한 도면이다.
도 2는 본 발명과 관련된 음성인식 에이전트를 설명하기 위한 블록도이다.
도 3은 본 발명의 실시 예에 따른, 인공지능 서버(200)의 구성을 나타내는 블록도이다.
도 4는 음성 인식 시스템에서 발생할 수 있는 문제점을 설명하기 위한 도면이다.
도 5는 본 발명의 실시 예에 따른, 사용자에게 추가 학습을 위한 추가 데이터를 요청하는 방법을 설명하기 위한 도면이다.
도 6은 본 발명의 실시 예에 따른, 1번 옵션 또는 2번 옵션이 선택된 경우의 동작 방법을 설명하기 위한 도면이다.
도 7은 발화된 문장의 단어 단위 인식률을 도시한 도면이다.
도 8은 1번 옵션 선택시의 동작을 설명하기 위한 도면이다.
도 9는 2번 옵션 선택시의 동작을 설명하기 위한 도면이다.
도 10은 3번 옵션 선택시의 동작을 설명하기 위한 도면이다.
도 11은 본 발명의 또 다른 실시 예에 따른, 사용자에게 추가 학습을 위한 추가 데이터를 요청하는 방법을 설명하기 위한 도면이다.
도 12는 텍스트 입력 요청시의 동작을 설명하기 위한 도면이다.
도 13은 본 발명의 실시 예에 따른, 음성 인식 시스템의 동작을 설명하기 위한 도면이다.1 is a view for explaining a speech recognition system according to an embodiment of the present invention.
2 is a block diagram illustrating a speech recognition agent related to the present invention.
3 is a block diagram showing the configuration of the artificial intelligence server 200 according to an embodiment of the present invention.
4 is a diagram illustrating a problem that may occur in the speech recognition system.
5 is a diagram for describing a method of requesting additional data for further learning from a user according to an exemplary embodiment of the present invention.
FIG. 6 is a diagram for describing an operating method when option 1 or option 2 is selected according to an embodiment of the present disclosure.
7 is a diagram illustrating a word unit recognition rate of a spoken sentence.
8 is a view for explaining an operation when the option 1 is selected.
9 is a view for explaining an operation when the option 2 is selected.
FIG. 10 is a diagram for describing an operation when option 3 is selected.
FIG. 11 is a diagram for describing a method of requesting additional data for additional learning from a user according to another exemplary embodiment.
12 is a diagram for describing an operation when a text input request is made.
13 is a diagram for describing an operation of a voice recognition system according to an exemplary embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. 또한, 본 명세서에 개시된 실시 예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 실시 예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않으며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings, and the same or similar components will be given the same reference numerals regardless of the reference numerals, and redundant description thereof will be omitted. The suffixes "module" and "unit" for components used in the following description are given or mixed in consideration of ease of specification, and do not have distinct meanings or roles. In addition, in the following description of the embodiments disclosed herein, when it is determined that the detailed description of the related known technology may obscure the gist of the embodiments disclosed herein, the detailed description thereof will be omitted. In addition, the accompanying drawings are only for easily understanding the embodiments disclosed in the present specification, the technical idea disclosed in the specification by the accompanying drawings are not limited, and all changes included in the spirit and scope of the present invention. It should be understood to include equivalents and substitutes.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms including ordinal numbers such as first and second may be used to describe various components, but the components are not limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component is said to be "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that another component may be present in the middle. Should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that there is no other component in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprises" or "having" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

본 명세서에서 설명되는 이동 단말기에는 휴대폰, 스마트 폰(smart phone), 노트북 컴퓨터(laptop computer), 디지털방송용 단말기, PDA(personal digital assistants), PMP(portable multimedia player), 네비게이션, 슬레이트 PC(slate PC), 태블릿 PC(tablet PC), 울트라북(ultrabook), 웨어러블 디바이스(wearable device, 예를 들어, 워치형 단말기 (smartwatch), 글래스형 단말기 (smart glass), HMD(head mounted display)) 등이 포함될 수 있다. The mobile terminal described herein includes a mobile phone, a smart phone, a laptop computer, a digital broadcasting terminal, a personal digital assistant, a portable multimedia player, a navigation, a slate PC , Tablet PCs, ultrabooks, wearable devices, such as smartwatches, glass glasses, head mounted displays, and the like. have.

도 1은 본 발명의 실시 예에 따른 음성 인식 시스템을 설명하기 위한 도면이다.1 is a view for explaining a speech recognition system according to an embodiment of the present invention.

본 발명의 실시 예에 따른 음성 인식 시스템(10)은, 음성인식 에이전트(100), 인공지능 서버(200) 및 이동 단말기(300)를 포함할 수 있다.The voice recognition system 10 according to an exemplary embodiment of the present invention may include a voice recognition agent 100, an artificial intelligence server 200, and a mobile terminal 300.

음성 인식 에이전트(100)는 인공지능 서버(200)와 통신할 수 있다. 구체적으로 음성 인식 에이전트(100)는 음성 인식 에이전트(100)를 인터넷 망을 포함하는 유/무선 네트워크와 연결하기 위한 인터페이스를 제공할 수 있다. 그리고 음성 인식 에이전트(100)는 접속된 네트워크 또는 접속된 네트워크에 링크된 다른 네트워크를 통해, 서버와 데이터를 송신 또는 수신할 수 있다. The voice recognition agent 100 may communicate with the artificial intelligence server 200. In more detail, the voice recognition agent 100 may provide an interface for connecting the voice recognition agent 100 with a wired / wireless network including an internet network. The voice recognition agent 100 may transmit or receive data with the server through the connected network or another network linked to the connected network.

또한 음성 인식 에이전트(100)는 이동 단말기(300)와 통신할 수 있다. 구체적으로 음성 인식 에이전트(100)는 음성 인식 에이전트(100)를 인터넷 망을 포함하는 유/무선 네트워크와 연결하기 위한 인터페이스를 제공할 수 있다. 그리고 음성 인식 에이전트(100)는 접속된 네트워크 또는 접속된 네트워크에 링크된 다른 네트워크를 통해, 이동 단말기(300)와 데이터를 송신 또는 수신할 수 있다.In addition, the voice recognition agent 100 may communicate with the mobile terminal 300. In more detail, the voice recognition agent 100 may provide an interface for connecting the voice recognition agent 100 with a wired / wireless network including an internet network. The voice recognition agent 100 may transmit or receive data with the mobile terminal 300 through the connected network or another network linked to the connected network.

이외에도 음성인식 에이전트(100)는 도 2에서 설명하는 근거리 통신을 통하여 이동 단말기(300)와 통신할 수 있다.In addition, the voice recognition agent 100 may communicate with the mobile terminal 300 through the short range communication described with reference to FIG. 2.

한편 음성 인식 에이전트(100)는 다양한 방식으로 음성 데이터를 학습하거나, 음성 데이터에 대응하는 기능을 수행할 수 있다.Meanwhile, the voice recognition agent 100 may learn voice data in various ways or perform a function corresponding to the voice data.

예를 들어 음성 인식 모델이 인공지능 서버(200)에 탑재되고, 음성 인식 에이전트(100)가 음성 데이터를 수신하여 인공지능 서버(200)에 전송하면, 인공지능 서버(200)는 음성 데이터를 학습하거나 음성 데이터에 대한 인식 결과를 출력하여 음성 인식 에이전트(100)에 전송하고, 음성 인식 에이전트(100)는 인식 결과에 대응하는 제어 명령을 생성하여 제어를 수행하는 방식으로 구현될 수 있다.For example, when the speech recognition model is mounted on the artificial intelligence server 200, and the speech recognition agent 100 receives the speech data and transmits the speech data to the artificial intelligence server 200, the artificial intelligence server 200 learns the speech data. Alternatively, the recognition result of the voice data may be output and transmitted to the voice recognition agent 100, and the voice recognition agent 100 may be implemented by generating a control command corresponding to the recognition result and performing control.

다른 예를 들어 음성 인식 모델이 인공지능 서버(200)에 탑재되고, 음성 인식 에이전트(100)가 음성 데이터를 수신하여 인공지능 서버(200)에 전송하면, 인공지능 서버(200)가 음성 데이터를 학습하거나 음성 데이터에 대한 인식 결과를 출력하고, 인식 결과에 대응하는 제어 명령을 음성 인식 에이전트(100)에 전송하는 방식으로 구현될 수 있다.In another example, when the voice recognition model is mounted on the artificial intelligence server 200, and the voice recognition agent 100 receives the voice data and transmits the voice data to the artificial intelligence server 200, the artificial intelligence server 200 transmits the voice data. Learning or outputting a recognition result for the voice data, and may be implemented by transmitting a control command corresponding to the recognition result to the voice recognition agent 100.

다른 예를 들어 인식 모델이 음성 인식 에이전트(100)에 탑재되고, 음성 인식 에이전트(100)가 음성 데이터를 수신하여 음성 데이터를 학습하거나 음성 데이터에 대한 인식 결과를 출력하여 인공지능 서버(200)에 전송하며, 인공지능 서버(200)에서 인식 결과에 대응하는 제어 명령을 음성 인식 에이전트(100)에 전송하는 방식으로 구현될 수 있다.For example, a recognition model is mounted on the voice recognition agent 100, and the voice recognition agent 100 receives voice data to learn voice data or outputs a recognition result for the voice data to the artificial intelligence server 200. And transmits a control command corresponding to the recognition result from the artificial intelligence server 200 to the voice recognition agent 100.

또한 음성 인식 에이전트(100)는, 인공지능 서버(200)와는 관계 없이, 독자적으로 인공지능 기능을 수행할 수도 있다.In addition, the speech recognition agent 100 may perform an artificial intelligence function independently of the artificial intelligence server 200.

예를 들어 음성 인식 모델이 음성 인식 에이전트(100)에 탑재되고, 음성 인식 에이전트(100)가 음성 데이터를 수신하여 음성 데이터를 학습하거나 음성 데이터에 대한 인식 결과를 출력하고, 인식 결과에 대응하는 제어 명령을 생성할 수 있다.For example, a voice recognition model is mounted on the voice recognition agent 100, and the voice recognition agent 100 receives voice data to learn voice data or to output voice recognition results, and controls corresponding to the recognition result. You can create a command.

도 2는 본 발명과 관련된 음성인식 에이전트를 설명하기 위한 블록도이다.2 is a block diagram illustrating a speech recognition agent related to the present invention.

음성인식 에이전트(100)는 무선 통신부(110), 입력부(120), 인공 지능부(130), 감지부(140), 출력부(150), 인터페이스부(160), 메모리(170), 제어부(180) 및 전원 공급부(190) 등을 포함할 수 있다. The voice recognition agent 100 may include a wireless communication unit 110, an input unit 120, an artificial intelligence unit 130, a detection unit 140, an output unit 150, an interface unit 160, a memory 170, and a control unit ( 180 and the power supply unit 190 may be included.

도 2에 도시된 구성요소들은 음성인식 에이전트를 구현하는데 있어서 필수적인 것은 아니어서, 본 명세서 상에서 설명되는 음성인식 에이전트는 위에서 열거된 구성요소들보다 많거나, 또는 적은 구성요소들을 가질 수 있다.The components shown in FIG. 2 are not essential to implementing a speech recognition agent, so the speech recognition agent described herein may have more or fewer components than the components listed above.

보다 구체적으로, 상기 구성요소들 중 무선 통신부(110)는, 음성인식 에이전트(100)와 무선 통신 시스템 사이, 음성인식 에이전트(100)와 다른 음성인식 에이전트(100) 사이, 또는 음성인식 에이전트(100)와 외부서버 사이의 무선 통신을 가능하게 하는 하나 이상의 모듈을 포함할 수 있다. 또한, 상기 무선 통신부(110)는, 음성인식 에이전트(100)를 하나 이상의 네트워크에 연결하는 하나 이상의 모듈을 포함할 수 있다.More specifically, the wireless communication unit 110 of the components, between the voice recognition agent 100 and the wireless communication system, between the voice recognition agent 100 and another voice recognition agent 100, or the voice recognition agent 100 ) And one or more modules that enable wireless communication between the external server and the external server. In addition, the wireless communication unit 110 may include one or more modules for connecting the voice recognition agent 100 to one or more networks.

이러한 무선 통신부(110)는, 방송 수신 모듈(111), 이동통신 모듈(112), 무선 인터넷 모듈(113), 근거리 통신 모듈(114), 위치정보 모듈(115) 중 적어도 하나를 포함할 수 있다.The wireless communication unit 110 may include at least one of the broadcast receiving module 111, the mobile communication module 112, the wireless internet module 113, the short range communication module 114, and the location information module 115. .

입력부(120)는, 영상 신호 입력을 위한 카메라(121) 또는 영상 입력부, 오디오 신호 입력을 위한 마이크로폰(microphone, 122), 또는 오디오 입력부, 사용자로부터 정보를 입력 받기 위한 사용자 입력부(123, 예를 들어, 터치키(touch key), 푸시키(mechanical key) 등)를 포함할 수 있다. 입력부(120)에서 수집한 음성 데이터나 이미지 데이터는 분석되어 사용자의 제어명령으로 처리될 수 있다.The input unit 120 may include a camera 121 or an image input unit for inputting an image signal, a microphone 122 for inputting an audio signal, an audio input unit, or a user input unit 123 for receiving information from a user. , Touch keys, mechanical keys, and the like. The voice data or the image data collected by the input unit 120 may be analyzed and processed as a control command of the user.

인공지능부(130)는, 인공 지능 기술에 기반하여 정보들을 처리하는 역할을 수행하는 것으로, 정보의 학습, 정보의 추론, 정보의 지각, 자연 언어의 처리 중 적어도 하나를 수행하는 하나 이상의 모듈을 포함할 수 있다. The artificial intelligence unit 130 performs a role of processing information based on artificial intelligence technology, and includes one or more modules that perform at least one of learning information, inferring information, perceiving information, and processing natural language. It may include.

인공지능부(130)는 머신 러닝(machine learning) 기술을 이용하여, 음성인식 에이전트 내에 저장된 정보, 음성인식 에이전트 주변의 환경 정보, 통신 가능한 외부 저장소에 저장된 정보 등 방대한 양의 정보(빅데이터, big data)를 학습, 추론, 처리 중 적어도 하나를 수행할 수 있다. 그리고, 인공지능부(130)는 상기 머신 러닝 기술을 이용하여 학습된 정보들을 이용하여, 실행 가능한 적어도 하나의 음성인식 에이전트의 동작을 예측(또는 추론)하고, 상기 적어도 하나의 예측된 동작들 중 가장 실현성이 높은 동작이 실행되도록 음성인식 에이전트를 제어할 수 있다. The artificial intelligence unit 130 uses machine learning technology to generate a large amount of information (big data, big data, etc.) stored in the voice recognition agent, environment information around the voice recognition agent, and information stored in an external storage that can be communicated with. at least one of data, learning, inference, and processing. The artificial intelligence unit 130 predicts (or infers) an operation of at least one voice recognition agent executable by using the information learned using the machine learning technique, and among the at least one predicted operations. The voice recognition agent can be controlled to execute the most feasible operation.

머신 러닝 기술은 적어도 하나의 알고리즘에 근거하여, 대규모의 정보들을 수집 및 학습하고, 학습된 정보를 바탕으로 정보를 판단 및 예측하는 기술이다. 정보의 학습이란 정보들의 특징, 규칙, 판단 기준 등을 파악하여, 정보와 정보 사이의 관계를 정량화하고, 정량화된 패턴을 이용하여 새로운 데이터들을 예측하는 동작이다. Machine learning technology is a technology that collects and learns a large amount of information based on at least one algorithm, and determines and predicts information based on the learned information. Learning information is an operation of grasping characteristics, rules, and judgment criteria of information, quantifying the relationship between information, and predicting new data using the quantized pattern.

이러한 머신 러닝 기술이 사용하는 알고리즘은 통계학에 기반한 알고리즘이 될 수 있으며, 예를 들어, 트리 구조 형태를 예측 모델로 사용하는 의사 결정 나무(decision tree), 생물의 신경 네트워크 구조와 기능을 모방하는 인공 신경망(neural network), 생물의 진화 알고리즘에 기반한 유전자 프로그래밍(genetic programming), 관측된 예를 군집이라는 부분집합으로 분배하는 군집화(Clustering), 무작위로 추출된 난수를 통해 함수값을 확률로 계산하는 몬테카를로 방법(Monter carlo method) 등이 될 수 있다. The algorithms used by these machine learning techniques can be algorithms based on statistics, for example, decision trees using tree structures as predictive models, artificial trees that mimic the neural network structure and function of organisms. Neural networks, genetic programming based on living evolutionary algorithms, clustering that distributes observed examples into subsets of clusters, and Monte Carlo, which randomly computes function values through randomized random numbers Monte carlo method.

머신 러닝 기술의 한 분야로써, 딥러닝(deep learning) 기술은 인공 신경망 알고리즘을 이용하여, 정보들을 학습, 판단, 처리 중 적어도 하나를 수행하는 기술이다. 인공 신경망은 레이어와 레이어 사이를 연결하고, 레이어와 레이어 사이의 데이터를 전달하는 구조를 가질 수 있다. 이러한 딥러닝 기술은 병렬 연산에 최적화된 GPU(graphic processing unit)를 이용하여 인공 신경망을 통하여 방대한 양의 정보를 학습할 수 있다. As a field of machine learning technology, deep learning technology is a technology that performs at least one of learning, determining, and processing information by using an artificial neural network algorithm. The artificial neural network may have a structure that connects layers to layers and transfers data between layers. Such deep learning technology can learn a huge amount of information through an artificial neural network using a graphic processing unit (GPU) optimized for parallel computing.

한편, 인공지능부(130)는 머신 러닝 기술을 적용하기 위한 방대한 양의 정보들을 수집하기 위하여, 음성인식 에이전트의 구성 요소들에서 입력 또는 출력되는 신호, 데이터, 정보 등을 수집(감지, 모니터링, 추출, 검출, 수신)할 수 있다. 또한, 인공지능부(130)는 통신을 통하여 연결되는 외부 저장소(예를 들어, 클라우드 서버, cloud server)에 저장된 데이터, 정보 등을 수집(감지, 모니터링, 추출, 검출, 수신)할 수 있다. 보다 구체적으로, 정보의 수집이란, 센서를 통하여 정보를 감지하거나, 메모리(170)에 저장된 정보를 추출하거나, 통신을 통하여, 외부 저장소로부터 정보들을 수신하는 동작을 포함하는 용어로 이해될 수 있다. Meanwhile, the artificial intelligence unit 130 collects signals, data, information, etc. input or output from the components of the voice recognition agent in order to collect a large amount of information for applying the machine learning technology (detection, monitoring, Extraction, detection, reception). In addition, the artificial intelligence unit 130 may collect (detect, monitor, extract, detect, receive) data, information, and the like stored in an external storage (eg, a cloud server) connected through communication. More specifically, the collection of information may be understood as a term including an operation of sensing information through a sensor, extracting information stored in the memory 170, or receiving information from an external storage through communication.

인공지능부(130)는 센싱부(140)를 통하여, 음성인식 에이전트 내 정보, 음성인식 에이전트를 둘러싼 주변 환경 정보 및 사용자 정보를 감지할 수 있다. 또한, 인공지능부(130)는 무선 통신부(110)를 통하여, 방송 신호 및/또는 방송 관련된 정보, 무선 신호, 무선 데이터 등을 수신할 수 있다. 또한, 인공지능부(130)는 입력부로부터 영상 정보(또는 신호), 오디오 정보(또는 신호), 데이터 또는 사용자로부터 입력되는 정보를 입력받을 수 있다. The artificial intelligence unit 130 may detect information in the voice recognition agent, surrounding environment information surrounding the voice recognition agent, and user information through the sensing unit 140. In addition, the artificial intelligence unit 130 may receive a broadcast signal and / or broadcast related information, a wireless signal, wireless data, and the like through the wireless communication unit 110. In addition, the artificial intelligence unit 130 may receive image information (or signal), audio information (or signal), data, or information input from a user from the input unit.

이러한 인공지능부(130)는 백그라운드 상에서 실시간으로 방대한 양의 정보들을 수집하고, 이를 학습하여, 적절한 형태로 가공한 정보(예를 들어, 지식 그래프, 명령어 정책, 개인화 데이터베이스, 대화 엔진 등)를 메모리(170)에 저장할 수 있다. The artificial intelligence unit 130 collects a large amount of information in real time on the background, learns it, and stores the processed information (for example, knowledge graph, command policy, personalization database, conversation engine, etc.) in an appropriate form. Can be stored at 170.

그리고, 인공지능부(130)는 머신 러닝 기술을 이용하여 학습된 정보들을 바탕으로, 음성인식 에이전트의 동작이 예측되면, 이러한 예측된 동작을 실행하기 위하여, 음성인식 에이전트의 구성 요소들을 제어하거나, 예측된 동작을 실행하기 위한 제어 명령을 제어부(180)로 전달할 수 있다. 제어부(180)는 제어 명령에 근거하여, 음성인식 에이전트를 제어함으로써, 예측된 동작을 실행할 수 있다. And, based on the information learned using the machine learning technology, the artificial intelligence unit 130, if the operation of the speech recognition agent is predicted, to execute the predicted operation, to control the components of the speech recognition agent, The control command for executing the predicted operation may be transmitted to the controller 180. The controller 180 may execute the predicted operation by controlling the voice recognition agent based on the control command.

한편, 인공지능부(130)는 특정 동작이 수행되면, 머신 러닝 기술을 통하여, 특정 동작의 수행을 나타내는 이력 정보를 분석하고, 이러한 분석 정보를 바탕으로 기존의 학습된 정보에 대한 업데이트를 수행할 수 있다. 이에, 인공지능부(130)는 정보 예측의 정확도를 향상시킬 수 있다. Meanwhile, when a specific operation is performed, the artificial intelligence unit 130 analyzes historical information indicating performance of a specific operation through machine learning technology, and updates the previously learned information based on the analysis information. Can be. Thus, the artificial intelligence unit 130 may improve the accuracy of the information prediction.

한편, 본 명세서에서, 인공지능부(130)와 제어부(180)는 동일한 구성요소로 이해될 수 있다. 이 경우, 본 명세서에서 설명되는 제어부(180)에서 수행되는 기능은, 인공지능부(130)에서 수행된다고 표현할 수 있으며, 제어부(180)는 인공지능부(130)로 명명되거나, 이와 반대로, 인공지능부(130)는 제어부(180)로 명명되어도 무방하다.Meanwhile, in the present specification, the artificial intelligence unit 130 and the controller 180 may be understood as the same component. In this case, a function performed by the controller 180 described herein may be expressed as being performed by the artificial intelligence unit 130, and the controller 180 may be named as the artificial intelligence unit 130 or vice versa. The intelligent unit 130 may be referred to as the controller 180.

또한, 이와 다르게, 본 명세서에서, 인공지능부(130)와 제어부(180)는 별도의 구성요소로 이해될 수 있다. 이 경우, 인공지능부(130)와 제어부(180)는 서로 데이터 교환을 통하여, 음성인식 에이전트 상에서 다양한 제어를 수행할 수 있다. 제어부(180)는 인공지능부(130)에서 도출된 결과를 기반으로, 음성인식 에이전트 상에서 적어도 하나의 기능을 수행하거나, 음성인식 에이전트의 구성요소 중 적어도 하나를 제어할 수 있다. 나아가, 인공지능부(130) 또한, 제어부(180)의 제어 하에 동작될 수 있다.Alternatively, in the present specification, the artificial intelligence unit 130 and the controller 180 may be understood as separate components. In this case, the artificial intelligence unit 130 and the controller 180 may perform various controls on the voice recognition agent through data exchange with each other. The controller 180 may perform at least one function on the voice recognition agent or control at least one of the components of the voice recognition agent based on the result derived from the artificial intelligence unit 130. Furthermore, the artificial intelligence unit 130 may also be operated under the control of the controller 180.

센싱부(140)는 음성인식 에이전트 내 정보, 음성인식 에이전트를 둘러싼 주변 환경 정보 및 사용자 정보 중 적어도 하나를 센싱하기 위한 하나 이상의 센서를 포함할 수 있다. The sensing unit 140 may include one or more sensors for sensing at least one of information in the voice recognition agent, surrounding environment information surrounding the voice recognition agent, and user information.

예를 들어, 센싱부(140)는 근접센서(141, proximity sensor), 조도 센서(142, illumination sensor), 터치 센서(touch sensor), 가속도 센서(acceleration sensor), 자기 센서(magnetic sensor), 중력 센서(G-sensor), 자이로스코프 센서(gyroscope sensor), 모션 센서(motion sensor), RGB 센서, 적외선 센서(IR 센서: infrared sensor), 지문인식 센서(finger scan sensor), 초음파 센서(ultrasonic sensor), 광 센서(optical sensor, 예를 들어, 카메라(121 참조)), 마이크로폰(microphone, 122 참조), 배터리 게이지(battery gauge), 환경 센서(예를 들어, 기압계, 습도계, 온도계, 방사능 감지 센서, 열 감지 센서, 가스 감지 센서 등), 화학 센서(예를 들어, 전자 코, 헬스케어 센서, 생체 인식 센서 등) 중 적어도 하나를 포함할 수 있다. 한편, 본 명세서에 개시된 음성인식 에이전트는, 이러한 센서들 중 적어도 둘 이상의 센서에서 센싱되는 정보들을 조합하여 활용할 수 있다.For example, the sensing unit 140 may include a proximity sensor 141, an illumination sensor 142, an illumination sensor, a touch sensor, an acceleration sensor, a magnetic sensor, and gravity. G-sensor, Gyroscope Sensor, Motion Sensor, RGB Sensor, Infrared Sensor, Infrared Sensor, Finger Scan Sensor, Ultrasonic Sensor Optical sensors (e.g. cameras 121), microphones (see 122), battery gauges, environmental sensors (e.g. barometers, hygrometers, thermometers, radiation detection sensors, Thermal sensors, gas sensors, etc.), chemical sensors (eg, electronic noses, healthcare sensors, biometric sensors, etc.). Meanwhile, the voice recognition agent disclosed in the present specification may use a combination of information sensed by at least two or more of these sensors.

출력부(150)는 시각, 청각 또는 촉각 등과 관련된 출력을 발생시키기 위한 것으로, 디스플레이부(151), 음향 출력부(152), 햅팁 모듈(153), 광 출력부(154) 중 적어도 하나를 포함할 수 있다. 디스플레이부(151)는 터치 센서와 상호 레이어 구조를 이루거나 일체형으로 형성됨으로써, 터치 스크린을 구현할 수 있다. 이러한 터치 스크린은, 음성인식 에이전트(100)와 사용자 사이의 입력 인터페이스를 제공하는 사용자 입력부(123)로써 기능함과 동시에, 음성인식 에이전트(100)와 사용자 사이의 출력 인터페이스를 제공할 수 있다.The output unit 150 is used to generate an output related to sight, hearing, or tactile sense, and includes at least one of a display unit 151, an audio output unit 152, a hap tip module 153, and an optical output unit 154. can do. The display unit 151 forms a layer structure with or is integrally formed with the touch sensor, thereby implementing a touch screen. The touch screen may function as a user input unit 123 that provides an input interface between the voice recognition agent 100 and the user, and may also provide an output interface between the voice recognition agent 100 and the user.

인터페이스부(160)는 음성인식 에이전트(100)에 연결되는 다양한 종류의 외부 기기와의 통로 역할을 수행한다. 이러한 인터페이스부(160)는, 유/무선 헤드셋 포트(port), 외부 충전기 포트(port), 유/무선 데이터 포트(port), 메모리 카드(memory card) 포트, 식별 모듈이 구비된 장치를 연결하는 포트(port), 오디오 I/O(Input/Output) 포트(port), 비디오 I/O(Input/Output) 포트(port), 이어폰 포트(port)중 적어도 하나를 포함할 수 있다. 음성인식 에이전트(100)에서는, 상기 인터페이스부(160)에 외부 기기가 연결되는 것에 대응하여, 연결된 외부 기기와 관련된 적절할 제어를 수행할 수 있다.The interface unit 160 serves as a path to various types of external devices connected to the voice recognition agent 100. The interface unit 160 connects a device equipped with a wired / wireless headset port, an external charger port, a wired / wireless data port, a memory card port, and an identification module. It may include at least one of a port, an audio input / output (I / O) port, a video input / output (I / O) port, and an earphone port. In response to the external device being connected to the interface unit 160, the voice recognition agent 100 may perform appropriate control related to the connected external device.

또한, 메모리(170)는 음성인식 에이전트(100)의 다양한 기능을 지원하는 데이터를 저장한다. 메모리(170)는 음성인식 에이전트(100)에서 구동되는 다수의 응용 프로그램(application program 또는 애플리케이션(application)), 음성인식 에이전트(100)의 동작을 위한 데이터들, 명령어들을, 인공 지능부(130)의 동작을 위한 데이터들(예를 들어, 머신 러닝을 위한 적어도 하나의 알고리즘 정보 등)을 저장할 수 있다. 이러한 응용 프로그램 중 적어도 일부는, 무선 통신을 통해 외부 서버로부터 다운로드 될 수 있다. 또한 이러한 응용 프로그램 중 적어도 일부는, 음성인식 에이전트(100)의 기본적인 기능(예를 들어, 전화 착신, 발신 기능, 메시지 수신, 발신 기능)을 위하여 출고 당시부터 음성인식 에이전트(100)상에 존재할 수 있다. 한편, 응용 프로그램은, 메모리(170)에 저장되고, 음성인식 에이전트(100) 상에 설치되어, 제어부(180)에 의하여 상기 음성인식 에이전트의 동작(또는 기능)을 수행하도록 구동될 수 있다.In addition, the memory 170 stores data supporting various functions of the voice recognition agent 100. Memory 170 is a plurality of application programs (application program or application) running in the voice recognition agent 100, data for the operation of the voice recognition agent 100, instructions, the artificial intelligence unit 130 Data for operation of (eg, at least one algorithm information for machine learning, etc.). At least some of these applications may be downloaded from an external server through wireless communication. In addition, at least some of these applications may exist on the voice recognition agent 100 from the time of shipment for the basic functions (eg, call forwarding, call forwarding, message reception, call forwarding) of the voice recognition agent 100. have. The application program may be stored in the memory 170 and installed on the voice recognition agent 100 to be driven by the controller 180 to perform an operation (or function) of the voice recognition agent.

제어부(180)는 상기 응용 프로그램과 관련된 동작 외에도, 통상적으로 음성인식 에이전트(100)의 전반적인 동작을 제어한다. 제어부(180)는 위에서 살펴본 구성요소들을 통해 입력 또는 출력되는 신호, 데이터, 정보 등을 처리하거나 메모리(170)에 저장된 응용 프로그램을 구동함으로써, 사용자에게 적절한 정보 또는 기능을 제공 또는 처리할 수 있다.In addition to the operation related to the application program, the controller 180 typically controls the overall operation of the voice recognition agent 100. The controller 180 may provide or process information or a function appropriate to a user by processing signals, data, information, and the like, which are input or output through the above-described components, or driving an application program stored in the memory 170.

또한, 제어부(180)는 메모리(170)에 저장된 응용 프로그램을 구동하기 위하여, 도 1a와 함께 살펴본 구성요소들 중 적어도 일부를 제어할 수 있다. 나아가, 제어부(180)는 상기 응용프로그램의 구동을 위하여, 음성인식 에이전트(100)에 포함된 구성요소들 중 적어도 둘 이상을 서로 조합하여 동작시킬 수 있다.In addition, the controller 180 may control at least some of the components described with reference to FIG. 1A in order to drive an application program stored in the memory 170. In addition, the controller 180 may operate by combining at least two or more of the components included in the voice recognition agent 100 to drive the application program.

전원공급부(190)는 제어부(180)의 제어 하에서, 외부의 전원, 내부의 전원을 인가 받아 음성인식 에이전트(100)에 포함된 각 구성요소들에 전원을 공급한다. 이러한 전원공급부(190)는 배터리를 포함하며, 상기 배터리는 내장형 배터리 또는 교체 가능한 형태의 배터리가 될 수 있다.The power supply unit 190 receives power from an external power source and an internal power source under the control of the controller 180 to supply power to each component included in the voice recognition agent 100. The power supply unit 190 includes a battery, which may be a built-in battery or a replaceable battery.

이하에서는, 위에서 살펴본 음성인식 에이전트(100)를 통하여 구현되는 다양한 실시 예들을 살펴보기에 앞서, 위에서 열거된 구성요소들에 대하여 도 2를 참조하여 보다 구체적으로 살펴본다.Hereinafter, the components listed above will be described in detail with reference to FIG. 2 before looking at various embodiments implemented through the voice recognition agent 100 described above.

먼저, 무선 통신부(110)에 대하여 살펴보면, 무선 통신부(110)의 방송 수신 모듈(111)은 방송 채널을 통하여 외부의 방송 관리 서버로부터 방송 신호 및/또는 방송 관련된 정보를 수신한다. 상기 방송 채널은 위성 채널, 지상파 채널을 포함할 수 있다. 적어도 두 개의 방송 채널들에 대한 동시 방송 수신 또는 방송 채널 스위칭을 위해 둘 이상의 상기 방송 수신 모듈이 상기 이동단말기(100)에 제공될 수 있다.First, referring to the wireless communication unit 110, the broadcast receiving module 111 of the wireless communication unit 110 receives a broadcast signal and / or broadcast related information from an external broadcast management server through a broadcast channel. The broadcast channel may include a satellite channel and a terrestrial channel. Two or more broadcast receiving modules may be provided to the mobile terminal 100 for simultaneous broadcast reception or broadcast channel switching for at least two broadcast channels.

상기 방송 관리 서버는, 방송 신호 및/또는 방송 관련 정보를 생성하여 송신하는 서버 또는 기 생성된 방송 신호 및/또는 방송 관련 정보를 제공받아 단말기에 송신하는 서버를 의미할 수 있다. 상기 방송 신호는, TV 방송 신호, 라디오 방송 신호, 데이터 방송 신호를 포함할 뿐만 아니라, TV 방송 신호 또는 라디오 방송 신호에 데이터 방송 신호가 결합한 형태의 방송 신호도 포함할 수 있다. The broadcast management server may mean a server that generates and transmits a broadcast signal and / or broadcast related information or a server that receives a previously generated broadcast signal and / or broadcast related information and transmits the same to a terminal. The broadcast signal may include not only a TV broadcast signal, a radio broadcast signal, and a data broadcast signal, but also a broadcast signal having a data broadcast signal combined with a TV broadcast signal or a radio broadcast signal.

상기 방송 신호는 디지털 방송 신호의 송수신을 위한 기술표준들(또는방송방식, 예를들어, ISO, IEC, DVB, ATSC 등) 중 적어도 하나에 따라 부호화될 수 있으며, 방송 수신 모듈(111)은 상기 기술 표준들에서 정한 기술규격에 적합한 방식을 이용하여 상기 디지털 방송 신호를 수신할 수 있다.The broadcast signal may be encoded according to at least one of technical standards for transmitting / receiving a digital broadcast signal (or a broadcast method, for example, ISO, IEC, DVB, ATSC, etc.), and the broadcast receiving module 111 may The digital broadcast signal may be received using a method suitable for the technical standard set by the technical standards.

상기 방송 관련 정보는, 방송 채널, 방송 프로그램 또는 방송 서비스 제공자에 관련된 정보를 의미할 수 있다. 상기 방송 관련 정보는, 이동통신망을 통하여도 제공될 수 있다. 이러한 경우에는 상기 이동통신 모듈(112)에 의해 수신될 수 있다. The broadcast associated information may mean information related to a broadcast channel, a broadcast program, or a broadcast service provider. The broadcast related information may also be provided through a mobile communication network. In this case, it may be received by the mobile communication module 112.

상기 방송 관련 정보는 예를 들어, DMB(Digital Multimedia Broadcasting)의 EPG(Electronic Program Guide) 또는 DVB-H(Digital Video Broadcast-Handheld)의 ESG(Electronic Service Guide) 등의 다양한 형태로 존재할 수 있다. 방송 수신 모듈(111)을 통해 수신된 방송 신호 및/또는 방송 관련 정보는 메모리(160)에 저장될 수 있다.The broadcast related information may exist in various forms such as an electronic program guide (EPG) of digital multimedia broadcasting (DMB) or an electronic service guide (ESG) of digital video broadcast-handheld (DVB-H). The broadcast signal and / or broadcast related information received through the broadcast receiving module 111 may be stored in the memory 160.

이동통신 모듈(112)은, 이동통신을 위한 기술표준들 또는 통신방식(예를 들어, GSM(Global System for Mobile communication), CDMA(Code Division Multi Access), CDMA2000(Code Division Multi Access 2000), EV-DO(Enhanced Voice-Data Optimized or Enhanced Voice-Data Only), WCDMA(Wideband CDMA), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced) 등)에 따라 구축된 이동 통신망 상에서 기지국, 외부의 단말, 서버 중 적어도 하나와 무선 신호를 송수신한다. The mobile communication module 112 may include technical standards or communication schemes (eg, Global System for Mobile communication (GSM), Code Division Multi Access (CDMA), Code Division Multi Access 2000 (CDMA2000), EV, etc.) for mobile communication. Enhanced Voice-Data Optimized or Enhanced Voice-Data Only (DO), Wideband CDMA (WCDMA), High Speed Downlink Packet Access (HSDPA), High Speed Uplink Packet Access (HSUPA), Long Term Evolution (LTE), LTE-A (Long Term Evolution-Advanced) and the like to transmit and receive a radio signal with at least one of a base station, an external terminal, a server on a mobile communication network.

상기 무선 신호는, 음성 호 신호, 화상 통화 호 신호 또는 문자/멀티미디어 메시지 송수신에 따른 다양한 형태의 데이터를 포함할 수 있다. The wireless signal may include various types of data according to transmission and reception of a voice call signal, a video call signal, or a text / multimedia message.

무선 인터넷 모듈(113)은 무선 인터넷 접속을 위한 모듈을 말하는 것으로, 음성인식 에이전트(100)에 내장되거나 외장될 수 있다. 무선 인터넷 모듈(113)은 무선 인터넷 기술들에 따른 통신망에서 무선 신호를 송수신하도록 이루어진다.The wireless internet module 113 refers to a module for wireless internet access and may be embedded or external to the voice recognition agent 100. The wireless internet module 113 is configured to transmit and receive wireless signals in a communication network according to wireless internet technologies.

무선 인터넷 기술로는, 예를 들어 WLAN(Wireless LAN), Wi-Fi(Wireless-Fidelity), Wi-Fi(Wireless Fidelity) Direct, DLNA(Digital Living Network Alliance), WiBro(Wireless Broadband), WiMAX(World Interoperability for Microwave Access), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced) 등이 있으며, 상기 무선 인터넷 모듈(113)은 상기에서 나열되지 않은 인터넷 기술까지 포함한 범위에서 적어도 하나의 무선 인터넷 기술에 따라 데이터를 송수신하게 된다.Wireless Internet technologies include, for example, Wireless LAN (WLAN), Wireless-Fidelity (Wi-Fi), Wireless Fidelity (Wi-Fi) Direct, Digital Living Network Alliance (DLNA), Wireless Broadband (WiBro), and WiMAX (World Interoperability for Microwave Access (HSDPA), High Speed Downlink Packet Access (HSDPA), High Speed Uplink Packet Access (HSUPA), Long Term Evolution (LTE), Long Term Evolution-Advanced (LTE-A), and the like. 113) transmits and receives data according to at least one wireless Internet technology in a range including the Internet technologies not listed above.

WiBro, HSDPA, HSUPA, GSM, CDMA, WCDMA, LTE, LTE-A 등에 의한 무선인터넷 접속은 이동통신망을 통해 이루어진다는 관점에서 본다면, 상기 이동통신망을 통해 무선인터넷 접속을 수행하는 상기 무선 인터넷 모듈(113)은 상기 이동통신 모듈(112)의 일종으로 이해될 수도 있다.In view of the fact that the wireless Internet access by WiBro, HSDPA, HSUPA, GSM, CDMA, WCDMA, LTE, LTE-A, etc. is made through a mobile communication network, the wireless Internet module 113 for performing a wireless Internet access through the mobile communication network 113 ) May be understood as a kind of mobile communication module 112.

근거리 통신 모듈(114)은 근거리 통신(Short range communication)을 위한 것으로서, 블루투스(Bluetooth™), RFID(Radio Frequency Identification), 적외선 통신(Infrared Data Association; IrDA), UWB(Ultra Wideband), ZigBee, NFC(Near Field Communication), Wi-Fi(Wireless-Fidelity), Wi-Fi Direct, Wireless USB(Wireless Universal Serial Bus) 기술 중 적어도 하나를 이용하여, 근거리 통신을 지원할 수 있다. 이러한, 근거리 통신 모듈(114)은, 근거리 무선 통신망(Wireless Area Networks)을 통해 음성인식 에이전트(100)와 무선 통신 시스템 사이, 음성인식 에이전트(100)와 다른 음성인식 에이전트(100) 사이, 또는 음성인식 에이전트(100)와 다른 음성인식 에이전트(100, 또는 외부서버)가 위치한 네트워크 사이의 무선 통신을 지원할 수 있다. 상기 근거리 무선 통신망은 근거리 무선 개인 통신망(Wireless Personal Area Networks)일 수 있다.The short range communication module 114 is for short range communication, and includes Bluetooth ™, Radio Frequency Identification (RFID), Infrared Data Association (IrDA), Ultra Wideband (UWB), ZigBee, and NFC. (Near Field Communication), Wi-Fi (Wireless-Fidelity), Wi-Fi Direct, Wireless USB (Wireless Universal Serial Bus) by using at least one of the technologies, it can support near field communication. The short-range communication module 114 may be configured between a voice recognition agent 100 and a wireless communication system, between the voice recognition agent 100 and another voice recognition agent 100, or a voice through a wireless area network. Wireless communication between the recognition agent 100 and the network in which the other voice recognition agent 100 or an external server is located may be supported. The short range wireless communication network may be short range wireless personal area networks.

여기에서, 다른 음성인식 에이전트(100)는 본 발명에 따른 음성인식 에이전트(100)와 데이터를 상호 교환하는 것이 가능한(또는 연동 가능한) 웨어러블 디바이스(wearable device, 예를 들어, 스마트워치(smartwatch), 스마트 글래스(smart glass), HMD(head mounted display))가 될 수 있다. 근거리 통신 모듈(114)은, 음성인식 에이전트(100) 주변에, 상기 음성인식 에이전트(100)와 통신 가능한 웨어러블 디바이스를 감지(또는 인식)할 수 있다. 나아가, 제어부(180)는 상기 감지된 웨어러블 디바이스가 본 발명에 따른 음성인식 에이전트(100)와 통신하도록 인증된 디바이스인 경우, 음성인식 에이전트(100)에서 처리되는 데이터의 적어도 일부를, 상기 근거리 통신 모듈(114)을 통해 웨어러블 디바이스로 전송할 수 있다. 따라서, 웨어러블 디바이스의 사용자는, 음성인식 에이전트(100)에서 처리되는 데이터를, 웨어러블 디바이스를 통해 이용할 수 있다. 예를 들어, 이에 따르면 사용자는, 음성인식 에이전트(100)에 전화가 수신된 경우, 웨어러블 디바이스를 통해 전화 통화를 수행하거나, 음성인식 에이전트(100)에 메시지가 수신된 경우, 웨어러블 디바이스를 통해 상기 수신된 메시지를 확인하는 것이 가능하다.Here, the other voice recognition agent 100 may be a wearable device capable of exchanging data (or interworking with) the voice recognition agent 100 according to the present invention, for example, a smartwatch, It may be a smart glass, a head mounted display (HMD). The short range communication module 114 may detect (or recognize) a wearable device that can communicate with the voice recognition agent 100 around the voice recognition agent 100. In addition, when the detected wearable device is a device that is authenticated to communicate with the voice recognition agent 100 according to the present invention, the controller 180 transmits at least a portion of data processed by the voice recognition agent 100 to the near field communication. The module 114 may transmit the data to the wearable device. Therefore, the user of the wearable device may use data processed by the voice recognition agent 100 through the wearable device. For example, according to this, when a call is received by the voice recognition agent 100, the user performs a phone call through the wearable device or when a message is received by the voice recognition agent 100, the wearable device provides the call. It is possible to confirm the received message.

위치정보 모듈(115)은 음성인식 에이전트의 위치(또는 현재 위치)를 획득하기 위한 모듈로서, 그의 대표적인 예로는 GPS(Global Positioning System) 모듈 또는 WiFi(Wireless Fidelity) 모듈이 있다. 예를 들어, 음성인식 에이전트는 GPS모듈을 활용하면, GPS 위성에서 보내는 신호를 이용하여 음성인식 에이전트의 위치를 획득할 수 있다. The location information module 115 is a module for obtaining the location (or current location) of the voice recognition agent, and a representative example thereof is a Global Positioning System (GPS) module or a Wireless Fidelity (WiFi) module. For example, if the voice recognition agent utilizes a GPS module, the voice recognition agent may acquire a location of the voice recognition agent by using a signal transmitted from a GPS satellite.

다른 예로서, 음성인식 에이전트는 Wi-Fi모듈을 활용하면, Wi-Fi모듈과 무선신호를 송신 또는 수신하는 무선 AP(Wireless Access Point)의 정보에 기반하여, 음성인식 에이전트의 위치를 획득할 수 있다. 필요에 따라서, 위치정보모듈(115)은 치환 또는 부가적으로 음성인식 에이전트의 위치에 관한 데이터를 얻기 위해 무선 통신부(110)의 다른 모듈 중 어느 기능을 수행할 수 있다. 위치정보모듈(115)은 음성인식 에이전트의 위치(또는 현재 위치)를 획득하기 위해 이용되는 모듈로, 음성인식 에이전트의 위치를 직접적으로 계산하거나 획득하는 모듈로 한정되지는 않는다.As another example, when the voice recognition agent utilizes the Wi-Fi module, the voice recognition agent may acquire the location of the voice recognition agent based on the information of the wireless access point (AP) transmitting or receiving the Wi-Fi module and the wireless signal. have. If necessary, the location information module 115 may perform any function of the other modules of the wireless communication unit 110 to obtain data regarding the location of the voice recognition agent. The location information module 115 is a module used to obtain the location (or current location) of the voice recognition agent, and is not limited to a module that directly calculates or obtains the location of the voice recognition agent.

다음으로, 입력부(120)는 영상 정보(또는 신호), 오디오 정보(또는 신호), 데이터, 또는 사용자로부터 입력되는 정보의 입력을 위한 것으로서, 영상 정보의 입력을 위하여, 음성인식 에이전트(100) 는 하나 또는 복수의 카메라(121)를 구비할 수 있다. 카메라(121)는 화상 통화모드 또는 촬영 모드에서 이미지 센서에 의해 얻어지는 정지영상 또는 동영상 등의 화상 프레임을 처리한다. 처리된 화상 프레임은 디스플레이부(151)에 표시되거나 메모리(170)에 저장될 수 있다. 한편, 음성인식 에이전트(100)에 구비되는 복수의 카메라(121)는 매트릭스 구조를 이루도록 배치될 수 있으며, 이와 같이 매트릭스 구조를 이루는 카메라(121)를 통하여, 음성인식 에이전트(100)에는 다양한 각도 또는 초점을 갖는 복수의 영상정보가 입력될 수 있다. 또한, 복수의 카메라(121)는 입체영상을 구현하기 위한 좌 영상 및 우 영상을 획득하도록, 스트레오 구조로 배치될 수 있다.Next, the input unit 120 is for inputting image information (or signal), audio information (or signal), data, or information input from a user. For input of image information, the voice recognition agent 100 is One or a plurality of cameras 121 may be provided. The camera 121 processes image frames such as still images or moving images obtained by the image sensor in the video call mode or the photographing mode. The processed image frame may be displayed on the display unit 151 or stored in the memory 170. On the other hand, the plurality of cameras 121 provided in the voice recognition agent 100 may be arranged to form a matrix structure, through the camera 121 forming a matrix structure in this way, the voice recognition agent 100 has a variety of angles or A plurality of image information having a focus may be input. In addition, the plurality of cameras 121 may be arranged in a stereo structure to acquire a left image and a right image for implementing a stereoscopic image.

마이크로폰(122)은 외부의 음향 신호를 전기적인 음성 데이터로 처리한다. 처리된 음성 데이터는 음성인식 에이전트(100)에서 수행 중인 기능(또는 실행 중인 응용 프로그램)에 따라 다양하게 활용될 수 있다. 한편, 마이크로폰(122)에는 외부의 음향 신호를 입력 받는 과정에서 발생되는 잡음(noise)을 제거하기 위한 다양한 잡음 제거 알고리즘이 구현될 수 있다.The microphone 122 processes external sound signals into electrical voice data. The processed voice data may be utilized in various ways depending on the function (or running application program) being performed by the voice recognition agent 100. Meanwhile, various noise reduction algorithms may be implemented in the microphone 122 to remove noise generated in the process of receiving an external sound signal.

사용자 입력부(123)는 사용자로부터 정보를 입력 받기 위한 것으로서, 사용자 입력부(123)를 통해 정보가 입력되면, 제어부(180)는 입력된 정보에 대응되도록 음성인식 에이전트(100)의 동작을 제어할 수 있다. 이러한, 사용자 입력부(123)는 기계식 (mechanical) 입력수단(또는, 메커니컬 키, 예를 들어, 음성인식 에이전트(100)의 전?후면 또는 측면에 위치하는 버튼, 돔 스위치 (dome switch), 조그 휠, 조그 스위치 등) 및 터치식 입력수단을 포함할 수 있다. 일 예로서, 터치식 입력수단은, 소프트웨어적인 처리를 통해 터치스크린에 표시되는 가상 키(virtual key), 소프트 키(soft key) 또는 비주얼 키(visual key)로 이루어지거나, 상기 터치스크린 이외의 부분에 배치되는 터치 키(touch key)로 이루어질 수 있 한편, 상기 가상키 또는 비주얼 키는, 다양한 형태를 가지면서 터치스크린 상에 표시되는 것이 가능하며, 예를 들어, 그래픽(graphic), 텍스트(text), 아이콘(icon), 비디오(video) 또는 이들의 조합으로 이루어질 수 있다.The user input unit 123 is for receiving information from a user. When information is input through the user input unit 123, the controller 180 may control an operation of the voice recognition agent 100 to correspond to the input information. have. The user input unit 123 may be a mechanical input unit (or a mechanical key, for example, a button, a dome switch, or a jog wheel located at the front or rear or side of the voice recognition agent 100). , Jog switch, etc.) and touch input means. As an example, the touch input means may include a virtual key, a soft key, or a visual key displayed on the touch screen through a software process, or a portion other than the touch screen. The virtual key or the visual key may be displayed on the touch screen while having various forms, for example, a graphic or text. ), An icon, a video, or a combination thereof.

한편, 센싱부(140)는 음성인식 에이전트 내 정보, 음성인식 에이전트를 둘러싼 주변 환경 정보 및 사용자 정보 중 적어도 하나를 센싱하고, 이에 대응하는 센싱 신호를 발생시킨다. 제어부(180)는 이러한 센싱 신호에 기초하여, 음성인식 에이전트(100)의 구동 또는 동작을 제어하거나, 음성인식 에이전트(100)에 설치된 응용 프로그램과 관련된 데이터 처리, 기능 또는 동작을 수행 할 수 있다. 센싱부(140)에 포함될 수 있는 다양한 센서 중 대표적인 센서들의 대하여, 보다 구체적으로 살펴본다.Meanwhile, the sensing unit 140 senses at least one of information in the voice recognition agent, surrounding environment information surrounding the voice recognition agent, and user information, and generates a sensing signal corresponding thereto. The controller 180 may control driving or operation of the voice recognition agent 100 or perform data processing, function or operation related to an application program installed in the voice recognition agent 100 based on the sensing signal. Representative sensors among various sensors that may be included in the sensing unit 140 will be described in more detail.

먼저, 근접 센서(141)는 소정의 검출면에 접근하는 물체, 혹은 근방에 존재하는 물체의 유무를 전자계의 힘 또는 적외선 등을 이용하여 기계적 접촉이 없이 검출하는 센서를 말한다. 이러한 근접 센서(141)는 위에서 살펴본 터치 스크린에 의해 감싸지는 음성인식 에이전트의 내부 영역 또는 상기 터치 스크린의 근처에 근접 센서(141)가 배치될 수 있다. First, the proximity sensor 141 refers to a sensor that detects the presence or absence of an object approaching a predetermined detection surface or an object present in the vicinity without using a mechanical contact by using an electromagnetic force or infrared rays. The proximity sensor 141 may be disposed in the inner region of the voice recognition agent covered by the touch screen described above or near the touch screen.

근접 센서(141)의 예로는 투과형 광전 센서, 직접 반사형 광전 센서, 미러 반사형 광전 센서, 고주파 발진형 근접 센서, 정전 용량형 근접 센서, 자기형 근접 센서, 적외선 근접 센서 등이 있다. 터치 스크린이 정전식인 경우에, 근접 센서(141)는 전도성을 갖는 물체의 근접에 따른 전계의 변화로 상기 물체의 근접을 검출하도록 구성될 수 있다. 이 경우 터치 스크린(또는 터치 센서) 자체가 근접 센서로 분류될 수 있다. Examples of the proximity sensor 141 include a transmission photoelectric sensor, a direct reflection photoelectric sensor, a mirror reflection photoelectric sensor, a high frequency oscillation proximity sensor, a capacitive proximity sensor, a magnetic proximity sensor, and an infrared proximity sensor. In the case where the touch screen is capacitive, the proximity sensor 141 may be configured to detect the proximity of the object with the change of the electric field according to the proximity of the conductive object. In this case, the touch screen (or touch sensor) itself may be classified as a proximity sensor.

한편, 설명의 편의를 위해, 터치 스크린 상에 물체가 접촉되지 않으면서 근접되어 상기 물체가 상기 터치 스크린 상에 위치함이 인식되도록 하는 행위를 "근접 터치(proximity touch)"라고 명명하고, 상기 터치 스크린 상에 물체가 실제로 접촉되는 행위를 "접촉 터치(contact touch)"라고 명명한다. 상기 터치 스크린 상에서 물체가 근접 터치 되는 위치라 함은, 상기 물체가 근접 터치될 때 상기 물체가 상기 터치 스크린에 대해 수직으로 대응되는 위치를 의미한다. 상기 근접 센서(141)는, 근접 터치와, 근접 터치 패턴(예를 들어, 근접 터치 거리, 근접 터치 방향, 근접 터치 속도, 근접 터치 시간, 근접 터치 위치, 근접 터치 이동 상태 등)을 감지할 수 있다. On the other hand, for convenience of description, the action of allowing the object to be recognized without being in contact with the touch screen so that the object is located on the touch screen is referred to as "proximity touch", and the touch The act of actually touching an object on the screen is called a "contact touch." The position where an object is in close proximity touch on the touch screen means a position where the object is perpendicular to the touch screen when the object is in close proximity touch. The proximity sensor 141 may detect a proximity touch and a proximity touch pattern (for example, a proximity touch distance, a proximity touch direction, a proximity touch speed, a proximity touch time, a proximity touch position, and a proximity touch movement state). have.

한편, 제어부(180)는 위와 같이, 근접 센서(141)를 통해 감지된 근접 터치 동작 및 근접 터치 패턴에 상응하는 데이터(또는 정보)를 처리하며, 나아가, 처리된 데이터에 대응하는 시각적인 정보를 터치 스크린상에 출력시킬 수 있다. 나아가, 제어부(180)는, 터치 스크린 상의 동일한 지점에 대한 터치가, 근접 터치인지 또는 접촉 터치인지에 따라, 서로 다른 동작 또는 데이터(또는 정보)가 처리되도록 음성인식 에이전트(100)를 제어할 수 있다.Meanwhile, the controller 180 processes data (or information) corresponding to the proximity touch operation and the proximity touch pattern detected through the proximity sensor 141 as described above, and further, provides visual information corresponding to the processed data. It can be output on the touch screen. Furthermore, the controller 180 may control the voice recognition agent 100 to process different operations or data (or information) according to whether the touch on the same point on the touch screen is a proximity touch or a touch touch. have.

터치 센서는 저항막 방식, 정전용량 방식, 적외선 방식, 초음파 방식, 자기장 방식 등 여러 가지 터치방식 중 적어도 하나를 이용하여 터치 스크린(또는 디스플레이부(151))에 가해지는 터치(또는 터치입력)을 감지한다.The touch sensor applies a touch (or touch input) applied to the touch screen (or display unit 151) using at least one of various touch methods such as a resistive film method, a capacitive method, an infrared method, an ultrasonic method, and a magnetic field method. Detect.

일 예로서, 터치 센서는, 터치 스크린의 특정 부위에 가해진 압력 또는 특정 부위에 발생하는 정전 용량 등의 변화를 전기적인 입력신호로 변환하도록 구성될 수 있다. 터치 센서는, 터치 스크린 상에 터치를 가하는 터치 대상체가 터치 센서 상에 터치 되는 위치, 면적, 터치 시의 압력, 터치 시의 정전 용량 등을 검출할 수 있도록 구성될 수 있다. 여기에서, 터치 대상체는 상기 터치 센서에 터치를 인가하는 물체로서, 예를 들어, 손가락, 터치펜 또는 스타일러스 펜(Stylus pen), 포인터 등이 될 수 있다. As an example, the touch sensor may be configured to convert a change in pressure applied to a specific portion of the touch screen or capacitance generated at the specific portion into an electrical input signal. The touch sensor may be configured to detect a position, an area, a pressure at the touch, a capacitance at the touch, and the like, when the touch object applying the touch on the touch screen is touched on the touch sensor. Here, the touch object is an object applying a touch to the touch sensor and may be, for example, a finger, a touch pen or a stylus pen, a pointer, or the like.

이와 같이, 터치 센서에 대한 터치 입력이 있는 경우, 그에 대응하는 신호(들)는 터치 제어기로 보내진다. 터치 제어기는 그 신호(들)를 처리한 다음 대응하는 데이터를 제어부(180)로 전송한다. 이로써, 제어부(180)는 디스플레이부(151)의 어느 영역이 터치 되었는지 여부 등을 알 수 있게 된다. 여기에서, 터치 제어기는, 제어부(180)와 별도의 구성요소일 수 있고, 제어부(180) 자체일 수 있다.As such, when there is a touch input to the touch sensor, the corresponding signal (s) is sent to the touch controller. The touch controller processes the signal (s) and then transmits the corresponding data to the controller 180. As a result, the controller 180 can determine which area of the display unit 151 is touched. Here, the touch controller may be a separate component from the controller 180 or may be the controller 180 itself.

한편, 제어부(180)는, 터치 스크린(또는 터치 스크린 이외에 구비된 터치키)을 터치하는, 터치 대상체의 종류에 따라 서로 다른 제어를 수행하거나, 동일한 제어를 수행할 수 있다. 터치 대상체의 종류에 따라 서로 다른 제어를 수행할지 또는 동일한 제어를 수행할 지는, 현재 음성인식 에이전트(100)의 동작상태 또는 실행 중인 응용 프로그램에 따라 결정될 수 있다. The controller 180 may perform different control or perform the same control according to the type of the touch object, which touches the touch screen (or a touch key provided in addition to the touch screen). Whether to perform different control or the same control according to the type of touch object may be determined according to the operation state of the voice recognition agent 100 or an application program being executed.

한편, 위에서 살펴본 터치 센서 및 근접 센서는 독립적으로 또는 조합되어, 터치 스크린에 대한 숏(또는 탭) 터치(short touch), 롱 터치(long touch), 멀티 터치(multi touch), 드래그 터치(drag touch), 플리크 터치(flick touch), 핀치-인 터치(pinch-in touch), 핀치-아웃 터치(pinch-out 터치), 스와이프(swype) 터치, 호버링(hovering) 터치 등과 같은, 다양한 방식의 터치를 센싱할 수 있다.Meanwhile, the touch sensor and the proximity sensor described above may be independently or combined, and may be a short (or tap) touch, a long touch, a multi touch, a drag touch on a touch screen. ), Flick touch, pinch-in touch, pinch-out touch, swipe touch, hovering touch, etc. A touch can be sensed.

초음파 센서는 초음파를 이용하여, 감지대상의 위치정보를 인식할 수 있다. 한편 제어부(180)는 광 센서와 복수의 초음파 센서로부터 감지되는 정보를 통해, 파동 발생원의 위치를 산출하는 것이 가능하다. 파동 발생원의 위치는, 광이 초음파보다 매우 빠른 성질, 즉, 광이 광 센서에 도달하는 시간이 초음파가 초음파 센서에 도달하는 시간보다 매우 빠름을 이용하여, 산출될 수 있다. 보다 구체적으로 광을 기준 신호로 초음파가 도달하는 시간과의 시간차를 이용하여 파동 발생원의 위치가 산출될 수 있다.The ultrasonic sensor may recognize location information of a sensing object using ultrasonic waves. On the other hand, the controller 180 can calculate the position of the wave generation source through the information detected from the optical sensor and the plurality of ultrasonic sensors. The position of the wave source can be calculated using the property that light is much faster than ultrasonic waves, i.e., the time that the light reaches the optical sensor is much faster than the time when the ultrasonic wave reaches the ultrasonic sensor. More specifically, the position of the wave generation source may be calculated using a time difference from the time when the ultrasonic wave reaches the light as the reference signal.

한편, 입력부(120)의 구성으로 살펴본, 카메라(121)는 카메라 센서(예를 들어, CCD, CMOS 등), 포토 센서(또는 이미지 센서) 및 레이저 센서 중 적어도 하나를 포함한다.On the other hand, the camera 121, which is described as the configuration of the input unit 120, includes at least one of a camera sensor (for example, CCD, CMOS, etc.), a photo sensor (or an image sensor) and a laser sensor.

카메라(121)와 레이저 센서는 서로 조합되어, 3차원 입체영상에 대한 감지대상의 터치를 감지할 수 있다. 포토 센서는 디스플레이 소자에 적층될 수 있는데, 이러한 포토 센서는 터치 스크린에 근접한 감지대상의 움직임을 스캐닝하도록 이루어진다. 보다 구체적으로, 포토 센서는 행/열에 Photo Diode와 TR(Transistor)를 실장하여 Photo Diode에 인가되는 빛의 양에 따라 변화되는 전기적 신호를 이용하여 포토 센서 위에 올려지는 내용물을 스캔한다. 즉, 포토 센서는 빛의 변화량에 따른 감지대상의 좌표 계산을 수행하며, 이를 통하여 감지대상의 위치정보가 획득될 수 있다.The camera 121 and the laser sensor may be combined with each other to detect a touch of a sensing object on a 3D stereoscopic image. The photo sensor may be stacked on the display element, which is configured to scan the movement of the sensing object in proximity to the touch screen. More specifically, the photo sensor mounts a photo diode and a transistor (TR) in a row / column and scans contents mounted on the photo sensor by using an electrical signal that varies according to the amount of light applied to the photo diode. That is, the photo sensor calculates coordinates of the sensing object according to the amount of change of light, and thus the position information of the sensing object can be obtained.

디스플레이부(151)는 음성인식 에이전트(100)에서 처리되는 정보를 표시(출력)한다. 예를 들어, 디스플레이부(151)는 음성인식 에이전트(100)에서 구동되는 응용 프로그램의 실행화면 정보, 또는 이러한 실행화면 정보에 따른 UI(User Interface), GUI(Graphic User Interface) 정보를 표시할 수 있다. The display unit 151 displays (outputs) information processed by the voice recognition agent 100. For example, the display unit 151 may display execution screen information of an application program driven by the voice recognition agent 100 or UI (User Interface) or Graphic User Interface (GUI) information according to the execution screen information. have.

또한, 상기 디스플레이부(151)는 입체영상을 표시하는 입체 디스플레이부로서 구성될 수 있다. 상기 입체 디스플레이부에는 스테레오스코픽 방식(안경 방식), 오토 스테레오스코픽 방식(무안경 방식), 프로젝션 방식(홀로그래픽 방식) 등의 3차원 디스플레이 방식이 적용될 수 있다.In addition, the display unit 151 may be configured as a stereoscopic display unit for displaying a stereoscopic image. The stereoscopic display unit may be a three-dimensional display method such as a stereoscopic method (glasses method), an auto stereoscopic method (glasses-free method), a projection method (holographic method).

일반적으로 3차원 입체 영상은 좌 영상(좌안용 영상)과 우 영상(우안용 영상)으로 구성된다. 좌 영상과 우 영상이 3차원 입체 영상으로 합쳐지는 방식에 따라, 좌 영상과 우 영상을 한 프레임 내 상하로 배치하는 탑-다운(top-down) 방식, 좌 영상과 우 영상을 한 프레임 내 좌우로 배치하는 L-to-R(left-to-right, side by side) 방식, 좌 영상과 우 영상의 조각들을 타일 형태로 배치하는 체커 보드(checker board) 방식, 좌 영상과 우 영상을 열 단위 또는 행 단위로 번갈아 배치하는 인터레이스드(interlaced) 방식, 그리고 좌 영상과 우 영상을 시간 별로 번갈아 표시하는 시분할(time sequential, frame by frame) 방식 등으로 나뉜다.Generally, a 3D stereoscopic image is composed of a left image (left eye image) and a right image (right eye image). A top-down method in which the left and right images are arranged up and down in one frame according to the way in which the left and right images are combined into a 3D stereoscopic image, and the left and right images are in the left and right in one frame. L-to-R (left-to-right, side by side) method that is arranged as a checker board method to arrange the pieces of the left and right images in the form of tiles, and the left and right images in columns Or an interlaced method of alternately arranging rows, and a time sequential (frame by frame) method of alternately displaying left and right images by time.

또한, 3차원 썸네일 영상은 원본 영상 프레임의 좌 영상 및 우 영상으로부터 각각 좌 영상 썸네일 및 우 영상 썸네일을 생성하고, 이들이 합쳐짐에 따라 하나의 영상으로 생성될 수 있다. 일반적으로 썸네일(thumbnail)은 축소된 화상 또는 축소된 정지영상을 의미한다. 이렇게 생성된 좌 영상 썸네일과 우 영상 썸네일은 좌 영상과 우 영상의 시차에 대응하는 깊이감(depth)만큼 화면 상에서 좌우 거리차를 두고 표시됨으로써 입체적인 공간감을 나타낼 수 있다.In addition, the 3D thumbnail image may generate a left image thumbnail and a right image thumbnail from the left image and the right image of the original image frame, respectively, and may be generated as one image as they are combined. In general, a thumbnail refers to a reduced image or a reduced still image. The left image thumbnail and the right image thumbnail generated as described above may be displayed with a three-dimensional space by displaying left and right distances on the screen by a depth corresponding to the parallax between the left and right images.

3차원 입체영상의 구현에 필요한 좌 영상과 우 영상은 입체 처리부에 의하여 입체 디스플레이부에 표시될 수 있다. 입체 처리부는 3D 영상(기준시점의 영상과 확장시점의 영상)을 입력 받아 이로부터 좌 영상과 우 영상을 설정하거나, 2D 영상을 입력 받아 이를 좌 영상과 우 영상으로 전환하도록 이루어진다.The left image and the right image necessary for implementing the 3D stereoscopic image may be displayed on the stereoscopic display by the stereoscopic processing unit. The stereoscopic processor is configured to receive a 3D image (an image of a reference time point and an image of an extended time point) and set a left image and a right image therefrom, or to receive a 2D image and convert it to a left image and a right image.

음향 출력부(152)는 호신호 수신, 통화모드 또는 녹음 모드, 음성인식 모드, 방송수신 모드 등에서 무선 통신부(110)로부터 수신되거나 메모리(170)에 저장된 오디오 데이터를 출력할 수 있다. 음향출력부(152)는 음성인식 에이전트(100)에서 수행되는 기능(예를 들어, 호신호 수신음, 메시지 수신음 등)과 관련된 음향 신호를 출력하기도 한다. 이러한 음향 출력부(152)에는 리시버(receiver), 스피커(speaker), 버저(buzzer) 등이 포함될 수 있다.The sound output unit 152 may output audio data received from the wireless communication unit 110 or stored in the memory 170 in a call signal reception, a call mode or a recording mode, a voice recognition mode, a broadcast reception mode, and the like. The sound output unit 152 may also output a sound signal related to a function (eg, a call signal reception sound, a message reception sound, etc.) performed by the voice recognition agent 100. The sound output unit 152 may include a receiver, a speaker, a buzzer, and the like.

햅틱 모듈(haptic module)(153)은 사용자가 느낄 수 있는 다양한 촉각 효과를 발생시킨다. 햅틱 모듈(153)이 발생시키는 촉각 효과의 대표적인 예로는 진동이 될 수 있다. 햅틱 모듈(153)에서 발생하는 진동의 세기와 패턴 등은 사용자의 선택 또는 제어부의 설정에 의해 제어될 수 있다. 예를 들어, 상기햅틱 모듈(153)은 서로 다른 진동을 합성하여 출력하거나 순차적으로 출력할 수도 있다.The haptic module 153 generates various tactile effects that a user can feel. A representative example of the tactile effect generated by the haptic module 153 may be vibration. The intensity and pattern of vibration generated by the haptic module 153 may be controlled by the user's selection or the setting of the controller. For example, the haptic module 153 may synthesize different vibrations and output them or sequentially output them.

햅틱 모듈(153)은, 진동 외에도, 접촉 피부면에 대해 수직 운동하는 핀 배열, 분사구나 흡입구를 통한 공기의 분사력이나 흡입력, 피부 표면에 대한 스침, 전극(electrode)의 접촉, 정전기력 등의 자극에 의한 효과와, 흡열이나 발열 가능한 소자를 이용한 냉온감 재현에 의한 효과 등 다양한 촉각 효과를 발생시킬 수 있다.In addition to the vibration, the haptic module 153 may be used for stimulation such as a pin array vertically moving with respect to the contact skin surface, a jetting force or suction force of air through the injection or inlet, grazing to the skin surface, contact of an electrode, and electrostatic force. Various tactile effects can be generated, such as the effects of the heat-absorption and the reproduction of the sense of cold using the element capable of generating heat.

햅틱 모듈(153)은 직접적인 접촉을 통해 촉각 효과를 전달할 수 있을 뿐만 아니라, 사용자가 손가락이나 팔 등의 근 감각을 통해 촉각 효과를 느낄 수 있도록 구현할 수도 있다. 햅틱 모듈(153)은 음성인식 에이전트(100)의 구성 태양에 따라 2개 이상이 구비될 수 있다.The haptic module 153 may not only deliver a tactile effect through direct contact, but also may allow a user to feel the tactile effect through a muscle sense such as a finger or an arm. The haptic module 153 may be provided with two or more according to the configuration of the speech recognition agent 100.

광출력부(154)는 음성인식 에이전트(100)의 광원의 빛을 이용하여 이벤트 발생을 알리기 위한 신호를 출력한다. 음성인식 에이전트(100)에서 발생 되는 이벤트의 예로는 메시지 수신, 호 신호 수신, 부재중 전화, 알람, 일정 알림, 이메일 수신, 애플리케이션을 통한 정보 수신 등이 될 수 있다.The light output unit 154 outputs a signal for notifying occurrence of an event by using light of a light source of the voice recognition agent 100. Examples of events generated by the voice recognition agent 100 may be message reception, call signal reception, missed call, alarm, schedule notification, email reception, information reception through an application, and the like.

광출력부(154)가 출력하는 신호는 음성인식 에이전트가 전면이나 후면으로 단색이나 복수색의 빛을 발광함에 따라 구현된다. 상기 신호 출력은 음성인식 에이전트가 사용자의 이벤트확인을 감지함에 의하여 종료될 수 있다.The signal output from the light output unit 154 is implemented as the voice recognition agent emits light of a single color or a plurality of colors to the front or the rear. The signal output may be terminated by the voice recognition agent detecting the user's event confirmation.

인터페이스부(160)는 음성인식 에이전트(100)에 연결되는 모든 외부 기기와의 통로 역할을 한다. 인터페이스부(160)는 외부 기기로부터 데이터를 전송 받거나, 전원을 공급받아 음성인식 에이전트(100) 내부의 각 구성요소에 전달하거나, 음성인식 에이전트(100) 내부의 데이터가 외부 기기로 전송되도록 한다. 예를 들어, 유/무선 헤드셋 포트(port), 외부 충전기 포트(port), 유/무선 데이터 포트(port), 메모리 카드(memory card) 포트(port), 식별 모듈이 구비된 장치를 연결하는 포트(port), 오디오 I/O(Input/Output) 포트(port), 비디오 I/O(Input/Output) 포트(port), 이어폰 포트(port) 등이 인터페이스부(160)에 포함될 수 있다.The interface unit 160 serves as a path to all external devices connected to the voice recognition agent 100. The interface unit 160 receives data from an external device, receives power, transfers the power to each component inside the voice recognition agent 100, or transmits the data inside the voice recognition agent 100 to an external device. For example, a wired / wireless headset port, an external charger port, a wired / wireless data port, a memory card port, or a port that connects a device equipped with an identification module. The port, an audio input / output (I / O) port, a video input / output (I / O) port, an earphone port, and the like may be included in the interface unit 160.

한편, 식별 모듈은 음성인식 에이전트(100)의 사용 권한을 인증하기 위한 각종 정보를 저장한 칩으로서, 사용자 인증 모듈(user identify module; UIM), 가입자 인증 모듈(subscriber identity module; SIM), 범용 사용자 인증 모듈(universal subscriber identity module; USIM) 등을 포함할 수 있다. 식별 모듈이 구비된 장치(이하 '식별 장치')는, 스마트 카드(smart card) 형식으로 제작될 수 있다. 따라서 식별 장치는 상기 인터페이스부(160)를 통하여 단말기(100)와 연결될 수 있다.On the other hand, the identification module is a chip that stores a variety of information for authenticating the authority of the voice recognition agent 100, a user identification module (UIM), subscriber identity module (SIM), universal user A universal subscriber identity module (USIM) or the like. A device equipped with an identification module (hereinafter referred to as an 'identification device') may be manufactured in the form of a smart card. Therefore, the identification device may be connected to the terminal 100 through the interface unit 160.

또한, 상기 인터페이스부(160)는 음성인식 에이전트(100)가 외부 크래들(cradle)과 연결될 때 상기 크래들로부터의 전원이 상기 음성인식 에이전트(100)에 공급되는 통로가 되거나, 사용자에 의해 상기 크래들에서 입력되는 각종 명령 신호가 상기 음성인식 에이전트(100)로 전달되는 통로가 될 수 있다. 상기 크래들로부터 입력되는 각종 명령 신호 또는 상기 전원은 상기 음성인식 에이전트(100)가 상기 크래들에 정확히 장착되었음을 인지하기 위한 신호로 동작될 수 있다.In addition, the interface unit 160 may be a passage through which power from the cradle is supplied to the voice recognition agent 100 when the voice recognition agent 100 is connected to an external cradle, or by a user in the cradle. Various command signals inputted may be passages to the voice recognition agent 100. Various command signals or power input from the cradle may operate as signals for recognizing that the voice recognition agent 100 is correctly mounted on the cradle.

메모리(170)는 제어부(180)의 동작을 위한 프로그램을 저장할 수 있고, 입/출력되는 데이터들(예를 들어, 폰북, 메시지, 정지영상, 동영상 등)을 임시 저장할 수도 있다. 상기 메모리(170)는 상기 터치 스크린 상의 터치 입력시 출력되는 다양한 패턴의 진동 및 음향에 관한 데이터를 저장할 수 있다.The memory 170 may store a program for the operation of the controller 180 and may temporarily store input / output data (for example, a phone book, a message, a still image, a video, etc.). The memory 170 may store data relating to various patterns of vibration and sound output when a touch input on the touch screen is performed.

메모리(170)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), SSD 타입(Solid State Disk type), SDD 타입(Silicon Disk Drive type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(random access memory; RAM), SRAM(static random access memory), 롬(read-only memory; ROM), EEPROM(electrically erasable programmable read-only memory), PROM(programmable read-only memory), 자기 메모리, 자기 디스크 및 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 음성인식 에이전트(100)는 인터넷(internet)상에서 상기 메모리(170)의 저장 기능을 수행하는 웹 스토리지(web storage)와 관련되어 동작될 수도 있다.The memory 170 may include a flash memory type, a hard disk type, a solid state disk type, an SSD type, a silicon disk drive type, and a multimedia card micro type. ), Card type memory (e.g. SD or XD memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read It may include at least one type of storage medium of -only memory (PROM), programmable read-only memory (PROM), magnetic memory, magnetic disk and optical disk. The voice recognition agent 100 may be operated in connection with a web storage that performs a storage function of the memory 170 on the Internet.

한편, 앞서 살펴본 것과 같이, 제어부(180)는 응용 프로그램과 관련된 동작과, 통상적으로 음성인식 에이전트(100)의 전반적인 동작을 제어한다. 예를 들어, 제어부(180)는 상기 음성인식 에이전트의 상태가 설정된 조건을 만족하면, 애플리케이션들에 대한 사용자의 제어 명령의 입력을 제한하는 잠금 상태를 실행하거나, 해제할 수 있다. On the other hand, as described above, the controller 180 controls the operation related to the application program, and generally the overall operation of the voice recognition agent 100. For example, if the state of the voice recognition agent satisfies a set condition, the controller 180 may execute or release a lock state that restricts input of a user's control command to applications.

또한, 제어부(180)는 음성 통화, 데이터 통신, 화상 통화 등과 관련된 제어 및 처리를 수행하거나, 터치 스크린 상에서 행해지는 필기 입력 또는 그림 그리기 입력을 각각 문자 및 이미지로 인식할 수 있는 패턴 인식 처리를 행할 수 있다. 나아가 제어부(180)는 이하에서 설명되는 다양한 실시 예들을 본 발명에 따른 음성인식 에이전트(100) 상에서 구현하기 위하여, 위에서 살펴본 구성요소들을 중 어느 하나 또는 복수를 조합하여 제어할 수 있다.In addition, the controller 180 may perform control and processing related to a voice call, data communication, video call, or the like, or may perform pattern recognition processing for recognizing handwriting or drawing input performed on a touch screen as text and images, respectively. Can be. Furthermore, the controller 180 may control any one or a plurality of components described above in order to implement the various embodiments described below on the voice recognition agent 100 according to the present invention.

전원 공급부(190)는 제어부(180)의 제어에 의해 외부의 전원, 내부의 전원을 인가 받아 각 구성요소들의 동작에 필요한 전원을 공급한다. 전원공급부(190)는 배터리를 포함하며, 배터리는 충전 가능하도록 이루어지는 내장형 배터리가 될 수 있으며, 충전 등을 위하여 단말기 바디에 착탈 가능하게 결합될 수 있다.The power supply unit 190 receives an external power source and an internal power source under the control of the controller 180 to supply power for operation of each component. The power supply unit 190 includes a battery, and the battery may be a built-in battery configured to be rechargeable, and may be detachably coupled to the terminal body for charging.

또한, 전원공급부(190)는 연결포트를 구비할 수 있으며, 연결포트는 배터리의 충전을 위하여 전원을 공급하는 외부 충전기가 전기적으로 연결되는 인터페이스(160)의 일 예로서 구성될 수 있다.In addition, the power supply unit 190 may include a connection port, the connection port may be configured as an example of the interface 160 to which the external charger for supplying power for charging the battery is electrically connected.

다른 예로서, 전원공급부(190)는 상기 연결포트를 이용하지 않고 무선방식으로 배터리를 충전하도록 이루어질 수 있다. 이 경우에, 전원공급부(190)는외부의 무선 전력 전송장치로부터 자기 유도 현상에 기초한 유도 결합(Inductive Coupling) 방식이나 전자기적 공진 현상에 기초한 공진 결합(Magnetic Resonance Coupling) 방식 중 하나 이상을 이용하여 전력을 전달받을 수 있다.As another example, the power supply unit 190 may be configured to charge the battery in a wireless manner without using the connection port. In this case, the power supply unit 190 may use at least one of an inductive coupling based on a magnetic induction phenomenon or a magnetic resonance coupling based on an electromagnetic resonance phenomenon from an external wireless power transmitter. Power can be delivered.

한편, 이하에서 다양한 실시 예는 예를 들어, 소프트웨어, 하드웨어 또는 이들의 조합된 것을 이용하여 컴퓨터 또는 이와 유사한 장치로 읽을 수 있는 기록매체 내에서 구현될 수 있다.Meanwhile, various embodiments of the present disclosure may be implemented in a recording medium readable by a computer or a similar device using, for example, software, hardware, or a combination thereof.

한편 도 2에서 설명한 음성인식 에이전트(100)에 대한 설명은, 이동 단말기(300)에 그대로 적용될 수 있다.Meanwhile, the description of the voice recognition agent 100 described with reference to FIG. 2 may be applied to the mobile terminal 300 as it is.

본 명세서에서, 용어 메모리(170)는 용어 저장부(170)와 혼용되어 사용될 수 있다.In this specification, the term memory 170 may be used interchangeably with the term storage unit 170.

한편 제어부(180)는 인공지능부(130)의 제어하에, 이동 단말기(100)의 각 구성의 동작을 제어할 수 있다.The controller 180 may control the operation of each component of the mobile terminal 100 under the control of the artificial intelligence unit 130.

한편, 이동 단말기(100)의 입력부(120)는 센싱부(140)를 포함할 수 있으며, 센싱부(140)가 수행하는 모든 기능을 수행할 수 있다. 예를 들어 입력부(120)는 사용자 터치 입력을 감지할 수 있다.Meanwhile, the input unit 120 of the mobile terminal 100 may include the sensing unit 140, and may perform all the functions performed by the sensing unit 140. For example, the input unit 120 may detect a user touch input.

도 3은 본 발명의 실시 예에 따른, 인공지능 서버(200)의 구성을 나타내는 블록도이다.3 is a block diagram showing the configuration of the artificial intelligence server 200 according to an embodiment of the present invention.

통신부(210)는 외부 장치와 통신할 수 있다. The communicator 210 may communicate with an external device.

구체적으로 통신부(210)는 음성인식 에이전트(100)와 연결되어, 인공지능부(220)의 제어 하에, 음성 인식 에이전트(100)와 데이터를 송수신 할 수 있다.In detail, the communication unit 210 may be connected to the voice recognition agent 100 to transmit / receive data with the voice recognition agent 100 under the control of the artificial intelligence unit 220.

또한 통신부(210)는 이동 단말기(300)와 연결되어, 인공지능부(220)의 제어 하에, 이동 단말기(300)와 데이터를 송수신 할 수 있다.In addition, the communication unit 210 may be connected to the mobile terminal 300 to transmit / receive data with the mobile terminal 300 under the control of the artificial intelligence unit 220.

그리고 본 명세서에서, 인공지능 서버(200)에서 전송된 데이터가 최종적으로 이동 단말기(300)에 전송되는 경우, 이러한 데이터는 음성인식 에이전트(100)를 통하여 전송될 수도 있으며, 음성인식 에이전트(100)를 통하는 것 없이 이동 단말기(300)로 직접 전송될 수도 있다.And in the present specification, when the data transmitted from the artificial intelligence server 200 is finally transmitted to the mobile terminal 300, such data may be transmitted through the voice recognition agent 100, the voice recognition agent 100 It may be transmitted directly to the mobile terminal 300 without going through.

또한 이동 단말기(300)에서 전송된 데이터가 최종적으로 인공지능 서버(200)에 전송되는 경우, 이러한 데이터는 음성인식 에이전트(100)를 통하여 전송될 수도 있으며, 음성인식 에이전트(100)를 통하는 것 없이 인공지능 서버(200)로 직접 전송될 수도 있다.In addition, when the data transmitted from the mobile terminal 300 is finally transmitted to the AI server 200, such data may be transmitted through the voice recognition agent 100, without passing through the voice recognition agent 100 It may be sent directly to the artificial intelligence server 200.

인공지능부(220)는 통신부(210)를 통하여 음성인식 에이전트(100)로부터 음성 데이터를 수신할 수 있다.The artificial intelligence unit 220 may receive voice data from the voice recognition agent 100 through the communication unit 210.

또한 인공지능부(220)에 포함되는 음성 인식부(222)는 음성 인식 모델을 이용하여 음성 데이터에 대한 인식 결과를 출력하고, 출력된 인식 결과를 음성 인식 에이전트에 전송하거나, 출력된 인식 결과에 대응하는 제어 명령을 음성 인식 에이전트에 전송할 수 있다.In addition, the voice recognition unit 222 included in the artificial intelligence unit 220 outputs a recognition result for the voice data using a voice recognition model, and transmits the output recognition result to the voice recognition agent, or outputs the recognition result. The corresponding control command may be sent to the speech recognition agent.

또한 인공지능부(220)에 포함되는 음성 인식부(222)는 음성 데이터를 적응 학습(Adaptation Learning)하고, 학습 결과를 저장부(230) 내 음성 데이터 데이터베이스(232)에 저장할 수 있다.In addition, the voice recognition unit 222 included in the artificial intelligence unit 220 may adaptively learn the voice data, and store the learning result in the voice data database 232 in the storage unit 230.

또한 인공지능부(220)에 포함되는 음성 인식부(222)는 음성 데이터를 문장 또는 단어에 라벨링(labeling) 하여 음성 데이터 데이터베이스(232)에 저장할 수 있다.In addition, the voice recognition unit 222 included in the artificial intelligence unit 220 may label the voice data in a sentence or a word and store the voice data in the voice data database 232.

한편 인공지능부(220)는 음성 인식 모델을 이용하여 음성 신호를 분석하고 특징을 추출하여 인식 결과를 추출할 수 있다. 여기서 인식 결과는 수신된 음성 신호가 명령어인지 비 명령어인지, 또는 복수의 명령어 중 어느 명령어를 의미하는지 일 수 있다.Meanwhile, the artificial intelligence unit 220 may extract a recognition result by analyzing a speech signal and extracting a feature using a speech recognition model. Herein, the recognition result may be whether the received voice signal is a command or a non-command, or which command of a plurality of commands.

여기서 명령어란 음성인식 에이전트 또는 음성 인식 에이전트와 연결된 다른 장치가 특정 기능을 수행하도록 기 등록된 것일 수 있으며, 비 명령어란 특정 기능의 수행과는 관계 없는 것일 수 있다.In this case, the command may be a voice recognition agent or another device connected to the voice recognition agent may be pre-registered to perform a specific function, and the non-command may be irrelevant to the performance of the specific function.

한편 인공지능부(220)에 포함되는 문장 추천부(221)는 음성 특징 분석 모델을 이용하여 음성 데이터의 특성을 분석할 수 있다.On the other hand, the sentence recommender 221 included in the artificial intelligence unit 220 may analyze the characteristics of the voice data using the speech feature analysis model.

한편 저장부(230) 내 문장 데이터베이스(231)는 카테고리화 된 복수의 문장을 보유할 수 있다.The sentence database 231 in the storage unit 230 may hold a plurality of categorized sentences.

그리고 인공지능부(220)에 포함되는 문장 추천부(221)는 음성 데이터의 특성에 기초하여, 문장 데이터베이스(231)가 보유하는 복수의 문장 중 음성 데이터의 특성에 대응하는 특정 문장을 검색하고, 검색된 특정 문장을 음성 인식 에이전트에 전송할 수 있다.The sentence recommender 221 included in the artificial intelligence unit 220 searches for a specific sentence corresponding to the characteristic of the speech data among the plurality of sentences held by the sentence database 231 based on the characteristic of the speech data, The searched specific sentence may be transmitted to the speech recognition agent.

한편 본 도면에서는 문장 추천부(221), 음성 인식부(222), 문장 데이터 베이스(231), 음성 데이터 베이스(232)가 하나의 서버를 구성하는 것으로 설명하였으나 이에 한정되지 않으며, 다양한 조합으로 구성될 수 있다.Meanwhile, in this drawing, the sentence recommendation unit 221, the voice recognition unit 222, the sentence database 231, and the voice database 232 are described as configuring one server, but the present disclosure is not limited thereto. Can be.

예를 들어 문장 추천부(221)와 문장 데이터베이스(231)가 제1 서버를 구성할 수 있으며, 음성 인식부(222)와 음성 데이터베이스(232)는 제2 서버를 구성할 수도 있다. 이 경우 제1 서버와 제2 서버는 서로 데이터를 송수신할 수 있다.For example, the sentence recommender 221 and the sentence database 231 may constitute a first server, and the speech recognizer 222 and the speech database 232 may constitute a second server. In this case, the first server and the second server may transmit and receive data with each other.

도 4는 음성 인식 시스템에서 발생할 수 있는 문제점을 설명하기 위한 도면이다.4 is a diagram illustrating a problem that may occur in the speech recognition system.

기존 제품들은, 여러 사용자들로부터 데이터를 수집하여 클라우드에 모인 빅 데이터를 기반으로 음성 인식 모델을 재학습시키고, 음성인식 소프트웨어를 업그레이드 하는 방식으로 음성 인식 모델의 성능을 개선한다.Existing products improve the performance of speech recognition models by collecting data from multiple users, re-learning speech recognition models based on big data gathered in the cloud, and upgrading speech recognition software.

다만 사람의 음성/음색은 너무도 다양하기 때문에 인식률을 높이기 위해서는 음성 인식 모델을 특정 사용자에 맞게 최적화시켜 학습하여야 한다.However, since the human voice / voice is so diverse, it is necessary to optimize the speech recognition model for specific users in order to increase the recognition rate.

이러한 최적화 과정이 존재하지 않는 경우, 도 4에서 도시하는 바와 같이 인식 실패가 반복됨에 따라, 제품 및 브랜드에 부정적인 영향을 줄 수 있다.If this optimization process does not exist, as recognition failure is repeated as shown in FIG. 4, the product and the brand may be negatively affected.

따라서 음성 인식 에이전트를 사용하는 사용자가 직접 자신의 음성을 학습시키는 것이 필요하다.Therefore, it is necessary for a user who uses a speech recognition agent to learn his or her own voice.

도 5는 본 발명의 실시 예에 따른, 사용자에게 추가 학습을 위한 추가 데이터를 요청하는 방법을 설명하기 위한 도면이다.5 is a diagram for describing a method of requesting additional data for further learning from a user according to an exemplary embodiment of the present invention.

음성 인식 에이전트(100)는 사용자로부터 음성 데이터를 수신할 수 있다(S505).The voice recognition agent 100 may receive voice data from the user (S505).

또한 음성 인식 에이전트(100)는 수신한 음성 데이터를 인공지능 서버로 전송할 수 있다(S510).In addition, the voice recognition agent 100 may transmit the received voice data to the artificial intelligence server (S510).

한편 인공지능 서버(200)는 음성 데이터를 수신하고, 수신한 음성 데이터를 음성 인식 모델에 입력하여 음성 데이터에 대한 음성 인식률 및 인식 결과 중 적어도 하나를 출력할 수 있다(S515).Meanwhile, the artificial intelligence server 200 may receive voice data, input the received voice data into a voice recognition model, and output at least one of a voice recognition rate and a recognition result for the voice data (S515).

여기서 음성 인식률은 음성에 대한 신뢰 점수(Confidence Score)의 비교로 측정될 수 있다.Here, the speech recognition rate may be measured by comparing a confidence score with respect to the voice.

구체적으로 인공지능 서버(200)는 제조과정에서 학습된 테스트 데이터 또는 현재 개인화된 음성 데이터로 추출한 신뢰 점수(Confidence Score)의 평균 대비 사용자의 음성 데이터의 신뢰 점수(Confidence Score)를 산출할 수 있다.In detail, the artificial intelligence server 200 may calculate a confidence score of the voice data of the user compared to an average of confidence scores extracted from the test data learned in the manufacturing process or the current personalized voice data.

예를 들어, 특정 명령어 또는 기동어에 대하여 기 학습된 음성 데이터 들의 신뢰 점수(Confidence Score)의 평균이 70.02이고, 특정 사용자가 발화한 음성 데이터의 신뢰 점수(Confidence Score)가 52.13인 경우, 인식률은 약 74퍼센트로 산출될 수 있다.For example, if the average of the confidence score of the previously learned voice data for a specific command or a start word is 70.02, and the confidence score of speech data spoken by a specific user is 52.13, the recognition rate is It can be calculated at about 74 percent.

또한 인식률은, 샘플과의 오차를 비교 후 평균값을 획득하는 방식으로 획득될 수 있다.In addition, the recognition rate may be obtained by comparing an error with a sample and obtaining an average value.

예를 들어 특정 명령어 또는 기동어에 대하여 기 학습된 음성 데이터들 중 특정 개수의 샘플을 추출하고, 특정 사용자가 발화한 음성 데이터와 샘플들 간의 평균 제곱 오차(MSE) 또는 제곱근 평균 제곱 오차(RMSE)를 산출함으로써, 사용자의 음성 데이터에 대한 인식률을 산출할 수 있다.For example, a certain number of samples of speech data that have been previously learned for a specific command or a starting word are extracted, and a mean square error (MSE) or a root mean square error (RMSE) between the speech data and samples spoken by a specific user is extracted. By calculating, the recognition rate for the voice data of the user can be calculated.

한편 인공지능 서버(200)는 획득한 음성 인식률을 음성 에이전트(100)에 전송할 수 있다(S520).Meanwhile, the artificial intelligence server 200 may transmit the acquired speech recognition rate to the speech agent 100 (S520).

한편 음성 인식 에이전트(100)는 음성 데이터에 대한 음성 인식률을 수신하고, 음성 인식률이 기 설정된 기준보다 낮은 경우 사용자의 음성을 학습하기 위한 추가 데이터를 사용자에게 요청할 수 있다.Meanwhile, the voice recognition agent 100 may receive a voice recognition rate for the voice data, and if the voice recognition rate is lower than a preset criterion, the voice recognition agent 100 may request the user for additional data for learning the voice of the user.

구체적으로 음성 인식 에이전트(100)는 추가 데이터의 확보를 위하여 음성 인식 모델을 추가 학습의 문의를 출력할 수 있다(S525). 이 경우 음성 인식 에이전트(100)는 사용자의 음성 데이터에 대한 음성 인식률을 함께 출력할 수 있다.In more detail, the speech recognition agent 100 may output a query for further learning in the speech recognition model in order to secure additional data (S525). In this case, the voice recognition agent 100 may output a voice recognition rate with respect to the voice data of the user.

예를 들어 음성 인식 에이전트(100)는 “음성 인식률을 파악한 결과 고객님의 음성에 대한 저의 인식률은 60% 정도입니다. 제 음성 인식 기능을 고객님의 음성에 맞게 최적화 하시겠습니까?”라는 음성 메시지를 출력할 수 있다.For example, the Speech Recognition Agent (100) said, “As I understand the speech recognition rate, my recognition rate for your voice is about 60%. Would you like to optimize my speech recognition for your own voice? ”

한편 추가 학습에 대한 수락의 입력이 수신되면, 음성인식 에이전트(100)는 추가 학습을 위한 복수의 옵션을 제공할 수 있다(S530).On the other hand, if the input of the acceptance of the additional learning is received, the speech recognition agent 100 may provide a plurality of options for further learning (S530).

구체적으로 음성인식 에이전트는 제시 음성 따라 말하기의 제1 옵션, 제시 문장 따라 말하기의 제2 옵션 및 직접 문장 작성하고 따라 말하기의 제3 옵션을 사용자에게 제공할 수 있다.In more detail, the voice recognition agent may provide a user with a first option of speaking according to the presentation voice, a second option of speaking according to the presented sentence, and a third option of directly writing and speaking the sentence.

한편 사용자로부터 특정 옵션을 선택하는 입력이 수신(S535)되는 경우, 음성 인식 에이전트는 선택된 옵션에 대응하는 추가 데이터를 사용자에게 요청할 수 있다.Meanwhile, when an input for selecting a specific option is received from the user (S535), the voice recognition agent may request additional data corresponding to the selected option from the user.

도 6은 본 발명의 실시 예에 따른, 1번 옵션 또는 2번 옵션이 선택된 경우의 동작 방법을 설명하기 위한 도면이다.FIG. 6 is a diagram for describing an operating method when option 1 or option 2 is selected according to an embodiment of the present disclosure.

음성 인식 에이전트(100)는 추가 학습을 위한 문장의 요청을 인공지능 서버(200)에 전송할 수 있다(S605).The speech recognition agent 100 may transmit a request for a sentence for further learning to the artificial intelligence server 200 (S605).

한편 문장의 요청이 수신되면(S610), 인공지능 서버(200)는 음성 데이터의 특성을 분석할 수 있다(S615). Meanwhile, when a request for a sentence is received (S610), the artificial intelligence server 200 may analyze characteristics of voice data (S615).

또한 인공지능 서버(200)는 음성 데이터의 특성에 기초하여 복수의 문장 중 음성 데이터의 특성에 대응하는 특정 문장을 검색할 수 있다(S620).In addition, the artificial intelligence server 200 may search for a specific sentence corresponding to the characteristic of the voice data among the plurality of sentences based on the characteristic of the voice data (S620).

구체적으로 문장 데이터베이스(231)에는 복수의 문장들이 저장될 수 있으며, 복수의 문장은 카테고리 별로 분류되어 있을 수 있다. 여기서 카테고리는 제품 기능, 국가, 지역, 억양, 나이, 성별 및 외래어 중 적어도 하나를 포함할 수 있다.In more detail, a plurality of sentences may be stored in the sentence database 231, and the plurality of sentences may be classified by category. Here, the category may include at least one of product function, country, region, intonation, age, gender, and foreign language.

또한 인공지능 서버(200)는 사용자의 음성 데이터에 포함되는 단어들의 인식률을 산출할 수 있다.In addition, the artificial intelligence server 200 may calculate a recognition rate of words included in the voice data of the user.

예를 들어 도 7을 참고하면, 사용자가 “Can you tell me how many water bottle do we have?”라는 문장을 발화한 경우, 인공지능 서버(200)는 문장에 포함되는 단어 단위로 신뢰 점수를 산출하고, 신뢰 점수가 기 설정된 기준보다 낮은 특정 단어(water, bottle)를 획득할 수 있다. For example, referring to FIG. 7, when the user utters the sentence “Can you tell me how many water bottle do we have?”, The artificial intelligence server 200 calculates a confidence score in units of words included in the sentence. In addition, a specific score (water, bottle) having a confidence score lower than a predetermined criterion may be obtained.

그리고 인공지능 서버는 사용자의 음성 데이터에 포함되는 단어들의 인식률 및 단어의 특성에 기초하여, 음성 데이터의 특성을 획득할 수 있다.The artificial intelligence server may acquire the characteristics of the voice data based on the recognition rate of the words included in the voice data of the user and the characteristics of the words.

예를 들어 특정 단어(water, bottle)의 인식률이 낮고, 특정 단어(water, bottle)들은 미국식 영어와 영국식 영어가 다르게 발음되는 특성을 가지고 있는 경우, 인공지능 서버는 사용자의 출신 국가가 미국인지 또는 영국인지에 따라 다르게 발음되는 단어의 인식률이 낮다는 음성 데이터의 특성을 획득할 수 있다.For example, if the recognition rate of certain words (water, bottle) is low, and certain words (water, bottle) have the characteristic that American English and British English are pronounced differently, the AI server can determine whether the user's country of origin is American or not. It is possible to acquire the characteristics of the voice data that the recognition rate of the words which are pronounced differently depending on whether the UK is low.

이 경우 인공지능 서버는 음성 데이터의 특성에 기초하여, 복수의 카테고리 중 국가 카테고리에 대한 추가적인 학습이 필요한 것으로 결정할 수 있다.In this case, the artificial intelligence server may determine that additional learning is required for a country category among the plurality of categories based on the characteristics of the voice data.

그리고 인공지능 서버는, 음성 데이터의 특성에 기초하여, 복수의 카테고리 중 사용자에게 추가적인 학습이 요구되는 카테고리에 포함되는 특정 문장을 획득할 수 있다.The artificial intelligence server may acquire a specific sentence included in a category for which additional learning is required of the user from among a plurality of categories based on the characteristics of the voice data.

예를 들어 사용자의 출신 국가를 구분할 수 있는 단어들을 포함하는 복수의 문장이 국가 카테고리로 분류되어 있을 수 있다. 그리고 인공지능 서버는 복수의 문장 중 영국식 영어와 미국식 영어를 구분하여 학습할 수 있는 단어를 포함하는 문장을 획득할 수 있다.For example, a plurality of sentences including words that can distinguish a country of origin of a user may be classified into a country category. The AI server may acquire a sentence including a word that can be distinguished from the English and the American English among the plurality of sentences.

예를 들어 “schedule”은 미국식 영어와 영국식 영어에서 서로 다르게 발음되는 특성을 가질 수 있다. 따라서 인공지능 서버는 국가 카테고리에서 “Can you tell me my schedule of today?”라는 문장을 획득할 수 있다. For example, "schedule" may have a characteristic that is pronounced differently in American English and British English. Therefore, the AI server can obtain the sentence “Can you tell me my schedule of today?” In the country category.

다른 예를 들어 “water”, “bottle”은 미국식 영어와 영국식 영어에서 서로 다르게 발음되는 특성을 가질 수 있다. 따라서 인공지능 서버는 국가 카테고리에서 “Can you tell me how many water bottle do we have?” 라는 문장을 획득할 수 있다.For example, "water" and "bottle" may have different characteristics in American English and British English. Therefore, the AI server can obtain the sentence “Can you tell me how many water bottle do we have?” In the country category.

즉 획득되는 문장에 포함되는 단어는, 뜻과 문자는 동일하나, 다양의 발음 또는 다양한 억양으로 발음될 수 있는 단어일 수 있다.That is, the words included in the acquired sentence may be words that have the same meaning and letter but may be pronounced by various pronunciations or various intonations.

또한 특정 카테고리에 대응하는 문장에 포함되는 단어는, 뜻과 문자는 동일하나, 카테고리의 특성에 따라(국가에 따라, 지역에 따라 등) 다양한 발음 또는 다양한 억양으로 발음될 수 있는 단어일 수 있다.In addition, a word included in a sentence corresponding to a specific category has the same meaning and letter, but may be a word that may be pronounced with various pronunciations or various accents, depending on the characteristics of the category (depending on the country, region, etc.).

다른 예를 들어, 사용자는 “조용한 음악 좀 틀어줄래?”라는 의도를 가졌으나, 사용자가 특정 지역(경상도) 출신임에 따라 “조용한 엄악 좀 틀어줄래?”라는 문장을 발화하였다.In another example, the user intended to "play quiet music?", But as the user came from a certain area (Gyeongsang-do), she uttered the phrase "can you play quiet music?"

이 경우 인공지능 서버(200)는 사용자의 음성 데이터에 포함되는 단어들의 인식률을 산출하고, 인식률이 기 설정된 기준보다 낮은 특정 단어(음악)을 획득할 수 있다.In this case, the artificial intelligence server 200 may calculate a recognition rate of words included in the voice data of the user, and may acquire a specific word (music) having a recognition rate lower than a preset criterion.

예를 들어 특정 단어(음악)의 인식률이 낮고, 특정 단어(음악)는 특정 지역(경상도)에서 특이하게 발음되는 특성을 가지는 경우, 인공지능 서버는 경상도에서 다르게 발음되는 단어의 인식률이 낮다는 음성 데이터의 특성을 획득할 수 있다.For example, if the recognition rate of a specific word (music) is low, and the specific word (music) has a characteristic of being pronounced unusually in a specific region (ordinary degree), the artificial intelligence server has a low voice recognition rate of a differently pronounced word in the current state The characteristics of the data can be obtained.

이 경우 인공지능 서버는 음성 데이터의 특성에 기초하여, 복수의 카테고리 중 지역 카테고리에 대한 추가적인 학습이 필요한 것으로 결정할 수 있다.In this case, the artificial intelligence server may determine that additional learning is needed for a local category among the plurality of categories based on the characteristics of the voice data.

예를 들어 사용자의 출신 지역을 구분할 수 있는 단어들을 포함하는 복수의 문장이 지역 카테고리로 분류되어 있을 수 있다. 그리고 인공지능 서버는 복수의 문장 중 경상도 지역 출신임을 학습할 수 있는 단어를 포함하는 문장을 획득할 수 있다.For example, a plurality of sentences including words that can distinguish a region of a user may be classified into a region category. The artificial intelligence server may acquire a sentence including a word for learning that it is from a Gyeongsang province region among the plurality of sentences.

예를 들어 “쌀”은 경상도에서 ‘살’로 발음되는 특성을 가질 수 있다. 따라서 인공지능 서버는 지역 카테고리에서, “집에 쌀이 얼마나 남아있지?”라는 문장을 획득할 수 있다.For example, “rice” may have the characteristic of being pronounced “sal” in Gyeongsang-do. Thus, the AI server can obtain a sentence from the local category, "How much rice is left in the house?"

즉 지역 카테고리에 대응하는 문장에 포함되는 단어는, 뜻과 문자는 동일하나, 지역에 따라 다양한 발음 또는 다양한 억양으로 발음될 수 있는 단어일 수 있다.That is, a word included in a sentence corresponding to a local category may be a word that has the same meaning and letter but may be pronounced by various pronunciations or various intonations depending on the region.

이 밖에도, 제품 기능과 관련된 문장은 제품 기능 카테고리로 분류될 수 있다. 이 경우 제품 기능과 관련된 문장은, 음성 인식 에이전트나 음성 인식 에이전트와 연동된 다른 장치에서 수행하는 기능에 대응하는 명령어를 포함할 수 있다.In addition, sentences related to product functions may be classified into product function categories. In this case, the sentence related to the product function may include a command corresponding to a function performed by the speech recognition agent or another device linked with the speech recognition agent.

예를 들어 “세탁기 탈수 몇분 남았는지 알려줄래?”, “에어컨 온도를 24도로 높여줄래?”등의 문장은 제품 기능 카테고리로 분류될 수 있다.For example, "Can you tell me how many minutes of washing machine dehydration is left?" And "Would you like to raise the air conditioning temperature to 24 degrees?"

그리고, 사용자의 음성 데이터가 명령어에 대한 인식률이 낮은 특성을 가지고 있는 경우, 인공지능 서버는 문장은 제품 기능 카테고리에서 문장을 추출할 수 있다.When the voice data of the user has a low recognition rate for the command, the AI server may extract a sentence from a product function category.

또한 나이 카테고리에 대응하는 문장에 포함되는 단어는, 뜻과 문자는 동일하나, 나이에 따라 다양한 발음 또는 다양한 억양으로 발음될 수 있는 단어일 수 있다.In addition, a word included in a sentence corresponding to an age category has the same meaning and letter, but may be a word that may be pronounced by various pronunciations or various intonations according to age.

또한 성별 카테고리에 대응하는 문장에 포함되는 단어는, 뜻과 문자는 동일하나, 성별에 따라 다양한 발음 또는 다양한 억양으로 발음될 수 있는 단어일 수 있다.In addition, a word included in a sentence corresponding to a gender category may be a word that has the same meaning and letter but may be pronounced by various pronunciations or various intonations according to gender.

또한 사투리 카테고리에 대응하는 문장에 포함되는 단어는, 뜻과 문자는 동일하나, 사투리에 의해 다양한 발음 또는 다양한 억양으로 발음될 수 있는 단어일 수 있다.In addition, a word included in a sentence corresponding to a dialect category may be a word that has the same meaning and letter but may be pronounced by various dialects or various intonations by the dialect.

또한 외래어 카테고리에 대응하는 문장에 포함되는 단어는, 뜻과 문자는 동일하나, 외래어에 의해 다양한 발음 또는 다양한 억양으로 발음될 수 있는 단어일 수 있다.In addition, a word included in a sentence corresponding to a foreign language category may be a word having the same meaning and letter but may be pronounced by various pronunciations or various intonations by the foreign language.

한편 인공지능 서버(200)는 음성 데이터로부터 특성을 추출하는 것 외에도, 사용자가 기 등록한 개인 정보에 기초하여 음성 데이터의 특성을 획득할 수 있다.Meanwhile, the artificial intelligence server 200 may acquire characteristics of the voice data based on personal information registered by the user in addition to extracting the characteristics from the voice data.

예를 들어 사용자는 국가, 성별, 나이, 지역, 사투리 등의 개인 정보를 등록할 수 있다. 그리고 사용자가 출신 국가가 영국이라는 개인 정보를 등록한 경우, 인공지능 서버는 국가 카테고리에 대한 추가적인 학습이 필요한 것으로 결정하고, 영국식 영어와 미국식 영어를 구분하여 학습할 수 있는 단어를 포함하는 문장을 획득할 수 있다.For example, a user can register personal information such as country, gender, age, region, dialect. If the user registers personal information that the country of origin is the United Kingdom, the artificial intelligence server determines that additional learning is required for the country category, and obtains a sentence containing words that can be distinguished between English English and American English. Can be.

한편 인공지능 서버에 의해 획득되는 특정 문장은 음성인식 에이전트의 기능에 대응하는 명령어를 포함할 수 있다.Meanwhile, the specific sentence acquired by the artificial intelligence server may include a command corresponding to the function of the voice recognition agent.

여기서 음성 인식 에이전트의 기능은, 음성 인식 에이전트 자체에서 제공하는 기능뿐만 아니라 음성인식 에이전트와 연동하는 장치에서 제공하는 기능을 포함할 수 있다.Here, the function of the speech recognition agent may include a function provided by a device interoperating with the speech recognition agent as well as the function provided by the speech recognition agent itself.

그리고 특정 문장은 사용자의 국가, 지역, 나이 등을 학습할 수 있는 단어를 포함할 뿐만 아니라, 사용자가 직접적으로 발화할 명령어를 포함함으로써, 인공지능 서버는 명령어에 대응하는 음성 데이터를 수집할 수 있다.In addition, the specific sentence not only includes words for learning the user's country, region, age, etc., but also includes a command to be directly uttered by the user, so that the AI server can collect voice data corresponding to the command. .

한편 인공지능 서버에 의해 획득되는 특정 문장은 음성인식 에이전트를 호출하는 기동어를 포함할 수 있다.Meanwhile, the specific sentence obtained by the artificial intelligence server may include a startup word for calling the voice recognition agent.

그리고 인공지능 서버는 특정 문장에 대응하여 사용자가 발화한 제2 음성 데이터 중 기동어만을 별도로 추출하여 학습하는 방식으로, 기동어에 대한 인식률을 향상시킬 수 있다.The artificial intelligence server may improve the recognition rate of the starting word by extracting and learning only the starting word from the second voice data spoken by the user in response to a specific sentence.

한편 인공지능 서버는 획득된 특정 문장을 음성인식 에이전트에 전송할 수 있다(S625).Meanwhile, the artificial intelligence server may transmit the acquired specific sentence to the voice recognition agent (S625).

한편 추가 학습이 필요한 경우 특정 문장을 전송하는 과정은, S520 내지 S535, 그리고 S605를 생략하고 진행될 수 있다.Meanwhile, when additional learning is required, the process of transmitting a specific sentence may be performed by omitting S520 through S535 and S605.

구체적으로 음성 데이터가 수신된 경우, 인공지능 서버(200)는 음성 데이터의 특성을 분석하고 음성 데이터의 인식률을 획득할 수 있다. 또한 인공지능 서버(200)는, 음성 데이터의 인식률이 기 설정된 기준보다 낮은 경우, 음성 데이터의 특성에 대응하는 특정 문장을 검색하여 음성인식 에이전트(100)에 전송할 수 있다.In detail, when voice data is received, the artificial intelligence server 200 may analyze characteristics of the voice data and obtain a recognition rate of the voice data. In addition, if the recognition rate of the voice data is lower than the preset reference, the artificial intelligence server 200 may search for a specific sentence corresponding to the characteristic of the voice data and transmit the same to the voice recognition agent 100.

한편 음성인식 에이전트(100)는 수신한 특정 문장을 출력할 수 있다(S630).Meanwhile, the voice recognition agent 100 may output the received specific sentence (S630).

구체적으로 도 8에서 도시하는 바와 같이, 사용자가 제시 음성 따라 말하기의 제1 옵션을 선택한 경우, 음성인식 에이전트는 수신한 특정 문장을 음성 신호로써 출력할 수 있다.In detail, as illustrated in FIG. 8, when the user selects the first option of speaking according to the presented speech, the speech recognition agent may output the received specific sentence as a speech signal.

또한 사용자가 제시 문장 따라 말하기의 제2 옵션을 선택한 경우, 도 9에서 도시하는 바와 같이, 음성인식 에이전트는 사용자의 이동 단말기(300)에 특정 문장을 전송할 수 있다.In addition, when the user selects the second option of speaking according to the presented sentence, as shown in FIG. 9, the voice recognition agent may transmit a specific sentence to the user's mobile terminal 300.

이 경우 사용자의 이동 단말기(300)는 특정 문장에 대응하는 텍스트를 디스플레이 할 수 있다.In this case, the mobile terminal 300 of the user may display text corresponding to a specific sentence.

한편 사용자가 특정 문장을 발화하는 경우, 음성인식 에이전트는 발화된 특정 문장에 대응하는 제2 음성 데이터를 수신(S635)하고 수신된 제2 음성 데이터를 인공지능 서버(200)에 전송할 수 있다(S640).Meanwhile, when the user speaks a specific sentence, the voice recognition agent may receive second voice data corresponding to the specific sentence spoken (S635) and transmit the received second voice data to the artificial intelligence server 200 (S640). ).

한편 특정 문장에 대응하는 제2 음성 데이터가 수신되면, 인공지능 서버(200)는 특정 문장에 대응하는 제2 음성 데이터를 학습할 수 있다(S645).On the other hand, when the second voice data corresponding to the specific sentence is received, the artificial intelligence server 200 may learn the second voice data corresponding to the specific sentence (S645).

인공지능 서버는 제2 음성 데이터를 학습하기 전의 음성 데이터를 보유할 수 있다. 그리고 제2 음성데이터가 수신되는 경우, 제2 음성 데이터를 학습하기 전의 음성 데이터를 소스 데이터로, 제2 음성 데이터를 타겟 데이터로 하고, 소스 데이터를 타겟 데이터에 맞게 적응학습을 시킬 수 있다.The artificial intelligence server may retain the voice data before learning the second voice data. When the second voice data is received, the voice data before learning the second voice data may be used as the source data, the second voice data may be the target data, and the adaptive data may be adaptively matched to the target data.

또한 인공지능 서버는 특정 문장을 제2 음성 데이터에 라벨링 하여 음성 데이터베이스(232)에 저장할 수 있다. 여기서 음성 데이터베이스(232)는 특정 사용자에 개인화된 데이터베이스로써, 특정 사용자의 음성을 인식하는데 사용될 수 있다.In addition, the artificial intelligence server may label the specific sentence on the second voice data and store it in the voice database 232. Here, the voice database 232 is a database personalized to a specific user and may be used to recognize a voice of a specific user.

이 경우 학습 결과를 반영하여 음성인식 모델이 업데이트 될 수 있다. 그리고 인공지능 서버는 추가 데이터를 학습한 결과에 따라 변화된 음성 인식률을 전송할 수 있다(S650).In this case, the speech recognition model may be updated to reflect the learning results. The artificial intelligence server may transmit the changed speech recognition rate according to the result of learning the additional data (S650).

구체적으로 인공지능 서버는 510 단계에서 수신된 음성 데이터를 업데이트된 음성 인식 모델에 다시 입력할 수 있으며, 이에 따른 인식률을 산출하여 음성인식 에이전트에 전송할 수 있다.In more detail, the artificial intelligence server may re-input the voice data received in operation 510 into the updated voice recognition model, calculate the recognition rate, and transmit the received voice data to the voice recognition agent.

한편 변경된 인식률이 수신되면, 음성인식 에이전트는 변경된 인식률을 출력할 수 있다(S655).Meanwhile, when the changed recognition rate is received, the voice recognition agent may output the changed recognition rate (S655).

예를 들어 음성인식 에이전트는 “고객님께서 제공해주신 음성데이터를 기반으로 제 알고리즘을 학습한 결과, 학습률이 60%에서 70%로 향상되었습니다”라는 메시지를 출력할 수 있다.For example, the voice recognition agent can output the message, "As a result of learning my algorithm based on the voice data provided by the customer, the learning rate improved from 60% to 70%."

한편 도 10을 참고하여 사용자가 직접 문장 작성하고 따라 말하기의 제3 옵션을 선택한 경우의 실시 예를 설명한다.On the other hand, with reference to FIG. 10 will be described an embodiment when the user writes a sentence directly and selected the third option of speaking.

사용자가 특정 텍스트 및 특정 텍스트에 대응하는 제3 음성 데이터를 입력하면, 음성 인식 에이전트는 특정 텍스트 및 특정 텍스트에 대응하는 제3 음성 데이터를 인공지능 서버에 전송할 수 있다.When the user inputs the specific text and the third voice data corresponding to the specific text, the speech recognition agent may transmit the specific text and the third voice data corresponding to the specific text to the artificial intelligence server.

구체적으로 이동 단말기(300) 및 음성 인식 에이전트(100) 중 적어도 하나는, 사용자의 텍스트 입력 및 입력된 텍스트에 대응하는 음성 데이터를 수신할 수 있다.In more detail, at least one of the mobile terminal 300 and the voice recognition agent 100 may receive text input of the user and voice data corresponding to the input text.

이 경우 음성 에이전트는 수신된 텍스트 및 텍스트에 대응하는 음성 데이터를 인공지능 서버에 전송할 수 있다.In this case, the voice agent may transmit the received text and the voice data corresponding to the text to the artificial intelligence server.

이 경우 인공지능 서버는 특정 텍스트에 대응하는 제3 음성 데이터를 학습할 수 있다. In this case, the artificial intelligence server may learn third voice data corresponding to a specific text.

구체적으로 인공지능 서버는, 텍스트 내에 포함되는 단어 및 단어에 대응하는 음성 데이터를 결정할 수 있다. 그리고 인공지능 서버는 단어에 대응하는 음성 데이터를 학습할 수 있다.In detail, the artificial intelligence server may determine words included in the text and voice data corresponding to the words. The AI server may learn voice data corresponding to a word.

도 11은 본 발명의 또 다른 실시 예에 따른, 사용자에게 추가 학습을 위한 추가 데이터를 요청하는 방법을 설명하기 위한 도면이다.FIG. 11 is a diagram for describing a method of requesting additional data for additional learning from a user according to another exemplary embodiment.

음성 인식 에이전트(100)는 사용자로부터 음성 데이터를 수신할 수 있다(S1105).The voice recognition agent 100 may receive voice data from the user (S1105).

또한 음성 인식 에이전트(100)는 수신한 음성 데이터를 인공지능 서버로 전송할 수 있다(S1110).In addition, the voice recognition agent 100 may transmit the received voice data to the artificial intelligence server (S1110).

한편 인공지능 서버(200)는 음성 데이터를 수신하고, 수신한 음성 데이터를 저장부에 저장할 수 있다(S1115).Meanwhile, the artificial intelligence server 200 may receive voice data and store the received voice data in the storage unit (S1115).

또한 인공지능 서버(200)는 음성 데이터를 음성 인식 모델에 입력하여 음성 데이터에 대한 음성 인식률 및 인식 결과 중 적어도 하나를 출력할 수 있다(S1120).In addition, the artificial intelligence server 200 may input voice data into a voice recognition model to output at least one of a voice recognition rate and a recognition result for the voice data (S1120).

한편 인공지능 서버(200)는 획득한 음성 인식률을 음성 에이전트(100)에 전송할 수 있다(S1125).Meanwhile, the artificial intelligence server 200 may transmit the acquired voice recognition rate to the voice agent 100 (S1125).

구체적으로 도 12에서 도시하는 바와 같이, 음성 인식률이 기 설정된 기준보다 낮은 경우, 음성 인식 에이전트(100)는 기 수신한 음성 데이터에 대응하는 텍스트의 입력의 요청을 이동 단말기(300)에 전송할 수 있다(S1130).In detail, as illustrated in FIG. 12, when the speech recognition rate is lower than the preset reference, the speech recognition agent 100 may transmit a request for text input corresponding to the received speech data to the mobile terminal 300. (S1130).

한편 이동 단말기(300)는 사용자로부터, 앞서 사용자가 발화한 음성 데이터에 대응하는 텍스트의 입력을 수신하고(S1135), 수신한 텍스트를 음성 인식 에이전트에 전송할 수 있다(S1135).Meanwhile, the mobile terminal 300 may receive an input of text corresponding to the voice data uttered by the user from the user (S1135) and transmit the received text to the voice recognition agent (S1135).

이 경우 음성인식 에이전트(100)는 수신한 텍스트를 인공지능 서버(200)에 전송할 수 있다(S1140).In this case, the voice recognition agent 100 may transmit the received text to the artificial intelligence server 200 (S1140).

한편 이동 단말기(300)에서 전송한 텍스트가 음성 인식 에이전트를 통하여 인공지능 서버로 전송되는 것으로 설명하였으나 이에 한정되지 않으며, 이동 단말기(300)는 인공지능 서버로 텍스트를 직접 전송할 수 있다.Meanwhile, the text transmitted from the mobile terminal 300 is transmitted to the AI server through the voice recognition agent, but the present invention is not limited thereto. The mobile terminal 300 may directly transmit the text to the AI server.

이 경우 인공지능 서버는 텍스트에 대응하는 기 저장된 음성 데이터를 학습할 수 있다(S1145).In this case, the artificial intelligence server may learn prestored voice data corresponding to the text (S1145).

구체적으로 인공지능 서버는 TTS(Text To Speech)를 이용하여 수신한 텍스트를 음성 데이터로 변환할 수 있다. 그리고 인공지능 서버는 기 저장된 음성 데이터와 변환된 음성 데이터의 메트릭(metric)을 비교해서 유사도를 산출하고, 기 저장된 음성 데이터와 변환된 음성 데이터의 유사도에 기초하여 기 저장된 음성 데이터를 유효 데이터로 결정할 수 있다.In detail, the artificial intelligence server may convert the received text into voice data by using text to speech (TTS). The artificial intelligence server compares the stored voice data with the metric of the converted voice data to calculate similarity, and determines the pre-stored voice data as valid data based on the similarity between the pre-stored voice data and the converted voice data. Can be.

그리고 기 저장된 음성 데이터가 유효 데이터로 결정된 경우, 인공지능 서버는 유효 데이터로 결정된 음성 데이터를 텍스트와 라벨링하여 음성 데이터 데이터베이스(232)에 저장할 수 있다.When the pre-stored voice data is determined to be valid data, the artificial intelligence server may label the voice data determined as valid data with text and store the same in the voice data database 232.

한편 음성 데이터의 학습은 먼저 TTS를 학습시켜 음성을 출력한 후, 사용자의 수락 요청이 수신되는 경우에 음성 인식 모델을 학습시키는 방식으로 구현이 가능하다.On the other hand, the learning of the voice data can be implemented in such a manner as to learn a TTS, output a voice, and then learn a speech recognition model when a user's acceptance request is received.

구체적으로 인공지능 서버는 사용자의 음성 데이터로 TTS를 학습시킬 수 있다. 그리고 인공지능 서버는 학습된 TTS로 사용자의 음성과 유사한 음성 데이터를 생성하여 전송할 수 있다. 그리고 음성인식 에이전트는 TTS에서 생성된 음성 데이터를 출력할 수 있다.In more detail, the artificial intelligence server may learn the TTS using voice data of the user. The artificial intelligence server may generate and transmit voice data similar to the user's voice with the learned TTS. The voice recognition agent may output voice data generated in the TTS.

이 경우 사용자는 TTS에서 생성된 음성이 자신의 음성과 유사한지에 대하여 판단하고, 자신의 음성과 유사한 경우 수락 요청을 입력할 수 있다.In this case, the user may determine whether the voice generated in the TTS is similar to his voice, and input a request for acceptance if the voice is similar to his voice.

이 경우 음성인식 에이전트는 수락 요청을 인공지능 서버에 전송하고, 인공지능 서버는 TTS에서 학습된 음성 데이터를 학습하여 음성인식 모델을 업데이트 할 수 있다.In this case, the voice recognition agent may transmit an acceptance request to the AI server, and the AI server may update the voice recognition model by learning the voice data learned from the TTS.

또한 TTS에서 생성된 음성이 자신의 음성과 유사하지 않다고 판단한 사용자가 거절 요청을 입력한 경우, 음성인식 에이전트는 사용자의 음성을 학습하기 위한 추가 데이터를 사용자에게 다시 요청할 수 있다.In addition, when a user who determines that the voice generated in the TTS is not similar to his or her voice inputs a rejection request, the voice recognition agent may request the user for additional data for learning the user's voice.

한편 추가 학습을 위한 텍스트의 요청은, 음성 인식의 반복 실패시 수행될 수 있다.Meanwhile, the request for text for further learning may be performed when the repetition of speech recognition fails.

예를 들어 음성인식 에이전트는, 동일한 단어 또는 문장에 대하여 기 설정된 횟수 이상 인식에 실패하거나, 기 설정된 횟수 이상 인식률이 기 설정된 기준보다 낮은 경우, 음성 인식 에이전트는 사용자에게 기 발화한 음성 데이트에 대응하는 텍스트의 입력을 요청할 수 있다.For example, if the speech recognition agent fails to recognize the same word or sentence more than a predetermined number of times, or if the recognition rate is lower than the predetermined criterion more than the predetermined number of times, the speech recognition agent responds to the speech data that is issued to the user. You can request to enter text.

다른 예를 들어, 음성인식 에이전트는 먼저 도 6에서 설명한 방식으로, 사용자에게 특정 문장을 제시하고 특정 문장을 따라 읽게 요청하여 1차적으로 학습을 수행하며, 그래도 사용자의 음성이 인식되지 않는 경우에 추가 학습을 위한 텍스트를 사용자에게 요청할 수 있다.For another example, the speech recognition agent first performs a learning by asking a user to present a specific sentence and read along the specific sentence in the manner described with reference to FIG. 6, and is added when the user's voice is still not recognized. You can ask the user for text for learning.

도 13은 본 발명의 실시 예에 따른, 음성 인식 시스템의 동작을 설명하기 위한 도면이다.13 is a diagram for describing an operation of a voice recognition system according to an exemplary embodiment of the present invention.

음성 인식 시스템은 사용자로부터 사용자 정보를 수신하고, 수신된 사용자 정보를 등록할 수 있다(S1310).The voice recognition system may receive user information from the user and register the received user information (S1310).

구체적으로 음성인식 에이전트는 사용자 정보를 수신하여 서버에 전송하고, 서버는 수신된 사용자 정보를 저장할 수 있다.In detail, the voice recognition agent receives the user information and transmits the user information to the server, and the server may store the received user information.

여기서 사용자 정보는 국가, 지역, 억양, 나이, 성별 중 적어도 하나를 포함할 수 있다.The user information may include at least one of country, region, intonation, age and gender.

한편 음성 인식 시스템은 사용자의 음성 데이터를 수신하고, 음성 데이터를 인식하여 음성 인식 결과에 대응하는 기능을 수행할 수 있다(S1320, S1330).Meanwhile, the voice recognition system may receive voice data of a user, recognize voice data, and perform a function corresponding to a voice recognition result (S1320 and S1330).

한편 음성인식 시스템은 사용자의 추가학습 참여 여부 및 학습 옵션을 결정할 수 있다(S1340).Meanwhile, the voice recognition system may determine whether the user participates in additional learning and learning options (S1340).

구체적으로 음성인식 에이전트는 추가 학습의 문의를 출력하고, 추가 학습 방법에 대한 복수의 옵션을 제공할 수 있다.In detail, the voice recognition agent may output a query for further learning and provide a plurality of options for the additional learning method.

그리고 사용자로부터 추가 학습을 수락하고 특정 옵션을 선택하는 입력이 수신되면, 음성인식 시스템은 선택된 옵션을 등록할 수 있다. 그리고 이후에 추가 학습이 필요한 경우, 음성인식 시스템은 등록된 옵션으로 추가 학습을 진행할 수 있다.And when an input is received from the user to accept further learning and select a specific option, the speech recognition system may register the selected option. If further learning is required later, the voice recognition system may proceed with additional learning with the registered option.

한편 사용자에 따라서 학습이 더 잘되는 옵션이 다를 수 있으므로, 음성인식 에이전트는 먼저 복수의 옵션 모두로 학습을 수행하고, 학습 후 음성 인식률이 높은 옵션을 등록할 수도 있다.On the other hand, since the option of learning better may vary depending on the user, the speech recognition agent may first learn with all of the plurality of options, and then register the option having a high speech recognition rate after the learning.

예를 들어 제시 음성 따라 말하기의 제1 옵션, 제시 문장 따라 말하기의 제2 옵션 및 직접 문장 작성하고 따라 말하기 제3 옵션 중 제2 옵션의 인식률이 가장 높은 경우, 음성인식 시스템은 음성 인식률이 가장 높은 제2 옵션으로 사용자에게 추가 데이터를 요청할 수 있다.For example, if the recognition rate of the first option of speaking according to the presented speech, the second option of speaking according to the presented sentence, and the second option of writing and speaking directly according to the present sentence have the highest recognition rate, the speech recognition system has the highest speech recognition rate. A second option may be to request additional data from the user.

한편 특정 기능을 수행하기 위한 음성 인식률의 기준은 특정 기능이 무엇인지에 따라 상이할 수 있다.Meanwhile, the criterion of speech recognition rate for performing a specific function may be different depending on what the specific function is.

예를 들어 “턴 온”, “턴 오프” 등의 명령어를 가지는 홈 보이스 기반 서비스는 음성 인식률이 55% 이상이기만 하면, 사용자의 명령어에 대응하는 기능을 수행할 수 있다.For example, a home voice based service having a command such as “turn on” or “turn off” may perform a function corresponding to a user's command as long as the voice recognition rate is 55% or more.

다른 예를 들어 사용자의 개인 메시지를 확인하기 위한 명령어는, 음성 인식률이 65% 이상이어야 사용자의 명령어에 대응하는 기능을 수행할 수 있다.For another example, the command for confirming the user's personal message may perform a function corresponding to the user's command when the voice recognition rate is 65% or more.

다른 예를 들어 결재나 인증을 위한 명령어는, 음성 인식률이 75% 이상이어야 사용자의 명령어에 대응하는 기능을 수행할 수 있다.For another example, the command for payment or authentication may perform a function corresponding to the user's command when the voice recognition rate is 75% or more.

한편 위에서는 본 발명이 음성인식 에이전트, 인공지능 서버, 이동 단말기에 의해 구현되는 것으로 설명하였으나 이에 한정되지 않는다.Meanwhile, the present invention has been described as being implemented by a voice recognition agent, an artificial intelligence server, and a mobile terminal, but is not limited thereto.

예를 들어 앞서 설명한 인공지능 서버의 모든 구성 및 기능은 음성인식 에이전트에 탑재되어 수행될 수 있다.For example, all the configurations and functions of the above-described AI server may be mounted on the voice recognition agent and performed.

이와 같이 본 발명은, 사용자의 음성을 수동적으로 수집하여 학습하는 종래의 방식과는 달리, 사용자의 발화 습관을 가장 잘 파악할 수 있는 문장을 제시하여 음성 입력을 요청하거나, 사용자가 발화한 문장을 직접 텍스트로 요청한다. 따라서 본 발명에 따르면, 학습 성능을 대폭 향상시킬 수 있으며, 빠른 개인화가 가능한 장점이 있다.As described above, the present invention, unlike the conventional method of passively collecting and learning the user's voice, requests a voice input by presenting a sentence that can best grasp the user's speech habit, or directly inputs a sentence spoken by the user. Request by text Therefore, according to the present invention, it is possible to greatly improve the learning performance, there is an advantage capable of fast personalization.

한편, 제어부(180)는 일반적으로 장치의 제어를 담당하는 구성으로, 중앙처리장치, 마이크로 프로세서, 프로세서 등의 용어와 혼용될 수 있다.Meanwhile, the controller 180 is generally configured to control the device, and may be mixed with terms such as a CPU, a microprocessor, and a processor.

전술한 본 발명은, 프로그램이 기록된 매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 매체의 예로는, HDD(Hard Disk Drive), SSD(Solid State Disk), SDD(Silicon Disk Drive), ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있으며, 또한 캐리어 웨이브(예를 들어, 인터넷을 통한 송신)의 형태로 구현되는 것도 포함한다. 또한, 상기 컴퓨터는 단말기의 제어부(180)를 포함할 수도 있다. 따라서, 상기의 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니 되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다.The present invention described above can be embodied as computer readable codes on a medium on which a program is recorded. The computer readable medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable media include hard disk drives (HDDs), solid state disks (SSDs), silicon disk drives (SDDs), ROMs, RAM, CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and the like. This also includes implementations in the form of carrier waves (eg, transmission over the Internet). In addition, the computer may include the controller 180 of the terminal. Accordingly, the above detailed description should not be construed as limiting in all aspects and should be considered as illustrative. The scope of the present invention should be determined by reasonable interpretation of the appended claims, and all changes within the equivalent scope of the present invention are included in the scope of the present invention.

10: 음성 인식 시스템 100: 음성 인식 에이전트
200: 인공지능 서버 300: 이동 단말기10: speech recognition system 100: speech recognition agent
200: AI server 300: mobile terminal

Claims

In the speech recognition system,
A voice recognition agent receiving voice data from a user and transmitting the voice data to an artificial intelligence server; And
An artificial intelligence server for inputting the voice data into a voice recognition model, transmitting a result of recognizing the voice data to the voice recognition agent, and learning the voice data;
The speech recognition agent,
If the speech recognition rate for the voice data is lower than a preset criterion, requesting the user for additional data for learning the voice of the user
Speech recognition system.

The method of claim 1,
The speech recognition agent,
Providing a specific sentence to the user, and when the second voice data corresponding to the specific sentence is received, transmitting the second voice data to the artificial intelligence server;
The artificial intelligence server,
Learning the second voice data corresponding to the specific sentence
Speech recognition system.

The method of claim 2,
The artificial intelligence server,
Transmitting the specific sentence corresponding to the characteristic of the speech data among the plurality of sentences to the speech recognition agent based on the characteristic of the speech data.
Speech recognition system.

The method of claim 3, wherein
The plurality of sentences,
Are categorized into categories that include at least one of product features, country, region, age, dialect, gender, and foreign language,
The artificial intelligence server,
On the basis of the characteristics of the voice data, to transmit the specific sentence included in the category that requires further learning to the user of the plurality of categories to the voice recognition agent
Speech recognition system.

The method of claim 3, wherein
The specific sentence,
Including a command corresponding to the function of the speech recognition agent
Speech recognition system.

The method of claim 2,
The speech recognition system,
Further comprising a mobile terminal,
The speech recognition agent,
Transmitting the specific sentence to the mobile terminal of the user,
The mobile terminal,
Displaying text corresponding to the specific sentence
Speech recognition system.

The method of claim 1,
The speech recognition agent,
If the speech recognition rate is lower than a preset criterion, requesting the user to input text corresponding to the speech data
Speech recognition system.

The method of claim 7, wherein
The artificial intelligence server,
Store the voice data,
The speech recognition agent,
When text corresponding to the voice data is input, the text corresponding to the voice data is transmitted to the artificial intelligence server,
The artificial intelligence server,
Learning the stored voice data corresponding to the text
Speech recognition system.

The method of claim 8,
The artificial intelligence server,
Converting the text into speech data, determining the stored speech data as valid data based on the similarity between the converted speech data and the stored speech data, and learning the speech data determined as the valid data.
Speech recognition system.

The method of claim 8,
The speech recognition system,
The mobile terminal may further include receiving a text input corresponding to the voice data and transmitting the text corresponding to the voice data to the voice recognition agent.
Speech recognition system.

The method of claim 1,
The speech recognition agent,
When the user inputs specific text and third voice data corresponding to the specific text, the specific text and the third voice data corresponding to the specific text are transmitted to the artificial intelligence server,
The artificial intelligence server,
Learning the third voice data corresponding to the specific text
Speech recognition system.

The method of claim 1,
The speech recognition agent,
Provides a first option of speaking along the presented speech, a second option of speaking along the presented sentence, and a third option of directly writing and speaking along the sentence,
Requesting the additional data with the highest speech recognition rate among the first to third options
Speech recognition system.

The method of claim 1,
The artificial intelligence server,
Learning the additional data and transmitting the changed speech recognition rate to the speech recognition agent according to a result of learning the additional data;
Speech recognition system.

In the speech recognition device,
An input unit to receive voice data from a user; And
An artificial intelligence unit for inputting the voice data into a voice recognition model, obtaining a recognition result for the voice data, and learning the voice data;
The artificial intelligence unit,
If the speech recognition rate for the voice data is lower than a preset criterion, requesting the user for additional data for learning the voice of the user
Speech recognition device.