KR20150116389A

KR20150116389A - Speech recognition using Electronic Device and Server

Info

Publication number: KR20150116389A
Application number: KR1020150038857A
Authority: KR
Inventors: 정석영; 김경태
Original assignee: 삼성전자주식회사
Priority date: 2014-04-07
Filing date: 2015-03-20
Publication date: 2015-10-15
Also published as: KR102414173B1

Abstract

An electronic device is provided. The electronic device includes a processor configured to perform automatic speech recognition (ASR) on a speech input by using a speech recognition model that is stored in a memory and a communications module configured to provide the speech input to a server and receive a speech instruction, which corresponds to the speech input, from the server. The electronic device may perform different operations according to a confidence score of a result of the ASR. Besides, it may be permissible to prepare other various embodiments speculated through the specification.

Description

[0001] Speech recognition using an electronic device and a server [0002]

본 발명의 다양한 실시 예들은 전자 장치에 탑재된 음성 인식 모델과 서버에서 활용 가능한 음성 인식 모델을 이용하여 사용자의 음성 입력을 인식하고 음성 명령을 수행하는 기술과 관련된다.Various embodiments of the present invention relate to techniques for recognizing a user's voice input and performing voice commands using a voice recognition model installed on an electronic device and a voice recognition model available in the server.

키보드나 마우스를 이용한 전통적인 입력 방식에 부가하여, 최근의 전자 장치들은 사용자의 음성(speech)을 이용한 입력 방식을 지원할 수 있다. 예를 들어, 스마트폰이나 태블릿과 같은 전자 장치들은 특정 기능(예: S-Voice, 또는 Siri 등)이 실행된 상태에서 입력되는 사용자의 음성을 분석하여 그 음성을 텍스트로 변환하거나, 음성에 대응하는 동작을 수행할 수 있다. 또한 일부 전자 장치들은 음성 인식 기능이 항상 활성화되어 있어(always-on), 언제든지 사용자의 음성에 따라 깨어나거나(awake), 잠금 해제되거나(unlocked), 인터넷 검색이나 통화, SMS/이메일 읽기와 같은 기능을 수행할 수 있다.In addition to traditional input methods using a keyboard or a mouse, recent electronic devices can support an input method using a user's speech. For example, an electronic device such as a smart phone or a tablet may analyze a user's voice input when a specific function (e.g., S-Voice, Siri, etc.) is executed, convert the voice to text, Can be performed. In addition, some electronic devices are always-on, with voice recognition being always awake, awake, unlocked, and functioning like Internet browsing, calling, SMS / email reading Can be performed.

음성 인식과 관련된 다양한 연구와 기술이 알려져 있지만, 전자 장치에서 음성 인식을 수행하는 방법은 제한적일 수 밖에 없다. 예를 들어, 전자 장치는 음성 인식에 대한 빠른 응답을 위해 전자 장치에 자체적으로 탑재된 음성 인식 모델을 활용할 수 있다. 그러나 전자 장치의 저장 공간 및 처리 능력은 제한적이고, 그에 따라 인식할 수 있는 음성 입력의 개수나 종류 역시 제한된다. Although various studies and techniques related to speech recognition are known, a method of performing speech recognition in an electronic device is limited. For example, an electronic device may utilize a speech recognition model that is self-contained in an electronic device for quick response to speech recognition. However, the storage space and processing capabilities of electronic devices are limited, thereby limiting the number or types of speech inputs that can be recognized.

음성 입력에 대하여 보다 정확하고 확실한 결과를 얻기 위하여 전자 장치는 음성 입력을 서버로 전송하여 음성 인식을 요청하고, 서버로부터 회신된 결과를 제공하거나, 회신된 결과에 기초하여 특정 동작을 수행할 수 있다. 그러나 이 방법은 전자 장치의 통신 사용량을 증가시키고, 상대적으로 늦은 응답 속도를 가져온다. In order to obtain a more accurate and reliable result for voice input, the electronic device can send voice input to the server to request voice recognition, provide a result returned from the server, or perform a specific operation based on the returned result . However, this method increases the communication usage of the electronic device and brings about a relatively slow response speed.

본 문서에 개시되는 다양한 실시 예들은 둘 이상의 서로 다른 음성 인식 능력 또는 음성 인식 모델을 활용하여 전술한 여러 상황에서 발생할 수 있는 비효율을 개선하고 사용자에게 빠른 응답 속도와 높은 정확성을 제공하는 음성 인식 수행 방법을 제공할 수 있다.The various embodiments disclosed herein utilize two or more different speech recognition capabilities or speech recognition models to improve the inefficiencies that may occur in the various situations described above and provide a fast response time and high accuracy to the user Can be provided.

본 발명의 다양한 실시 예에 따른 전자 장치는, 메모리에 저장된 음성 인식 모델을 이용하여 음성 입력에 대한 ASR(automatic speech recognition)을 수행하는 프로세서, 상기 음성 입력을 서버로 제공하고 상기 서버로부터 상기 음성 입력에 대응하는 음성 명령을 수신하는 통신 모듈을 포함할 수 있다. 여기서 상기 프로세서는 (1) 상기 ASR의 수행 결과의 신뢰도가 제1 임계값 이상인 경우 상기 ASR의 수행 결과에 대응하는 동작을 수행하고, (2) 상기 ASR의 수행 결과의 신뢰도가 제2 임계값 미만인 경우 상기 신뢰도에 대한 피드백을 제공할 수 있다.An electronic device according to various embodiments of the present invention includes a processor for performing automatic speech recognition (ASR) on speech input using a speech recognition model stored in a memory, a processor for providing the speech input to a server, And a communication module for receiving a voice command corresponding to the voice command. Wherein the processor is configured to: (1) perform an operation corresponding to a result of the ASR if the reliability of the ASR is greater than or equal to a first threshold value; (2) It may provide feedback on the reliability.

본 발명의 다양한 실시 예에 따르면, 전자 장치에 자체적으로 탑재된 음성 인식 모델을 이용하여 음성 인식을 수행하고, 이 음성 인식 결과에 기초하여 보충적으로 서버를 통한 음성 인식 결과를 활용하여 빠른 응답 속도와 높은 정확성을 갖는 음성 인식 기능을 제공할 수 있다.According to various embodiments of the present invention, speech recognition is performed using a speech recognition model that is built in the electronic device, and based on the speech recognition result, It is possible to provide a voice recognition function with high accuracy.

또한 전자 장치와 서버를 이용한 음성 인식의 결과들을 비교하고, 비교 결과에 기초하여 음성 인식 모델 또는 음성 인식 알고리즘에 반영할 수 있다. 그에 따라 음성 인식이 반복하여 수행될수록 정확도와 응답 속도가 지속적으로 개선될 수 있다.The results of the speech recognition using the electronic device and the server can be compared and reflected in the speech recognition model or speech recognition algorithm based on the comparison result. As the speech recognition is repeatedly performed, the accuracy and the response speed can be continuously improved.

도 1은 본 발명의 일 실시 예에 따른 전자 장치 및 전자 장치와 네트워크를 통해 연결되는 서버를 나타낸다.
도 2는 본 발명의 다른 실시 예에 따른 전자 장치 및 서버를 나타낸다.
도 3은 본 발명의 일 실시 예에 따른 음성 인식 수행 방법의 흐름도를 나타낸다.
도 4는 본 발명의 다른 실시 예에 따른 음성 인식 수행 방법의 흐름도를 나타낸다.
도 5는 본 발명의 일 실시 예에 따른 임계 값을 업데이트하는 방법의 흐름도를 나타낸다.
도 6은 본 발명의 일 실시 예에 따른 음성 인식 모델을 업데이트하는 방법의 흐름도를 나타낸다.
도 7은 본 발명의 일 실시 예에 따른 네트워크 환경 내의 전자 장치를 나타낸다.
도 8은 본 발명의 일 실시 예에 따른 전자 장치의 블록 도를 나타낸다.BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 shows a server connected via a network with an electronic device and an electronic device according to an embodiment of the present invention.
2 shows an electronic device and a server according to another embodiment of the present invention.
3 is a flowchart illustrating a method of performing speech recognition according to an embodiment of the present invention.
4 is a flowchart illustrating a method of performing speech recognition according to another embodiment of the present invention.
5 shows a flow diagram of a method for updating a threshold according to an embodiment of the present invention.
6 is a flowchart of a method for updating a speech recognition model according to an embodiment of the present invention.
Figure 7 illustrates an electronic device in a network environment in accordance with one embodiment of the present invention.
8 shows a block diagram of an electronic device according to an embodiment of the present invention.

이하, 본 발명의 다양한 실시 예가 첨부된 도면을 참조하여 기재된다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 실시 예의 다양한 변경(modification), 균등물(equivalent), 및/또는 대체물(alternative)을 포함하는 것으로 이해되어야 한다. 도면의 설명과 관련하여, 유사한 구성요소에 대해서는 유사한 참조 부호가 사용될 수 있다.Various embodiments of the invention will now be described with reference to the accompanying drawings. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes various modifications, equivalents, and / or alternatives of the embodiments of the invention. In connection with the description of the drawings, like reference numerals may be used for similar components.

본 문서에서, "가진다", "가질 수 있다", "포함한다", 또는 "포함할 수 있다" 등의 표현은 해당 특징(예: 수치, 기능, 동작, 또는 부품 등의 구성요소)의 존재를 가리키며, 추가적인 특징의 존재를 배제하지 않는다.In this document, the expressions "have," "may," "include," or "include" may be used to denote the presence of a feature (eg, a numerical value, a function, Quot ;, and does not exclude the presence of additional features.

본 문서에서, "A 또는 B", "A 또는/및 B 중 적어도 하나", 또는 "A 또는/및 B 중 하나 또는 그 이상" 등의 표현은 함께 나열된 항목들의 모든 가능한 조합을 포함할 수 있다. 예를 들면, "A 또는 B", "A 및 B 중 적어도 하나", 또는 "A 또는 B 중 적어도 하나"는, (1) 적어도 하나의 A를 포함, (2) 적어도 하나의 B를 포함, 또는 (3) 적어도 하나의 A 및 적어도 하나의 B 모두를 포함하는 경우를 모두 지칭할 수 있다.In this document, the expressions "A or B," "at least one of A and / or B," or "one or more of A and / or B," etc. may include all possible combinations of the listed items . For example, "A or B," "at least one of A and B," or "at least one of A or B" includes (1) at least one A, (2) Or (3) at least one A and at least one B all together.

다양한 실시 예에서 사용된 "제1", "제2", "첫째", 또는 "둘째" 등의 표현들은 다양한 구성요소들을, 순서 및/또는 중요도에 상관없이 수식할 수 있고, 해당 구성요소들을 한정하지 않는다. 예를 들면, 제1 사용자 기기와 제2 사용자 기기는, 순서 또는 중요도와 무관하게, 서로 다른 사용자 기기를 나타낼 수 있다. 예를 들면, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 바꾸어 명명될 수 있다.Expressions such as " first, "second," first, "or" second, " as used in various embodiments, Not limited. For example, the first user equipment and the second user equipment may represent different user equipment, regardless of order or importance. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be named as the first component.

어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "(기능적으로 또는 통신적으로) 연결되어((operatively or communicatively) coupled with/to)" 있다거나 "접속되어(connected to)" 있다고 언급된 때에는, 상기 어떤 구성요소가 상기 다른 구성요소에 직접적으로 연결되거나, 다른 구성요소(예: 제3 구성요소)를 통하여 연결될 수 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소(예: 제1 구성요소)가 다른 구성요소(예: 제2 구성요소)에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 상기 어떤 구성요소와 상기 다른 구성요소 사이에 다른 구성요소(예: 제3 구성요소)가 존재하지 않는 것으로 이해될 수 있다.(Or functionally or communicatively) coupled with / to "another component (eg, a second component), or a component (eg, a second component) Quot; connected to ", it is to be understood that any such element may be directly connected to the other element or may be connected through another element (e.g., a third element). On the other hand, when it is mentioned that a component (e.g., a first component) is "directly connected" or "directly connected" to another component (e.g., a second component) It can be understood that there is no other component (e.g., a third component) between other components.

본 문서에서 사용된 표현 "~하도록 구성된(또는 설정된)(configured to)"은 상황에 따라, 예를 들면, "~에 적합한(suitable for)", "~하는 능력을 가지는(having the capacity to)", "~하도록 설계된(designed to)", "~하도록 변경된(adapted to)", "~하도록 만들어진(made to)", 또는 "~를 할 수 있는(capable of)"과 바꾸어 사용될 수 있다. 용어 "~하도록 구성(또는 설정)된"은 하드웨어적으로 "특별히 설계된(specifically designed to)"것만을 반드시 의미하지 않을 수 있다. 대신, 어떤 상황에서는, "~하도록 구성된 장치"라는 표현은, 그 장치가 다른 장치 또는 부품들과 함께 "~할 수 있는" 것을 의미할 수 있다. 예를 들면, 문구 "A, B, 및 C를 수행하도록 구성(또는 설정)된 프로세서"는 해당 동작을 수행하기 위한 전용 프로세서(예: 임베디드 프로세서), 또는 메모리 장치에 저장된 하나 이상의 소프트웨어 프로그램들을 실행함으로써, 해당 동작들을 수행할 수 있는 범용 프로세서(generic-purpose processor)(예: CPU 또는 application processor)를 의미할 수 있다.As used herein, the phrase " configured to " (or set) to be "adapted to, " To be designed to, "" adapted to, "" made to, "or" capable of ". The term " configured (or set) to "may not necessarily mean " specifically designed to" Instead, in some situations, the expression "configured to" may mean that the device can "do " with other devices or components. For example, a processor configured (or configured) to perform the phrases "A, B, and C" may be a processor dedicated to performing the operation (e.g., an embedded processor), or one or more software programs To a generic-purpose processor (e.g., a CPU or an application processor) that can perform the corresponding operations.

본 문서에서 사용된 용어들은 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 다른 실시 예의 범위를 한정하려는 의도가 아닐 수 있다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명의 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가질 수 있다. 일반적으로 사용되는 사전에 정의된 용어들은 관련 기술의 문맥 상 가지는 의미와 동일 또는 유사한 의미를 가지는 것으로 해석될 수 있으며, 본 문서에서 명백하게 정의되지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. 경우에 따라서, 본 문서에서 정의된 용어일지라도 본 발명의 실시 예들을 배제하도록 해석될 수 없다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the other embodiments. The singular expressions may include plural expressions unless the context clearly dictates otherwise. All terms used herein, including technical or scientific terms, may have the same meaning as commonly understood by one of ordinary skill in the art. Commonly used predefined terms may be interpreted to have the same or similar meaning as the contextual meanings of the related art and are not to be construed as ideal or overly formal in meaning unless expressly defined in this document . In some cases, the terms defined in this document can not be construed to exclude embodiments of the present invention.

이하, 첨부 도면을 참조하여, 다양한 실시 예에 따른 전자 장치가 설명된다. 본 문서에서, 사용자라는 용어는 전자 장치를 사용하는 사람 또는 전자 장치를 사용하는 장치(예: 인공지능 전자 장치)를 지칭할 수 있다.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An electronic apparatus according to various embodiments will now be described with reference to the accompanying drawings. In this document, the term user may refer to a person using an electronic device or a device using an electronic device (e.g., an artificial intelligence electronic device).

도 1은 본 발명의 일 실시 예에 따른 전자 장치 및 전자 장치와 네트워크를 통해 연결되는 서버를 나타낸다.BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 shows a server connected via a network with an electronic device and an electronic device according to an embodiment of the present invention.

도 1을 참조하면, 전자 장치는 사용자 단말 100과 같은 구성을 포함할 수 있다. 예를 들어, 사용자 단말 100은 마이크 110, 컨트롤러 120, ASR 모듈 130, ASR 모델 140, 트랜시버 150, 스피커 170, 및 디스플레이 180을 포함할 수 있다. 도 1에 도시된 사용자 단말 100의 구성은 예시적인 것이며, 본 문서에 개시되는 다양한 실시 예를 구현할 수 있는 다양한 변형이 가능하다. 예를 들어, 전자 장치는 도 2에 도시된 사용자 단말 101, 도 7에 도시된 전자 장치 701, 도 8에 도시된 전자 장치 801과 같은 구성을 포함하거나, 이 구성들을 활용하여 적절하게 변형될 수 있다. 이하에서는 사용자 단말 100을 기준으로 본 발명의 다양한 실시 예들이 설명된다.Referring to FIG. 1, an electronic device may include the same configuration as the user terminal 100. For example, the user terminal 100 may include a microphone 110, a controller 120, an ASR module 130, an ASR model 140, a transceiver 150, a speaker 170, and a display 180. The configuration of the user terminal 100 shown in FIG. 1 is exemplary and various modifications are possible that can implement various embodiments disclosed in this document. For example, the electronic device may comprise or be modified to suitably utilize configurations such as the user terminal 101 shown in Fig. 2, the electronic device 701 shown in Fig. 7, the electronic device 801 shown in Fig. 8 have. Various embodiments of the present invention will now be described with reference to a user terminal 100. [

사용자 단말 100은 마이크 110을 통하여 사용자로부터 음성 입력을 획득할 수 있다. 예를 들어, 사용자가 음성 인식과 관련된 어플리케이션을 실행하거나, 또는 음성 인식이 항상 활성화 상태인 경우, 사용자의 발화(speech)는 마이크 110을 통해 획득될 수 있다. 마이크 110은 아날로그 신호를 디지털 신호로 변환하는 ADC(Analog-Digital Convertor)를 포함할 수 있다. 그러나 일부 실시 예에서, ADC, DAC(Digital-Analog Convertor) 및 다양한 신호 처리 또는 전처리(pre-processing) 회로가 컨트롤러 120에 포함될 수 있다.The user terminal 100 can acquire voice input from the user via the microphone 110. [ For example, if the user is running an application related to speech recognition, or if speech recognition is always active, the user's speech may be obtained via the microphone 110. The microphone 110 may include an analog-to-digital converter (ADC) that converts an analog signal to a digital signal. However, in some embodiments, an ADC, a Digital-Analog Converter (DAC), and various signal processing or pre-processing circuits may be included in the controller 120.

컨트롤러 120은 마이크 110에 의해 획득된 음성 입력, 또는 음성 입력에 기초하여 생성된 오디오 신호(또는 음성 신호)를 ASR 모듈 130과 트랜시버 150으로 제공할 수 있다. 컨트롤러 120에 의해 ASR 모듈 130으로 제공되는 오디오 신호는 음성 인식을 위해 전처리된 신호일 수 있다. 예를 들어, 상기 오디오 신호는 노이즈 필터링(noise filtering) 또는 인간의 음성에 적합한 이퀄라이저(equalizer)가 적용된 신호일 수 있다. 반면에 컨트롤러 120에 의해 트랜시버 150으로 제공되는 신호는 음성 입력 그 자체일 수 있다. 컨트롤러 120은 ASR 모듈 130과 달리 트랜시버 150으로는 원음 데이터를 전송함으로써, 서버 200에 의해 적절한, 또는 더 나은 성능의 오디오 신호 처리가 적용될 수 있다.The controller 120 may provide the ASR module 130 and the transceiver 150 with audio signals (or audio signals) generated based on the audio input or the audio input obtained by the microphone 110. The audio signal provided to the ASR module 130 by the controller 120 may be a preprocessed signal for speech recognition. For example, the audio signal may be a noise filtering or a signal to which an equalizer suitable for a human voice is applied. While the signal provided to the transceiver 150 by the controller 120 may be the speech input itself. The controller 120 can transmit audio signals to the transceiver 150, unlike the ASR module 130, so that the audio signal processing appropriate or better performance can be applied by the server 200.

컨트롤러 120은, 사용자 단말 100의 일반적인 동작을 제어할 수 있다. 예를 들어, 컨트롤러 120은 사용자로부터의 음성 입력을 제어하고, 음성 인식 동작을 제어하고, 음성 인식에 따른 기능 수행을 제어할 수 있다.The controller 120 can control the general operation of the user terminal 100. For example, the controller 120 may control speech input from a user, control speech recognition, and control the performance of speech recognition.

ASR 모듈 130은 컨트롤러 120으로부터 제공되는 오디오 신호에 대하여 음성 인식을 수행할 수 있다. ASR 모듈 130은 음성 입력(오디오 신호)에 대하여 고립 단어 인식(isolated word recognition), 연속어 음성 인식(connected word recognition), 대용량 어휘 인식(large vocabulary recognition) 등을 수행할 수 있다. ASR 모듈 130에 의해 수행되는 ASR은 화자 독립(speaker-independent)적으로 구현되거나, 또는 화자 종속(speaker-dependent)적으로 구현될 수 있다. ASR 모듈 130은 반드시 하나의 음성 인식 엔진일 필요는 없으며, 둘 이상의 음성 인식 엔진으로 구성될 수 있다. 또한, ASR 모듈 130이 복수의 음성 인식 엔진을 포함하는 경우, 각각의 음성 인식 엔진은 인식 목적이 상이할 수 있다. 예를 들어, 하나의 음성 인식 엔진은 ASR 기능을 활성화 하기 위한 발화(wakeup speech), 예를 들어 “Hi, Galaxy”를 인식할 수 있고, 다른 하나의 음성 인식 엔진은 음성 명령 발화(command speech), 예를 들어 “read a recent e-mail”을 인식할 수 있다. ASR 모듈 130은 ASR 모델 140에 기초하여 음성 인식을 수행하고, 따라서 ASR 모델 140에 의해 인식할 수 있는 음성 입력의 범위(예: 종류 또는 개수)가 지정될 수 있다. ASR 모듈 130에 대한 위 설명은, 후술하는 서버의 ASR 모듈 230에도 적용될 수 있다.The ASR module 130 may perform speech recognition on the audio signal provided from the controller 120. The ASR module 130 can perform isolated word recognition, connected word recognition, and large vocabulary recognition on a voice input (audio signal). The ASR performed by the ASR module 130 may be implemented in a speaker-independent manner, or may be implemented in a speaker-dependent manner. The ASR module 130 does not necessarily have to be one speech recognition engine, but may be composed of two or more speech recognition engines. Further, when the ASR module 130 includes a plurality of speech recognition engines, each speech recognition engine may have different recognition purposes. For example, one speech recognition engine may recognize a wakeup speech, such as " Hi, Galaxy " to activate the ASR function, and the other speech recognition engine may recognize a speech command speech, , For example, "read a recent e-mail". The ASR module 130 performs speech recognition based on the ASR model 140, and thus the range (e.g., type or number) of speech input that can be recognized by the ASR model 140 can be specified. The above description of the ASR module 130 may be applied to the ASR module 230 of the server described later.

ASR 모듈 130은 음성 입력을 텍스트로 변환할 수 있다. ASR 모듈 130은 음성 입력에 대하여 전자 장치에 의해 수행될 동작이나 기능을 결정할 수 있다. 또한 ASR 모듈 130은 ASR의 수행 결과에 대하여 신뢰도(confidence lever, or confidence score)를 함께 결정할 수 있다.ASR module 130 may convert speech input to text. The ASR module 130 may determine an action or function to be performed by the electronic device for voice input. Also, the ASR module 130 can determine a confidence lever (or a confidence score) for the result of the ASR.

ASR 모델 140은 문법(grammar)을 포함할 수 있다. 여기서 문법은 언어학적인 문법 외에도 (사용자 입력이나 웹 상에서 수집되어) 통계적으로 생성되는 다양한 형태의 문법을 포함할 수 있다. 다양한 실시 예에서, ASR 모델 140은 음향 모델(acoustic model), 언어 모델(language model) 등을 포함할 수 있다. 또는 ASR 모델 140은 고립 단어 인식에 사용되는 음성 인식 모델이 될 수 있다. 다양한 실시 예에서, ASR 모델 140은 사용자 단말 100의 연산 능력과 저장 능력을 고려하여 적절한 수준의 음성 인식을 수행하기 위한 인식 모델을 포함할 수 있다. 예를 들어, 상기 문법은 언어적인 문법과 무관하게, 지정된 명령 구조를 위한 문법을 포함할 수 있다. 예를 들어 “call [user name]”은 [user name]의 사용자에게 호(call)를 발신하기 위한 문법으로서, 상기 ASR 모델 140에 포함될 수 있다. The ASR model 140 may include a grammar. In addition to linguistic grammars, the grammar can include various forms of statistically generated grammars (collected either on user input or on the Web). In various embodiments, the ASR model 140 may include an acoustic model, a language model, and the like. Or the ASR model 140 may be a speech recognition model used for isolated word recognition. In various embodiments, the ASR model 140 may include a recognition model for performing an appropriate level of speech recognition taking into account the computing power and storage capabilities of the user terminal 100. [ For example, the grammar may include grammars for a given command structure, regardless of the linguistic grammar. For example, " call [user name] " may be included in the ASR model 140 as a syntax for issuing a call to a user of [user name].

트랜시버 150은 컨트롤러 120으로부터 제공된 음성 신호를 네트워크 10을 통해 서버 200으로 전송할 수 있다. 또한 전송된 음성 신호에 대응하는 음성 인식의 수행 결과를 서버 200으로부터 수신할 수 있다.The transceiver 150 may transmit the voice signal provided from the controller 120 to the server 200 via the network 10. [ And can also receive the result of performing speech recognition corresponding to the transmitted speech signal from the server 200. [

스피커 170과 디스플레이 180은 사용자 입력과 상호작용하기 위하여 사용될 수 있다. 예를 들어, 마이크 110을 통해 사용자로부터 음성 입력이 제공되면, 음성 인식의 수행 결과가 디스플레이 180에 표시되고 스피커 170을 통해 출력될 수 있다. 물론, 스피커 170과 디스플레이 180은 각각 사용자 단말 100의 일반적인 사운드 출력 및 화면 출력 기능을 수행할 수 있다.Speaker 170 and display 180 may be used to interact with user input. For example, if a voice input is provided from the user via the microphone 110, the result of performing speech recognition may be displayed on the display 180 and output via the speaker 170. Of course, the speaker 170 and the display 180 can perform general sound output and screen output functions of the user terminal 100, respectively.

서버 200은 사용자 단말 100으로부터 네트워크 10을 통해 제공되는 음성 입력에 대한 음성 인식을 수행하기 위한 구성을 포함할 수 있다. 그에 따라, 서버 200의 일부 구성요소는 사용자 단말 100과 대응될 수 있다. 예를 들어, 서버 200은 트랜시버 210, 컨트롤러 220, ASR 모듈 230, ASR 모델 240 등을 포함할 수 있다. 또한 서버 200은 ASR 모델 컨버터 250이나 NLP (Natural Language Processing) 260과 같은 구성을 더 포함할 수 있다.The server 200 may include a configuration for performing voice recognition of voice input provided from the user terminal 100 through the network 10. [ Accordingly, some components of the server 200 may correspond to the user terminal 100. For example, the server 200 may include a transceiver 210, a controller 220, an ASR module 230, an ASR model 240, and the like. The server 200 may further include an ASR model converter 250 or a NLP (Natural Language Processing) 260.

컨트롤러 220은 서버 200에서 음성 인식을 수행하기 위한 기능 모듈들을 제어할 수 있다. 예를 들어, 컨트롤러 220은 ASR 모듈 230 및/또는 NLP 260과 연결될 수 있다. 또한 컨트롤러 220은 사용자 단말 100과 연동되어 인식 모델 업데이트와 관련된 기능을 수행할 수 있다. 또한 컨트롤러 220은 네트워크 10을 통해 전송된 음성 신호에 대한 전처리를 수행하여 ASR 모듈 230으로 제공할 수 있다. 여기서의 전처리는 사용자 단말 100에서 수행되는 전처리와 다른 방식 또는 효과를 가질 수 있다. 일부 실시 예에서, 서버 200의 컨트롤러 220은 오케스트레이터(orchestrator)로 참조될 수 있다.The controller 220 can control the function modules for performing speech recognition in the server 200. [ For example, controller 220 may be coupled to ASR module 230 and / or NLP 260. In addition, the controller 220 can perform a function related to the recognition model update in association with the user terminal 100. In addition, the controller 220 may perform preprocessing on the voice signal transmitted through the network 10 and provide it to the ASR module 230. The preprocessing herein may have a different approach or effect from the preprocessing performed in the user terminal 100. In some embodiments, the controller 220 of the server 200 may be referred to as an orchestrator.

ASR 모듈 230은 컨트롤러 220으로부터 제공되는 음성 신호에 대하여 음성 인식을 수행할 수 있다. ASR 모듈 130에 대한 설명 중 적어도 일부가 ASR 모듈 230에 적용될 수 있다. 다만, 서버용 ASR 모듈 230과 사용자 단말용 ASR 모듈 130은 일부 유사한 기능을 수행하지만, 포함하는 기능적 범위나 알고리즘이 상이할 수 있다. ASR 모듈 230은 ASR 모델 240에 기초하여 음성 인식을 수행하며, 그에 따라 사용자 단말 100의 ASR 모듈 130의 음성 인식 결과와 다른 결과를 생성할 수 있다. 구체적으로, 서버 200에서는 ASR 모듈 230 및 NLP 260에 의해 음성 인식, 자연어 이해(Natural Language Understanding, NLU), 및 대화 관리(Dialog Management, DM) 또는 그 조합에 의해 인식 결과를 생성하고, 사용자 단말 100에서는 ASR 모듈 130에 의해 인식 결과를 생성할 수 있다. 예를 들어, ASR 모듈 130의 ASR 수행 결과 음성 입력에 대하여 제1 동작 정보와 제1 신뢰도가 결정되고 ASR 모듈 230의 음성 인식 수행 결과 제2 동작 정보와 제2 신뢰도가 결정될 수 있다. 일부 실시 예에서, ASR 모듈 130의 수행 결과와 ASR 모듈 230의 수행 결과는 일치하거나, 적어도 일부는 다를 수 있다. 예를 들어, 제1 동작 정보와 제2 동작 정보는 서로 대응되지만 제1 신뢰도보다 제2 신뢰도가 더 높은 점수(score)를 가질 수 있다. 다양한 실시 예에서, 사용자 단말 100의 ASR 모듈 130에 의해 수행되는 음성 인식(ASR)은 제1 음성 인식으로, 서버 200의 ASR 모듈 230에 의해 수행되는 음성 인식(ASR)은 제2 음성 인식으로 참조될 수 있다.The ASR module 230 can perform speech recognition on the voice signal provided from the controller 220. At least a portion of the description of the ASR module 130 may be applied to the ASR module 230. However, the server ASR module 230 and the ASR module 130 for the user terminal perform some similar functions, but may include different functional ranges or algorithms. The ASR module 230 performs speech recognition based on the ASR model 240, thereby generating a result different from the speech recognition result of the ASR module 130 of the user terminal 100. Specifically, the server 200 generates recognition results by speech recognition, natural language understanding (NLU), and dialog management (DM), or a combination thereof, by the ASR module 230 and the NLP 260, The ASR module 130 can generate the recognition result. For example, the first operation information and the first reliability may be determined for the ASR performance result of the ASR module 130, and the second operation information and the second reliability may be determined as a result of performing the voice recognition of the ASR module 230. In some embodiments, the execution results of the ASR module 130 and the ASR module 230 may be consistent, or at least partially different. For example, the first operation information and the second operation information correspond to each other, but the second reliability may be higher than the first reliability. In various embodiments, speech recognition (ASR) performed by the ASR module 130 of the user terminal 100 is referred to as a first speech recognition, and speech recognition (ASR) performed by the ASR module 230 of the server 200 is referred to as second speech recognition .

다양한 실시 예에서, ASR 모듈 130에서 수행되는 제1 음성 인식의 알고리즘과 ASR 모듈 230에서 수행되는 제2 음성 인식의 알고리즘이 다르거나 음성 인식에 사용되는 모델이 다른 경우, 상호간의 모델 변환을 위한 ASR 모델 컨버터 250이 서버 200에 포함될 수 있다. In various embodiments, when the algorithm of the first speech recognition performed by the ASR module 130 is different from the algorithm of the second speech recognition performed by the ASR module 230, or when the model used for speech recognition is different, the ASR The model converter 250 may be included in the server 200.

또한 서버 200은 ASR 모듈 230에서 인식된 결과를 기반으로 사용자의 의도를 파악하고 수행될 기능을 결정하기 위한 NLP 260을 포함할 수 있다. NLP 260은 인간이 발화하는 언어 현상을 기계적으로 분석해서 컴퓨터가 이해할 수 있는 형태로 만드는 자연 언어 이해 혹은 그러한 형태를 다시 인간이 이해할 수 있는 언어로 표현하는 자연어 처리를 수행할 수 있다.
In addition, the server 200 may include an NLP 260 for determining a user's intention based on a result recognized by the ASR module 230 and determining a function to be performed. NLP 260 can perform natural language processing that mechanically analyzes the human language uttered by human beings and expresses them in a language that humans can understand.

도 2는 본 발명의 다른 실시 예에 따른 전자 장치 및 서버를 나타낸다.2 shows an electronic device and a server according to another embodiment of the present invention.

도 2에는, 도 1과 다른 방식으로 구현되는 전자 장치의 예시를 나타낸다. 그러나 본 문서에서 개시되는 음성 인식 방법은, 도 1이나 도 2, 또는 후술하는 도 7과 8의 전자 장치/사용자 단말 외에도, 이로부터 변형 가능한 다양한 형태의 장치에 의해 수행될 수 있다.Fig. 2 shows an example of an electronic device implemented in a manner different from Fig. However, the speech recognition method disclosed in this document can be performed by various types of apparatus that can be modified from the electronic device / user terminal of Figs. 1 and 2, or FIGS. 7 and 8 described later.

도 2를 참조하면, 사용자 단말 101은 프로세서 121 및 메모리 141을 포함할 수 있다. 프로세서 121은 음성 인식을 수행하기 위한 ARS 엔진 131을 포함할 수 있다. 메모리 141은 ASR 엔진 131이 음성 인식을 수행하기 위해 사용하는 ASR 모델 143을 저장하고 있을 수 있다. 예를 들어, 각 구성이 수행하는 기능에 있어서, 도 2의 프로세서 121, ASR 엔진 131, 및 ASR 모델 143(또는 메모리 141)은 도 1의 컨트롤러 120, ASR 모듈 130, 및 ASR 모델 140에 각각 대응되는 것으로 이해될 수 있다. 이하에서 대응되거나 중복되는 내용의 설명은 생략한다.Referring to FIG. 2, the user terminal 101 may include a processor 121 and a memory 141. The processor 121 may include an ARS engine 131 for performing speech recognition. The memory 141 may store an ASR model 143 used by the ASR engine 131 to perform speech recognition. For example, the processor 121, the ASR engine 131, and the ASR model 143 (or the memory 141) of FIG. 2 correspond to the controller 120, the ASR module 130, and the ASR model 140 of FIG. 1 . &Lt; / RTI > Description of corresponding or redundant description will be omitted below.

사용자 단말 101은 음성 인식 모듈 111(예: 마이크 110)을 이용하여 사용자로부터 음성 입력을 획득할 수 있다. 프로세서 121은 획득된 음성 입력에 대하여 메모리 141에 저장된 ASR 모델 143을 이용하여 ASR을 수행할 수 있다. 또한, 사용자 단말 101은 통신 모듈 151을 통해 음성 입력을 서버 200으로 제공하고, 서버 200으로부터 음성 입력에 대응하는 음성 명령(예: 제2 동작 정보)을 수신할 수 있다. 사용자 단말 101은 ASR 엔진 131 및 서버 200에 의해 획득 가능한 음성 인식 결과를 디스플레이 181(또는 스피커)을 이용하여 출력할 수 있다.The user terminal 101 can acquire voice input from the user using the voice recognition module 111 (e.g., the microphone 110). The processor 121 can perform the ASR using the ASR model 143 stored in the memory 141 with respect to the acquired voice input. Also, the user terminal 101 can provide voice input to the server 200 via the communication module 151 and receive voice commands (e.g., second operation information) corresponding to voice input from the server 200. [ The user terminal 101 can output the speech recognition result obtainable by the ASR engine 131 and the server 200 using the display 181 (or the speaker).

이하에서는, 도 3 내지 도 6을 참조하여 사용자 단말 100을 기준으로 다양한 음성 인식 방법에 대하여 설명한다.
Hereinafter, various speech recognition methods will be described based on the user terminal 100 with reference to FIGS. 3 to 6. FIG.

도 3은 본 발명의 일 실시 예에 따른 음성 인식 수행 방법의 흐름도를 나타낸다.3 is a flowchart illustrating a method of performing speech recognition according to an embodiment of the present invention.

동작 301에서 사용자 단말 100은 마이크와 같은 음성 획득 모듈을 이용하여 사용자의 음성 입력을 획득할 수 있다. 이 동작은, 사용자에 의해 음성 인식과 연관된 특정 기능이나 어플리케이션이 실행된 상태에서 수행될 수 있다. 그러나 일부 실시 예에서, 사용자 단말 100의 음성 인식은 항상 동작 상태(always-on)(예: 마이크가 항상 활성화 상태)일 수 있고, 이 경우 동작 301은 사용자의 발화에 대하여 항상 수행될 수 있다. 혹은, 전술한 바와 같이 서로 다른 음성 인식 엔진에 의해, 특정 음성 입력(예: Hi, Galaxy)에 의해 ASR이 활성화되고, 후속하여 입력되는 음성 인식에 대한 ASR이 수행될 수 있다.In operation 301, the user terminal 100 may use a voice acquisition module, such as a microphone, to obtain a user's voice input. This operation can be performed in the state where a specific function or application associated with voice recognition is executed by the user. However, in some embodiments, speech recognition of the user terminal 100 may always be always-on (e.g., the microphone is always active), in which case the action 301 may always be performed for the user's utterance. Alternatively, the ASR may be activated by a specific speech input (e.g., Hi, Galaxy) by different speech recognition engines as described above, and an ASR for speech recognition that is subsequently input may be performed.

동작 303에서 사용자 단말 100은 음성 신호(또는 음성 신호의 적어도 일부)를 서버 200으로 전송할 수 있다. 장치 내부적으로는, 음성 신호(또는 음성 입력이 (디지털) 음성 신호로 변환되고, 음성 신호에 대해 전처리 적용된 오디오 신호)가 프로세서(예: 컨트롤러 120)에 의해 ASR 모듈 130으로 제공될 수 있다. 다시 말해서 동작 303에서 사용자 단말 100은 음성 인식을 수행할 수 있는 장치 내부 및 외부에 위치한 ASR 모듈로 인식 대상이 되는 음성 신호를 제공할 수 있다. 사용자 단말 100은 자체적인 음성 인식과 서버 200을 통한 음성 인식을 모두 활용할 수 있다.At operation 303, user terminal 100 may transmit a voice signal (or at least a portion of a voice signal) to server 200. Internally, a voice signal (or an audio signal whose speech input is converted into a (digital) voice signal and preprocessed for the voice signal) may be provided to the ASR module 130 by a processor (e.g., controller 120). In other words, in operation 303, the user terminal 100 can provide a voice signal to be recognized by the ASR module located inside and outside the device capable of performing voice recognition. The user terminal 100 can utilize both its own voice recognition and voice recognition through the server 200. [

동작 305에서, 사용자 단말 100에서 자체적인 음성 인식이 수행될 수 있다. 이 음성 인식은 ASR1으로 참조될 수 있다. 예를 들어, ASR 모듈 130은 ASR 모델 140을 이용하여 음성 입력에 대한 음성 인식을 수행할 수 있다. 예를 들어, ASR 모델 140은 음성 신호 중 적어도 일부에 대하여 ASR1을 수행할 수 있다. ASR1의 수행 결과, 음성 입력에 대한 수행 결과가 획득될 수 있다. 예를 들어, 사용자가 “내일 날씨”와 같이 음성 입력을 제공한 경우, 사용자 단말 100은 음성 입력에 대한 음성 인식 기능을 이용하여 “날씨 어플리케이션 실행, 내일 날씨 출력”과 같은 동작 정보를 결정할 수 있다. 또한 음성 인식의 수행 결과는 상기 동작 정보 외에도, 동작 정보에 대한 신뢰도를 포함할 수 있다. 예를 들어, ASR 모듈 130은 사용자의 발화를 분석한 결과 “내일 날씨”가 명확한 경우 95%의 신뢰도를 결정할 수 있지만, 발화의 분석 결과 “매일 날씨”인지 “내일 날씨”인지 명확하지 않은 경우, 60%의 신뢰도를 결정된 동작 정보에 대하여 부여할 수 있다.At operation 305, the user terminal 100 may perform its own speech recognition. This speech recognition can be referred to as ASR1. For example, the ASR module 130 can perform speech recognition for voice input using the ASR model 140. [ For example, the ASR model 140 may perform ASR1 for at least some of the voice signals. As a result of the execution of ASR1, the result of performing the voice input can be obtained. For example, when the user provides voice input such as " weather tomorrow ", the user terminal 100 can determine the operation information such as " weather application execution, tomorrow weather output " . In addition, the result of performing the speech recognition may include the reliability of the operation information in addition to the operation information. For example, the ASR module 130 may determine 95% confidence that "tomorrow weather" is clear as a result of analyzing a user utterance, but if it is not clear whether the result is "daily weather" or "tomorrow weather" 60% of the reliability can be given to the determined operation information.

동작 307에서, 프로세서는 신뢰도가 지정된 임계 값(threshold) 이상인지 판단할 수 있다. 예를 들어, ASR 모듈 130에 의해 결정된 동작 정보에 대한 신뢰도가 지정된 수준(예: 80%) 이상인 경우, 동작 309에서 사용자 단말 100은 ASR1, 즉 사용자 단말 100 자체적인 음성 인식 기능에 의해 인식된 음성 명령에 대응되는 동작을 수행할 수 있다. 상기 동작은, 예를 들어 프로세서에 의해 실행 가능한 적어도 하나의 기능, 적어도 하나의 어플리케이션, 또는 ASR의 수행 결과에 기초한 입력 중 적어도 하나를 포함할 수 있다.At operation 307, the processor may determine if the reliability is above a specified threshold. For example, if the confidence level for the action information determined by the ASR module 130 is greater than or equal to a predetermined level (e.g., 80%), then in operation 309, the user terminal 100 determines ASR1, It is possible to perform an operation corresponding to the command. The operation may include, for example, at least one of at least one function executable by the processor, at least one application, or an input based on the results of the performance of the ASR.

동작 309는 서버 200으로부터 음성 인식의 결과가 획득(예: 동작 315)되기 전에 수행될 수 있다. 다시 말해서, 사용자 단말 100에서 자체적으로 음성 인식을 수행한 결과 충분한 신뢰도를 갖는 음성 명령이 인식되는 경우, 사용자 단말은 서버 200으로부터 획득되는 추가적인 음성 인식 결과를 기다리지 않고 바로 해당 동작을 수행함으로써 사용자의 음성 입력에 대한 빠른 응답 시간을 확보할 수 있다.Operation 309 may be performed before the result of speech recognition from server 200 is acquired (e.g., operation 315). In other words, when a voice command having sufficient reliability is recognized as a result of performing voice recognition by itself in the user terminal 100, the user terminal immediately performs the corresponding operation without waiting for the additional voice recognition result obtained from the server 200, It is possible to secure a quick response time for the input.

동작 307에서 신뢰도가 임계 값 미만인 경우, 사용자 단말 100은 동작 315에서 서버 200으로부터 음성 인식 결과를 획득할 때까지 대기할 수 있다. 대기 동작 동안 사용자 단말 100은 적절한 메시지나 아이콘, 이미지 등을 디스플레이 하여 음성 입력에 대한 음성 인식이 수행되고 있음을 나타낼 수 있다.If the reliability is less than the threshold at operation 307, the user terminal 100 may wait until obtaining speech recognition results from the server 200 at operation 315. During the standby operation, the user terminal 100 may display an appropriate message, an icon, an image, etc. to indicate that voice recognition for voice input is being performed.

동작 311에서, 동작 303에서 서버 200으로 전송된 음성 신호에 대하여 서버에 의한 음성 인식이 수행될 수 있다. 이 음성 인식은 ASR2로 참조될 수 있다. 또한 동작 313에서 자연어 처리(NLP)가 수행될 수 있다. 예를 들어, 서버 200의 NLP 260에 의해 음성 입력 또는 ASR2의 인식 결과에 대하여 NLP가 수행될 수 있다. 일부 실시 예에서, 이 과정은 선택적으로 수행될 수도 있다.In operation 311, voice recognition by the server may be performed on the voice signal transmitted from the operation 303 to the server 200. This speech recognition can be referred to as ASR2. Natural language processing (NLP) may also be performed at operation 313. For example, the NLP 260 of the server 200 may perform NLP on the voice input or recognition result of ASR2. In some embodiments, this process may optionally be performed.

이제 동작 315에서 서버 200으로부터 ASR2 또는 ASR2와 NLP가 수행된 음성 인식 결과(예: 제2 동작 정보 및 제2 신뢰도)가 획득되면, 동작 317에서 ASR2에 의해 인식된 음성 명령(예: 제2 동작 정보)에 대응되는 동작이 수행될 수 있다. 동작 317의 경우, 음성 인식의 수행 외에도 동작 303에서 음성 신호를 전송하고 동작 315에서 음성 인식 결과를 획득하기 위한 시간이 추가적으로 필요하기 때문에, 동작 309에 의한 동작 수행보다 긴 응답 시간을 가질 수 있다. 그러나 동작 317에 의해 자체적으로 처리할 수 없거나, 혹은 자체적으로 처리가 가능하지만 낮은 신뢰도를 갖는 음성 인식에 대해서도 상대적으로 높은 신뢰도와 정확성을 갖는 동작이 수행될 수 있다.
If voice recognition results (e.g., second operation information and second reliability) on which ASR2 or ASR2 and NLP have been performed are obtained from server 200 at operation 315, then voice commands recognized by ASR2 at operation 317 Information) can be performed. In the case of operation 317, in addition to performing speech recognition, it may have a longer response time than performing an operation according to operation 309, because it requires additional time to transmit a speech signal at operation 303 and acquire a speech recognition result at operation 315. However, an operation with relatively high reliability and accuracy can be performed for voice recognition that can not be processed by itself or can be processed by itself in operation 317 but has low reliability.

도 4는 본 발명의 다른 실시 예에 따른 음성 인식 수행 방법의 흐름도를 나타낸다.4 is a flowchart illustrating a method of performing speech recognition according to another embodiment of the present invention.

도 4를 참조하면, 음성 획득 동작 401, 음성 신호 전송 동작 403, ASR1 동작 405, ASR2 동작 415, 및 NLP 동작 417은 도 3에서 전술한 동작 301, 303, 305, 311, 및 313에 각각 대응되므로 그 설명을 생략한다.4, the voice acquisition operation 401, the voice signal transmission operation 403, the ASR1 operation 405, the ASR2 operation 415, and the NLP operation 417 correspond to the operations 301, 303, 305, 311, and 313 described above in FIG. 3 The description thereof will be omitted.

도 4를 참조하여 설명되는 음성 인식 수행 방법은 두 개의 임계 값을 기준으로 수행된다. 제1 임계 값 및 제1 임계 값보다 낮은 신뢰도에 해당하는 제2 임계 값을 기준으로, 동작 405의 ASR1의 수행 결과의 신뢰도가 (1) 제1 임계값 이상인 경우, (2) 제2 임계값 미만인 경우, 및 (3) 제1 임계 값과 제2 임계 값 사이에 있는 경우, 서로 다른 동작(예: 각각 동작 409, 413, 421)이 수행될 수 있다.The speech recognition performance method described with reference to FIG. 4 is performed based on two threshold values. (1) when the reliability of the result of the ASR1 operation of the operation 405 is equal to or greater than the first threshold value, based on the first threshold value and the second threshold value corresponding to the reliability lower than the first threshold value, (2) And (3) if they are between the first threshold and the second threshold, different actions (e.g., actions 409, 413, 421, respectively) may be performed.

동작 407에서 신뢰도가 제1 임계 값 이상인 경우, 사용자 단말 100은 동작 409에서 ASR1의 수행 결과에 대응되는 동작을 수행할 수 있다. 만약 동작 407에서 신뢰도가 제1 임계 값 미만인 경우, 동작 411에서 신뢰도가 제2 임계 값 보다 낮은지 여부를 판단할 수 있다.If the reliability at operation 407 is equal to or greater than the first threshold, the user terminal 100 may perform an operation corresponding to the result of the ASR1 operation at operation 409. [ If the reliability at operation 407 is less than the first threshold, then operation 411 may determine whether the reliability is below a second threshold.

동작 411에서, 신뢰도가 제2 임계 값 미만인 경우, 사용자 단말 100은 상기 신뢰도에 대한 피드백을 제공할 수 있다. 이 피드백은 사용자의 음성 입력이 전자 장치에 의해 정상적으로 인식되지 않았거나, 인식은 되었으나 그 인식 결과를 신뢰할 수 없음을 나타내는 메시지나 오디오 출력을 포함할 수 있다. 예를 들어, 사용자 단말 100은 “음성이 인식되지 않았습니다. 다시 말씀해주세요.”와 같은 안내 메시지를 디스플레이 하거나 스피커를 통해 출력할 수 있다. 또는, 사용자 단말 100은 예를 들어 “XXX라고 말씀하신 것이 맞습니까?”와 같은 피드백을 통해 상대적으로 인식하기 용이한 음성 입력(예: 네, 아니, 아니오, 그럴 리가, 전혀 등)을 유도하여 낮은 신뢰도를 갖는 인식 결과에 대한 정확성을 확인할 수 있다.In operation 411, if the reliability is below a second threshold, the user terminal 100 may provide feedback on the reliability. This feedback may include a message or audio output indicating that the user's voice input was not normally recognized by the electronic device or that the recognition result was not reliable. For example, the user terminal 100 may " Please tell me again "can be displayed or output via the speaker. Alternatively, the user terminal 100 may derive a relatively easy to recognize speech input (e.g. yes, no, no, no, no, etc.) through feedback such as, for example, The accuracy of recognition results with low reliability can be confirmed.

동작 413에서 피드백이 제공되면, 이후 시간 경과에 따라 동작 419에서 음성 인식 결과가 획득되더라도 동작 421이 수행되지 않을 수 있다. 피드백에 의해 사용자로부터 새로운 음성 입력이 발생할 수 있고, 이 경우 이전에 발생한 음성 입력에 대한 동작이 수행되는 것은 적절하지 않기 때문이다. 그러나 일부 실시 예에서, 동작 413의 피드백에도 불구하고 사용자로부터 일정 시간 동안 어떠한 추가 입력도 발생하지 않고, 동작 419에서 서버 200으로부터 수신되는 음성 인식 결과(예: 제2 동작 정보 및 제2 신뢰도)가 지정된 조건을 만족시키는 경우(예: 제2 신뢰도가 제1 임계 값 또는 임의의 제3 임계 값 이상인 경우)에 동작 421이 동작 413 이후에 수행될 수 있다.If feedback is provided at operation 413, then operation 421 may not be performed even though a speech recognition result is obtained at operation 419 over time thereafter. A new voice input may be generated from the user by feedback, and in this case, it is not appropriate that the operation for the previously generated voice input is performed. However, in some embodiments, no additional input is generated from the user for a period of time despite the feedback of action 413, and the speech recognition results (e.g., second action information and second confidence) received from server 200 at action 419 Operation 421 may be performed after operation 413 if the specified condition is met (e.g., the second reliability is greater than or equal to the first threshold or any third threshold).

동작 411에서, 동작 405에서 획득되는 신뢰도가 제2 임계값 이상인 경우(다시 말해서, 신뢰도가 상기 제1 임계 값과 상기 제2 임계 값 사이인 경우), 사용자 단말 100은 동작 419에서 서버 200으로부터 음성 인식 결과를 획득할 수 있다. 동작 421에서 사용자 단말 100은 ASR2에 의해 인식된 음성 명령(제2 동작 정보)에 대응되는 동작을 수행할 수 있다.At operation 411, if the reliability obtained at operation 405 is greater than or equal to a second threshold (i.e., if the reliability is between the first threshold and the second threshold) The recognition result can be obtained. In operation 421, the user terminal 100 may perform an operation corresponding to the voice command (second operation information) recognized by the ASR2.

도 4의 실시 예에서는, 사용자 단말 100에 의한 음성 인식 결과의 신뢰도를, 사용 가능한 수준과 사용 불가능한 수준, 및 서버 200의 ASR 결과를 참조하여 사용 가능한 수준으로 구분하여 신뢰도에 따라 적절한 동작이 수행되도록 할 수 있다. 특히 신뢰도가 너무 낮은 경우에는 사용자 단말 100은 서버 200으로부터 결과 수신 여부와 관계없이 피드백을 제공하여 사용자에게 재입력을 유도함으로써, 응답 대기 시간이 한참 경과한 후에 사용자에게 “인식하지 못하였습니다”와 같은 메시지가 제공되는 것을 방지할 수 있다.
In the embodiment of FIG. 4, the reliability of the speech recognition result by the user terminal 100 is divided into usable levels by referring to the usable level and the unusable level and the ASR result of the server 200 so that proper operation is performed according to the reliability can do. In particular, when the reliability is too low, the user terminal 100 provides feedback regardless of whether the result is received from the server 200 to induce re-input to the user, A message can be prevented from being provided.

도 5는 본 발명의 일 실시 예에 따른 임계 값을 업데이트하는 방법의 흐름도를 나타낸다.5 shows a flow diagram of a method for updating a threshold according to an embodiment of the present invention.

도 5를 참조하면, 음성 획득 동작 501, 음성 신호 전송 동작 503, ASR1 동작 505, ASR2 동작 511, 및 NLP 동작 513은 도 3에서 전술한 동작 301, 303, 305, 311, 및 313에 각각 대응되므로 그 설명을 생략한다.5, the voice acquisition operation 501, the voice signal transmission operation 503, the ASR1 operation 505, the ASR2 operation 511, and the NLP operation 513 correspond to the operations 301, 303, 305, 311, and 313 described above in FIG. 3 The description thereof will be omitted.

동작 507에서 ASR1의 수행 결과에 대한 신뢰도가 임계 값(예: 제1 임계 값) 이상인 경우 동작 509로 진행하여 ASR1에 의해 인식된 음성 명령(예: 제1 동작 정보)에 대응되는 동작이 수행될 수 있다. 만약 동작 507에서 ASR1의 수행 결과에 대한 신뢰도가 임계 값 이하인 경우, 도 3의 동작 315 이후의 과정, 또는 도 4의 동작 411 이후의 과정이 수행될 수 있다.If the reliability for the result of the execution of ASR1 is equal to or greater than the threshold value (e.g., the first threshold value) at operation 507, the operation proceeds to operation 509 to perform an operation corresponding to the voice command recognized by ASR1 (e.g., first operation information) . If the reliability of the result of the ASR1 is less than or equal to the threshold value in operation 507, the operation after operation 315 in FIG. 3 or the operation after operation 411 in FIG. 4 may be performed.

도 5의 실시 예에서, 동작 509가 수행된 이후에도 프로세스는 종료되지 않고 동작 515 내지 동작 517이 수행될 수 있다. 동작 515에서, 사용자 단말 100은 서버 200으로부터 음성 인식 결과를 획득할 수 있다. 예를 들어, 사용자 단말 100은 동작 503에서 전송된 음성 신호에 대하여 ASR2의 수행 결과인 제2 동작 정보 및 제2 신뢰도를 획득할 수 있다.In the embodiment of FIG. 5, operations 515 through 517 may be performed without the process terminating even after operation 509 is performed. At operation 515, the user terminal 100 may obtain a speech recognition result from the server 200. [ For example, the user terminal 100 may obtain the second reliability information and the second operation information that is the result of the performance of the ASR2 with respect to the voice signal transmitted in operation 503. [

동작 517에서 사용자 단말 100은 ASR1과 ASR2의 인식 결과를 비교할 수 있다. 예를 들어, 사용자 단말은 ASR1과 ASR2의 인식 결과가 서로 동일한지(또는 대응되는지) 혹은 서로 상이한지 여부를 판단할 수 있다. 예를 들어, ASR1의 인식 결과 “내일 날씨”와 같이 음성이 인식되고, ASR2의 인식 결과 “내일 날씨는?”과 같이 음성이 인식된 경우, 두 경우 모두 동작 정보는 “날씨 어플리케이션 실행, 내일 날씨 출력”을 포함할 수 있다. 이와 같은 경우 ASR1과 ASR2의 인식 결과는 서로 대응되는 것으로 이해될 수 있다. 그러나 음성 인식 결과 서로 다른 동작을 수행하게 되는 경우, 둘(또는 그 이상의) 음성 인식 결과는 서로 대응되지 않는 것으로 판단될 수 있다.At operation 517, the user terminal 100 may compare the recognition results of ASR1 and ASR2. For example, the user terminal can determine whether the recognition results of ASR1 and ASR2 are the same (or correspond) or different from each other. For example, when a voice is recognized as in the case of "recognition of tomorrow" as a result of recognition of ASR1, and voice is recognized as a result of recognition of ASR2 such as "what is the weather tomorrow?", In both cases, the operation information is " Output " In such a case, it can be understood that the recognition results of ASR1 and ASR2 correspond to each other. However, when different operations are performed as a result of speech recognition, it can be determined that two (or more) speech recognition results do not correspond to each other.

동작 519에서 사용자 단말 100은 자체적인 ASR 수행 결과와 서버로부터 수신되는 음성 명령을 비교하여 임계 값을 변경할 수 있다. 예를 들어, 사용자 단말 100은 제1 동작 정보와 제2 동작 정보가 서로 동일하거나 대응되는 음성 명령을 포함하는 경우, 상기 제1 임계 값을 감소시킬 수 있다. 예를 들어, 어떤 음성 입력에 대하여 예전에는 80% 이상의 신뢰도를 보여야 서버 200의 응답을 기다리지 않고 사용자 단말 100의 자체적인 음성 인식의 수행 결과를 활용하였다면, 임계 값 업데이트를 통해 70% 이상의 신뢰도만 획득되더라도 사용자 단말 100의 자체적인 음성 인식 결과를 활용할 수 있다. 임계 값 업데이트는 사용자가 음성 인식 기능을 사용할 때마다 반복되어 수행될 수 있고, 결과적으로 사용자가 자주 사용하는 음성 인식에 대하여 낮은 임계 값이 설정되므로 빠른 응답 시간을 가져올 수 있다.In operation 519, the user terminal 100 may change the threshold value by comparing the result of performing its own ASR with the voice command received from the server. For example, the user terminal 100 may reduce the first threshold if the first operation information and the second operation information include the same or corresponding voice commands. For example, if a voice input has a reliability of 80% or more in the past, if the result of performing the speech recognition of the user terminal 100 itself is utilized without waiting for a response from the server 200, only the reliability of 70% The speech recognition result of the user terminal 100 itself can be utilized. The threshold update can be repeatedly performed each time the user uses the voice recognition function, and as a result, a low threshold value is set for the voice recognition frequently used by the user, so that a quick response time can be obtained.

그러나 ASR1의 수행 결과와 ASR2의 수행 결과가 서로 다른 경우, 상기 임계 값은 증가될 수 있다. 일부 실시 예에서, 임계 값 업데이트 동작은 지정된 조건이 지정된 횟수만큼 누적된 이후에 발생할 수 있다. 예를 들어, 어떤 음성 입력에 대하여 ASR1의 수행 결과와 ASR2의 수행 결과가 일치하는 횟수가 5회 이상 발생하는 경우 임계 값이 업데이트(낮게 조절)될 수 있다.
However, if the execution result of ASR1 differs from the execution result of ASR2, the threshold value may be increased. In some embodiments, the threshold update operation may occur after the specified condition is accumulated for a specified number of times. For example, the threshold value may be updated (lowered) when the number of times the ASR1 execution result and the ASR2 execution result match for any voice input occurs five times or more.

도 6은 본 발명의 일 실시 예에 따른 음성 인식 모델을 업데이트하는 방법의 흐름도를 나타낸다.6 is a flowchart of a method for updating a speech recognition model according to an embodiment of the present invention.

도 6을 참조하면, 음성 획득 동작 601, 음성 신호 전송 동작 603, ASR1 동작 605, ASR2 동작 611, 및 NLP 동작 613은 도 3에서 전술한 동작 301, 303, 305, 311, 및 313에 각각 대응되므로 그 설명을 생략한다.6, the voice acquisition operation 601, the voice signal transmission operation 603, the ASR1 operation 605, the ASR2 operation 611, and the NLP operation 613 correspond to the operations 301, 303, 305, 311, and 313 described above in FIG. 3 The description thereof will be omitted.

동작 607에서 ASR1의 수행 결과에 대한 신뢰도가 임계 값(예: 제1 임계 값) 이하인 경우, 도 3의 동작 309, 도 4의 동작 409, 도 5의 동작 509 이후의 동작이 수행될 수 있다.If the reliability for the result of the execution of ASR1 is less than or equal to the threshold value (e.g., the first threshold value) in operation 607, operations after operation 309 in FIG. 3, operation 409 in FIG. 4, and operation 509 in FIG. 5 may be performed.

동작 607에서 ASR1의 수행 결과에 대한 신뢰도가 임계 값보다 낮은 경우, 동작 609에서 사용자 단말 100은 서버 200으로부터 음성 인식 결과를 획득하고, 동작 615에서 ASR2에 의해 인식된 음성 명령에 대응되는 동작을 수행할 수 있다. 동작 609와 동작 615는 도 3의 동작 315와 동작 317, 또는 도 4의 동작 419와 동작 421에 대응될 수 있다.If the confidence in the result of the performance of ASR1 at operation 607 is less than the threshold, at operation 609, the user terminal 100 obtains a speech recognition result from the server 200 and performs an operation corresponding to the voice command recognized by ASR2 at operation 615 can do. Operations 609 and 615 may correspond to operations 315 and 317 in FIG. 3, or operations 419 and 421 in FIG.

동작 617에서 사용자 단말 100은 ASR1과 ASR2의 음성 인식 결과를 비교할 수 있다. 동작 617은 도 5의 동작 517에 대응될 수 있다.At operation 617, the user terminal 100 may compare the speech recognition results of ASR1 and ASR2. Operation 617 may correspond to operation 517 of FIG.

동작 619에서, 사용자 단말 100은 동작 517의 비교 결과에 기초하여 사용자 단말 100의 음성 인식 모델(예: ASR 모델 140)을 업데이트할 수 있다. 예를 들어, 사용자 단말 100은 음성 입력에 대한 ASR2의 음성 인식 결과(예: 제2 동작 정보, 또는 제2 동작 정보와 제2 신뢰도)를 음성 인식 모델에 추가할 수 있다. 예를 들어, 사용자 단말 100은 제1 동작 정보와 제2 동작 정보가 서로 대응되지 않는 경우, 제1 신뢰도와 제2 신뢰도에 기초하여(예: 제2 신뢰도가 제1 신뢰도보다 더 높은 경우) 상기 제2 동작 정보(및 상기 제2 신뢰도)를 상기 제1 음성 인식에 이용되는 음성 인식 모델에 추가할 수 있다. 도 5의 실시 예에서와 유사하게, 음성 인식 모델 업데이트 동작은 지정된 조건이 지정된 횟수만큼 누적된 이후에 발생할 수 있다.
At operation 619, user terminal 100 may update the speech recognition model (e.g., ASR model 140) of user terminal 100 based on the result of the comparison of operation 517. For example, the user terminal 100 may add a speech recognition result (e.g., second operation information, or second operation information and second reliability) of ASR2 to speech input to the speech recognition model. For example, when the first operation information and the second operation information do not correspond to each other, the user terminal 100 determines, based on the first reliability and the second reliability (e.g., when the second reliability is higher than the first reliability) The second operation information (and the second reliability) can be added to the speech recognition model used for the first speech recognition. Similar to the embodiment of Fig. 5, the speech recognition model update operation may occur after the specified condition is accumulated for the specified number of times.

도 7은 본 발명의 일 실시 예에 따른 네트워크 환경 내의 전자 장치를 나타낸다.Figure 7 illustrates an electronic device in a network environment in accordance with one embodiment of the present invention.

도 7을 참조하여, 다양한 실시 예에서의 네트워크 환경 700 내의 전자 장치 701이 기재된다. 전자 장치 701은 버스 710, 프로세서 720, 메모리 730, 입출력 인터페이스 750, 디스플레이 760, 및 통신 인터페이스 770을 포함할 수 있다. 어떤 실시 예에서는, 전자 장치 701은, 구성요소들 중 적어도 하나를 생략하거나 다른 구성 요소를 추가적으로 구비할 수 있다.Referring to FIG. 7, an electronic device 701 in a network environment 700 in various embodiments is described. The electronic device 701 may include a bus 710, a processor 720, a memory 730, an input / output interface 750, a display 760, and a communication interface 770. In some embodiments, the electronic device 701 may omit at least one of the components or additionally include other components.

버스 710은, 예를 들면, 구성요소들 710-170을 서로 연결하고, 구성요소들 간의 통신(예: 제어 메시지 및/또는 데이터)을 전달하는 회로를 포함할 수 있다.The bus 710 may include circuitry, for example, to connect components 710-170 to each other and to communicate communications (e.g., control messages and / or data) between the components.

프로세서 720은, 중앙처리장치(CPU), AP(application processor), 또는 CP(communication processor) 중 하나 또는 그 이상을 포함할 수 있다. 프로세서 720은, 예를 들면, 전자 장치 701의 적어도 하나의 다른 구성요소들의 제어 및/또는 통신에 관한 연산이나 데이터 처리를 실행할 수 있다.Processor 720 may include one or more of a central processing unit (CPU), an application processor (AP), or a communication processor (CP). Processor 720 may perform computations or data processing related to, for example, control and / or communication of at least one other component of electronic device 701. [

메모리 730은, 휘발성 및/또는 비 휘발성 메모리를 포함할 수 있다. 메모리 730은, 예를 들면, 전자 장치 701의 적어도 하나의 다른 구성요소에 관계된 명령 또는 데이터를 저장할 수 있다. 한 실시 예에 따르면, 메모리 730은 소프트웨어 및/또는 프로그램 740을 저장할 수 있다. 프로그램 740은, 예를 들면, 커널 741, 미들웨어 743, API(application programming interface) 745, 및/또는 어플리케이션 프로그램(또는 "어플리케이션") 747 등을 포함할 수 있다. 커널 741, 미들웨어 743, 또는 API 745의 적어도 일부는, 운영 시스템(operating system, OS)이라고 불릴 수 있다.Memory 730 may include volatile and / or nonvolatile memory. Memory 730 may, for example, store instructions or data related to at least one other component of electronic device 701. [ According to one embodiment, memory 730 may store software and / or program 740. The program 740 may include, for example, a kernel 741, a middleware 743, an application programming interface (API) 745, and / or an application program (or "application") 747. At least a portion of the kernel 741, middleware 743, or API 745 may be referred to as an operating system (OS).

커널 741은, 예를 들면, 다른 프로그램들(예: 미들웨어 743, API 745, 또는 어플리케이션 프로그램 747)에 구현된 동작 또는 기능을 실행하는 데 사용되는 시스템 리소스들(예: 버스 710, 프로세서 720, 또는 메모리 730 등)을 제어 또는 관리할 수 있다. 또한, 커널 741은 미들웨어 743, API 745, 또는 어플리케이션 프로그램 747에서 전자 장치 701의 개별 구성요소에 접근함으로써, 시스템 리소스들을 제어 또는 관리할 수 있는 인터페이스를 제공할 수 있다.The kernel 741 may include, for example, system resources (e.g., bus 710, processor 720, or 710) used to execute an operation or function implemented in other programs (e.g., middleware 743, API 745, or application program 747) Memory 730, etc.). In addition, the kernel 741 may provide an interface that can control or manage system resources by accessing individual components of the electronic device 701 in the middleware 743, API 745, or application program 747.

미들웨어 743은, 예를 들면, API 745 또는 어플리케이션 프로그램 747이 커널 741과 통신하여 데이터를 주고받을 수 있도록 중개 역할을 수행할 수 있다. The middleware 743, for example, can perform an intermediary role so that the API 745 or the application program 747 can communicate with the kernel 741 to exchange data.

또한, 미들웨어 743은 어플리케이션 프로그램 747로부터 수신된 하나 이상의 작업 요청들을 우선 순위에 따라 처리할 수 있다. 예를 들면, 미들웨어 743은 어플리케이션 프로그램 747 중 적어도 하나에 전자 장치 701의 시스템 리소스(예: 버스 710, 프로세서 720, 또는 메모리 730 등)를 사용할 수 있는 우선 순위를 부여할 수 있다. 예컨대, 미들웨어 743은 상기 적어도 하나에 부여된 우선 순위에 따라 상기 하나 이상의 작업 요청들을 처리함으로써, 상기 하나 이상의 작업 요청들에 대한 스케줄링 또는 로드 밸런싱 등을 수행할 수 있다.In addition, the middleware 743 can process one or more task requests received from the application program 747 according to the priority order. For example, the middleware 743 may prioritize the use of the system resources (e.g., bus 710, processor 720, or memory 730, etc.) of the electronic device 701 in at least one of the application programs 747. For example, the middleware 743 can perform the scheduling or load balancing of the one or more task requests by processing the one or more task requests according to the priority assigned to the at least one task.

API 745는, 예를 들면, 어플리케이션 747이 상기 커널 741 또는 미들웨어 743에서 제공되는 기능을 제어하기 위한 인터페이스로, 예를 들면, 파일 제어, 창 제어, 화상 처리, 또는 문자 제어 등을 위한 적어도 하나의 인터페이스 또는 함수(예: 명령어)를 포함할 수 있다.The API 745 is, for example, an interface for an application 747 to control the functions provided by the kernel 741 or the middleware 743, and includes at least one interface for controlling, for example, file control, window control, image processing, Interfaces or functions (e.g., commands).

입출력 인터페이스 750은, 예를 들면, 사용자 또는 다른 외부 기기로부터 입력된 명령 또는 데이터를 전자 장치 701의 다른 구성요소(들)에 전달할 수 있는 인터페이스의 역할을 할 수 있다. 또한, 입출력 인터페이스 750은 전자 장치 701의 다른 구성요소(들)로부터 수신된 명령 또는 데이터를 사용자 또는 다른 외부 기기로 출력할 수 있다.The input / output interface 750 may serve as an interface through which commands or data input from, for example, a user or other external device can be communicated to another component (s) of the electronic device 701. Output interface 750 can output commands or data received from other component (s) of the electronic device 701 to a user or other external device.

디스플레이 760은, 예를 들면, 액정 디스플레이(LCD), 발광 다이오드(LED) 디스플레이, 유기 발광 다이오드(OLED) 디스플레이, 또는 마이크로 전자기계 시스템(microelectromechanical systems, MEMS) 디스플레이, 또는 전자 종이(electronic paper) 디스플레이를 포함할 수 있다. 디스플레이 760은, 예를 들면, 사용자에게 각종 컨텐츠(예: 텍스트, 이미지, 비디오, 아이콘, 또는 심볼 등)을 표시할 수 있다. 디스플레이 760은, 터치 스크린을 포함할 수 있으며, 예를 들면, 전자 펜 또는 사용자의 신체의 일부를 이용한 터치, 제스처, 근접, 또는 호버링(hovering) 입력을 수신할 수 있다.The display 760 can be, for example, a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, or a microelectromechanical systems (MEMS) . &Lt; / RTI > Display 760 may display various content (e.g., text, image, video, icon, or symbol, etc.) to a user, for example. Display 760 may include a touch screen and may receive touch, gesture, proximity, or hovering input using, for example, an electronic pen or a portion of the user's body.

통신 인터페이스 770은, 예를 들면, 전자 장치 701과 외부 장치(예: 제1 외부 전자 장치 702, 제2 외부 전자 장치 704, 또는 서버 706) 간의 통신을 설정할 수 있다. 예를 들면, 통신 인터페이스 770은 무선 통신 또는 유선 통신을 통해서 네트워크 762에 연결되어 상기 외부 장치 (예: 제2 외부 전자 장치 704 또는 서버 706)와 통신할 수 있다.Communication interface 770 may establish communication between, for example, electronic device 701 and an external device (e.g., first external electronic device 702, second external electronic device 704, or server 706). For example, the communication interface 770 may be connected to the network 762 via wireless or wired communication to communicate with the external device (e.g., the second external electronic device 704 or the server 706).

무선 통신은, 예를 들면 셀룰러 통신 프로토콜로서, 예를 들면 LTE, LTE-A, CDMA, WCDMA, UMTS, WiBro, 또는 GSM 등 중 적어도 하나를 사용할 수 있다. 또한 무선 통신은, 예를 들면, 근거리 통신 764를 포함할 수 있다. 근거리 통신 764는, 예를 들면, Wi-Fi, Bluetooth, NFC(near field communication), 또는 GPS(global positioning system) 등 중 적어도 하나를 포함할 수 있다. 유선 통신은, 예를 들면, USB(universal serial bus), HDMI(high definition multimedia interface), RS-232(recommended standard 832), 또는 POTS(plain old telephone service) 등 중 적어도 하나를 포함할 수 있다. 네트워크 762는 통신 네트워크(telecommunications network), 예를 들면, 컴퓨터 네트워크(computer network)(예: LAN 또는 WAN), 인터넷, 또는 전화 망(telephone network) 중 적어도 하나를 포함할 수 있다.The wireless communication may use at least one of, for example, LTE, LTE-A, CDMA, WCDMA, UMTS, WiBro, or GSM as the cellular communication protocol. The wireless communication may also include, for example, local communication 764. The near field communication 764 may include at least one of, for example, Wi-Fi, Bluetooth, near field communication (NFC), or global positioning system (GPS) The wired communication may include at least one of, for example, a universal serial bus (USB), a high definition multimedia interface (HDMI), a recommended standard 832 (RS-232), or plain old telephone service (POTS). Network 762 may include at least one of a telecommunications network, e.g., a computer network (e.g., LAN or WAN), the Internet, or a telephone network.

제1 및 제2 외부 전자 장치 702, 704 각각은 전자 장치 701과 동일한 또는 다른 종류의 장치일 수 있다. 한 실시 예에 따르면, 서버 706은 하나 또는 그 이상의 서버들의 그룹을 포함할 수 있다. 다양한 실시 예에 따르면, 전자 장치 701에서 실행되는 동작들의 전부 또는 일부는 다른 하나 또는 복수의 전자 장치(예: 전자 장치 702, 704, 또는 서버 706)에서 실행될 수 있다. 한 실시 예에 따르면, 전자 장치 701이 어떤 기능이나 서비스를 자동으로 또는 요청에 의하여 수행해야 할 경우에, 전자 장치 701은 기능 또는 서비스를 자체적으로 실행시키는 대신에 또는 추가적으로, 그와 연관된 적어도 일부 기능을 다른 장치(예: 전자 장치 702, 704, 또는 서버 706)에게 요청할 수 있다. 다른 전자 장치(예: 전자 장치 702, 704, 또는 서버 706)는 요청된 기능 또는 추가 기능을 실행하고, 그 결과를 전자 장치 701로 전달할 수 있다. 전자 장치 701은 수신된 결과를 그대로 또는 추가적으로 처리하여 요청된 기능이나 서비스를 제공할 수 있다. 이를 위하여, 예를 들면, 클라우드 컴퓨팅, 분산 컴퓨팅, 또는 클라이언트-서버 컴퓨팅 기술이 이용될 수 있다.
Each of the first and second external electronic devices 702, 704 may be the same or a different kind of device as the electronic device 701. According to one embodiment, the server 706 may include one or more groups of servers. According to various embodiments, all or a portion of the operations performed on the electronic device 701 may be performed on one or more other electronic devices (e.g., electronic device 702, 704, or server 706). According to one embodiment, in the case where the electronic device 701 has to perform some function or service automatically or upon request, the electronic device 701 may perform at least some functions associated therewith (E.g., electronic device 702, 704, or server 706). Other electronic devices (e.g., electronic device 702, 704, or server 706) may execute the requested function or additional function and deliver the result to electronic device 701. The electronic device 701 can directly or additionally process the received result to provide the requested function or service. For this purpose, for example, cloud computing, distributed computing, or client-server computing technology may be used.

도 8은 본 발명의 일 실시 예에 따른 전자 장치의 블록 도를 나타낸다.8 shows a block diagram of an electronic device according to an embodiment of the present invention.

도 8을 참조하면, 전자 장치 801은, 예를 들면, 도 7에 도시된 전자 장치 701의 전체 또는 일부를 포함할 수 있다. 전자 장치 801은 하나 이상의 프로세서(예: 어플리케이션 프로세서(AP)) 810, 통신 모듈 820, 가입자 식별 모듈 824, 메모리 830, 센서 모듈 840, 입력 장치 850, 디스플레이 860, 인터페이스 870, 오디오 모듈 880, 카메라 모듈 891, 전력 관리 모듈 895, 배터리 896, 인디케이터 897, 및 모터 898을 포함할 수 있다. Referring to Fig. 8, the electronic device 801 may include all or part of the electronic device 701 shown in Fig. 7, for example. The electronic device 801 includes one or more processors (e.g., an application processor (AP)) 810, a communication module 820, a subscriber identification module 824, a memory 830, a sensor module 840, an input device 850, a display 860, an interface 870, 891, a power management module 895, a battery 896, an indicator 897, and a motor 898.

프로세서 810은, 예를 들면, 운영 체제 또는 응용 프로그램을 구동하여 프로세서 810에 연결된 다수의 하드웨어 또는 소프트웨어 구성요소들을 제어할 수 있고, 각종 데이터 처리 및 연산을 수행할 수 있다. 프로세서 810은, 예를 들면, SoC(system on chip)로 구현될 수 있다. 한 실시 예에 따르면, 프로세서 810은 GPU(graphic processing unit) 및/또는 이미지 신호 프로세서(image signal processor)를 더 포함할 수 있다. 프로세서 810은 도 8에 도시된 구성요소들 중 적어도 일부(예: 셀룰러 모듈 821)를 포함할 수도 있다. 프로세서 810은 다른 구성요소들(예: 비 휘발성 메모리) 중 적어도 하나로부터 수신된 명령 또는 데이터를 휘발성 메모리에 로드(load)하여 처리하고, 다양한 데이터를 비 휘발성 메모리에 저장(store)할 수 있다.The processor 810 may, for example, operate an operating system or an application program to control a plurality of hardware or software components coupled to the processor 810, and may perform various data processing and operations. The processor 810 may be implemented as a system on chip (SoC), for example. According to one embodiment, the processor 810 may further include a graphics processing unit (GPU) and / or an image signal processor. Processor 810 may include at least some of the components shown in FIG. 8 (e.g., cellular module 821). Processor 810 may load and process instructions or data received from at least one of the other components (e.g., non-volatile memory) into volatile memory and store the various data in non-volatile memory.

통신 모듈 820은, 도 7의 상기 통신 인터페이스 770과 동일 또는 유사한 구성을 가질 수 있다. 통신 모듈 820은, 예를 들면, 셀룰러 모듈 821, Wi-Fi 모듈 823, 블루투스 모듈 825, GPS 모듈 827, NFC 모듈 828 및 RF(radio frequency) 모듈 829를 포함할 수 있다.The communication module 820 may have the same or similar configuration as the communication interface 770 of FIG. The communication module 820 may include, for example, a cellular module 821, a Wi-Fi module 823, a Bluetooth module 825, a GPS module 827, an NFC module 828 and a radio frequency (RF) module 829.

셀룰러 모듈 821은, 예를 들면, 통신망을 통해서 음성 통화, 영상 통화, 문자 서비스, 또는 인터넷 서비스 등을 제공할 수 있다. 한 실시 예에 따르면, 셀룰러 모듈 821은 가입자 식별 모듈(예: SIM 카드) 824을 이용하여 통신 네트워크 내에서 전자 장치 801의 구별 및 인증을 수행할 수 있다. 한 실시 예에 따르면, 셀룰러 모듈 821은 프로세서 810이 제공할 수 있는 기능 중 적어도 일부 기능을 수행할 수 있다. 한 실시 예에 따르면, 셀룰러 모듈 821은 커뮤니케이션 프로세서(CP)를 포함할 수 있다.The cellular module 821 can provide voice calls, video calls, text services, or Internet services, for example, over a communication network. According to one embodiment, the cellular module 821 can perform identification and authentication of the electronic device 801 within the communication network using a subscriber identity module (e.g., SIM card) 824. According to one embodiment, the cellular module 821 may perform at least some of the functions that the processor 810 may provide. According to one embodiment, the cellular module 821 may include a communications processor (CP).

Wi-Fi 모듈 823, 블루투스 모듈 825, GPS 모듈 827 또는 NFC 모듈 828 각각은, 예를 들면, 해당하는 모듈을 통해서 송수신되는 데이터를 처리하기 위한 프로세서를 포함할 수 있다. 어떤 실시 예에 따르면, 셀룰러 모듈 821, Wi-Fi 모듈 823, 블루투스 모듈 825, GPS 모듈 827 또는 NFC 모듈 828 중 적어도 일부(예: 두 개 이상)는 하나의 IC(integrated chip) 또는 IC 패키지 내에 포함될 수 있다.Each of the Wi-Fi module 823, the Bluetooth module 825, the GPS module 827, or the NFC module 828 may include, for example, a processor for processing data transmitted and received through a corresponding module. According to some embodiments, at least some (e.g., two or more) of the cellular module 821, the Wi-Fi module 823, the Bluetooth module 825, the GPS module 827 or the NFC module 828 may be included in one integrated chip (IC) .

RF 모듈 829는, 예를 들면, 통신 신호(예: RF 신호)를 송수신할 수 있다. RF 모듈 829는, 예를 들면, 트랜시버(transceiver), PAM(power amp module), 주파수 필터(frequency filter), LNA(low noise amplifier), 또는 안테나 등을 포함할 수 있다. 다른 실시 예에 따르면, 셀룰러 모듈 821, Wi-Fi 모듈 823, 블루투스 모듈 825, GPS 모듈 827 또는 NFC 모듈 828 중 적어도 하나는 별개의 RF 모듈을 통하여 RF 신호를 송수신할 수 있다. The RF module 829 can, for example, send and receive communication signals (e.g., RF signals). The RF module 829 may include, for example, a transceiver, a power amplifier module (PAM), a frequency filter, a low noise amplifier (LNA), or an antenna. According to another embodiment, at least one of the cellular module 821, the Wi-Fi module 823, the Bluetooth module 825, the GPS module 827, or the NFC module 828 can transmit and receive RF signals through separate RF modules.

가입자 식별 모듈 824는, 예를 들면, 가입자 식별 모듈을 포함하는 카드 및/또는 내장 SIM(embedded SIM)을 포함할 수 있으며, 고유한 식별 정보(예: ICCID (integrated circuit card identifier)) 또는 가입자 정보(예: IMSI (international mobile subscriber identity))를 포함할 수 있다. The subscriber identity module 824 may include, for example, a card containing a subscriber identity module and / or an embedded SIM and may include unique identification information (e.g., an integrated circuit card identifier (ICCID) (E.g., international mobile subscriber identity (IMSI)).

메모리 830(예: 메모리 730)는, 예를 들면, 내장 메모리 832 또는 외장 메모리 834를 포함할 수 있다. 내장 메모리 832는, 예를 들면, 휘발성 메모리(예: DRAM(dynamic RAM), SRAM(static RAM), 또는 SDRAM(synchronous dynamic RAM) 등), 비-휘발성(non-volatile) 메모리 (예: OTPROM(one time programmable ROM), PROM(programmable ROM), EPROM(erasable and programmable ROM), EEPROM(electrically erasable and programmable ROM), 마스크(mask) ROM, 플래시(flash) ROM, 플래시 메모리(예: 낸드플래시(NAND flash) 또는 노아플래시(NOR flash) 등), 하드 드라이브, 또는 SSD(solid state drive) 중 적어도 하나를 포함할 수 있다. Memory 830 (e.g., memory 730) may include, for example, internal memory 832 or external memory 834. The internal memory 832 may be a volatile memory such as a dynamic RAM (DRAM), a static random access memory (SRAM), or a synchronous dynamic RAM (SDRAM), a non-volatile memory such as an OTPROM one time programmable ROM), programmable ROM (PROM), erasable and programmable ROM (EPROM), electrically erasable and programmable ROM (EEPROM), mask ROM, flash ROM, flash memory (e.g., NAND flash, or NOR flash), a hard drive, or a solid state drive (SSD).

외장 메모리 834는 플래시 드라이브(flash drive), 예를 들면, CF(compact flash), SD(secure digital), Micro-SD, Mini-SD, xD(extreme digital), MMC(MultiMediaCard), 또는 메모리 스틱(memory stick) 등을 더 포함할 수 있다. 외장 메모리 834는 다양한 인터페이스를 통하여 전자 장치 801과 기능적으로 및/또는 물리적으로 연결될 수 있다.The external memory 834 may be a flash drive, for example, a compact flash (CF), a secure digital (SD), a micro-SD, a mini-SD, an extreme digital (xD), a multi- a memory stick, and the like. The external memory 834 may be functionally and / or physically connected to the electronic device 801 through various interfaces.

센서 모듈 840은, 예를 들면, 물리량을 계측하거나 전자 장치 801의 작동 상태를 감지하여, 계측 또는 감지된 정보를 전기 신호로 변환할 수 있다. 센서 모듈 840은, 예를 들면, 제스처 센서 840A, 자이로 센서 840B, 기압 센서 840C, 마그네틱 센서 840D, 가속도 센서 840E, 그립 센서 840F, 근접 센서 840G, 컬러 센서 840H(예: RGB 센서), 생체 센서 840I, 온/습도 센서 840J, 조도 센서 840K, 또는 UV(ultra violet) 센서 840M 중의 적어도 하나를 포함할 수 있다. 추가적으로 또는 대체적으로, 센서 모듈 840은, 예를 들면, 후각 센서(E-nose sensor), EMG(electromyography) 센서, EEG(electroencephalogram) 센서, ECG(electrocardiogram) 센서, IR(infrared) 센서, 홍채 센서 및/또는 지문 센서를 포함할 수 있다. 센서 모듈 840은 그 안에 속한 적어도 하나 이상의 센서들을 제어하기 위한 제어 회로를 더 포함할 수 있다. 어떤 실시 예에서는, 전자 장치 801은 프로세서 810의 일부로서 또는 별도로, 센서 모듈 840을 제어하도록 구성된 프로세서를 더 포함하여, 프로세서 810이 슬립(sleep) 상태에 있는 동안, 센서 모듈 840을 제어할 수 있다.The sensor module 840 may, for example, measure a physical quantity or sense the operating state of the electronic device 801 and convert the measured or sensed information into an electrical signal. The sensor module 840 may include, for example, a gesture sensor 840A, a gyro sensor 840B, an air pressure sensor 840C, a magnetic sensor 840D, an acceleration sensor 840E, a grip sensor 840F, a proximity sensor 840G, a color sensor 840H , An on / humidity sensor 840J, an illuminance sensor 840K, or an ultraviolet (UV) sensor 840M. Additionally or alternatively, the sensor module 840 may include, for example, an E-nose sensor, an EMG (electromyography) sensor, an EEG (electroencephalogram) sensor, an ECG (electrocardiogram) sensor, an IR And / or a fingerprint sensor. The sensor module 840 may further include a control circuit for controlling at least one sensor belonging to the sensor module 840. In some embodiments, the electronic device 801 may further include a processor configured to control the sensor module 840, either as part of the processor 810 or separately, to control the sensor module 840 while the processor 810 is in a sleep state .

입력 장치 850은, 예를 들면, 터치 패널(touch panel) 852, (디지털) 펜 센서(pen sensor) 854, 키(key) 856, 또는 초음파(ultrasonic) 입력 장치 858을 포함할 수 있다. 터치 패널 852는, 예를 들면, 정전식, 감압식, 적외선 방식, 또는 초음파 방식 중 적어도 하나의 방식을 사용할 수 있다. 또한, 터치 패널 852는 제어 회로를 더 포함할 수도 있다. 터치 패널 852는 택타일 레이어(tactile layer)를 더 포함하여, 사용자에게 촉각 반응을 제공할 수 있다. The input device 850 may include, for example, a touch panel 852, a (digital) pen sensor 854, a key 856, or an ultrasonic input device 858. The touch panel 852 can employ, for example, at least one of an electrostatic type, a pressure sensitive type, an infrared type, and an ultrasonic type. Further, the touch panel 852 may further include a control circuit. The touch panel 852 may further include a tactile layer to provide a tactile response to the user.

(디지털) 펜 센서 854는, 예를 들면, 터치 패널의 일부이거나, 별도의 인식용 시트(sheet)를 포함할 수 있다. 키 856은, 예를 들면, 물리적인 버튼, 광학식 키, 또는 키패드를 포함할 수 있다. 초음파 입력 장치 858은 마이크(예: 마이크 888)를 통해, 입력 도구에서 발생된 초음파를 감지하여, 상기 감지된 초음파에 대응하는 데이터를 확인할 수 있다. The (digital) pen sensor 854 may be, for example, part of a touch panel or may include a separate recognition sheet. Key 856 may include, for example, a physical button, an optical key, or a keypad. The ultrasonic input device 858 can detect the ultrasonic wave generated by the input tool through the microphone (e.g., the microphone 888) and confirm the data corresponding to the ultrasonic wave detected.

디스플레이 860(예: 디스플레이 760)은 패널 862, 홀로그램 장치 864, 또는 프로젝터 866을 포함할 수 있다. 패널 862는, 도 7의 디스플레이 760과 동일 또는 유사한 구성을 포함할 수 있다. 패널 862는, 예를 들면, 유연하게(flexible), 투명하게(transparent), 또는 착용할 수 있게(wearable) 구현될 수 있다. 패널 862는 터치 패널 852와 하나의 모듈로 구성될 수도 있다. 홀로그램 장치 864는 빛의 간섭을 이용하여 입체 영상을 허공에 보여줄 수 있다. 프로젝터 866은 스크린에 빛을 투사하여 영상을 표시할 수 있다. 스크린은, 예를 들면, 전자 장치 801의 내부 또는 외부에 위치할 수 있다. 한 실시 예에 따르면, 디스플레이 860은 상기 패널 862, 상기 홀로그램 장치 864, 또는 프로젝터 866를 제어하기 위한 제어 회로를 더 포함할 수 있다.Display 860 (e.g., display 760) may include panel 862, hologram device 864, or projector 866. Panel 862 may include the same or similar configuration as display 760 of FIG. The panel 862 can be embodied, for example, flexible, transparent, or wearable. The panel 862 may be composed of a touch panel 852 and one module. The hologram device 864 can display stereoscopic images in the air using the interference of light. The projector 866 can display images by projecting light onto the screen. The screen may be located, for example, inside or outside the electronic device 801. According to one embodiment, the display 860 may further include control circuitry for controlling the panel 862, the hologram device 864, or the projector 866.

인터페이스 870은, 예를 들면, HDMI 872, USB 874, 광 인터페이스(optical interface) 876, 또는 D-sub(D-subminiature) 878을 포함할 수 있다. 인터페이스 870은, 예를 들면, 도 7에 도시된 통신 인터페이스 770에 포함될 수 있다. 추가적으로 또는 대체적으로, 인터페이스 870은, 예를 들면, MHL(mobile high-definition link) 인터페이스, SD 카드/MMC 인터페이스, 또는 IrDA(infrared data association) 규격 인터페이스를 포함할 수 있다.The interface 870 may include, for example, an HDMI 872, a USB 874, an optical interface 876, or a D-sub (D-subminiature) 878. The interface 870 may be included in the communication interface 770 shown in Fig. 7, for example. Additionally or alternatively, interface 870 may include, for example, a mobile high-definition link (MHL) interface, an SD card / MMC interface, or an infrared data association (IrDA) interface.

오디오 모듈 880은, 예를 들면, 소리(sound)와 전기 신호를 쌍방향으로 변환시킬 수 있다. 오디오 모듈 880의 적어도 일부 구성요소는, 예를 들면, 도 7에 도시된 입출력 인터페이스 750에 포함될 수 있다. 오디오 모듈 880은, 예를 들면, 스피커 882, 리시버 884, 이어폰 886, 또는 마이크 888 등을 통해 입력 또는 출력되는 소리 정보를 처리할 수 있다.Audio module 880 can, for example, bidirectionally convert sound and electrical signals. At least some of the components of the audio module 880 may be included, for example, in the input / output interface 750 shown in FIG. The audio module 880 can process sound information input or output through, for example, a speaker 882, a receiver 884, an earphone 886, a microphone 888, or the like.

카메라 모듈 891은, 예를 들면, 정지 영상 및 동영상을 촬영할 수 있는 장치로서, 한 실시 예에 따르면, 하나 이상의 이미지 센서(예: 전면 센서 또는 후면 센서), 렌즈, ISP(image signal processor), 또는 플래시(flash)(예: LED 또는 제논 램프(xenon lamp))를 포함할 수 있다.The camera module 891 is, for example, a device capable of capturing still images and moving images, and according to one embodiment, one or more image sensors (e.g., a front sensor or a rear sensor), a lens, an image signal processor And may include a flash (e.g., LED or xenon lamp).

전력 관리 모듈 895는, 예를 들면, 전자 장치 801의 전력을 관리할 수 있다. 한 실시 예에 따르면, 전력 관리 모듈 895는 PMIC(power management integrated circuit), 충전 IC(charger integrated circuit), 또는 배터리 또는 연료 게이지(battery or fuel gauge)를 포함할 수 있다. PMIC는, 유선 및/또는 무선 충전 방식을 가질 수 있다. 무선 충전 방식은, 예를 들면, 자기공명 방식, 자기유도 방식 또는 전자기파 방식 등을 포함하며, 무선 충전을 위한 부가적인 회로, 예를 들면, 코일 루프, 공진 회로, 또는 정류기 등을 더 포함할 수 있다. 배터리 게이지는, 예를 들면, 배터리 896의 잔량, 충전 중 전압, 전류, 또는 온도를 측정할 수 있다. 배터리 896은, 예를 들면, 충전식 전지(rechargeable battery) 및/또는 태양 전지(solar battery)를 포함할 수 있다. The power management module 895 can manage the power of the electronic device 801, for example. According to one embodiment, the power management module 895 may include a power management integrated circuit (PMIC), a charger integrated circuit (PMIC), or a battery or fuel gauge. The PMIC may have a wired and / or wireless charging scheme. The wireless charging system may include, for example, a magnetic resonance system, a magnetic induction system, or an electromagnetic wave system, and may further include an additional circuit for wireless charging, for example, a coil loop, a resonant circuit, have. The battery gauge can measure, for example, the remaining amount of the battery 896, the voltage during charging, the current, or the temperature. The battery 896 may include, for example, a rechargeable battery and / or a solar battery.

인디케이터 897은 전자 장치 801 혹은 그 일부(예: 프로세서 810)의 특정 상태, 예를 들면, 부팅 상태, 메시지 상태 또는 충전 상태 등을 표시할 수 있다. 모터 898은 전기적 신호를 기계적 진동으로 변환할 수 있고, 진동(vibration), 또는 햅틱(haptic) 효과 등을 발생시킬 수 있다. 도시되지는 않았으나, 전자 장치 801은 모바일 TV 지원을 위한 처리 장치(예: GPU)를 포함할 수 있다. 모바일 TV 지원을 위한 처리 장치는, 예를 들면, DMB(digital multimedia broadcasting), DVB(digital video broadcasting), 또는 미디어플로(MediaFloTM) 등의 규격에 따른 미디어 데이터를 처리할 수 있다.The indicator 897 may indicate a particular state of the electronic device 801 or a portion thereof (e.g., processor 810), such as a boot state, a message state, or a state of charge. The motor 898 can convert an electrical signal to mechanical vibration and can generate vibration, haptic effects, and the like. Although not shown, the electronic device 801 may include a processing unit (e.g., a GPU) for mobile TV support. The processing device for mobile TV support can process media data conforming to standards such as digital multimedia broadcasting (DMB), digital video broadcasting (DVB), or MediaFlo (TM), for example.

본 문서에서 기술된 구성요소들 각각은 하나 또는 그 이상의 부품(component)으로 구성될 수 있으며, 해당 구성 요소의 명칭은 전자 장치의 종류에 따라서 달라질 수 있다. 다양한 실시 예에서, 전자 장치는 본 문서에서 기술된 구성요소 중 적어도 하나를 포함하여 구성될 수 있으며, 일부 구성요소가 생략되거나 또는 추가적인 다른 구성요소를 더 포함할 수 있다. 또한, 다양한 실시 예에 따른 전자 장치의 구성 요소들 중 일부가 결합되어 하나의 개체(entity)로 구성됨으로써, 결합되기 이전의 해당 구성 요소들의 기능을 동일하게 수행할 수 있다.
Each of the components described in this document may be composed of one or more components, and the name of the component may be changed according to the type of the electronic device. In various embodiments, the electronic device may comprise at least one of the components described herein, some components may be omitted, or may further include additional other components. In addition, some of the components of the electronic device according to various embodiments may be combined into one entity, so that the functions of the components before being combined can be performed in the same manner.

본 문서에서 사용된 용어 "모듈"은, 예를 들면, 하드웨어, 소프트웨어 또는 펌웨어(firmware) 중 하나 또는 둘 이상의 조합을 포함하는 단위(unit)를 의미할 수 있다. "모듈"은, 예를 들면, 유닛(unit), 로직(logic), 논리 블록(logical block), 부품(component), 또는 회로(circuit) 등의 용어와 바꾸어 사용(interchangeably use)될 수 있다. "모듈"은, 일체로 구성된 부품의 최소 단위 또는 그 일부가 될 수 있다. "모듈"은 하나 또는 그 이상의 기능을 수행하는 최소 단위 또는 그 일부가 될 수도 있다. "모듈"은 기계적으로 또는 전자적으로 구현될 수 있다. 예를 들면, "모듈"은, 알려졌거나 앞으로 개발될, 어떤 동작들을 수행하는 ASIC(application-specific integrated circuit) 칩, FPGAs(field-programmable gate arrays) 또는 프로그램 가능 논리 장치(programmable-logic device) 중 적어도 하나를 포함할 수 있다.As used in this document, the term "module" may refer to a unit comprising, for example, one or a combination of two or more of hardware, software or firmware. A "module" may be interchangeably used with terms such as, for example, unit, logic, logical block, component, or circuit. A "module" may be a minimum unit or a portion of an integrally constructed component. A "module" may be a minimum unit or a portion thereof that performs one or more functions. "Modules" may be implemented either mechanically or electronically. For example, a "module" may be an application-specific integrated circuit (ASIC) chip, field-programmable gate arrays (FPGAs) or programmable-logic devices And may include at least one.

다양한 실시 예에 따른 장치(예: 모듈들 또는 그 기능들) 또는 방법(예: 동작들)의 적어도 일부는, 예컨대, 프로그램 모듈의 형태로 컴퓨터로 읽을 수 있는 저장매체(computer-readable storage media)에 저장된 명령어로 구현될 수 있다. At least a portion of a device (e.g., modules or functions thereof) or a method (e.g., operations) according to various embodiments may include, for example, computer-readable storage media in the form of program modules, As shown in FIG.

예를 들어, 상기 저장 매체는, 실행될 때 전자 장치의 프로세서로 하여금 사용자로부터 음성 입력을 획득하여 음성 신호를 생성하는 동작, 상기 음성 신호의 적어도 일부에 대한 제1 음성 인식을 수행하여 제1 동작 정보 및 제1 신뢰도(confidence score)를 획득하는 동작, 제2 음성 인식의 수행을 위해 상기 음성 신호의 적어도 일부를 서버로 전송하는 동작, 상기 서버로부터 상기 전송된 신호에 대한 제2 동작 정보를 수신하는 동작, 및 (1) 상기 제1 신뢰도가 제1 임계 값 이상인 경우 상기 제1 동작 정보에 대응하는 기능을 수행하고, (2) 상기 제1 신뢰도가 제2 임계 값 미만인 경우 상기 제1 신뢰도에 대한 피드백을 제공하고, (3) 상기 제1 신뢰도가 상기 제1 임계 값과 상기 제2 임계 값 사이에 있는 경우 상기 제2 동작 정보에 대응하는 기능을 수행하도록 하는 명령어를 저장하고 있을 수 있다.For example, the storage medium may include instructions that, when executed, cause the processor of the electronic device to obtain a speech input from a user to generate a speech signal, perform a first speech recognition on at least a portion of the speech signal, Receiving at least a portion of the speech signal to a server for performing a second speech recognition, receiving second operation information for the transmitted signal from the server, (1) performing a function corresponding to the first operation information when the first reliability is equal to or greater than a first threshold value, (2) performing a function corresponding to the first reliability if the first reliability is less than a second threshold value, (3) to perform a function corresponding to the second operation information when the first reliability is between the first threshold value and the second threshold value, You may be saving your words.

다양한 실시 예에 따른 모듈 또는 프로그램 모듈은 전술한 구성요소들 중 적어도 하나 이상을 포함하거나, 일부가 생략되거나, 또는 추가적인 다른 구성요소를 더 포함할 수 있다. 다양한 실시 예에 따른 모듈, 프로그램 모듈 또는 다른 구성요소에 의해 수행되는 동작들은 순차적, 병렬적, 반복적 또는 휴리스틱(heuristic)한 방법으로 실행될 수 있다. 또한, 일부 동작은 다른 순서로 실행되거나, 생략되거나, 또는 다른 동작이 추가될 수 있다.Modules or program modules according to various embodiments may include at least one or more of the elements described above, some of which may be omitted, or may further include additional other elements. Operations performed by modules, program modules, or other components in accordance with various embodiments may be performed in a sequential, parallel, iterative, or heuristic manner. Also, some operations may be performed in a different order, omitted, or other operations may be added.

그리고 본 문서에 개시된 실시 예는 개시된 기술 내용의 설명 및 이해를 위해 제시된 것이며 본 발명의 범위를 한정하는 것은 아니다. 따라서, 본 문서의 범위는 본 발명의 기술적 사상에 근거한 모든 변경 또는 다양한 다른 실시 예를 포함하는 것으로 해석되어야 한다.
And the embodiments disclosed in this document are presented for the purpose of explanation and understanding of the disclosed technical contents, and do not limit the scope of the present invention. Accordingly, the scope of this document should be interpreted to include all modifications based on the technical idea of the present invention or various other embodiments.

Claims

In an electronic device,
A processor for performing automatic speech recognition (ASR) on speech input using a speech recognition model stored in a memory,
And a communication module for providing the voice input to a server and receiving voice commands corresponding to the voice input from the server,
(1) when the reliability of the ASR is greater than or equal to a first threshold, performing an operation corresponding to the result of the ASR; (2) when the reliability of the ASR is less than a second threshold And to provide feedback on the reliability.

The method according to claim 1,
And the processor is configured to (3) perform the voice command received from the server if the reliability is between the first threshold and the second threshold.

The method according to claim 1,
And when the reliability is equal to or greater than the first threshold value, the operation is set to be performed regardless of whether or not the voice command is received.

The method of claim 3,
Wherein the operation is set to include at least one of an input based on at least one function executable by the processor, at least one application, or an execution result of the ASR.

The method according to claim 1,
Wherein the feedback is set to include a message or audio output indicating that the speech input is not recognized or that the result of the performance is unreliable.

The method according to claim 1,
Wherein the voice command received from the server corresponds to a result of performing voice recognition on the voice input provided by the server based on a voice recognition model different from the voice recognition model stored in the memory.

The method of claim 6,
Wherein the speech recognition performed at the server comprises NLP (Natural Language Processing).

The method according to claim 1,
Wherein the processor is configured to provide an audio signal to the ASR engine performing the ASR and to provide the audio input itself to the server via the communication module, Device.

The method according to claim 1,
Wherein the processor is configured to compare the result of the performance of the ASR with the voice command received from the server and to change the first threshold based on the comparison result if the confidence is above the first threshold.

The method of claim 9,
Wherein the processor is configured to decrease the first threshold value if the result of the ASR and the voice command received from the server correspond, and to increase the first threshold value if the voice command does not correspond to each other.

The method according to claim 1,
Wherein the processor is configured to compare the result of performing the ASR with the voice command received from the server when the reliability is less than the first threshold and to update the speech recognition model based on a result of the comparison.

The method of claim 11,
Wherein the communication module receives reliability from the server with the voice command for the voice command,
Wherein the processor is configured to add confidence for the speech command and the speech command to the speech recognition model.

A method of performing speech recognition of an electronic device,
An operation of acquiring a voice input from a user to generate a voice signal,
Performing a first speech recognition on at least a portion of the speech signal to obtain first operational information and a first confidence score,
Transmitting at least a portion of the speech signal to a server for performing a second speech recognition,
Receiving second operation information on the transmitted signal from the server, and
(1) perform a function corresponding to the first operation information when the first reliability is equal to or greater than a first threshold value, and (2) provide feedback on the first reliability when the first reliability is less than a second threshold value And (3) performing a function corresponding to the second operation information when the first reliability is between the first threshold value and the second threshold value.

14. The method of claim 13,
Wherein the operation of performing the function corresponding to the first operation information when the first reliability is equal to or greater than the first threshold value is performed before the operation of receiving the second operation information.

14. The method of claim 13,
Further comprising: increasing the first threshold value when the function corresponding to the first operation information matches the function corresponding to the second operation information.

14. The method of claim 13,
Further comprising decreasing the first threshold value when the function corresponding to the first operation information does not match the function corresponding to the second operation information.

14. The method of claim 13,
And wherein the act of receiving the second action information further comprises receiving a second confidence together with the second action information.

18. The method of claim 17,
Further comprising adding to the speech recognition model used for the first speech recognition, the second operation information and the second reliability information for the speech input when the first reliability is less than the first threshold value. How to perform recognition.

18. The method of claim 17,
When the first operation information and the second operation information do not correspond to each other, the second operation information and the second reliability based on the first reliability and the second reliability, Further comprising adding to the model.

21. A storage medium storing computer-readable instructions, the computer-readable instructions, when executed by a processor of an electronic device,
An operation of acquiring a voice input from a user to generate a voice signal,
Performing a first speech recognition on at least a portion of the speech signal to obtain first operational information and a first confidence score,
Transmitting at least a portion of the speech signal to a server for performing a second speech recognition,
Receiving second operation information on the transmitted signal from the server, and
(1) perform a function corresponding to the first operation information when the first reliability is equal to or greater than a first threshold value, and (2) provide feedback on the first reliability when the first reliability is less than a second threshold value And (3) when the first reliability is between the first threshold value and the second threshold value, perform a function corresponding to the second operation information.