KR102342623B1

KR102342623B1 - Voice and connection platform

Info

Publication number: KR102342623B1
Application number: KR1020177011922A
Authority: KR
Inventors: 그레고리 레너드; 마티아스 어보
Original assignee: 엑스브레인, 인크.
Priority date: 2014-10-01
Filing date: 2015-09-30
Publication date: 2021-12-22
Also published as: US10789953B2; CN107004410B; US20160098992A1; CN107004410A; JP2017535823A; US20190180750A1; KR20170070094A; CA2962636A1; WO2016054230A1; US10235996B2; EP3201913A1; JP6671379B2; EP3201913A4

Abstract

음성 어시스턴트를 제공하는 시스템 및 방법은, 제1 디바이스에서, 제1 행동을 요청하는 사용자로부터 제1 오디오 입력을 수신하는 것; 제1 오디오 입력에 대해 자동 음성 인식을 수행하는 것; 사용자의 컨텍스트를 획득하는 것; 제1 오디오 입력의 음성 인식에 기초하여 자연어 이해를 수행하는 것; 및 사용자의 컨텍스트 및 자연어 이해에 기초하여 제1 행동을 취하는 것을 포함한다.A system and method for providing a voice assistant comprises, at a first device, receiving a first audio input from a user requesting a first action; performing automatic speech recognition on the first audio input; obtaining the user's context; performing natural language understanding based on speech recognition of the first audio input; and taking a first action based on the user's context and natural language understanding.

Description

VOICE AND CONNECTION PLATFORM

현재의 음성 어시스턴트(voice assistant)들로는 Apple의 Siri, Google의 Google Now 및 Microsoft의 Cortana가 있다. 이러한 현재 시스템들에서의 제1 문제점은 사용자가 사람과 하는 것처럼 대화 방식으로 사용자가 개인 어시스턴트(personal assistant)와 상호작용할 수 있게 하지 않는다. 이러한 현재 시스템들에서의 제2 문제점은 사용자가 너무 종종 이해되지 않거나 오해된다는 것 또는 현재 시스템들이 디폴트로 신속히 웹 검색으로 된다는 것이다. 이러한 현재 시스템들에서의 제3 문제점은 그들이 그들의 사용자를 돕는 데 사전 대응적(proactive)이지 않다는 것이다. 제4 문제점은 이러한 현재 시스템들이 그들과 상호작용하는 애플리케이션들이 제한되어 있고, 예를 들어, 이러한 음성 어시스턴트들이 제한된 수의 애플리케이션들과만 상호작용할 수 있다는 것이다. 제5 문제점은 이러한 현재 시스템들이 사용자의 컨텍스트를 이용하지 않는다는 것이다. 제6 문제점은 이러한 현재 시스템들이 다른 음성 어시스턴트들과 통합되지 않는다는 것이다.Current voice assistants include Apple's Siri, Google's Google Now and Microsoft's Cortana. A first problem with these current systems is that they do not allow the user to interact with a personal assistant in a conversational manner as the user does with a human. A second problem with these current systems is that the user too often is not understood or misunderstood, or that current systems default to a quick web search. A third problem with these current systems is that they are not proactive in helping their users. A fourth problem is that these current systems have limited applications with which they interact, eg, these voice assistants can only interact with a limited number of applications. A fifth problem is that these current systems do not use the user's context. A sixth problem is that these current systems do not integrate with other voice assistants.

일 실시예에서, 음성 및 연결 엔진은 기존의 음성 어시스턴트들의 전술한 결점들 중 하나 이상을 해결하는 음성 어시스턴트를 제공한다. 일 실시예에서, 음성 및 연결 엔진은 자동 음성 인식, 자연어 이해 및 텍스트-음성 변환 컴포넌트들 중 하나 이상에 대한 애그노스틱 및 모듈식 접근법(agnostic and modular approach)을 사용함으로써, 그 컴포넌트들에 대한 빈번한 업데이트들은 물론 상이한 언어들에 대한 시스템의 적합화를 간략화시키는 것을 가능하게 한다. 일 실시예에서, 음성 및 연결 엔진은 사용자와의 보다 자연스럽고 인간같은 대화를 제공하기 위해 그리고 사용자의 요청들의 이해의 정확도를 증가시키고 요청을 수신하는 것과 요청에 따라 실행하는 것 사이의 시간량을 감소시키기 위해 컨텍스트를 관리한다. 일 실시예에서, 음성 및 연결 엔진은 즉각 디폴트로 웹 검색으로 되기보다는 사용자의 의도된 요청을 획득하기 위해 차선책(work around)을 제공한다. 일 실시예에서, 음성 및 연결 엔진은 사용자 디바이스의 다양한 애플리케이션들(예컨대, 전화, 통합 메신저(unified messenger), 뉴스, 미디어, 날씨, 웹 검색을 위한 브라우저 등)과 상호작용하기 위해 모듈들을 이용하고, 시간의 경과에 따라 애플리케이션들이 추가되고 업데이트될 때 모듈들이 개별적으로 추가 또는 수정될 수 있다. 일 실시예에서, 애플리케이션들과 상호작용하기 위한 모듈들은 사용자 명령들에서 어떤 레벨의 표준화를 제공한다. 예를 들어, 사용자는 페이스북, 이메일 또는 트위터를 통해 메시지를 송신하기 위해 구두 요청 "메시지를 보내(send a message)"를 사용할 수 있다.In one embodiment, the voice and connectivity engine provides a voice assistant that addresses one or more of the aforementioned deficiencies of existing voice assistants. In one embodiment, the speech and connectivity engine uses an agnostic and modular approach to one or more of automatic speech recognition, natural language understanding, and text-to-speech components, thereby Frequent updates make it possible, of course, to simplify the adaptation of the system to different languages. In one embodiment, the voice and connectivity engine provides a more natural and human-like conversation with the user and increases the accuracy of understanding the user's requests and reduces the amount of time between receiving and executing the request. Manage context to reduce. In one embodiment, the voice and connectivity engine provides a work around to obtain the user's intended request rather than immediately defaulting to a web search. In one embodiment, the voice and connectivity engine uses modules to interact with various applications of the user device (eg, phone, unified messenger, news, media, weather, browser for web search, etc.) and , modules may be individually added or modified as applications are added and updated over time. In one embodiment, modules for interacting with applications provide some level of standardization in user commands. For example, a user may use the verbal request “send a message” to send a message via Facebook, email or Twitter.

일 실시예에서, 본 방법은, 제1 디바이스에서, 제1 행동을 요청하는 사용자로부터 제1 오디오 입력을 수신하는 단계; 제1 오디오 입력에 대해 자동 음성 인식을 수행하는 단계; 사용자의 컨텍스트를 획득하는 단계; 제1 오디오 입력의 음성 인식에 기초하여 자연어 이해를 수행하는 단계; 및 사용자의 컨텍스트 및 자연어 이해에 기초하여 제1 행동을 취하는 단계를 포함한다.In one embodiment, the method includes: receiving, at a first device, a first audio input from a user requesting a first action; performing automatic speech recognition on the first audio input; obtaining the user's context; performing natural language understanding based on speech recognition of the first audio input; and taking a first action based on the user's context and natural language understanding.

다른 양태들은 이들 및 다른 혁신적인 특징들을 위한 대응하는 방법들, 시스템들, 장치들, 및 컴퓨터 프로그램 제품들을 포함한다. 이들 및 다른 구현들 각각은 임의로 하기의 특징들 중 하나 이상을 포함할 수 있다. 예를 들어, 동작들은 제1 오디오 입력이 내부 이벤트에 응답하여 수신되는 것을 추가로 포함한다. 예를 들어, 동작들은 사용자 입력 없이 음성 어시스턴트를 개시하고 음성 어시스턴트의 개시 이후에 사용자로부터 제1 오디오 입력을 수신하는 것을 추가로 포함한다. 예를 들어, 동작들은 컨텍스트가 컨텍스트 이력, 대화 이력, 사용자 프로파일, 사용자 이력, 위치 및 현재 컨텍스트 도메인 중 하나 이상을 포함하는 것을 추가로 포함한다. 예를 들어, 동작들은, 행동을 취한 후에, 제1 행동에 관련되지 않은 제2 행동을 요청하는 사용자로부터 제2 오디오 입력을 수신하는 것; 제2 행동을 취하는 것; 제1 행동에 관련되어 있는 제3 행동을 요청하는 사용자로부터 제3 오디오 입력을 수신하는 것 - 제3 오디오 입력은 제3 행동을 취하는 데 사용되는 정보를 누락하고 있음 -; 컨텍스트를 사용하여 누락된 정보를 획득하는 것; 및 제3 행동을 취하는 것을 추가로 포함한다. 예를 들어, 동작들은 누락된 정보가 행동, 행위자 및 엔티티 중 하나 이상인 것을 추가로 포함한다. 예를 들어, 동작들은, 제2 디바이스에서, 제1 행동에 관련되어 있는 제2 행동을 요청하는 사용자로부터 제2 오디오 입력을 수신하는 것 - 제2 오디오 입력은 제2 행동을 취하는 데 사용되는 정보를 누락하고 있음 -; 컨텍스트를 사용하여 누락된 정보를 획득하는 것; 및 컨텍스트에 기초하여 제2 행동을 취하는 것을 추가로 포함한다. 예를 들어, 동작들은 컨텍스트와 제1 오디오 입력이 제1 행동을 취하는 데 사용되는 정보를 누락하고 있다고 결정하는 것; 어떤 정보가 누락된 정보인지를 결정하는 것; 및 누락된 정보를 제공하는 제2 오디오 입력을 제공하라고 사용자에게 프롬프트하는 것을 추가로 포함한다. 예를 들어, 동작들은 제1 행동을 취하는 데 사용되는 정보가 제1 오디오 입력으로부터 획득될 수 없다고 결정하는 것; 어떤 정보가 누락된 정보인지를 결정하는 것; 및 제1 오디오 입력으로부터 획득될 수 없는 정보를 제공하는 제2 오디오 입력을 제공하라고 사용자에게 프롬프트하는 것을 추가로 포함한다. 예를 들어, 동작들은 제1 행동을 취하는 데 사용되는 정보가 제1 오디오 입력으로부터 획득될 수 없다고 결정하는 것; 어떤 정보가 제1 행동을 취하는 데 사용되는 정보로부터 누락되어 있는지를 결정하는 것; 사용자에 의한 선택을 위해, 복수의 옵션들을 제공하는 것 - 옵션은 제1 행동을 완료하기 위한 잠재적인 정보를 제공함 -; 및 복수의 옵션들로부터 제1 옵션을 선택하는 제2 오디오 입력을 수신하는 것을 추가로 포함한다.Other aspects include corresponding methods, systems, apparatuses, and computer program products for these and other innovative features. Each of these and other implementations may optionally include one or more of the following features. For example, the operations further include receiving the first audio input in response to the internal event. For example, the operations further include initiating the voice assistant without user input and receiving a first audio input from the user after initiation of the voice assistant. For example, the actions further include wherein the context includes one or more of a context history, a conversation history, a user profile, a user history, a location, and a current context domain. For example, the actions may include, after taking the action, receiving a second audio input from a user requesting a second action not related to the first action; taking a second action; receiving a third audio input from a user requesting a third action related to the first action, the third audio input omitting information used to take the third action; using context to obtain missing information; and taking a third action. For example, the actions further include that the missing information is one or more of an action, an actor, and an entity. For example, the actions may include receiving, at a second device, a second audio input from a user requesting a second action related to the first action, the second audio input information being used to take the second action is missing -; using context to obtain missing information; and taking a second action based on the context. For example, the actions may include determining that the context and the first audio input are missing information used to take the first action; determining which information is missing; and prompting the user to provide a second audio input providing the missing information. For example, the actions may include determining that information used to take the first action cannot be obtained from the first audio input; determining which information is missing; and prompting the user to provide a second audio input that provides information that cannot be obtained from the first audio input. For example, the actions may include determining that information used to take the first action cannot be obtained from the first audio input; determining which information is missing from the information used to take the first action; providing a plurality of options for selection by the user, the options providing potential information for completing the first action; and receiving a second audio input selecting the first option from the plurality of options.

본원에 기술되는 특징들 및 장점들은 모두를 포함하는 것이 아니며, 많은 부가의 특징들 및 장점들이 도면들 및 설명을 고려하면 본 기술분야의 통상의 기술자에게 명백할 것이다. 더욱이, 명세서에서 사용되는 표현(language)이 원칙적으로 발명 요지의 범주를 제한하기 위해서가 아니라 읽기 쉬움 및 교육적 목적을 위해 선택되었다는 것에 유의해야 한다.The features and advantages described herein are not all-inclusive, and many additional features and advantages will be apparent to those skilled in the art upon consideration of the drawings and description. Moreover, it should be noted that the language used in the specification has been principally chosen for readability and educational purposes and not to limit the scope of the inventive subject matter.

본 개시내용이, 유사한 참조 번호들이 유사한 요소들을 가리키는 데 사용되는 첨부 도면들의 도면들에, 제한이 아닌 예로서 도시되어 있다.
도 1은 일 실시예에 따른, 음성 및 연결 플랫폼에 대한 예시적인 시스템을 나타낸 블록도.
도 2는 일 실시예에 따른, 예시적인 컴퓨팅 디바이스를 나타낸 블록도.
도 3은 일 실시예에 따른, 클라이언트측 음성 및 연결 엔진의 일 예를 나타낸 블록도.
도 4는 일 실시예에 따른, 서버측 음성 및 연결 엔진의 일 예를 나타낸 블록도.
도 5는 일부 실시예들에 따른, 음성 및 연결 플랫폼을 사용하여 요청을 수신하고 처리하는 예시적인 방법의 플로우차트.
도 6은 일부 실시예들에 따른, 사용자의 의도된 요청을 결정하기 위해 부가 정보를 획득하는 예시적인 방법의 플로우차트.
도 7은 다른 실시예에 따른, 음성 및 연결 플랫폼을 사용하여 요청을 수신하고 처리하는 예시적인 방법을 나타낸 도면.
도 8은 일 실시예에 따른, 음성 및 연결 플랫폼에서 컨텍스트를 관리하는 일 예의 블록도.BRIEF DESCRIPTION OF THE DRAWINGS The present disclosure is illustrated by way of example and not limitation in the drawings in the accompanying drawings in which like reference numbers are used to indicate like elements.
1 is a block diagram illustrating an exemplary system for a voice and connectivity platform, according to one embodiment.
2 is a block diagram illustrating an exemplary computing device, in accordance with one embodiment.
3 is a block diagram illustrating an example of a client-side voice and connectivity engine, according to one embodiment.
4 is a block diagram illustrating an example of a server-side voice and connectivity engine, according to an embodiment.
5 is a flowchart of an exemplary method of receiving and processing a request using a voice and connectivity platform, in accordance with some embodiments.
6 is a flowchart of an exemplary method of obtaining additional information to determine a user's intended request, in accordance with some embodiments.
7 illustrates an exemplary method of receiving and processing a request using a voice and connectivity platform, according to another embodiment.
8 is a block diagram of an example of managing context in a voice and connectivity platform, according to an embodiment.

도 1은 일 실시예에 따른, 음성 및 연결 플랫폼에 대한 예시적인 시스템(100)을 나타낸 블록도이다. 예시된 시스템(100)은, 서로와의 상호작용을 위해 네트워크(102)를 통해 통신가능하게 결합되어 있는, 클라이언트 디바이스들(106a ... 106n), 자동 음성 인식(automatic speech recognition)(ASR) 서버(110), 음성 및 연결 서버(122) 및 텍스트-음성 변환(text to speech)(TTS) 서버(116)를 포함한다. 예를 들어, 클라이언트 디바이스들(106a ... 106n)은, 각각, 신호 라인들(104a ... 104n)을 통해 네트워크(102)에 결합될 수 있고, 라인들(110a ... 110n)에 의해 예시된 바와 같이 사용자들(112a ... 112n)(개별적으로 그리고 총칭하여 사용자(112)라고도 지칭됨)에 의해 액세스될 수 있다. 자동 음성 인식 서버(110)는 신호 라인(108)을 통해 네트워크(102)에 결합될 수 있다. 음성 및 연결 서버(122)는 신호 라인(120)을 통해 네트워크(102)에 결합될 수 있다. 텍스트-음성 변환 서버(116)는 신호 라인(114)을 통해 네트워크(102)에 연결될 수 있다. 참조 번호들에서의 명명법 "a" 및 "n"의 사용은 그 명명법을 가지는 임의의 수의 그 요소들이 시스템(100)에 포함될 수 있다는 것을 나타낸다.1 is a block diagram illustrating an exemplary system 100 for a voice and connectivity platform, according to one embodiment. The illustrated system 100 includes automatic speech recognition (ASR), client devices 106a ... 106n, communicatively coupled via a network 102 for interaction with each other. a server 110 , a voice and connection server 122 , and a text to speech (TTS) server 116 . For example, client devices 106a ... 106n may be coupled to network 102 via signal lines 104a ... 104n, respectively, and to lines 110a ... 110n, respectively. may be accessed by users 112a ... 112n (individually and collectively referred to as user 112 ) as illustrated by The automatic speech recognition server 110 may be coupled to the network 102 via a signal line 108 . Voice and connectivity server 122 may be coupled to network 102 via signal line 120 . Text-to-speech server 116 may be coupled to network 102 via signal line 114 . The use of the nomenclature “a” and “n” in the reference numbers indicates that any number of the elements having the nomenclature may be included in the system 100 .

네트워크(102)는 임의의 수의 네트워크들 및/또는 네트워크 유형들을 포함할 수 있다. 예를 들어, 네트워크(102)는 하나 이상의 LAN(local area network)들, WAN(wide area network)들(예컨대, 인터넷), VPN(virtual private network)들, 모바일 네트워크들(예컨대, 셀룰러 네트워크), WWAN(wireless wide area network)들, Wi-Fi 네트워크들, WiMAX® 네트워크들, 블루투스® 통신 네트워크들, 피어-투-피어 네트워크들, 다른 상호연결된 데이터 경로들 - 이들을 거쳐 다수의 디바이스들이 통신할 수 있음 -, 이들의 다양한 조합들 등을 포함할 수 있지만, 이들로 제한되지 않는다. 네트워크(102)에 의해 전송되는 데이터는 네트워크(102)에 결합된 지정된 컴퓨팅 디바이스들로 라우팅되는 패킷화된 데이터(예컨대, IP(Internet Protocol) 데이터 패킷들)를 포함할 수 있다. 일부 구현들에서, 네트워크(102)는 시스템(100)의 컴퓨팅 디바이스들을 상호연결시키는 유선과 무선(예컨대, 지상 또는 위성 기반 송수신기들) 네트워킹 소프트웨어 및/또는 하드웨어의 조합을 포함할 수 있다. 예를 들어, 네트워크(102)는, 데이터 패킷들의 헤더에 포함된 정보에 기초하여, 데이터 패킷들을 다양한 컴퓨팅 디바이스들로 라우팅하는 패킷 교환 디바이스(packet-switching device)들을 포함할 수 있다.Network 102 may include any number of networks and/or network types. For example, network 102 may include one or more local area networks (LANs), wide area networks (WANs) (eg, the Internet), virtual private networks (VPNs), mobile networks (eg, cellular networks), Wireless wide area networks (WWANs), Wi-Fi networks, WiMAX® networks, Bluetooth® communication networks, peer-to-peer networks, other interconnected data paths over which multiple devices can communicate in -, various combinations thereof, and the like. Data transmitted by network 102 may include packetized data (eg, Internet Protocol (IP) data packets) that are routed to designated computing devices coupled to network 102 . In some implementations, network 102 may include a combination of wired and wireless (eg, terrestrial or satellite-based transceivers) networking software and/or hardware that interconnects the computing devices of system 100 . For example, network 102 may include packet-switching devices that route data packets to various computing devices based on information included in headers of the data packets.

네트워크(102)를 통해 교환되는 데이터는 HTML(hypertext markup language), XML(extensible markup language), JSON(JavaScript Object Notation), CSV(Comma Separated Values), JDBC(Java DataBase Connectivity), ODBC(Open DataBase Connectivity) 등을 비롯한 기술들 및/또는 포맷들을 사용하여 표현될 수 있다. 그에 부가하여, 링크들 중 전부 또는 일부는 종래의 암호화 기술들, 예를 들어, SSL(secure sockets layer), HTTPS(Secure HTTP) 및/또는 VPN(virtual private network)들 또는 IPsec(Internet Protocol security)을 사용하여 암호화될 수 있다. 다른 실시예에서, 엔티티들은, 앞서 기술된 것들 대신에 또는 그에 부가하여, 커스텀 및/또는 전용 데이터 통신 기술들을 사용할 수 있다. 실시예에 따라, 네트워크(102)는 또한 다른 네트워크들에의 링크들을 포함할 수 있다. 그에 부가하여, 네트워크(102)를 통해 교환되는 데이터가 압축될 수 있다.Data exchanged over the network 102 may include hypertext markup language (HTML), extensible markup language (XML), JavaScript Object Notation (JSON), Comma Separated Values (CSV), Java DataBase Connectivity (JDBC), Open DataBase Connectivity (ODBC). ) and/or the like. In addition, all or some of the links may use conventional encryption techniques, such as secure sockets layer (SSL), Secure HTTP (HTTPS) and/or virtual private networks (VPNs) or Internet Protocol security (IPsec). can be encrypted using In another embodiment, entities may use custom and/or proprietary data communication technologies in place of or in addition to those described above. Depending on the embodiment, network 102 may also include links to other networks. Additionally, data exchanged over the network 102 may be compressed.

클라이언트 디바이스들(106a ... 106n)(개별적으로 그리고 총칭하여 클라이언트 디바이스(106)라고도 지칭됨)은 데이터 처리 및 통신 능력을 가지는 컴퓨팅 디바이스들이다. 도 1이 2개의 클라이언트 디바이스들(106)을 예시하고 있지만, 본 명세서는 하나 이상의 클라이언트 디바이스들(106)을 가지는 임의의 시스템 아키텍처에 적용된다. 일부 실시예들에서, 클라이언트 디바이스(106)는 프로세서(예컨대, 가상, 물리 등), 메모리, 전원, 네트워크 인터페이스, 그리고/또는, 디스플레이, 그래픽 프로세서, 무선 송수신기들, 키보드, 스피커들, 카메라, 센서들, 펌웨어, 운영 체제들, 드라이버들, 다양한 물리적 연결 인터페이스들(예컨대, USB, HDMI 등)과 같은, 다른 소프트웨어 및/또는 하드웨어 컴포넌트들을 포함할 수 있다. 클라이언트 디바이스들(106a ... 106n)은 무선 및/또는 유선 연결을 사용하여 네트워크(102)를 통해 서로 그리고 시스템(100)의 다른 엔티티들에 결합되고 그들과 통신할 수 있다.Client devices 106a ... 106n (referred to individually and collectively as client device 106) are computing devices having data processing and communication capabilities. Although FIG. 1 illustrates two client devices 106 , this disclosure applies to any system architecture having one or more client devices 106 . In some embodiments, the client device 106 includes a processor (eg, virtual, physical, etc.), memory, power source, network interface, and/or display, graphics processor, wireless transceivers, keyboard, speakers, camera, sensor files, firmware, operating systems, drivers, and other software and/or hardware components, such as various physical connection interfaces (eg, USB, HDMI, etc.). Client devices 106a ... 106n may couple to and communicate with each other and other entities of system 100 via network 102 using wireless and/or wired connections.

클라이언트 디바이스들(106)의 예들은 자동차, 로봇, 휴대폰(예컨대, 피처 폰, 스마트폰 등), 태블릿, 랩톱, 데스크톱, 넷북, 서버 기기(server appliance), 서버, 가상 머신, TV, 셋톱 박스, 미디어 스트리밍 디바이스, 휴대용 미디어 플레이어, 내비게이션 디바이스, PDA(personal digital assistant) 등을 포함할 수 있지만, 이들로 제한되지 않는다. 2개 이상의 클라이언트 디바이스들(106)이 도 1에 도시되어 있지만, 시스템(100)은 임의의 수의 클라이언트 디바이스들(106)을 포함할 수 있다. 그에 부가하여, 클라이언트 디바이스들(106a ... 106n)은 동일하거나 상이한 유형들의 컴퓨팅 디바이스들일 수 있다. 예를 들어, 일 실시예에서, 클라이언트 디바이스(106a)는 자동차이고, 클라이언트 디바이스(106n)는 휴대폰이다.Examples of client devices 106 include automobiles, robots, cell phones (eg, feature phones, smart phones, etc.), tablets, laptops, desktops, netbooks, server appliances, servers, virtual machines, TVs, set top boxes, may include, but are not limited to, a media streaming device, a portable media player, a navigation device, a personal digital assistant (PDA), and the like. Although two or more client devices 106 are shown in FIG. 1 , system 100 may include any number of client devices 106 . Additionally, the client devices 106a ... 106n may be the same or different types of computing devices. For example, in one embodiment, the client device 106a is a car and the client device 106n is a mobile phone.

도시된 구현에서, 클라이언트 디바이스(106a)는 클라이언트측 음성 및 연결 엔진(109a), 자동 음성 인식 엔진(111a) 및 텍스트-음성 변환 엔진(119a)의 인스턴스를 포함한다. 도시되어 있지는 않지만, 클라이언트 디바이스(106n)는 클라이언트측 음성 및 연결 엔진(109n), 자동 음성 인식 엔진(111n) 및 텍스트-음성 변환 엔진(119n)의 그 자신의 인스턴스를 포함할 수 있다. 일 실시예에서, 클라이언트측 음성 및 연결 엔진(109), 자동 음성 인식 엔진(111) 및 텍스트-음성 변환 엔진(119)의 인스턴스는 클라이언트 디바이스(106)의 메모리에 저장가능하고 클라이언트 디바이스(106)의 프로세서에 의해 실행가능하다.In the illustrated implementation, the client device 106a includes instances of a client-side voice and connection engine 109a, an automatic speech recognition engine 111a, and a text-to-speech engine 119a. Although not shown, the client device 106n may include its own instances of a client-side voice and connection engine 109n, an automatic speech recognition engine 111n, and a text-to-speech engine 119n. In one embodiment, instances of the client-side voice and connection engine 109 , the automatic speech recognition engine 111 , and the text-to-speech engine 119 are storable in the memory of the client device 106 and the client device 106 . executable by the processor of

텍스트-음성 변환(TTS) 서버(116), 자동 음성 인식(ASR) 서버(110) 및 음성 및 연결 서버(122)는 데이터 처리, 저장, 및 통신 능력을 가지는 하나 이상의 컴퓨팅 디바이스들을 포함할 수 있다. 예를 들어, 이 엔티티들(110, 116, 122)은 하나 이상의 하드웨어 서버들, 서버 어레이들, 저장 디바이스들, 시스템들 등을 포함할 수 있고, 그리고/또는 집중되거나 분산되고/클라우드 기반일 수 있다. 일부 구현들에서, 이 엔티티들(110, 116, 122)은, 호스트 서버 환경에서 동작하고 추상화 계층(예컨대, 가상 머신 관리자)을 통해, 예를 들어, 프로세서, 메모리, 저장소, 네트워크 인터페이스들 등을 비롯한 호스트 서버의 물리적 하드웨어에 액세스하는, 하나 이상의 가상 서버들을 포함할 수 있다.Text-to-speech (TTS) server 116 , automatic speech recognition (ASR) server 110 , and voice and connectivity server 122 may include one or more computing devices having data processing, storage, and communication capabilities. . For example, these entities 110 , 116 , 122 may include one or more hardware servers, server arrays, storage devices, systems, etc., and/or may be centralized or distributed/cloud based. have. In some implementations, these entities 110 , 116 , 122 operate in a host server environment and implement, for example, processor, memory, storage, network interfaces, etc., through an abstraction layer (eg, virtual machine manager). It may include one or more virtual servers, including access to the physical hardware of the host server.

자동 음성 인식(ASR) 엔진(111)은 자동 음성 인식을 수행한다. 예를 들어, 일 실시예에서, ASR 엔진(111)은 오디오(예컨대, 음성) 입력을 수신하고 오디오를 텍스트 스트링으로 변환한다. ASR 엔진들(111)의 예들은 Nuance, Google Voice, Telisma/OnMobile 등을 포함하지만, 이들로 제한되지 않는다.The automatic speech recognition (ASR) engine 111 performs automatic speech recognition. For example, in one embodiment, the ASR engine 111 receives audio (eg, voice) input and converts the audio to a text string. Examples of ASR engines 111 include, but are not limited to, Nuance, Google Voice, Telisma/OnMobile, and the like.

실시예에 따라, ASR 엔진(111)은 온보드(on-board), 오프보드(off-board) 또는 이들의 조합일 수 있다. 예를 들어, 일 실시예에서, ASR 엔진(111)이 온보드이고 ASR이 클라이언트 디바이스(106) 상에서 ASR 엔진(111a) 및 ASR 엔진(111x)에 의해 수행되며, ASR 서버(110)가 생략될 수 있다. 다른 예에서, 일부 실시예에서, ASR 엔진(111)이 오프보드(예컨대, 스트리밍 또는 릴레이)이고 ASR이 ASR 서버(110) 상에서 ASR 엔진(111x)에 의해 수행되며, ASR 엔진(111a)이 생략될 수 있다. 또 다른 예에서, ASR이 클라이언트 디바이스(106)에서 ASR 엔진(111a)에 의해서도 그리고 ASR 서버(110)에서 ASR 엔진(111x)에 의해서도 수행된다.According to an embodiment, the ASR engine 111 may be on-board, off-board, or a combination thereof. For example, in one embodiment, ASR engine 111 is onboard and ASR is performed by ASR engine 111a and ASR engine 111x on client device 106, and ASR server 110 may be omitted. have. In another example, in some embodiments, the ASR engine 111 is offboard (eg, streaming or relay) and ASR is performed by the ASR engine 111x on the ASR server 110 , and the ASR engine 111a is omitted. can be In another example, ASR is performed both by the ASR engine 111a at the client device 106 and by the ASR engine 111x at the ASR server 110 .

텍스트-음성 변환(TTS) 엔진(119)은 텍스트-음성 변환을 수행한다. 예를 들어, 일 실시예에서, TTS 엔진(119)은 텍스트 또는 다른 비음성 입력(예컨대, 도 3의 차선책 엔진(work around engine)(328)을 참조하여 이하에서 논의되는 바와 같은 부가 정보에 대한 요청)을 수신하고 클라이언트 디바이스(106)의 오디오 출력을 통해 사용자(112)에게 제시되는 사람 인식가능 음성을 출력한다. ASR 엔진들(111)의 예들은 Nuance, Google Voice, Telisma/OnMobile, Creawave, Acapella 등을 포함하지만, 이들로 제한되지 않는다.A text-to-speech (TTS) engine 119 performs text-to-speech conversion. For example, in one embodiment, the TTS engine 119 may provide text or other non-voice input (eg, additional information as discussed below with reference to the work around engine 328 of FIG. 3 ). request) and output a human recognizable voice presented to the user 112 via the audio output of the client device 106 . Examples of ASR engines 111 include, but are not limited to, Nuance, Google Voice, Telisma/OnMobile, Creawave, Acapella, and the like.

실시예에 따라, TTS 엔진(119)은 온보드, 오프보드 또는 이들의 조합일 수 있다. 예를 들어, 일 실시예에서, TTS 엔진(119)이 온보드이고 TTS가 클라이언트 디바이스(106) 상에서 TTS 엔진(119a) 및 TTS 엔진(119x)에 의해 수행되며, TTS 서버(116)가 생략될 수 있다. 다른 예에서, 일부 실시예에서, TTS 엔진(119)이 오프보드(예컨대, 스트리밍 또는 릴레이)이고 TTS가 TTS 서버(116) 상에서 TTS 엔진(119x)에 의해 수행되며, TTS 엔진(119a)이 생략될 수 있다. 또 다른 예에서, TTS가 클라이언트 디바이스(106)에서 TTS 엔진(116a)에 의해서도 그리고 TTS 서버(116)에서 TTS 엔진(116x)에 의해서도 수행된다.Depending on the embodiment, the TTS engine 119 may be on-board, off-board, or a combination thereof. For example, in one embodiment, TTS engine 119 is onboard and TTS is performed by TTS engine 119a and TTS engine 119x on client device 106, and TTS server 116 may be omitted. have. In another example, in some embodiments, TTS engine 119 is offboard (eg, streaming or relay) and TTS is performed by TTS engine 119x on TTS server 116 , and TTS engine 119a is omitted. can be In another example, TTS is performed by the TTS engine 116a at the client device 106 and also by the TTS engine 116x at the TTS server 116 .

예시된 실시예에서, 음성 및 연결 엔진이 2개의 컴포넌트들(109, 124)로 분할되고; 하나는 클라이언트측에 있고 하나는 서버측에 있다. 실시예에 따라, 음성 및 연결 엔진이 온보드, 오프보드 또는 이 둘의 하이브리드일 수 있다. 다른 예에서, 일 실시예에서, 음성 및 연결 엔진이 온보드이고, 도 3 및 도 4와 관련하여 이하에서 논의되는 특징들 및 기능이 클라이언트 디바이스(106) 상에서 수행된다. 다른 예에서, 일 실시예에서, 음성 및 연결 엔진이 오프보드이고, 도 3 및 도 4와 관련하여 이하에서 논의되는 특징들 및 기능이 음성 및 연결 서버(122) 상에서 수행된다. 또 다른 예에서, 일 실시예에서, 음성 및 연결 엔진이 하이브리드이고, 도 3 및 도 4와 관련하여 이하에서 논의되는 특징들 및 기능이 클라이언트측 음성 및 연결 엔진(109)과 서버측 음성 및 연결 엔진(124) 간에 분할되어 있다. 그렇지만, 특징들 및 기능이 도 3 및 도 4의 예시된 실시예들과 상이한 방식으로 분할될 수 있다는것을 잘 알 것이다. 일 실시예에서, 음성 및 연결 엔진은, 컨텍스트 및 인공 지능을 사용하고 사용자(112)와의 자연스러운 대화를 제공하는, 음성 어시스턴트를 제공하고, 사용자 요청들에서의 단점들(예컨대, 음성 인식의 실패)을 회피할 수 있다.In the illustrated embodiment, the voice and connectivity engine is split into two components 109 , 124 ; One is on the client side and one is on the server side. Depending on the embodiment, the voice and connectivity engine may be onboard, offboard, or a hybrid of the two. In another example, in one embodiment, the voice and connectivity engine is onboard and the features and functionality discussed below with respect to FIGS. 3 and 4 are performed on the client device 106 . In another example, in one embodiment, the voice and connectivity engine is offboard, and the features and functionality discussed below with respect to FIGS. 3 and 4 are performed on the voice and connectivity server 122 . In another example, in one embodiment, the voice and connectivity engine is a hybrid, and the features and functionality discussed below with respect to FIGS. 3 and 4 include the client-side voice and connectivity engine 109 and the server-side voice and connectivity. It is divided between engines 124 . It will be appreciated, however, that features and functionality may be divided in different ways than the illustrated embodiments of FIGS. 3 and 4 . In one embodiment, the voice and connection engine provides a voice assistant that uses context and artificial intelligence and provides a natural conversation with the user 112 , and disadvantages in user requests (eg, failure to recognize voice). can be avoided.

일 실시예에서, 클라이언트측(온보드) 음성 및 연결 엔진(109)은 대화를 관리하고, 확장된 시맨틱 처리(semantic processing)를 위해 서버측(오프보드) 음성 및 연결 플랫폼(124)에 연결한다. 이러한 실시예는 유익하게도 이 둘 사이의 연결의 상실 및 복구를 가능하게 하기 위해 동기화를 제공할 수 있다. 예를 들어, 사용자가 터널을 통해 지나가고 있고 네트워크(102) 연결을 갖지 않는 것으로 가정하자. 일 실시예에서, 시스템(100)이 네트워크(102) 연결의 결여를 검출하고, 실행할 자동 음성 인식 엔진(111) 및 자연어 이해 엔진(326)의 "라이트(lite)" 로컬 버전을 사용하여, 음성 입력(즉, 질의/요청)을 클라이언트 디바이스(106) 상에서 로컬적으로 분석할 때, 그러나 네트워크(102) 연결이 이용가능할 때, ASR 및 자연어 이해(Natural Language Understanding)(NLU)가 보다 큰 시맨틱스, 어휘들 및 처리 능력을 제공하는 그 엔진들의 서버측 버전들에서 수행된다. 일 실시예에서, 사용자의 요청이 네트워크(102) 연결을 필요로 하면, 시스템은 시스템이 네트워크(102) 연결을 갖지 않는다고 사용자에게 구두로 통지할 수 있고, 네트워크(102) 연결이 재구축될 때 사용자의 요청이 처리될 것이다.In one embodiment, a client-side (onboard) voice and connectivity engine 109 manages the conversation and connects to a server-side (offboard) voice and connectivity platform 124 for extended semantic processing. Such an embodiment may advantageously provide synchronization to enable loss and recovery of the connection between the two. For example, assume a user is passing through a tunnel and does not have a network 102 connection. In one embodiment, system 100 detects a lack of network 102 connectivity and uses a "lite" local version of automatic speech recognition engine 111 and natural language understanding engine 326 to execute, ASR and Natural Language Understanding (NLU) have greater semantics when parsing input (ie, queries/requests) locally on the client device 106 , but when network 102 connectivity is available. , vocabularies, and processing power in server-side versions of those engines. In one embodiment, if the user's request requires a network 102 connection, the system may verbally notify the user that the system does not have a network 102 connection, and when the network 102 connection is re-established. Your request will be processed.

도 1에 예시된 시스템(100)이 일 실시예에 따른 음성 및 연결에 대한 예시적인 시스템을 나타낸다는 것과 각종의 상이한 시스템 환경들 및 구성들이 생각되고 본 개시내용의 범주 내에 있다는 것을 잘 알 것이다. 예를 들어, 다양한 기능이 서버로부터 클라이언트로 또는 그 반대로 이동될 수 있고, 일부 구현들은 부가의 또는 보다 적은 컴퓨팅 디바이스들, 서버들, 및/또는 네트워크들을 포함할 수 있으며, 다양한 기능을 클라이언트측 또는 서버측에서 구현할 수 있다. 게다가, 시스템(100)의 다양한 엔티티들이 단일의 컴퓨팅 디바이스 또는 시스템 내에 통합되거나 부가의 컴퓨팅 디바이스들 또는 시스템들 간에 분할되거나, 기타일 수 있다. It will be appreciated that the system 100 illustrated in FIG. 1 represents an example system for voice and connectivity in accordance with one embodiment, and that a variety of different system environments and configurations are contemplated and are within the scope of the present disclosure. For example, various functionality may be moved from server to client and vice versa, and some implementations may include additional or fewer computing devices, servers, and/or networks, and various functionality may be moved client-side or It can be implemented on the server side. Moreover, the various entities of system 100 may be integrated into a single computing device or system, partitioned among additional computing devices or systems, or the like.

도 2는 일 실시예에 따른, 예시적인 컴퓨팅 디바이스(200)의 블록도이다. 컴퓨팅 디바이스(200)는, 예시된 바와 같이, 통신 버스(206)에 의해 통신가능하게 결합될 수 있는, 프로세서(202), 메모리(204), 통신 유닛(208), 및 저장 디바이스(241)를 포함할 수 있다. 도 2에 도시된 컴퓨팅 디바이스(200)는 예로서 제공되고, 본 개시내용의 범주를 벗어남이 없이 컴퓨팅 디바이스(200)가 다른 형태들을 취할 수 있고 부가의 또는 보다 적은 컴포넌트들을 포함할 수 있다는 것을 잘 알 것이다. 예를 들어, 도시되어 있지는 않지만, 컴퓨팅 디바이스(200)는 입력 및 출력 디바이스들(예컨대, 디스플레이, 키보드, 마우스, 터치 스크린, 스피커 등), 다양한 운영 체제들, 센서들, 부가의 프로세서들, 및 다른 물리적 구성들을 포함할 수 있다. 그에 부가하여, 도 2에 도시되고 본원에 기술되는 컴퓨터 아키텍처가 다양한 수정들을 갖는 시스템(100) 내의 다수의 엔티티들 - 예를 들어, TTS 서버(116)(예컨대, TTS 엔진(119)을 포함시키고 다른 예시된 엔진들을 생략하는 것에 의해), ASR 서버(110)(예컨대, ASR 엔진(111)을 포함시키고 다른 예시된 엔진들을 생략하는 것에 의해), 클라이언트 디바이스(106)(예컨대, 서버측 음성 및 연결 엔진(124)을 생략하는 것에 의해) 그리고 음성 및 연결 서버(122)(예컨대, 서버측 음성 및 연결 엔진(124)을 포함시키고 다른 예시된 엔진들을 생략하는 것에 의해)를 포함함 - 에 적용될 수 있다는 것을 잘 알 것이다.2 is a block diagram of an exemplary computing device 200 , according to one embodiment. Computing device 200 includes a processor 202 , memory 204 , communication unit 208 , and storage device 241 , which may be communicatively coupled by a communication bus 206 , as illustrated. may include The computing device 200 shown in FIG. 2 is provided by way of example, and it should be understood that the computing device 200 may take other forms and include additional or fewer components without departing from the scope of the present disclosure. you will know For example, although not shown, computing device 200 may include input and output devices (eg, display, keyboard, mouse, touch screen, speaker, etc.), various operating systems, sensors, additional processors, and It may include other physical configurations. In addition, the computer architecture shown in FIG. 2 and described herein includes a number of entities within the system 100 having various modifications - eg, a TTS server 116 (eg, a TTS engine 119 ); By omitting the other illustrated engines), the ASR server 110 (eg, by including the ASR engine 111 and omitting the other illustrated engines), the client device 106 (eg, server-side voice and by omitting the connection engine 124) and the voice and connection server 122 (eg, by including the server-side voice and connection engine 124 and omitting the other illustrated engines). you will know that you can

프로세서(202)는 본원에 기술되는 특징들 및 기능을 제공하기 위해 다양한 입력, 논리적, 및/또는 수학적 연산들을 수행하는 것에 의해 소프트웨어 명령어들을 실행하는 산술 논리 유닛, 마이크로프로세서, 범용 제어기, FPGA(field programmable gate array), ASIC(application specific integrated circuit), 또는 어떤 다른 프로세서 어레이, 또는 이들의 어떤 조합을 포함한다. 프로세서(202)는 다양한 입출력, 논리적, 및/또는 수학적 연산들을 수행하는 것에 의해 코드, 루틴들 및 소프트웨어 명령어들을 실행할 수 있다. 프로세서(202)는, 예를 들어, CISC(complex instruction set computer) 아키텍처, RISC(reduced instruction set computer) 아키텍처, 및/또는 명령어 세트들의 조합을 구현하는 아키텍처를 비롯한, 데이터 신호들을 처리하는 다양한 컴퓨터 아키텍처들을 갖는다. 프로세서(202)는 물리 및/또는 가상일 수 있고, 단일 코어 또는 복수의 처리 유닛들 및/또는 코어들을 포함할 수 있다. 일부 구현들에서, 프로세서(202)는 전자 디스플레이 신호들을 발생시켜 디스플레이 디바이스(도시되지 않음)에 제공하는 것, 영상들을 디스플레이하는 것을 지원하는 것, 영상들을 포착하여 전송하는 것, 다양한 유형들의 특징 추출 및 샘플링을 비롯한 복잡한 작업들을 수행하는 것 등을 할 수 있을 것이다. 일부 구현들에서, 프로세서(202)는, 메모리(204)로부터 데이터 및 명령어들에 액세스하고 데이터를 메모리(204)에 저장하기 위해, 버스(206)를 통해 메모리(204)에 결합될 수 있다. 버스(206)는 프로세서(202)를, 예를 들어, 메모리(204), 통신 유닛(208), 및 저장 디바이스(241)를 비롯한, 애플리케이션 서버(122)의 다른 컴포넌트들에 결합시킬 수 있다.The processor 202 may be an arithmetic logic unit, microprocessor, general purpose controller, FPGA (field) that executes software instructions by performing various input, logical, and/or mathematical operations to provide the features and functionality described herein. programmable gate array), application specific integrated circuit (ASIC), or any other processor array, or any combination thereof. Processor 202 may execute code, routines, and software instructions by performing various input/output, logical, and/or mathematical operations. The processor 202 may be configured for a variety of computer architectures that process data signals, including, for example, a complex instruction set computer (CISC) architecture, a reduced instruction set computer (RISC) architecture, and/or an architecture implementing a combination of instruction sets. have them The processor 202 may be physical and/or virtual and may include a single core or multiple processing units and/or cores. In some implementations, the processor 202 generates and provides electronic display signals to a display device (not shown), assists in displaying images, captures and transmits images, extracts various types of features and performing complex tasks including sampling. In some implementations, the processor 202 may be coupled to the memory 204 via a bus 206 to access data and instructions from and store data in the memory 204 . Bus 206 may couple processor 202 to other components of application server 122 , including, for example, memory 204 , communication unit 208 , and storage device 241 .

메모리(204)는 데이터를 저장하고, 데이터에의 액세스를 컴퓨팅 디바이스(200)의 다른 컴포넌트들에 제공할 수 있다. 일부 구현들에서, 메모리(204)는 프로세서(202)에 의해 실행될 수 있는 명령어들 및/또는 데이터를 저장할 수 있다. 예를 들어, 도시되어 있는 바와 같이, 메모리(204)는 하나 이상의 엔진들(109, 111, 119, 124)을 저장할 수 있다. 메모리(204)는 또한, 예를 들어, 운영 체제, 하드웨어 드라이버들, 소프트웨어 애플리케이션들, 데이터베이스들 등을 비롯한, 다른 명령어들 및 데이터를 저장할 수 있다. 메모리(204)는 프로세서(202) 및 컴퓨팅 디바이스(200)의 다른 컴포넌트들과 통신하기 위해 버스(206)에 결합될 수 있다.Memory 204 may store data and provide access to the data to other components of computing device 200 . In some implementations, memory 204 may store instructions and/or data that may be executed by processor 202 . For example, as shown, memory 204 may store one or more engines 109 , 111 , 119 , 124 . Memory 204 may also store other instructions and data, including, for example, an operating system, hardware drivers, software applications, databases, and the like. Memory 204 may be coupled to bus 206 to communicate with processor 202 and other components of computing device 200 .

메모리(204)는, 프로세서(202)에 의한 또는 그와 관련한 처리를 위한 명령어들, 데이터, 컴퓨터 프로그램들, 소프트웨어, 코드, 루틴들 등을 포함하거나, 저장하거나, 전달하거나, 전파하거나 또는 전송할 수 있는 임의의 장치 또는 디바이스일 수 있는, 비일시적 컴퓨터 사용가능(예컨대, 판독가능, 기입가능, 기타) 매체를 포함한다. 일부 구현들에서, 메모리(204)는 휘발성 메모리와 비휘발성 메모리 중 하나 이상을 포함할 수 있다. 예를 들어, 메모리(204)는 DRAM(dynamic random access memory) 디바이스, SRAM(static random access memory) 디바이스, 개별 메모리 디바이스(예컨대, PROM, FPROM, ROM), 하드 디스크 드라이브, 광학 디스크 드라이브(CD, DVD, Blue-ray^TM 등) 중 하나 이상을 포함할 수 있지만, 이들로 제한되지 않는다. 메모리(204)가 단일의 디바이스일 수 있거나 다수의 유형들의 디바이스들 및 구성들을 포함할 수 있다는 것을 잘 알 것이다.Memory 204 may contain, store, transfer, propagate, or transmit instructions, data, computer programs, software, code, routines, etc. for processing by or in connection with processor 202 . includes non-transitory computer-usable (eg, readable, writable, other) media, which can be any apparatus or device that exists. In some implementations, memory 204 may include one or more of volatile memory and non-volatile memory. For example, memory 204 may include a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, a discrete memory device (eg, PROM, FPROM, ROM), a hard disk drive, an optical disk drive (CD, DVD, Blu-ray ^TM, etc.), but is not limited thereto. It will be appreciated that memory 204 may be a single device or may include multiple types of devices and configurations.

버스(206)는 컴퓨팅 디바이스의 컴포넌트들 사이에서 또는 컴퓨팅 디바이스들(106/110/116/122) 사이에서 데이터를 전달하기 위한 통신 버스, 네트워크(102) 또는 그의 부분들을 포함하는 네트워크 버스 시스템, 프로세서 메시(processor mesh), 이들의 조합 등을 포함할 수 있다. 일부 구현들에서, 엔진들(109, 111, 119, 124), 그들의 서브컴포넌트들 및 컴퓨팅 디바이스(200) 상에서 동작하는 다양한 소프트웨어(예컨대, 운영 체제, 디바이스 드라이버들 등)는 협력하고, 버스(206)와 관련하여 구현되는 소프트웨어 통신 메커니즘을 통해 통신할 수 있다. 소프트웨어 통신 메커니즘은, 예를 들어, 프로세스간 통신, 로컬 함수 또는 프로시저 호출들, 원격 프로시저 호출들, 객체 브로커(예컨대, CORBA), 소프트웨어 모듈들 간의 직접 소켓 통신(direct socket communication)(예컨대, TCP/IP 소켓), UDP 브로드캐스트들 및 수신들, HTTP 연결들 등을 포함하고 그리고/또는 용이하게 할 수 있다. 게다가, 통신의 임의의 것 또는 전부가 안전할 수 있을 것이다(예컨대, SSL, HTTPS 등).The bus 206 is a communication bus for transferring data between components of the computing device or between the computing devices 106/110/116/122, a network bus system including the network 102 or portions thereof, a processor a processor mesh, combinations thereof, and the like. In some implementations, engines 109 , 111 , 119 , 124 , their subcomponents, and various software (eg, operating system, device drivers, etc.) running on computing device 200 cooperate, and bus 206 ) through a software communication mechanism implemented in connection with A software communication mechanism may be, for example, interprocess communication, local function or procedure calls, remote procedure calls, an object broker (eg CORBA), direct socket communication between software modules (eg, TCP/IP socket), UDP broadcasts and receptions, HTTP connections, and/or the like. Moreover, any or all of the communication may be secure (eg, SSL, HTTPS, etc.).

통신 유닛(208)은 네트워크(102)와의 유선 및/또는 무선 연결을 위한 하나 이상의 인터페이스 디바이스들(I/F)을 포함할 수 있다. 예를 들어, 통신 유닛(208)은 CAT-유형 인터페이스들; 모바일 네트워크(103)와의 통신을 위한 무선 송수신기(radio transceiver)들(4G, 3G, 2G 등), 및 Wi-Fi^TM 및 근접(close-proximity)(예컨대, Bluetooth®, NFC 등) 연결 등을 위한 무선 송수신기들을 사용하여 신호들을 송신 및 수신하기 위한 무선 송수신기(wireless transceiver)들; USB 인터페이스들; 이들의 다양한 조합들; 기타를 포함할 수 있지만, 이들로 제한되지 않는다. 일부 구현들에서, 통신 유닛(208)은 프로세서(202)를 네트워크(102)에 링크시킬 수 있고, 네트워크(102)는 차례로 다른 처리 시스템들에 결합될 수 있다. 통신 유닛(208)은, 예를 들어, 본원의 다른 곳에서 논의되는 것들을 비롯한, 다양한 표준 네트워크 통신 프로토콜들을 사용하여, 네트워크(102)에의 그리고 시스템(100)의 다른 엔티티들에의 다른 연결들을 제공할 수 있다.The communication unit 208 may include one or more interface devices (I/F) for wired and/or wireless connection with the network 102 . For example, the communication unit 208 may include CAT-type interfaces; radio transceivers (4G, 3G, 2G, etc.) for communication with the mobile network 103, and for Wi-Fi ^TM and close-proximity (eg, Bluetooth®, NFC, etc.) connections, etc. wireless transceivers for transmitting and receiving signals using the wireless transceivers; USB interfaces; various combinations thereof; Others may include, but are not limited to. In some implementations, the communication unit 208 may link the processor 202 to a network 102 , which in turn may be coupled to other processing systems. The communication unit 208 provides other connections to the network 102 and to other entities of the system 100 using, for example, a variety of standard network communication protocols, including those discussed elsewhere herein. can do.

저장 디바이스(241)는 데이터를 저장하고 그에의 액세스를 제공하는 정보 소스이다. 일부 구현들에서, 저장 디바이스(241)는 데이터를 수신하고 그에의 액세스를 제공하기 위해 버스(206)를 통해 컴퓨팅 디바이스의 컴포넌트들(202, 204, 및 208)에 결합될 수 있다. 저장 디바이스(241)에 의해 저장되는 데이터는 컴퓨팅 디바이스(200) 및 실시예에 기초하여 변할 수 있다. 예를 들어, 일 실시예에서, 클라이언트 디바이스(106)의 저장 디바이스(241)는 사용자의 현재 컨텍스트 및 세션에 관한 정보를 저장할 수 있고, 음성 및 연결 서버(122)의 저장 디바이스(241)는 중기 및 장기 컨텍스트들, 기계 학습(machine learning)을 위해 사용되는 집계된 사용자 데이터 등을 저장한다.Storage device 241 is an information source that stores data and provides access to it. In some implementations, storage device 241 may be coupled to components 202 , 204 , and 208 of a computing device via bus 206 to receive data and provide access to it. The data stored by the storage device 241 may vary based on the computing device 200 and the embodiment. For example, in one embodiment, the storage device 241 of the client device 106 may store information regarding the user's current context and session, and the storage device 241 of the voice and connectivity server 122 is and long-term contexts, aggregated user data used for machine learning, and the like.

저장 디바이스(241)는 컴퓨팅 디바이스(200) 및/또는 컴퓨팅 디바이스(200)와 별개의 것이지만 그에 결합되거나 그에 의해 액세스가능한 저장 시스템에 포함될 수 있다. 저장 디바이스(241)는 데이터를 저장하기 위한 하나 이상의 비일시적 컴퓨터 판독가능 매체들을 포함할 수 있다. 일부 구현들에서, 저장 디바이스(241)는 메모리(204)에 포함될 수 있거나 그와 별개의 것일 수 있다. 일부 구현들에서, 저장 디바이스(241)는 애플리케이션 서버(122) 상에서 동작가능한 데이터베이스 관리 시스템(DBMS)을 포함할 수 있다. 예를 들어, DBMS는 SQL(structured query language) DBMS, NoSQL DMBS, 이들의 다양한 조합들 등을 포함할 수 있을 것이다. 어떤 경우에, DBMS는 데이터를 행들과 열들로 이루어진 다차원 테이블들에 저장하고, 프로그램적 동작들을 사용하여 데이터의 행들을 조작, 즉 삽입, 질의, 업데이트 및/또는 삭제할 수 있다.The storage device 241 may be included in the computing device 200 and/or a storage system separate from but coupled to or accessible by the computing device 200 . Storage device 241 may include one or more non-transitory computer-readable media for storing data. In some implementations, storage device 241 may be included in or separate from memory 204 . In some implementations, storage device 241 may include a database management system (DBMS) operable on application server 122 . For example, the DBMS may include a structured query language (SQL) DBMS, NoSQL DMBS, various combinations thereof, and the like. In some cases, the DBMS stores data in multidimensional tables of rows and columns, and can use programmatic operations to manipulate rows of data: insert, query, update, and/or delete.

앞서 언급된 바와 같이, 컴퓨팅 디바이스(200)는 다른 및/또는 보다 적은 컴포넌트들을 포함할 수 있다. 다른 컴포넌트들의 예들은 디스플레이, 입력 디바이스, 센서 등(도시되지 않음)을 포함할 수 있다. 일 실시예에서, 컴퓨팅 디바이스는 디스플레이를 포함한다. 디스플레이는, 예를 들어, OLED(organic light-emitting diode) 디스플레이, LCD(liquid crystal display) 등을 비롯한, 임의의 종래의 디스플레이 디바이스, 모니터 또는 화면을 포함할 수 있다. 일부 구현들에서, 디스플레이는 스타일러스, 사용자(112)의 하나 이상의 손가락들 등으로부터 입력을 수신할 수 있는 터치 스크린 디스플레이일 수 있다. 예를 들어, 디스플레이는 디스플레이 표면과의 다수의 접촉점들을 검출하고 해석할 수 있는 용량성 터치 스크린 디스플레이일 수 있다.As noted above, computing device 200 may include other and/or fewer components. Examples of other components may include a display, input device, sensor, etc. (not shown). In one embodiment, the computing device includes a display. The display may include, for example, any conventional display device, monitor, or screen, including organic light-emitting diode (OLED) displays, liquid crystal displays (LCDs), and the like. In some implementations, the display may be a touch screen display capable of receiving input from a stylus, one or more fingers of user 112 , or the like. For example, the display may be a capacitive touch screen display capable of detecting and interpreting multiple points of contact with the display surface.

입력 디바이스(도시되지 않음)는 정보를 애플리케이션 서버(122)에 입력하기 위한 임의의 디바이스를 포함할 수 있다. 일부 구현들에서, 입력 디바이스는 하나 이상의 주변 기기(peripheral device)들을 포함할 수 있다. 예를 들어, 입력 디바이스는 키보드(예컨대, QWERTY 키보드 또는 임의의 다른 언어의 키보드), 포인팅 디바이스(예컨대, 마우스 또는 터치패드), 마이크로폰, 영상/비디오 포착 디바이스(예컨대, 카메라) 등을 포함할 수 있다. 일 실시예에서, 컴퓨팅 디바이스(200)는 클라이언트 디바이스(106)를 대표할 수 있고, 클라이언트 디바이스(106)는 음성 입력을 수신하기 위한 마이크로폰 및 텍스트-음성 변환(TTS)을 용이하게 하기 위한 스피커들을 포함한다. 일부 구현들에서, 입력 디바이스는 사용자(112)의 하나 이상의 손가락들로부터 입력을 수신할 수 있는 터치 스크린 디스플레이를 포함할 수 있다. 예를 들어, 사용자(112)는 키보드 영역들에서 디스플레이와 접촉하기 위해 손가락들을 사용함으로써 터치 스크린 디스플레이 상에 디스플레이되는 에뮬레이트된(즉, 가상 또는 소프트) 키보드와 상호작용할 수 있을 것이다.An input device (not shown) may include any device for inputting information into the application server 122 . In some implementations, the input device can include one or more peripheral devices. For example, an input device may include a keyboard (eg, a QWERTY keyboard or a keyboard in any other language), a pointing device (eg, a mouse or touchpad), a microphone, an image/video capture device (eg, a camera), etc. have. In one embodiment, computing device 200 can be representative of client device 106 , which includes a microphone for receiving voice input and speakers for facilitating text-to-speech (TTS). include In some implementations, the input device can include a touch screen display that can receive input from one or more fingers of the user 112 . For example, user 112 may interact with an emulated (ie, virtual or soft) keyboard displayed on a touch screen display by using fingers to contact the display in keyboard areas.

예시적인 클라이언트측 음성 및 연결 엔진(109)Exemplary client-side voice and connection engine 109

이제 도 3을 참조하면, 일 실시예에 따른, 예시적인 클라이언트측 음성 및 연결 엔진(109)의 블록도가 예시되어 있다. 예시된 실시예에서, 클라이언트측 음성 및 연결 엔진(109)은 자동 음성 인식(ASR) 엔진(322), 클라이언트측 컨텍스트 홀더(client-side context holder)(324), 자연어 이해(NLU) 엔진(326), 차선책 엔진(328) 및 연결 엔진(330)을 포함한다.Referring now to FIG. 3 , illustrated is a block diagram of an exemplary client-side voice and connectivity engine 109 , in accordance with one embodiment. In the illustrated embodiment, the client-side speech and connection engine 109 includes an automatic speech recognition (ASR) engine 322 , a client-side context holder 324 , and a natural language understanding (NLU) engine 326 . ), a suboptimal engine 328 and a connection engine 330 .

자동 음성 인식(ASR) 상호작용 엔진(322)은 자동 음성 인식(ASR) 엔진(111)과 상호작용하기 위한 코드 및 루틴들을 포함한다. 일 실시예에서, ASR 상호작용 엔진(322)은 프로세서(202)에 의해 실행가능한 한 세트의 명령어들이다. 다른 실시예에서, ASR 상호작용 엔진(322)은 메모리(204)에 저장되고, 프로세서(202)에 의해 액세스가능하며 실행가능하다. 어느 실시예에서나, ASR 상호작용 엔진(322)은 프로세서(202), ASR 엔진(111), 및 시스템(100)의 다른 컴포넌트들과의 협력 및 통신을 위해 적합하게 되어 있다.Automatic speech recognition (ASR) interaction engine 322 includes code and routines for interacting with automatic speech recognition (ASR) engine 111 . In one embodiment, the ASR interaction engine 322 is a set of instructions executable by the processor 202 . In another embodiment, the ASR interaction engine 322 is stored in the memory 204 , and is accessible and executable by the processor 202 . In either embodiment, the ASR interaction engine 322 is adapted for cooperation and communication with the processor 202 , the ASR engine 111 , and other components of the system 100 .

ASR 상호작용 엔진(322)은 ASR 엔진(111)과 상호작용한다. 일 실시예에서, ASR 엔진(111)은 클라이언트 디바이스(106)에 로컬이다. 예를 들어, ASR 상호작용 엔진(322)은 ASR 엔진(111a)과 같은 온보드 ASR 애플리케이션인 ASR 엔진(111)과 상호작용한다. 일 실시예에서, ASR 엔진(111)은 클라이언트 디바이스(106)로부터 원격지에 있다. 예를 들어, ASR 상호작용 엔진(322)은 ASR 엔진(111x)과 같은 네트워크(102)를 통해 액세스가능하고 사용되는 오프보드 ASR 애플리케이션인 ASR 엔진(111)과 상호작용한다. 일 실시예에서, ASR 엔진(111)은 클라이언트 디바이스(106)에 로컬인 컴포넌트들과 그로부터 원격지에 있는 컴포넌트들 둘 다를 포함하는 하이브리드이다. 예를 들어, ASR 상호작용 엔진(322)은, 클라이언트 디바이스(106)가 네트워크(102) 연결을 가질 때는, 클라이언트 디바이스(106)에 대한 처리 부담을 줄이고 그의 배터리 수명을 개선시키기 위해, 오프보드 ASR 엔진(111x)과 상호작용하고, 네트워크(102) 연결이 이용가능하지 않거나 불충분할 때는, 온보드 ASR 엔진(111a)과 상호작용한다.ASR interaction engine 322 interacts with ASR engine 111 . In one embodiment, the ASR engine 111 is local to the client device 106 . For example, ASR interaction engine 322 interacts with ASR engine 111 which is an onboard ASR application, such as ASR engine 111a. In one embodiment, the ASR engine 111 is remote from the client device 106 . For example, ASR interaction engine 322 interacts with ASR engine 111 , which is an offboard ASR application accessible and used over network 102 , such as ASR engine 111x . In one embodiment, the ASR engine 111 is a hybrid comprising both components local to the client device 106 and components remote therefrom. For example, the ASR interaction engine 322, when the client device 106 has a network 102 connection, reduces the processing burden on the client device 106 and improves its battery life, the offboard ASR It interacts with engine 111x and when network 102 connectivity is unavailable or insufficient, with onboard ASR engine 111a.

일 실시예에서, ASR 상호작용 엔진(322)은 ASR 엔진(111)의 음성 입력을 개시하는 것에 의해 ASR 엔진(111)과 상호작용한다. 일 실시예에서, ASR 상호작용 엔진(322)은, 하나 이상의 이벤트들을 검출한 것에 응답하여, ASR 엔진(111)의 음성 입력을 개시할 수 있다. 일부 실시예들에서, ASR 상호작용 엔진(322)은, 사용자(112)가 대화를 시작하기를 기다리지 않고, 사전 대응적으로 ASR을 개시한다. 이벤트들의 예들은 웨이크업(wake-up) 단어 또는 문구, 타이머의 만료, 사용자 입력, 내부 이벤트, 외부 이벤트 등을 포함하지만, 이들로 제한되지 않는다.In one embodiment, the ASR interaction engine 322 interacts with the ASR engine 111 by initiating speech input of the ASR engine 111 . In one embodiment, the ASR interaction engine 322 may initiate speech input of the ASR engine 111 in response to detecting one or more events. In some embodiments, the ASR interaction engine 322 proactively initiates the ASR without waiting for the user 112 to initiate a conversation. Examples of events include, but are not limited to, a wake-up word or phrase, expiration of a timer, user input, internal event, external event, and the like.

일 실시예에서, ASR 상호작용 엔진(322)은, 웨이크업 단어 또는 문구를 검출한 것에 응답하여, ASR 엔진(111)의 음성 입력을 개시할 수 있다. 예를 들어, 음성 및 연결 플랫폼이 사용자들과 상호작용하기 위해 페르소나(persona)와 연관되어 있고 페르소나가 이름이 "Sam"이라고 가정하자. 일 실시예에서, ASR 상호작용 엔진(322)은 클라이언트 디바이스의 마이크로폰을 통해 단어 "Sam"이 수신될 때를 검출하고, ASR 엔진(111)에 대한 음성 입력을 개시한다. 다른 예에서, 문구 "이봐!"가 웨이크업 문구로서 할당되어 있는 것으로 가정하고; 일 실시예에서, ASR 상호작용 엔진(322)은 클라이언트 디바이스의 마이크로폰을 통해 문구 "이봐!"가 수신될 때를 검출하고, ASR 엔진(111)에 대한 음성 입력을 개시한다.In one embodiment, the ASR interaction engine 322 may initiate speech input of the ASR engine 111 in response to detecting the wakeup word or phrase. For example, suppose a voice and connectivity platform is associated with a persona for interacting with users, and the persona is named "Sam". In one embodiment, the ASR interaction engine 322 detects when the word “Sam” is received via the microphone of the client device and initiates voice input to the ASR engine 111 . In another example, assume that the phrase “Hey!” is assigned as the wake-up phrase; In one embodiment, the ASR interaction engine 322 detects when the phrase “Hey!” is received via the microphone of the client device and initiates voice input to the ASR engine 111 .

일 실시예에서, ASR 상호작용 엔진(322)은, 타이머의 만료를 검출한 것에 응답하여, ASR 엔진(111)의 음성 입력을 개시한다. 예를 들어, 시스템(100)은 사용자가 오전 7시에 일어나고 오후 6시에 퇴근하며; 일 실시예에서, 오전 7시에 대한 타이머와 오후 6시에 대한 타이머를 설정하며, ASR 상호작용 엔진(322)이 그 시각들에서 ASR 엔진(111)에 대한 음성 입력을 개시한다고 결정할 수 있다. 예를 들어, 따라서 사용자는 오전 7시에 일어날 때는 뉴스 또는 날씨를 요청할 수 있고, 오후 6시에 퇴근할 때는 교통 정보(traffic report)를 요청하거나 배우자에게 통화를 개시하라고 요청할 수 있다.In one embodiment, the ASR interaction engine 322 initiates voice input of the ASR engine 111 in response to detecting expiration of the timer. For example, system 100 may indicate that a user wakes up at 7 am and leaves work at 6 pm; In one embodiment, it may set a timer for 7 am and a timer for 6 pm, and determine that the ASR interaction engine 322 initiates voice input to the ASR engine 111 at those times. For example, a user can thus request news or weather when waking up at 7 am, and request a traffic report or ask a spouse to initiate a call when leaving work at 6 pm.

일 실시예에서, ASR 상호작용 엔진(322)은, 사용자 입력을 검출한 것에 응답하여, ASR 엔진(111)의 음성 입력을 개시한다. 예를 들어, ASR 상호작용 엔진(322)은 제스처(예컨대, 터치 스크린 상에서의 특정 스와이프 또는 움직임) 또는 버튼(물리 또는 소프트/가상) 선택(예컨대, 전용 버튼을 선택하는 것 또는 다목적 버튼을 길게 누르는 것)을 검출한 것에 응답하여 ASR 엔진(111)의 음성 입력을 개시한다. 언급된 버튼이 클라이언트 디바이스(106) 또는 클라이언트 디바이스(106)와 연관된 컴포넌트(예컨대, 도크(dock), 크레이들(cradle), 블루투스 헤드셋, 스마트 워치 등) 상에 있을 수 있다는 것을 잘 알 것이다.In one embodiment, the ASR interaction engine 322 initiates speech input of the ASR engine 111 in response to detecting the user input. For example, the ASR interaction engine 322 may include a gesture (eg, a specific swipe or movement on a touch screen) or a button (physical or soft/virtual) selection (eg, selecting a dedicated button or long-pressing a multipurpose button). In response to detecting the pressing), the voice input of the ASR engine 111 is started. It will be appreciated that the mentioned button may be on the client device 106 or a component associated with the client device 106 (eg, a dock, cradle, Bluetooth headset, smart watch, etc.).

일 실시예에서, ASR 상호작용 엔진(322)은, 내부 이벤트를 검출한 것에 응답하여, ASR 엔진(111)의 음성 입력을 개시한다. 일 실시예에서, 내부 이벤트는 클라이언트 디바이스(106)의 센서(예컨대, GPS, 가속도계, 전력 센서, 도킹 센서, 블루투스 안테나 등)에 기초한다. 예를 들어, ASR 상호작용 엔진(322)은 사용자 디바이스(106)가 사용자의 자동차에 위치되어 있는 것을 검출한 것(예컨대, 자동차의 온보드 진단, 자동차내 크레이들/도크에의 전력 및 연결 등을 검출한 것)에 응답하여 ASR의 음성 입력을 개시하고, (예컨대, 내비게이션 길 안내(navigation directions) 또는 재생할 음악에 대한 사용자의 요청을 수신하기 위해) ASR 엔진(111)의 음성 입력을 개시한다. 일 실시예에서, 내부 이벤트는 클라이언트 디바이스(106)의 애플리케이션(도시되지 않음)에 기초한다. 예를 들어, 클라이언트 디바이스(106)가 캘린더 애플리케이션(calendar application)을 갖는 스마트폰이고 캘린더 애플리케이션이 원격 위치에 있는 사용자에 대한 약속을 포함하고; 일 실시예에서, (예컨대, 약속 장소로의 길 안내를 위한 사용자의 요청을 수신하기 위해) ASR이 약속을 검출한 것에 응답하여 ASR 엔진의 음성 입력을 개시한다고 가정하자. 일 실시예에서, 내부 이벤트는 로컬 텍스트-음성 변환 엔진(119a)의 동작에 기초한다. 예를 들어, 텍스트-음성 변환 엔진(119)이 컨텍스트 프롬프트(예컨대, "퇴근하는 것 같은데, 아내에게 전화하고 집으로 길 안내를 할까요?"), 또는 다른 프롬프트를 사용자에게 제시하기 위해 동작하고; 일 실시예에서, ASR 상호작용 엔진(322)이 텍스트-음성 변환 프롬프트를 검출하고, 프롬프트에 대한 사용자의 응답을 수신하기 위해 ASR 엔진(111)의 음성 입력을 개시한다고 가정하자.In one embodiment, the ASR interaction engine 322 initiates voice input of the ASR engine 111 in response to detecting an internal event. In one embodiment, the internal event is based on a sensor (eg, GPS, accelerometer, power sensor, docking sensor, Bluetooth antenna, etc.) of the client device 106 . For example, the ASR interaction engine 322 may detect that the user device 106 is located in the user's vehicle (eg, on-board diagnostics of the vehicle, power and connection to a cradle/dock in the vehicle, etc.) in response to the detection) initiates voice input of the ASR and initiates voice input of the ASR engine 111 (eg, to receive navigation directions or a user's request for music to be played). In one embodiment, the internal event is based on an application (not shown) of the client device 106 . For example, the client device 106 is a smartphone with a calendar application and the calendar application includes appointments for a user at a remote location; In one embodiment, assume that the ASR initiates voice input of the ASR engine in response to detecting an appointment (eg, to receive a user's request for directions to an appointment). In one embodiment, the internal event is based on the operation of the local text-to-speech engine 119a. For example, text-to-speech engine 119 is operative to present a context prompt (eg, "I think I'm leaving work. Shall I call your wife and direct you home?"), or other prompt to the user; In one embodiment, assume that the ASR interaction engine 322 detects a text-to-speech prompt and initiates speech input of the ASR engine 111 to receive the user's response to the prompt.

일 실시예에서, ASR 상호작용 엔진(322)은, (예컨대, 써드파티 API 또는 데이터베이스로부터) 외부 이벤트를 검출한 것에 응답하여, ASR 엔진(111)의 음성 입력을 개시한다. 일 실시예에서, 내부 이벤트는 원격 텍스트-음성 변환 엔진(119x)의 동작에 기초한다. 예를 들어, 텍스트-음성 변환 엔진(119)이 컨텍스트 프롬프트(예컨대, "퇴근하는 것 같은데, 아내에게 전화하고 집으로 길 안내를 할까요?" 또는 "목적지에 가까워지고 있습니다. 이용가능한 주차 공간으로 길 안내를 해주길 원하세요?"), 또는 다른 프롬프트를 사용자에게 제시하기 위해 동작하고; 일 실시예에서, ASR 상호작용 엔진(322)이 텍스트-음성 변환 프롬프트를 검출하고, 프롬프트에 대한 사용자의 응답을 수신하기 위해 ASR 엔진(111)의 음성 입력을 개시한다고 가정하자.In one embodiment, the ASR interaction engine 322 initiates voice input of the ASR engine 111 in response to detecting an external event (eg, from a third-party API or database). In one embodiment, the internal event is based on the operation of the remote text-to-speech engine 119x. For example, text-to-speech engine 119 may trigger a contextual prompt (eg, "I think I'm leaving work, would you like to call my wife and direct me home?" or "You are approaching your destination. Get directions to an available parking space?" Do you want a guide?"), or other prompts; In one embodiment, assume that the ASR interaction engine 322 detects a text-to-speech prompt and initiates speech input of the ASR engine 111 to receive the user's response to the prompt.

일 실시예에서, ASR 상호작용 엔진(322)은 애그노스틱이다. 예를 들어, 일 실시예에서, ASR 상호작용 엔진(322)은 하나 이상의 상이한 ASR 엔진들(111)을 사용할 수 있다. ASR 엔진들(111)의 예들은 Nuance, Google Voice, Telisma/OnMobile, Creawave, Acapella 등을 포함하지만, 이들로 제한되지 않는다. 애그노스틱 ASR 상호작용 엔진(322)은 유익하게도 사용되는 ASR 엔진(111) 및 ASR 엔진(111)의 언어에서의 유연성을 가능하게 할 수 있고, 새로운 ASR 엔진들(111)이 이용가능하게 되고 기존의 ASR 엔진들이 중단될 때, 사용되는 ASR 엔진(들)(111)이 음성 및 연결 시스템(100)의 수명 주기에 걸쳐 변경될 수 있게 할 수 있다. 일부 실시예들에서, 시스템(100)은 다수의 ASR 엔진들을 포함하고, 사용되는 ASR 엔진(111)은 컨텍스트에 의존한다. 예를 들어, Google Voice가 Nuance보다 고유 명사들의 더 나은 인식을 제공하고; 일 실시예에서, 사용자가 전화 애플리케이션의 연락처 리스트에 액세스했다고 결정될 때 ASR 상호작용 엔진(322)이 Google Voice ASR과 상호작용할 수 있는 것으로 가정하자. 일부 실시예들에서, 시스템(100)은 언제라도 ASR 엔진들 간에 전환할 수 있다(예컨대, 음성 입력의 제1 부분은 제1 ASR 엔진(111)으로 처리하고 음성 입력의 제2 부분은 제2 ASR(111)로 처리함). ASR 엔진(111)과 유사하게, 일 실시예에서, 시스템(100)은 사용되는 TTS 엔진(119)과 관련하여 애그노스틱이다. 또한 ASR 엔진(111)과 유사하게, 일부 실시예들에서, 시스템(100)은 다수의 TTS 엔진들(119)을 포함할 수 있고 상이한 컨텍스트들에 대해 상이한 TTS 엔진들을 선택할 수 있으며 그리고/또는 언제라도 상이한 TTS 엔진들 간에 전환할 수 있다. 예를 들어, 일 실시예에서, 시스템(100)은 영어로 된 헤드라인을 읽기 시작할 수 있고, 사용자는 프랑스어를 요청할 수 있으며, 시스템은 영어-프랑스어 TTS 엔진으로 전환할 것이다.In one embodiment, the ASR interaction engine 322 is agnostic. For example, in one embodiment, the ASR interaction engine 322 may use one or more different ASR engines 111 . Examples of ASR engines 111 include, but are not limited to, Nuance, Google Voice, Telisma/OnMobile, Creawave, Acapella, and the like. The agnostic ASR interaction engine 322 may advantageously enable flexibility in the language of the ASR engine 111 and ASR engine 111 used, as new ASR engines 111 become available and When existing ASR engines are shut down, it may allow the ASR engine(s) 111 used to change over the lifecycle of the voice and connectivity system 100 . In some embodiments, system 100 includes multiple ASR engines, and the ASR engine 111 used is context-dependent. For example, Google Voice provides better recognition of proper nouns than Nuance; In one embodiment, assume that the ASR interaction engine 322 is able to interact with Google Voice ASR when it is determined that the user has accessed the contact list of the phone application. In some embodiments, system 100 may switch between ASR engines at any time (eg, processing a first portion of speech input with a first ASR engine 111 and a second portion of speech input with a second Treated with ASR(111)). Similar to the ASR engine 111 , in one embodiment, the system 100 is agnostic with respect to the TTS engine 119 being used. Also similar to the ASR engine 111 , in some embodiments, the system 100 may include multiple TTS engines 119 and may select different TTS engines for different contexts and/or when You can even switch between different TTS engines. For example, in one embodiment, the system 100 may start reading headlines in English, the user may request French, and the system will switch to an English-French TTS engine.

ASR 엔진(111)은, ASR 상호작용 엔진(322)이 음성 입력을 개시한 후에, 음성 입력을 수신한다. 일 실시예에서, 개시에 응답하여, ASR 엔진(111)은, ASR 상호작용 엔진(322)의 부가의 개입 없이, 음성 입력을 수신한다. 일 실시예에서, 음성 입력을 개시한 후에, ASR 상호작용 엔진(322)은 음성 입력을 ASR 엔진(111)으로 전달한다. 예를 들어, ASR 상호작용 엔진(322)은 음성 입력을 ASR 엔진(111)으로 송신하기 위해 ASR 엔진(111)에 통신가능하게 결합된다. 다른 실시예에서, 음성 입력을 개시한 후에, ASR 상호작용 엔진(322)은 음성 입력을 저장 디바이스(또는 통신에 의해 액세스가능한 임의의 다른 비일시적 저장 매체)에 저장하고, 음성 입력이 저장 디바이스(또는 다른 비일시적 저장 매체)에 의해 액세스함으로써 ASR 엔진(111)에 의해 검색될 수 있다.The ASR engine 111 receives the voice input after the ASR interaction engine 322 initiates the voice input. In one embodiment, in response to initiation, the ASR engine 111 receives the voice input, without further intervention of the ASR interaction engine 322 . In one embodiment, after initiating the voice input, the ASR interaction engine 322 passes the voice input to the ASR engine 111 . For example, the ASR interaction engine 322 is communicatively coupled to the ASR engine 111 to transmit voice input to the ASR engine 111 . In another embodiment, after initiating the voice input, the ASR interaction engine 322 stores the voice input on a storage device (or any other non-transitory storage medium accessible by communication), and the voice input is stored on the storage device ( or other non-transitory storage media) and retrieved by the ASR engine 111 .

일부 실시예들에서, 시스템(100)은 음성 입력과 같은 사용자 입력을 수신함이 없이 전자 음성 어시스턴트를 사전 대응적으로 제공한다. 예를 들어, 일 실시예에서, 시스템(100)은 자동차(즉, 클라이언트 디바이스(106))가 교통 정체 속에 있다고 결정할 수 있고, TTS를 자동으로 개시하여 사용자와 대화를 시작하거나(예컨대, "대안의 경로를 제공해주기를 원하세요?") 행동을 수행한다(예컨대, 주차 공간 및 기차를 타는 것과 같은 대안의 경로를 결정하고 그에 따라 내비게이션 경로를 업데이트함).In some embodiments, system 100 proactively provides an electronic voice assistant without receiving user input, such as voice input. For example, in one embodiment, system 100 may determine that a car (ie, client device 106 ) is in a traffic jam and automatically initiates a TTS to initiate a conversation with the user (eg, “alternative would you like to provide a route to?") action (eg, determine an alternative route, such as parking space and taking a train, and update the navigation route accordingly).

클라이언트측 컨텍스트 홀더(324)는 컨텍스트 동기화를 위한 코드 및 루틴들을 포함한다. 일 실시예에서, 컨텍스트 동기화는 클라이언트측으로부터 컨텍스트 작업흐름의 정의, 사용 및 저장을 관리하는 것과 컨텍스트 작업흐름을 서버측과 공유하는 것을 포함한다. 일 실시예에서, 클라이언트측 컨텍스트 홀더(324)는 프로세서(202)에 의해 실행가능한 한 세트의 명령어들이다. 다른 실시예에서, 클라이언트측 컨텍스트 홀더(324)는 메모리(204)에 저장되고, 프로세서(202)에 의해 액세스가능하며 실행가능하다. 어느 실시예에서나, 클라이언트측 컨텍스트 홀더(324)는 프로세서(202), 클라이언트 디바이스(106)의 다른 컴포넌트들 및 시스템(100)의 다른 컴포넌트들과의 협력 및 통신을 위해 적합하게 되어 있다.The client-side context holder 324 contains code and routines for context synchronization. In one embodiment, context synchronization includes managing the definition, use, and storage of context workflows from the client side and sharing the context workflows with the server side. In one embodiment, the client-side context holder 324 is a set of instructions executable by the processor 202 . In another embodiment, the client-side context holder 324 is stored in the memory 204 , and is accessible and executable by the processor 202 . In either embodiment, the client-side context holder 324 is adapted for collaboration and communication with the processor 202 , other components of the client device 106 , and other components of the system 100 .

클라이언트측 컨텍스트 홀더(324)는 클라이언트측으로부터 컨텍스트 작업흐름의 정의, 사용 및 저장을 관리하고 컨텍스트 작업흐름을 서버측과 공유한다. 일 실시예에서, 클라이언트측 컨텍스트 홀더(324)는, 네트워크(102) 상에서의 순회(itinerancy) 및 저 용량에도 불구하고, 시스템(100) 내에서 컨텍스트를 동기화하기 위해 컨텍스트 동기화 프로토콜을 사용하여 컨텍스트 에이전트(420)(서버측 컨텍스트 홀더)와 통신한다(이는 일부 네트워크들, 예컨대, 모바일 데이터 네트워크에 대해 특히 유익할 수 있음).The client-side context holder 324 manages the definition, use, and storage of context workflows from the client side and shares the context workflows with the server side. In one embodiment, the client-side context holder 324 is a context agent using a context synchronization protocol to synchronize the context within the system 100, notwithstanding the low capacity and itinerancy on the network 102. 420 (server-side context holder), which may be particularly beneficial for some networks, eg, mobile data networks.

클라이언트측 컨텍스트 홀더(324)는 컨텍스트의 정의, 사용 및 저장을 관리한다. 컨텍스트는 음성 및 연결 엔진에 의해 제공되는 개인 어시스턴트의 현재 상태이다. 일 실시예에서, 컨텍스트는 하나 이상의 파라미터들을 포함한다. 파라미터들의 예들은 컨텍스트 이력, 대화 이력(예컨대, 사용자의 이전 요청들 및 시스템의 이전 응답들 및 행동들), 사용자 프로파일(예컨대, 사용자의 ID(identity) 및 선호사항들), 사용자 이력(예컨대, 사용자의 습관들), 위치(클라이언트 디바이스(106)의 물리적 위치), 현재 컨텍스트 도메인(예컨대, 클라이언트 디바이스(106), 사용되고 있는 애플리케이션(들), 사용자에게 현재 제시되는 인터페이스)을 포함하지만, 이들로 제한되지 않는다. 일부 실시예들에서, 파라미터는 변수 또는 직렬화된 객체(serialized object)일 수 있다.The client-side context holder 324 manages the definition, use, and storage of contexts. Context is the current state of the personal assistant provided by the voice and connectivity engine. In one embodiment, the context includes one or more parameters. Examples of parameters include context history, conversation history (eg, the user's previous requests and previous responses and actions of the system), user profile (eg, the user's identity and preferences), user history (eg, the user's habits), location (physical location of client device 106 ), current context domain (eg, client device 106 , application(s) being used, interface currently presented to the user). not limited In some embodiments, a parameter may be a variable or a serialized object.

일 실시예에서, 컨텍스트는 다차원 컨텍스트이고, 임의의 차원이 있는(dimensional) 변수 또는 특징을 기술할 수 있다. 일부 실시예들에서, 컨텍스트는 다차원 행렬을 사용한다. 본원에 기술된 바와 같이, 일부 실시예들에서, 컨텍스트가 클라이언트측(예컨대, 클라이언트 디바이스(106a))과 서버측(예컨대, 음성 및 연결 서버(122)) 사이에서 실시간으로 동기화된다. 플랫폼의 양 부분(클라이언트와 서버)의 긴밀한 통합과 컨텍스트가 임의의 차원이 있는 변수 또는 특징을 기술할 수 있는 것의 조합으로 인해, 컨텍스트는 때때로 "딥 컨텍스트(Deep Context)"라고 지칭될 수 있다.In one embodiment, the context is a multidimensional context and may describe any dimensional variable or characteristic. In some embodiments, the context uses a multidimensional matrix. As described herein, in some embodiments, context is synchronized in real time between a client side (eg, client device 106a ) and a server side (eg, voice and connectivity server 122 ). Because of the tight integration of both parts of the platform (client and server) and the combination of which contexts can describe variables or characteristics with arbitrary dimensions, a context may sometimes be referred to as a "Deep Context".

실시예에 따라, 컨텍스트가, 음성으로부터 단어들을 정확하게 인식하고, 사용자의 의도된 요청을 결정하며, 사용자(112)와 시스템(100) 사이의 보다 자연스러운 대화를 용이하게 하는 시스템(100)의 능력을 증가시키는 것(이들로 제한되지 않음)을 비롯하여, 하나 이상의 이점들을 제공하기 위해 시스템(100)에 의해 사용된다.Depending on the embodiment, the context may affect the ability of the system 100 to accurately recognize words from speech, determine the user's intended request, and facilitate a more natural conversation between the user 112 and the system 100 . used by system 100 to provide one or more advantages, including, but not limited to, increasing.

일 실시예에서, 컨텍스트는 음성으로부터 단어들을 보다 정확하게 인식하기 위해 사용된다. 예를 들어, 사용자가 전화 애플리케이션을 열어 두고 있고; 일 실시예에서, 컨텍스트가 자연어 이해 엔진(326)에 의해 사용되는 사전을 (예컨대, 연락처들의 이름들 및 전화를 조작하는 것 또는 통화를 하는 것과 연관된 단어들로) 제한하기 위해 (예컨대, 전처리 동안 NLU 엔진(326)에 의해) 사용될 수 있다고 가정하자. 일 실시예에서, 이러한 사전 제한은 유익하게도, NLU 엔진(326)이 사용자가 Renault가 아니라 Renaud에게 전화하기를 원한다는 것을 정확하게 결정할 수 있도록, 자동차 회사 "Renault"는 제거할 수 있지만 이름 "Renaud"는 남겨둘 수 있다. NLU 엔진(326)은 심지어 사용자에 의해 행해진 이전의 전화 통화들에 기초하여 사용자가 어느 Renaud에게 전화하려고 의도하는지(Renaud라는 이름의 다수의 연락처들을 가정함)를 결정할 수 있다. 따라서, 이전의 예는 또한 컨텍스트가 사용자의 의도된 요청을 보다 정확하게 결정하기 위해 사용되는 일 실시예를 설명한다. 그에 따라, 컨텍스트는 또한 사용자의 요청을 수신하는 것으로부터 요청에 따라 정확하게 실행하는 것까지의 시간량을 최소화할 수 있다.In one embodiment, context is used to more accurately recognize words from speech. For example, the user has the phone application open; In one embodiment, the context is used to limit the dictionary used by the natural language understanding engine 326 (eg, to names of contacts and words associated with manipulating a phone call or making a call) (eg, during preprocessing). by the NLU engine 326). In one embodiment, this pre-restriction advantageously removes the car company "Renault" but the name "Renaud" so that the NLU engine 326 can accurately determine that the user wants to call Renaud and not Renault. can be left The NLU engine 326 may even determine which Renaud the user intends to call (assuming multiple contacts named Renaud) based on previous phone calls made by the user. Thus, the previous example also describes one embodiment in which context is used to more accurately determine the intended request of the user. As such, the context may also minimize the amount of time from receiving a user's request to executing correctly upon request.

일 실시예에서, 컨텍스트는 사용자와 시스템(100) 사이의 보다 자연스러운 대화(양방향 통신)를 용이하게 하는 데 사용된다. 예를 들어, 사용자가 Yahoo!에 관한 뉴스를 요청하고; 시스템이 Yahoo!에 관한 기사들의 헤드라인들을 읽기 시작하는 경우 컨텍스트가 대화를 용이하게 하는 데 사용될 수 있다. 사용자는 "누가 CEO죠?"라고 질문하고; 시스템(100)은 사용자의 의도된 요청이 Yahoo!의 CEO에 대한 것임을 이해하고 그 이름에 대해 검색하여 제공한다. 사용자는 이어서 오늘의 날씨에 대해 질문하고; 시스템(100)은 이 요청이 날씨 애플리케이션과 연관되어 있다는 것과 사용자의 의도된 요청이 사용자의 물리적 위치에 대한 날씨에 대한 것임을 이해하고, 날씨 애플리케이션이 사용되어야만 한다고 결정하며, 날씨를 획득하기 위해 날씨 애플리케이션에 대한 API 호출을 행한다. 사용자는 이어서 "그리고 내일은"이라고 말하고; 시스템(100)은 사용자의 의도된 요청이 사용자의 현재 위치에서 내일 날씨에 대한 것임을 이해한다. 사용자는 이어서 "주식 거래는 어때요?" 질문하고; 시스템(100)은 사용자의 의도된 요청이 Yahoo! 주식의 현재 거래 가격에 대한 것임을 이해하고 그 정보를 획득하기 위해 웹 검색을 수행한다. 요약하고 간략화하기 위해, 일부 실시예들에서, 이러한 컨텍스트 점핑(context jumping)을 지원함으로써 사용자(112)와 시스템(100) 사이의 보다 "자연스러운" 대화를 가능하게 하기 위해, 컨텍스트는 토픽을 추적하고, 애플리케이션들 간에 전환하며, 다양한 애플리케이션들의 작업 흐름들에서의 상태를 추적할 수 있다.In one embodiment, context is used to facilitate a more natural conversation (two-way communication) between the user and system 100 . For example, a user requests news about Yahoo!; Context can be used to facilitate a conversation when the system starts reading headlines of articles about Yahoo! The user asks "Who is the CEO?"; The system 100 understands that the user's intended request is for the CEO of Yahoo! and searches for that name and serves it. The user then asks about today's weather; The system 100 understands that the request is associated with a weather application and that the user's intended request is for weather for the user's physical location, determines that the weather application should be used, and determines that the weather application should be used to obtain the weather. Make an API call to The user then says "and tomorrow"; The system 100 understands that the user's intended request is for tomorrow's weather at the user's current location. The user continues, "How about a stock trade?" ask questions; The system 100 responds to the user's intended request from Yahoo! Understand that it is about the current trading price of a stock and perform a web search to obtain that information. To summarize and simplify, in some embodiments, to facilitate a more “natural” conversation between user 112 and system 100 by supporting such context jumping, context tracks a topic and , switch between applications, and track status in the workflows of various applications.

일부 실시예들에서, 예를 들어, 수많은 사용자들로부터 집계된 데이터 및 사용자들이 시스템(100)과 일반적으로 어떻게 상호작용하는지에 기초하여 다음 단계 또는 명령의 확률을 학습하기 위해, 또는 그 사용자의 데이터 및 그 사용자가 시스템(100)과 어떻게 상호작용하는지에 기초하여 특정의 사용자에 대해, 기계 학습이 컨텍스트들에 적용된다.In some embodiments, for example, to learn a probability of a next step or command based on aggregated data from numerous users and how users generally interact with system 100 , or data of those users. and for a particular user based on how that user interacts with the system 100 , machine learning is applied to contexts.

일 실시예에서, 클라이언트측 컨텍스트 홀더(324)는 사용자의 현재 컨텍스트를 도 4의 컨텍스트 에이전트(420)와 동기화시킨다. 컨텍스트를 서버측 음성 및 연결 엔진(124)과 동기화시키는 것은 클라이언트측 음성 및 연결 엔진(109)이 임의로 서버측 엔진(124)으로 하여금 대화를 관리하고 다양한 동작들을 수행하게 할 수 있거나, 예컨대, 서버(122)에의 연결에 기초하여 클라이언트 디바이스(106)에서의 기능들을 수행할 수 있게 한다.In one embodiment, the client-side context holder 324 synchronizes the user's current context with the context agent 420 of FIG. Synchronizing the context with the server-side voice and connection engine 124 allows the client-side voice and connection engine 109 to optionally cause the server-side engine 124 to manage the conversation and perform various operations, or, for example, a server to perform functions at the client device 106 based on the connection to 122 .

일 실시예에서, 클라이언트측 홀더(324) 및 컨텍스트 에이전트(420)(즉, 서버측 홀더)는 통신 프로토콜을 제공하는 컨텍스트 동기화 프로토콜을 사용하여 통신하는 것은 물론, 동기화되고 있는 컨텍스트 정보가 전달되는 것을 검증한다. 일 실시예에서, 컨텍스트 동기화 프로토콜은 현재 컨텍스트의 상태 또는 서브상태의 각각의 속성(예컨대, 변수 또는 파라미터)에 대한 키 액세스(예컨대, 컨텍스트 ID)를 표준화한다.In one embodiment, the client-side holder 324 and the context agent 420 (ie, the server-side holder) communicate using a context synchronization protocol that provides the communication protocol, as well as ensure that the context information being synchronized is communicated. verify In one embodiment, the context synchronization protocol standardizes key access (eg, context ID) for each attribute (eg, variable or parameter) of a state or substate of the current context.

이제 도 8을 참조하면, 일 실시예에 따른, 클라이언트측과 서버측 사이의 컨텍스트의 동기화에 관한 추가 상세를 제공하는 개략도(800)가 도시되어 있다. 예시된 실시예에서, 클라이언트 디바이스의 클라이언트측 컨텍스트 홀더(324)는 클라이언트 디바이스(106)의 하나 이상의 컨텍스트들(810a/812a/814a)을 유지한다. 일 실시예에서, 각각의 컨텍스트(810a/812a/814a)는 모듈과 연관되어 있다. 일 실시예에서, 클라이언트측 컨텍스트 홀더(324)는 애플리케이션의 기능을 통한 사용자의 흐름 및 각각의 화면 상에서 이용가능한 기능들을 포함하는 화면들(화면 1 내지 화면 N)을 포함하는 컨텍스트를 유지한다. 예를 들어, 예시된 실시예에서, 사용자는, 한 세트의 기능을 제공하는, 화면 1(820a)을 제시받았고, 사용자는 (화면 1의 F1 내지 Fn으로부터) 기능을 선택하였다. 사용자는 이어서 화면 2를 제시받았고, 여기서 사용자는 (화면 2의 F1 내지 Fn으로부터) 기능을 선택하였다. 사용자는 이어서 화면 3을 제시받았고, 여기서 사용자는 (화면 3의 F1 내지 Fn으로부터) 기능을 선택하였으며, 이하 마찬가지이다. 예를 들어, 일 실시예에서, 모듈 1(810a)이 전화 애플리케이션에 대한 모듈이고 모듈 2(812a)가 미디어 애플리케이션에 대한 모듈이며; 일 실시예에서, 모듈 1(810a)의 화면들(820a, 822a, 824a 및 826a)이, 연락처를 선택하고 전화를 걸기 위해, (이하에서 논의되는) 차선책을 탐색하기 위한 사용자와 시스템 간의 대화를 나타낼 수 있고, 모듈 2(812a)의 화면들이 사용자가 재생될 장르, 아티스트, 앨범 및 트랙을 탐색하는 흐름을 나타낼 수 있다고 가정하자.Referring now to FIG. 8 , a schematic diagram 800 is shown that provides further details regarding the synchronization of context between the client side and the server side, according to one embodiment. In the illustrated embodiment, the client-side context holder 324 of the client device maintains one or more contexts 810a/812a/814a of the client device 106 . In one embodiment, each context 810a/812a/814a is associated with a module. In one embodiment, the client-side context holder 324 maintains a context comprising screens (Screen 1 through Screen N) that include the user's flow through the functions of the application and the functions available on each screen. For example, in the illustrated embodiment, the user is presented with screen 1 820a, which provides a set of functions, and the user has selected a function (from F1 through Fn of screen 1). The user was then presented with screen 2, where the user selected a function (from F1 to Fn of screen 2). The user was then presented with screen 3, where the user selected a function (from F1 to Fn of screen 3), and so on. For example, in one embodiment, module 1 810a is a module for a phone application and module 2 812a is a module for a media application; In one embodiment, the screens 820a, 822a, 824a, and 826a of module 1 810a facilitate a conversation between the user and the system to select a contact and search for a workaround (discussed below) to make a call. , and assume that the screens of module 2 812a may represent a flow in which the user searches for genres, artists, albums and tracks to be played.

홈 화면(830a)은 다양한 모듈들(810a, 812a, 814a)의 컨텍스트들을 리셋시킨다. 예를 들어, 모듈 1(810)이 뉴스 애플리케이션과 연관되어 있고; 일 실시예에서, 사용자가 (예컨대, 타임아웃 기간과 같은 메커니즘에 의해 자동으로 또는 사용자의 요청에 기초하여) 홈 화면(830a)으로 보내진다고 가정하자. 일 실시예에서, 사용자가 홈 화면(830a)으로 보내질 때, 모듈들(810a, 812a, 814a) 중 하나 이상에서의 컨텍스트 정보의 리셋이 트리거링된다.The home screen 830a resets the contexts of the various modules 810a, 812a, and 814a. For example, module 1 810 is associated with a news application; In one embodiment, assume that the user is sent to the home screen 830a (eg, automatically by a mechanism such as a timeout period or based on the user's request). In one embodiment, when the user is sent to home screen 830a, a reset of context information in one or more of modules 810a, 812a, 814a is triggered.

일 실시예에서, 도 4를 참조하여 이하에서 또한 기술되는, 컨텍스트 동기화 프로토콜(804)은 클라이언트측 컨텍스트 홀더(324)로부터, 서버측 컨텍스트 홀더 또는 유사한 것이라고도 지칭되는 컨텍스트 에이전트(422)로 컨텍스트들을 전달하기 위한 프로토콜을 제공한다. 일부 실시예들에서, 컨텍스트 동기화 프로토콜은 높은 정도의 압축을 제공한다. 일부 실시예들에서, 컨텍스트 동기화 프로토콜은, 컨텍스트 에이전트(422)의 정보(806)가 클라이언트측 컨텍스트 홀더(324)의 정보(802)와 동일하도록, 컨텍스트들이 클라이언트측과 서버측 사이에서 성공적으로 동기화되는 것을 검증하기 위한 메커니즘을 제공한다.In one embodiment, the context synchronization protocol 804, also described below with reference to FIG. 4, transfers contexts from a client-side context holder 324 to a context agent 422, also referred to as a server-side context holder or the like. Provides a protocol for delivery. In some embodiments, the context synchronization protocol provides a high degree of compression. In some embodiments, the context synchronization protocol allows contexts to successfully synchronize between the client side and the server side, such that the information 806 of the context agent 422 is the same as the information 802 of the client side context holder 324 . It provides a mechanism to verify that

일 실시예에서, 컨텍스트 엔진(424)은 컨텍스트 에이전트(422)로부터 컨텍스트들을 수집한다. 일 실시예에서, 컨텍스트 엔진(424)은 사용자에 대한 컨텍스트 정보(808)를 관리한다. 예를 들어, 컨텍스트 에이전트(424)는 시간에 따른 애플리케이션에 대한 컨텍스트 정보(예컨대, 장기 및 중기 컨텍스트들) 및 애플리케이션에서의 각각의 사용자 세션에 대한 다양한 컨텍스트 정보를 유지한다. 이러한 정보는 기계 학습에(예컨대, Victoria에게 전화하라는 요청과 같은 현재 컨텍스트 및 Victoria에 대한 마지막 요청이 Victoria P에 대한 것과 같은 과거 컨텍스트들에 기초하여 사용자의 의도를 예측하는 데) 유용할 수 있다.In one embodiment, the context engine 424 collects contexts from the context agent 422 . In one embodiment, the context engine 424 manages context information 808 for the user. For example, the context agent 424 maintains context information for an application over time (eg, long-term and medium-term contexts) and various context information for each user session in the application. Such information may be useful for machine learning (eg, predicting a user's intent based on the current context, such as a request to call Victoria, and past contexts, such as the last request to Victoria was to Victoria P).

일 실시예에서, 클라이언트측 컨텍스트 홀더(324)는 컨텍스트를, 예컨대, 자연어 이해(NLU) 엔진(326) 및/또는 컨텍스트 에이전트(422)를 비롯한, 시스템(100)의 하나 이상의 컴포넌트들로 전달한다. 일 실시예에서, 클라이언트측 컨텍스트 홀더(324)는 컨텍스트를 저장 디바이스(241)(또는 통신에 의해 액세스가능한 임의의 다른 비일시적 저장 매체)에 저장한다. 예컨대, 자연어 이해 엔진(326) 및/또는 컨텍스트 에이전트(422)를 비롯한, 시스템(100)의 다른 컴포넌트들은 저장 디바이스(241)(또는 비일시적 저장 매체)에 액세스함으로써 컨텍스트를 검색할 수 있다.In one embodiment, the client-side context holder 324 passes the context to one or more components of the system 100 , including, for example, a natural language understanding (NLU) engine 326 and/or a context agent 422 . . In one embodiment, the client-side context holder 324 stores the context in the storage device 241 (or any other non-transitory storage medium accessible by communication). Other components of system 100 , including, for example, natural language understanding engine 326 and/or context agent 422 , may retrieve context by accessing storage device 241 (or non-transitory storage medium).

자연어 이해(NLU) 엔진(326)은 ASR 엔진(111)의 출력을 수신하고, ASR 엔진(111)의 출력에 기초하여, 사용자의 의도된 요청을 결정하기 위한 코드 및 루틴들을 포함한다. 일 실시예에서, NLU 엔진(326)은 프로세서(202)에 의해 실행가능한 한 세트의 명령어들이다. 다른 실시예에서, NLU 엔진(326)은 메모리(204)에 저장되고, 프로세서(202)에 의해 액세스가능하며 실행가능하다. 어느 실시예에서나, NLU 엔진(326)은 프로세서(202), ASR 엔진(111), 및 시스템(100)의 다른 컴포넌트들과의 협력 및 통신을 위해 적합하게 되어 있다.A natural language understanding (NLU) engine 326 includes code and routines for receiving the output of the ASR engine 111 and determining, based on the output of the ASR engine 111 , an intended request of the user. In one embodiment, the NLU engine 326 is a set of instructions executable by the processor 202 . In another embodiment, the NLU engine 326 is stored in the memory 204 , and is accessible and executable by the processor 202 . In either embodiment, the NLU engine 326 is adapted for cooperation and communication with the processor 202 , the ASR engine 111 , and other components of the system 100 .

일 실시예에서, NLU 엔진(326)은 음성 인식에서의 오류를 정정하기 위해 ASR 엔진(111) 출력을 전처리한다. 명확함 및 편리함을 위해, ASR 엔진(111)의 출력이 때로는 "인식된 음성"이라고 지칭된다. 일 실시예에서, NLU 엔진(326)은 인식된 음성에서의 임의의 오류들을 정정하기 위해 인식된 음성을 전처리한다. 일 실시예에서, NLU 엔진(326)은 ASR 엔진(111)으로부터 인식된 음성 그리고, 임의로, 연관된 신뢰도들을 수신하고, 클라이언트측 컨텍스트 홀더(324)로부터 컨텍스트를 수신하며 인식된 음성에서 임의의 잘못 인식된 용어들을 정정한다. 예를 들어, 사용자가 프랑스어를 말하고 음성 입력이 "donne-moi l'information technologique"(즉, "정보 기술을 주세요")이지만; ASR 엔진(111)이 "Benoit la formation technologique"(즉, "Benoit 기술 훈련")를 인식된 음성으로서 출력한다고 가정하자. 일 실시예에서, NLU 엔진(326)은 "Benoit"를 "donne-moi"로 그리고 "formation"을 "information"으로 정정함으로써 NLU 엔진(326)의 차후에 결정된 사용자 의도의 정확도를 증가시키기 위해 컨텍스트에 기초하여 전처리를 수행한다.In one embodiment, the NLU engine 326 preprocesses the ASR engine 111 output to correct errors in speech recognition. For clarity and convenience, the output of the ASR engine 111 is sometimes referred to as "recognized speech". In one embodiment, the NLU engine 326 preprocesses the recognized speech to correct any errors in the recognized speech. In one embodiment, the NLU engine 326 receives the recognized voice from the ASR engine 111 and, optionally, associated credits, and the context from the client-side context holder 324 and recognizes any misrepresentation in the recognized voice. Correct the terms used. For example, if the user speaks French and the voice input is "donne-moi l'information technologique" (ie, "Give me information technology"); Assume that the ASR engine 111 outputs "Benoit la formation technologique" (ie, "Benoit skill training") as a recognized voice. In one embodiment, the NLU engine 326 adds "Benoit" to "donne-moi" and "formation" to "information" in the context to increase the accuracy of the subsequently determined user intent of the NLU engine 326. based on pre-processing.

NLU 엔진(326)은, 일부 실시예들에서 임의로 전처리될 수 있는, ASR 엔진(111)으로부터 인식된 음성에 기초하여 사용자의 의도를 결정한다. 일 실시예에서, NLU 엔진(326)은 사용자의 의도를 튜플로서 결정한다. 일 실시예에서, 튜플은 행동(예컨대, 수행될 기능) 및 행위자(예컨대, 기능을 수행하는 모듈)를 포함한다. 그렇지만, 일부 실시예들에서, 튜플은 부가의 또는 상이한 정보를 포함할 수 있다. 예를 들어, NLU 엔진(326)이 인식된 음성 "Greg에게 전화해"를 수신하고; 일 실시예에서, NLU 엔진(326)이, 행동(즉, 전화를 거는 것), 행위자(즉, 전화 모듈), 및, 때로는 "항목(item)"이라고도 지칭되는, 엔티티(즉, 전화의 수신자/대상으로서의 Greg)를 포함하는, 튜플을 결정하는 것으로 가정하자.The NLU engine 326 determines the user's intent based on speech recognized from the ASR engine 111 , which may optionally be preprocessed in some embodiments. In one embodiment, the NLU engine 326 determines the user's intent as a tuple. In one embodiment, a tuple contains an action (eg, a function to be performed) and an actor (eg, a module that performs the function). However, in some embodiments, a tuple may contain additional or different information. For example, the NLU engine 326 receives the recognized voice “Call Greg”; In one embodiment, the NLU engine 326 includes an action (ie, placing a call), an actor (ie, a phone module), and an entity (ie, a recipient of a call), sometimes also referred to as an “item”. Assume that we determine a tuple containing /Greg as object).

일 실시예에서, NLU 엔진(326)은 키워드 또는 바로 가기(short cut) 중 하나 이상을 검출한다. 키워드는 모듈에의 직접 액세스를 제공하는 단어이다. 예를 들어, 사용자가 "전화"라고 말할 때, 전화 모듈이 액세스되고, 전화 애플리케이션이 시작된다(또는 포그라운드로 나온다). 바로 가기는 문구(예컨대, 메시지를 보내)이다. 키워드들 및 바로 가기들의 예들은 도 7의 테이블(710)에서 찾아볼 수 있다. 일부 실시예들에서, 시스템(100)은, 의도 학습(intent learning)이라고 지칭될 수 있는, 기계 학습에 기초하여 하나 이상의 바로 가기들을 생성한다. 예를 들어, 일 실시예에서, 시스템(100)은 "Louis에게 메시지를 보내"가, NLU 엔진(326)에 의해, 사용자(112)가 (예컨대, SMS 문자 메시지보다는) 이메일을 받아쓰게 하여 연락처 Louis Monier로 송신하라고 그리고 이메일을 받아쓰게 하는 음성 입력을 수신하고 "Louis에게 메시지를 보내"를 바로 가기로서 설정하는 인터페이스로 곧바로 진행하라고 요청하는 것으로서 해석되어야만 한다는 것을 학습한다.In one embodiment, the NLU engine 326 detects one or more of a keyword or a short cut. A keyword is a word that provides direct access to a module. For example, when the user says "phone", the phone module is accessed and the phone application is started (or brought to the foreground). Shortcuts are phrases (eg, send a message). Examples of keywords and shortcuts can be found in table 710 of FIG. 7 . In some embodiments, system 100 creates one or more shortcuts based on machine learning, which may be referred to as intent learning. For example, in one embodiment, system 100 "sends a message to Louis" causes user 112 to dictate an email (eg, rather than an SMS text message) by NLU engine 326 to a contact We learn that it should be interpreted as requesting to send to Louis Monier and to receive voice input to dictate an email and proceed directly to the interface setting "Send a message to Louis" as a shortcut.

일 실시예에서, NLU 엔진(326)의 자연어 이해 기능은 모듈식이고, 시스템(100)은 자연어 이해를 수행하는 모듈에 관해 애그노스틱이다. 일부 실시예들에서, 모듈성은, 정확한 이해를 계속적으로 개선시키기 위해 또는 새로운 보다 정확한 자연어 이해 시스템들이 이용가능하게 될 때 자연어 이해 모듈을 교체하기 위해, NLU 엔진(326)의 NLU 모듈이 빈번히 업데이트될 수 있게 한다.In one embodiment, the natural language understanding function of the NLU engine 326 is modular, and the system 100 is agnostic with respect to the modules that perform natural language understanding. In some embodiments, modularity allows the NLU module of the NLU engine 326 to be updated frequently to continuously improve accurate understanding or to replace the natural language understanding module as new more accurate natural language understanding systems become available. make it possible

NLU 엔진(326)이 사용자의 의도된 요청을 결정할 수 없을 때(예컨대, 요청이 모호하거나, 요청이 말이 되지 않거나, 또는 요청된 행동 및/또는 행동이 이용가능하지 않거나 부합하지 않거나, 값이 튜플에서 누락되어 있거나, 기타), NLU 엔진(326)은 차선책을 개시한다. 예를 들어, 사용자의 요청이 불완전할 때(예컨대, 튜플이 완전하지 않을 때), NLU 엔진(326)은 부가 정보에 대해 사용자에게 프롬프트하라고 차선책 엔진(328)(이하에서 논의됨)에 요청한다. 예를 들어, 사용자가 "TV에 뭐가 나와?"라고 요청할 때, 일 실시예에서, NLU 엔진(326)은 채널 및 시간이 누락되어 있다고 결정하고, 차선책을 개시한다.When the NLU engine 326 is unable to determine the user's intended request (eg, the request is ambiguous, the request does not make sense, or the requested action and/or action is not available or does not match, or the value is a tuple , or otherwise), the NLU engine 326 initiates a suboptimal solution. For example, when the user's request is incomplete (eg, the tuple is not complete), the NLU engine 326 requests the suboptimal engine 328 (discussed below) to prompt the user for additional information. . For example, when the user asks "What's on TV?", in one embodiment, the NLU engine 326 determines that the channel and time are missing, and initiates a suboptimal solution.

일 실시예에서, NLU 엔진(326)은 튜플을 연결 엔진(330)에 전달한다. 예를 들어, NLU 엔진(326)은 튜플을 연결 엔진(330)으로 송신하기 위해 연결 엔진(330)에 통신가능하게 결합된다. 다른 실시예에서, NLU 엔진(326)은 튜플을 저장 디바이스(241)(또는 통신에 의해 액세스가능한 임의의 다른 비일시적 저장 매체)에 저장하고, 연결 엔진(330)은 저장 디바이스(241)(또는 다른 비일시적 저장 매체)에 액세스함으로써 검색될 수 있다.In one embodiment, the NLU engine 326 passes the tuple to the connection engine 330 . For example, the NLU engine 326 is communicatively coupled to the connection engine 330 to send the tuple to the connection engine 330 . In another embodiment, the NLU engine 326 stores the tuple in the storage device 241 (or any other non-transitory storage medium accessible by communication), and the connection engine 330 stores the tuple in the storage device 241 (or other non-transitory storage media).

일 실시예에서, NLU 엔진(326)은 부가 정보에 대한 요청을 차선책 엔진(328)에 전달한다. 예를 들어, NLU 엔진(326)은 부가 정보에 대한 요청을 차선책 엔진(328)으로 송신하기 위해 차선책 엔진(328)에 통신가능하게 결합된다. 다른 실시예에서, NLU 엔진(326)은 부가 정보에 대한 요청을 저장 디바이스(241)(또는 통신에 의해 액세스가능한 임의의 다른 비일시적 저장 매체)에 저장하고, 차선책 엔진(328)은 저장 디바이스(241)(또는 다른 비일시적 저장 매체)에 액세스함으로써 부가 정보에 대한 요청을 검색한다.In one embodiment, the NLU engine 326 passes the request for additional information to the suboptimal engine 328 . For example, NLU engine 326 is communicatively coupled to suboptimal engine 328 to transmit a request for additional information to suboptimal engine 328 . In another embodiment, the NLU engine 326 stores the request for additional information in the storage device 241 (or any other non-transitory storage medium accessible by communication), and the suboptimal engine 328 stores the request for additional information in the storage device ( 241) (or other non-transitory storage medium) to retrieve the request for additional information.

차선책 엔진(328)은, NLU 엔진(326)이 사용자의 의도된 요청을 결정할 수 있도록, 사용자에게 부가 정보에 대한 요청을 발생시키기 위한 코드 및 루틴들을 포함한다. 일 실시예에서, 차선책 엔진(328)은 프로세서(202)에 의해 실행가능한 한 세트의 명령어들이다. 다른 실시예에서, 차선책 엔진(328)은 메모리(204)에 저장되고, 프로세서(202)에 의해 액세스가능하며 실행가능하다. 어느 실시예에서나, 차선책 엔진(328)은 프로세서(202), 서버측 연결 엔진(124)의 다른 컴포넌트들, 및 시스템(100)의 다른 컴포넌트들과의 협력 및 통신을 위해 적합하게 되어 있다.The suboptimal engine 328 includes code and routines for generating a request for additional information to the user so that the NLU engine 326 can determine the user's intended request. In one embodiment, the suboptimal engine 328 is a set of instructions executable by the processor 202 . In another embodiment, the suboptimal engine 328 is stored in the memory 204 and is accessible and executable by the processor 202 . In either embodiment, the suboptimal engine 328 is adapted for cooperation and communication with the processor 202 , other components of the server-side connection engine 124 , and other components of the system 100 .

차선책 엔진(328)은, 사용자의 의도된 요청이 이해되고 실행될 수 있도록, 부가 정보에 대한 요청을 발생시킨다. 일 실시예에서, 차선책 엔진(328)은 부가 정보에 대한 하나 이상의 요청들을 발생시킴으로써, 부가 정보를 획득하기 위해 사용자와의 대화를 생성한다. 예를 들어, 차선책 엔진(328)은 부가 정보에 대한 요청을 발생시키고, 그 요청을 클라이언트 디바이스를 통해 사용자(112)에게 제시하기 위해 송신한다(예컨대, 요청을 텍스트-음성 변환 엔진(111)로 송신하고, 텍스트-음성 변환 엔진(111)은 요청을 사용자에게 오디오 출력으로서 그리고 클라이언트 디바이스의 디스플레이 상에 디스플레이하기 위해 제시함). 사용자의 응답이 (예컨대, ASR 엔진(111)에 의해 수신된 오디오 입력으로서 또는 키보드 또는 터치 스크린과 같은 다른 사용자 입력 디바이스를 통해) 수신된다. NLU 엔진(326)은 사용자의 의도된 요청을 결정한다. NLU 엔진(326)이 사용자의 의도된 요청을 여전히 결정할 수 없을 때, 차선책 엔진(328)은 다른 요청을 발생시키고 프로세스가 반복된다.The suboptimal engine 328 generates the request for additional information so that the user's intended request can be understood and executed. In one embodiment, the suboptimal engine 328 generates a conversation with the user to obtain the additional information by generating one or more requests for the additional information. For example, the suboptimal engine 328 generates a request for additional information and transmits the request for presentation to the user 112 via the client device (eg, sends the request to the text-to-speech engine 111 ). and the text-to-speech engine 111 presents the request to the user as audio output and for display on the display of the client device). A user's response is received (eg, as audio input received by the ASR engine 111 or via another user input device such as a keyboard or touch screen). The NLU engine 326 determines the user's intended request. When the NLU engine 326 is still unable to determine the user's intended request, the suboptimal engine 328 generates another request and the process repeats.

부가 정보에 대한 요청들의 유형들의 예들은 제안된 정보가 올바른지에 대한 요청, 원래의 요청을 전체적으로 반복하라는 사용자에 대한 요청, 원래의 요청의 일부분을 명확히 하라는 사용자에 대한 요청, 옵션들의 리스트로부터 선택하라는 사용자에 대한 요청 등 중 하나 이상을 포함할 수 있지만, 이들로 제한되지 않는다. 명확함 및 편리함을 위해, 차선책 엔진(328)의 동작을 하기의 시나리오와 관련하여 논의하는 것이 유익할 수 있다. 사용자가 "캘리포니아 임의의 타운의 1234 가상 스트리트로 길 안내해"라고 요청한다고 가정하자. 그렇지만, 어떤 이유로든지(예컨대, 배경 잡음, 사용자의 억양, 음성 인식에서의 오류로 인해), NLU 엔진(326)이 사용자의 의도된 요청을 이해하지 못하도록 NLU 엔진(326)이 "길 안내"와 "캘리포니아"를 이해하였다.Examples of types of requests for additional information include a request that the proposed information is correct, a request to the user to repeat the original request in its entirety, a request to the user to clarify part of the original request, and a request to select from a list of options. may include, but is not limited to, one or more of a request to a user; For clarity and convenience, it may be beneficial to discuss the operation of the suboptimal engine 328 in the context of the following scenario. Assume that the user requests "Navigate to 1234 virtual street in any town in California". However, for whatever reason (eg, due to background noise, the user's intonation, errors in speech recognition), the NLU engine 326 cannot "guide" and I understood "California".

일부 실시예들에서, 차선책 엔진(328)은 제안된 정보가 올바른지에 대한 요청을 발생시킨다. 일부 실시예들에서, 시스템(100)은 기계 학습에 기초하여 부가 정보를 제안한다. 예를 들어, 시스템이 사용자가 수요일마다 캘리포니아 임의의 타운의 1234 가상 스트리트로 운전하여 간다는 것을 알고 있다고 가정하자. 일 실시예에서, 차선책 엔진(328)은 부가 정보 "캘리포니아라고 하셨습니다. 임의의 타운의 1234 가상 스트리트로 가기를 원하셨습니까?"를 제안한다. 일 실시예에서, 사용자가 "예"라고 하면, 튜플이 완성되고 전체 주소로의 길 안내가 수행되며, 사용자가 "아니오"로 답변하면, 차선책 엔진(328)은 다른 요청(예컨대, 옵션들의 리스트로부터 선택하라는 또는 목적지의 스펠링을 말하라는 사용자에 대한 요청)을 발생시킨다.In some embodiments, the suboptimal engine 328 generates a request as to whether the suggested information is correct. In some embodiments, system 100 suggests side information based on machine learning. For example, suppose the system knows that the user drives to 1234 virtual street in any town in California every Wednesday. In one embodiment, the suboptimal engine 328 suggests the side information "California. Did you want to go to 1234 virtual street in any town?" In one embodiment, if the user says "yes", the tuple is complete and directions to the full address are performed, and if the user answers "no", then the suboptimal engine 328 sends another request (eg, a list of options). a request to the user to select from or to spell the destination).

일부 실시예들에서, 차선책 엔진(328)은 원래의 요청을 전체적으로 반복하라는 사용자에 대한 요청을 발생시킨다. 예를 들어, 차선책 엔진(328)은 "죄송합니다. 이해하지 못했습니다. 그것을 반복해주시겠습니까?"라는 요청을 발생시키고, 그 요청이 사용자 디바이스(106)를 통해 사용자에게 (시각적으로, 청각적으로, 또는 둘 다로) 제시되며, 사용자는 "캘리포니아 임의의 타운의 1234 가상 스트리트로 길 안내해"라고 반복한다. 일 실시예에서, 차선책 엔진(328)은 원래의 요청을 반복하라는 사용자에 대한 요청을 발생시키지 않고, 다른 유형들의 요청들 중 하나가 사용된다. 일 실시예에서, 차선책 엔진(328)은, 미리 결정된 문턱값(예컨대, 0 또는 1)에 기초하여, 원래의 요청을 전체적으로 반복하라는 사용자에 대한 요청을 발생시킬 횟수를 제한한다. 하나의 이러한 실시예에서, 문턱값을 충족시키는 것에 응답하여, 차선책 엔진(328)은 부가 정보에 대한 상이한 유형의 요청(예컨대, 옵션들의 리스트로부터 선택하라고 사용자에게 프롬프트하는 것)을 사용한다.In some embodiments, the suboptimal engine 328 issues a request to the user to fully repeat the original request. For example, the suboptimal engine 328 generates a request "Sorry. I don't understand. Would you like to repeat that?", which request is sent to the user via the user device 106 (visually, audibly, or both), and the user repeats "Navigate to 1234 Virtual Street in any town in California." In one embodiment, the suboptimal engine 328 does not issue a request to the user to repeat the original request, but one of the other types of requests is used. In one embodiment, the suboptimal engine 328 limits the number of times it will issue a request to the user to fully repeat the original request, based on a predetermined threshold (eg, 0 or 1). In one such embodiment, in response to meeting the threshold, the suboptimal engine 328 uses a different type of request for additional information (eg, prompting the user to select from a list of options).

일부 실시예들에서, 차선책 엔진(328)은 원래의 요청을 부분적으로 반복하거나 원래의 요청으로부터 누락된 정보를 제공하라는 사용자에 대한 요청을 발생시킨다. 예를 들어, 차선책 엔진(328)이 "길 안내" 및 "캘리포니아"가 이해되었다고 결정하고, 스트리트 주소와 도시가 누락되어 있다고 결정하며, 사용자가 (원래의 요청의 일부였던) 누락된 정보를 제공할 수 있도록, "죄송합니다. 캘리포니아에서의 도시와 스트리트 주소가 무엇입니까?"라는 요청을 발생시킨다고 가정하자. 그 요청은 사용자 디바이스(106)를 통해 (시각적으로, 청각적으로 또는 둘 다로) 사용자에게 제시되고, 사용자는 "임의의 타운의 1234 가상 주소"라고 말할 수 있다. 일 실시예에서, 차선책 엔진(328)은, 미리 결정된 문턱값(예컨대, 0, 1 또는 2)에 기초하여, 원래의 요청의 동일한 부분을 반복하라는 사용자에 대한 요청을 발생시킬 횟수를 제한한다. 하나의 이러한 실시예에서, 문턱값을 충족시키는 것에 응답하여, 차선책 엔진(328)은 부가 정보에 대한 상이한 유형의 요청(예컨대, 옵션들의 리스트로부터 선택하라고 사용자에게 프롬프트하는 것)을 사용한다.In some embodiments, the suboptimal engine 328 issues a request to the user to partially repeat the original request or provide missing information from the original request. For example, the suboptimal engine 328 determines that "Navigation" and "California" are understood, determines that the street address and city are missing, and that the user provides the missing information (which was part of the original request). So, let's say we're generating the request "Sorry, what's the city and street address in California?" The request is presented to the user (visually, aurally, or both) via the user device 106 , and the user can say “1234 virtual address of any town”. In one embodiment, the suboptimal engine 328 limits the number of times it will issue a request to the user to repeat the same portion of the original request, based on a predetermined threshold (eg, 0, 1, or 2). In one such embodiment, in response to meeting the threshold, the suboptimal engine 328 uses a different type of request for additional information (eg, prompting the user to select from a list of options).

일부 실시예들에서, 차선책 엔진(328)은, 때로는 "디폴트 리스트"이라고 지칭되는, 옵션들의 리스트로부터 선택하라는 사용자에 대한 요청을 발생시킨다. 예를 들어, 차선책 엔진(328)이 "길 안내" 및 "캘리포니아"가 이해되었다고 결정하고, 스트리트 주소와 도시가 누락되어 있다고 결정하며, "목적지의 도시가 어떤 글자로 시작합니까"라는 요청을 발생시키고, "A 내지 E은 1이고, F 내지 J는 2이며, ... 기타"와 같은 옵션들의 리스트를 발생시키는 것으로 가정하자. 그 요청은 사용자 디바이스(106)를 통해 (시각적으로, 청각적으로 또는 둘 다로) 사용자에게 제시되고, 사용자는 "1"을 말하거나 선택할 수 있거나 옵션 "A 내지 E"의 내용을 말하는 것에 의해 선택할 수 있다. NLU 엔진(326)이 "길 안내"와, 'a'와 'e'(경계 포함) 사이에 있는 글자로 시작되는 캘리포니아 도시로부터 사용자의 의도된 요청을 여전히 결정할 수 없기 때문에, 차선책 엔진(328)은 "A는 1이고, B는 2이며, ... 기타"와 같은 옵션들의 다른 리스트를 발생시킨다. 그 요청은 사용자 디바이스(106)를 통해 (시각적으로, 청각적으로 또는 둘 다로) 사용자에게 제시되고, 사용자는 "1"을 말하거나 선택할 수 있거나 옵션 "A"의 내용에 의해 선택할 수 있다. 차선책 엔진(328)은, "임의의 타운"이 도시로서 식별되고, "가상 스트리트"가 스트리트로서 식별되며, "1234"가 스트리트 번호로서 식별될 때까지, 옵션들을 필터링하는 것과 필터링된 옵션들의 리스트들을 갖는 요청들을 발생시키는 것을 계속할 수 있다.In some embodiments, the suboptimal engine 328 generates a request for the user to select from a list of options, sometimes referred to as a “default list.” For example, the suboptimal engine 328 determines that "Navigation" and "California" are understood, determines that the street address and city are missing, and generates the request "What letter does the city of the destination start with?" , and suppose that it generates a list of options such as "A through E equal 1, F through J equal 2, ... etc." The request is presented to the user (visually, audibly, or both) via the user device 106 , and the user can say or select “1” or select by saying the content of options “A through E”. can Since the NLU engine 326 is still unable to determine the user's intended request from the city of California that begins with "navigation" and the letters between 'a' and 'e' (inclusive), the next best workaround engine 328 gives rise to another list of options such as "A is 1, B is 2, ...etc". The request is presented to the user (visually, audibly, or both) via the user device 106 , and the user can say or select “1” or select by the content of option “A”. The suboptimal engine 328 filters the options and lists the filtered options until "any town" is identified as a city, "virtual street" is identified as a street, and "1234" is identified as a street number. may continue to issue requests with

실시예에 따라, 옵션들은 클라이언트 디바이스의 디스플레이 상에 시각적으로 열거되거나, 텍스트-음성 변환을 사용하여 클라이언트 디바이스(106)를 통해 사용자(112)에게 읽어주게 되거나, 둘 다일 수 있다. 일 실시예에서, 리스트 옵션들이 한 번에 그룹들로서(예컨대, 3개 내지 5개로 된 그룹들로서) 제시된다. 예를 들어, 8개의 옵션들의 리스트가 2개의 세트들로 이루어져 4개의 옵션들의 제1 세트로서 제시될 수 있고, 사용자는 "다음"이라고 말하는 것에 의해 다음 세트를 요청할 수 있으며, 4개의 옵션들의 제2 세트가 제시된다. 한 번에 제시되는 옵션들의 개수를 제한하는 것은 사용자가 압도될 가능성을 감소시킬 수 있고 사용성을 향상시킬 수 있다. 다수의 세트들로 분할된 옵션들의 리스트들을 탐색하기 위해, 일 실시예에서, 사용자는 리스트의 제1 세트로 가기 위해 "시작", 리스트의 끝으로 가기 위해 "끝", 리스트에서의 다음 세트로 가기 위해 "다음", 그리고 리스트에서의 이전 세트로 가기 위해 "이전", 또는 글자에 의해 탐색하거나 필터링하기 위해 "___로 가"(예컨대, "글자 V로 가")와 같은, 명령들을 사용할 수 있다.Depending on the embodiment, the options may be listed visually on the display of the client device, read to the user 112 via the client device 106 using text-to-speech, or both. In one embodiment, the list options are presented as groups at a time (eg, as groups of three to five). For example, a list of 8 options may consist of 2 sets and presented as a first set of 4 options, the user may request the next set by saying "next", Two sets are presented. Limiting the number of options presented at one time may reduce the likelihood that the user will be overwhelmed and may improve usability. To navigate the lists of options divided into multiple sets, in one embodiment, the user "begins" to go to the first set of the list, "end" to go to the end of the list, and to the next set in the list. to use commands, such as "next" to go, and "previous" to go to the previous set in a list, or "go to ___" to navigate or filter by letter (eg, "go to letter V"). can

일부 실시예들에서, 차선책 엔진(328)의 요청들로부터 생기는 대화는 요청 유형들 간에 임의의 순서로 전환할 수 있다. 예를 들어, 일 실시예에서, 차선책 엔진(328)은, 사용자가 옵션을 선택할 시에, 옵션들의 리스트 없이 부가 정보에 대해 사용자에게 프롬프트할 수 있다. 예를 들어, "임의의 타운"이 앞서 기술된 바와 같은 옵션들의 리스트를 사용하여 도시라고 수신/결정할 시에, 차선책 엔진(328)은 " 캘리포니아 임의의 타운에서의 스트리트의 이름이 무엇입니까?"라는 요청을 발생시킬 수 있고, 사용자는 구두로 "가상 스트리트"라고 응답할 수 있다. 응답 "가상 스트리트"가 이해할 수 없는 경우, 일 실시예에서, 차선책 엔진(328)은 사용자에게 반복하라고 요청할 수 있거나 사용자에게 차선책 엔진(328)에 의해 발생된 옵션들의 리스트로부터 선택하라고 요청할 수 있다.In some embodiments, the conversation resulting from the requests of the suboptimal engine 328 may switch between request types in any order. For example, in one embodiment, the suboptimal engine 328 may prompt the user for additional information without a list of options when the user selects an option. For example, upon receiving/determining that “any town” is a city using the list of options as described above, the next best option engine 328 may ask, “What is the name of a street in any town in California?” may generate a request, and the user may verbally respond with "virtual street". If the response “virtual street” is not understandable, in one embodiment, the suboptimal engine 328 may ask the user to repeat or ask the user to select from a list of options generated by the suboptimal engine 328 .

일부 실시예들에서, 차선책 엔진(328)에 의해 발생된 요청들은 사용자가 부정적으로 응답할(예컨대, "아니오"라고 말할) 필요를 최소화하거나 제거하기 위해 발생된다. 예를 들어, 차선책 엔진(328)은 도시의 첫 번째 글자에 대한 옵션들의 리스트를 발생시키고, "캘리포니아 도시가 글자 A로 시작합니까?"- 상기 예의 경우에 '예'일 것이지만, 이러한 요청은 다른 경우들에서는 '아니오' 결과를 가져올 가능성이 있음 - 와 유사한 요청들을 보내기보다는, 사용자에게 적절한 옵션을 선택하라고 요청한다.In some embodiments, requests generated by suboptimal engine 328 are generated to minimize or eliminate the need for the user to respond negatively (eg, say "no"). For example, the suboptimal engine 328 generates a list of options for the first letter of a city, and "Does a California city start with the letter A?" - would be 'yes' in the case of the above example, but this request would In some cases, it is likely to result in a 'no' - rather than sending similar requests, it asks the user to select the appropriate option.

상기 ".... 1234 가상 스트리트로 길 안내해" 예가 하나의 사용 사례라는 것과 많은 다른 사용 사례들이 존재한다는 것을 잘 알 것이다. 예를 들어, 사용자가 "Greg에게 전화해"라고 요청하고 사용자가 주소록에 Greg라는 이름의 다수의 연락처들(예컨대, Greg R., Greg S. Greg T.)을 갖고; 일 실시예에서, 차선책 엔진(328)이 "어느 Greg에게 전화하고 싶으세요? Greg R.은 1이고. Greg S.는 2이며, Greg T.는 3입니다."라는 옵션들의 리스트를 갖는 요청을 보내고, 사용자가 원하는 Greg와 연관된 번호를 말할 수 있는 것으로 가정하자.It will be appreciated that the above ".... Navigate to 1234 Virtual Street" example is one use case, and that many other use cases exist. For example, a user requests "Call Greg" and the user has multiple contacts named Greg in the address book (eg, Greg R., Greg S. Greg T.); In one embodiment, the suboptimal engine 328 sends a request with a list of options: "Which Greg do you want to call? Greg R. is 1, Greg S. is 2, Greg T. is 3." , suppose the user can say the number associated with the desired Greg.

게다가, 이상의 예들에서, 원래의 요청의 일부분인 행위자(즉, 각각, 내비게이션 애플리케이션 및 전화 애플리케이션) 및 엔티티의 일부분(즉, 각각, 캘리포니아 및 Greg)이 NLU 엔진(326)에 의해 이해가능하였지만, 차선책 엔진(328)은, 원래의 요청 전체가 NLU 엔진(326)에 의해 이해가능하지 않았을 때 또는 튜플의 다른 부분들이 누락되어 있을 때, 동작할 수 있다. 예를 들어, 차선책 엔진(328)은 원하는 행위자(예컨대, 사용자가 사용하고자 하는 애플리케이션), 원하는 행동(예컨대, 애플리케이션의 기능 또는 특징), 원하는 엔티티(예컨대, 행동의 대상, 행동의 수신자(recipient), 행동을 위한 입력 등)를 획득하기 위해 하나 이상의 요청들을 행할 수 있다. 일 실시예에서, 차선책 엔진(328)은 NLU 엔진(326)의 요청 시에 또는 NLU 엔진(326)이 사용자의 의도된 요청을 나타내는 완전한 튜플을 가질 때까지 요청들을 발생시킨다. 다른 예에서, NLU 엔진(326)이 메시지를 이해했지만, 행위자(예컨대, 통합 메시징 클라이언트에서의 어느 서비스 - 이메일, SMS, 페이스북 등 - 를 사용할지) 및 엔티티(예컨대, 수신자)를 이해하지 못하고; 일 실시예에서, 차선책 엔진(328)이 이 부가 정보를 요청한다고 가정하자.Moreover, in the examples above, the actors (ie, navigation application and phone application, respectively) and portions of entities (ie, California and Greg, respectively) that were part of the original request were understandable by NLU engine 326 , but suboptimal Engine 328 may operate when the entire original request was not understandable by NLU engine 326 or when other parts of the tuple are missing. For example, the suboptimal engine 328 may include a desired actor (eg, an application that the user intends to use), a desired action (eg, a function or characteristic of an application), a desired entity (eg, an object of the action, a recipient of the action). , input for action, etc.) may make one or more requests. In one embodiment, suboptimal engine 328 issues requests upon request of NLU engine 326 or until NLU engine 326 has a complete tuple representing the user's intended request. In another example, the NLU engine 326 understands the message, but does not understand the actors (eg, which services in the Unified Messaging client to use - email, SMS, Facebook, etc.) and entities (eg, recipients) and ; In one embodiment, assume that the suboptimal engine 328 requests this side information.

차선책 엔진(328)을 참조하여 앞서 논의된 특징들 및 기능이 유익하게도, 어떤 제약된 동작 환경들에서(예컨대, 운전 중인 동안) 위험하거나 불법적일 수 있는, 사용자가 요청의 부분들을 타이핑하는 것을 필요로 함이 없이(예컨대, 사용자가 말하고 그리고/또는 터치 스크린 또는 다른 입력을 통해 간단한 선택을 할 수 있음), 사용자의 의도된 요청이 결정되고 궁극적으로 실행될 수 있는 자동 문제 해결 메커니즘들을 제공할 수 있으며 그로써 사용자(112) 및 사용자(112) 주위의 사람들의 안전을 향상시킨다는 것을 잘 알 것이다. 게다가, 시스템(100)이 사용자를 "포기"하거나 웹 검색과 같은 디폴트로 푸시할 가능성이 보다 적기 때문에, 차선책 엔진(328)을 참조하여 앞서 논의된 특징들 및 기능으로 인해, 유익하게도, 보다 많은 사용자 만족이 얻어질 수 있다는 것을 잘 알 것이다.Features and functionality discussed above with reference to suboptimal engine 328 advantageously require the user to type portions of the request, which may be dangerous or illegal in certain constrained operating environments (eg, while driving). provide automatic troubleshooting mechanisms by which the user's intended request can be determined and ultimately executed without notice (eg, the user may speak and/or may make a simple selection via a touch screen or other input); It will be appreciated that this improves the safety of user 112 and those around user 112 . Moreover, because of the features and functionality discussed above with reference to suboptimal engine 328, advantageously, more It will be appreciated that user satisfaction can be obtained.

일 실시예에서, 차선책 엔진(328)은 부가 정보에 대한 요청을 텍스트-음성 변환 엔진(119) 및 클라이언트 디바이스의 디스플레이 상에 내용을 디스플레이하기 위한 그래픽 엔진(도시되지 않음) 중 하나 이상으로 전달한다. 다른 실시예에서, 차선책 엔진(328)은 부가 정보에 대한 요청을 저장 디바이스(241)(또는 통신에 의해 액세스가능한 임의의 다른 비일시적 저장 매체)에 저장한다. 예컨대, 텍스트-음성 변환 엔진(119) 및/또는 그래픽 엔진(도시되지 않음)을 비롯한 시스템(100)의 다른 컴포넌트들은 저장 디바이스(241)(또는 다른 비일시적 저장 매체)에 액세스함으로써 부가 정보에 대한 요청을 검색하고 그것을 클라이언트 디바이스(106)를 통해 사용자(112)에게 제시하기 위해 송신할 수 있다.In one embodiment, the suboptimal engine 328 passes the request for additional information to one or more of a text-to-speech engine 119 and a graphics engine (not shown) for displaying the content on the display of the client device. . In another embodiment, the suboptimal engine 328 stores the request for additional information in the storage device 241 (or any other non-transitory storage medium accessible by communication). Other components of system 100, including, for example, text-to-speech engine 119 and/or graphics engine (not shown) may access storage device 241 (or other non-transitory storage medium) for additional information. It can retrieve the request and send it for presentation to the user 112 via the client device 106 .

연결 엔진(330)은 사용자의 의도된 요청을 처리하기 위한 코드 및 루틴들을 포함한다. 일 실시예에서, 연결 엔진(330)은 프로세서(202)에 의해 실행가능한 한 세트의 명령어들이다. 다른 실시예에서, 연결 엔진(330)은 메모리(204)에 저장되고, 프로세서(202)에 의해 액세스가능하며 실행가능하다. 어느 실시예에서나, 연결 엔진(330)은 프로세서(202), 클라이언트 디바이스(106)의 다른 컴포넌트들 및 시스템(100)의 다른 컴포넌트들과의 협력 및 통신을 위해 적합하게 되어 있다.Connection engine 330 includes code and routines for handling the user's intended request. In one embodiment, the connection engine 330 is a set of instructions executable by the processor 202 . In another embodiment, the connection engine 330 is stored in the memory 204 and is accessible and executable by the processor 202 . In either embodiment, the connection engine 330 is adapted for cooperation and communication with the processor 202 , other components of the client device 106 and other components of the system 100 .

일 실시예에서, 연결 엔진(330)은 모듈들의 라이브러리(도시되지 않음)를 포함한다. 모듈은 애플리케이션의 기능을 노출시키는 한 세트의 코드 및 루틴들을 포함할 수 있다. 예를 들어, 전화 모듈은 전화 애플리케이션의 기능(예컨대, 전화 걸기, 전화 받기, 음성 메일 검색, 연락처 리스트 액세스 등)을 노출시킨다. 일 실시예에서, 모듈은, 사용자가 다른 클라이언트 디바이스(106)(예컨대, 자동차)를 통해 클라이언트 디바이스(예컨대, 전화) 상의 이러한 기능에 액세스할 수 있도록, 애플리케이션(예컨대, 전화 애플리케이션)의 기능을 노출시킨다. 일부 실시예들에서, 특정 특징들 및 기능들은 특정 디바이스 또는 디바이스 유형의 존재를 필요로 할 수 있다. 예를 들어, 일부 실시예들에서, 자동차가 전화와 통신가능하게 결합되어 있지 않는 한, 전화 또는 SMS 문자 기능이 자동차를 통해 이용가능하지 않을 수 있다. 모듈들의 라이브러리 및 모듈들의 모듈식 특성은, 애플리케이션들이 업데이트될 때 또는 음성 및 연결 엔진이 새로운 애플리케이션들과 인터페이싱하는 것이 바람직하게 될 때, 용이한 업데이트를 가능하게 할 수 있다.In one embodiment, the connection engine 330 includes a library of modules (not shown). A module may contain a set of code and routines that expose the functionality of an application. For example, the phone module exposes the functionality of a phone application (eg, making a call, answering a call, searching for voice mail, accessing a contact list, etc.). In one embodiment, the module exposes the functionality of an application (eg, a phone application), such that a user can access such functionality on a client device (eg, a phone) via another client device 106 (eg, a car). make it In some embodiments, certain features and functions may require the presence of a particular device or type of device. For example, in some embodiments, phone or SMS texting functionality may not be available via the vehicle unless the vehicle is communicatively coupled with the phone. The library of modules and the modular nature of modules may allow for easy updating as applications are updated or when it becomes desirable for the voice and connectivity engine to interface with new applications.

일부 실시예들에서, 기능이 완료하는 데 오랜 시간이 걸릴 때(예컨대, 긴 보고서를 작성할 때), 에이전트/어시스턴트는 기능(예컨대, TTS, 이메일, SMS 문자 등)이 완료되는 때를 사용자에게 통보할 것이다. 하나의 이러한 실시예에서, 시스템(100)은 연락을 취하는 가장 빠른 방법을 결정한다 - 예를 들어, 시스템은 사용자가 페이스북에 로그인되어 있다고 결정하고 기능이 완료되었다는 것을 나타내는 페이스북 메시지를 사용자에게 송신한다 -.In some embodiments, when a function takes a long time to complete (eg, when creating a long report), the agent/assistant notifies the user when the function is complete (eg, TTS, email, SMS text, etc.) something to do. In one such embodiment, the system 100 determines the fastest way to contact you - for example, the system determines that the user is logged in to Facebook and sends a Facebook message to the user indicating that the function has been completed. Send -.

일 실시예에서, 시스템(100)의 음성 어시스턴트는 하나 이상의 다른 음성 어시스턴트들(예컨대, Apple의 Siri, Microsoft의 Cortana, Google의 Google Now 등)과 상호작용하기 위한 하나 이상의 모듈들을 포함한다. 예를 들어, 일 실시예에서, 사용자가 "X에 대해 Google Now를 검색하기" 또는 "Y를 Siri에게 물어보기"와 같은 바로 가기 또는 키워드를 포함하는 음성 입력을 제공한 것에 응답하여, 연결 모듈(330)은 Google Now 또는 Siri에 연결하거나 그와 상호작용하기 위한 모듈(330)을, 각각, 선택하고, 질의를 그 음성 어시스턴트로 포워딩한다. 일 실시예에서, 음성 및 연결 엔진(109/124)은 사용자 경험의 흐름의 제어를 재개하기 위해(예컨대, 대화를 재개하거나 기능 및 도움을 제공하기 위해) 시스템(100)의 개인 어시스턴트를 트리거링하는 웨이크업 단어가 있는지 음성 입력들을 모니터링할 수 있다. 이러한 실시예는, 유익하게도, 시스템(100)에서 동작하는 엔티티가 그의 고객들에게 다른 음성 어시스턴트들 및 그들의 특징들에의 액세스를 제공할 수 있게 한다. 예를 들어, 자동차 제조업체는 유익하게도 고객이 그 고객의 휴대폰의 음성 어시스턴트(예컨대, 고객이 iPhone을 사용할 때 Siri)에 액세스할 수 있게 하거나 고객의 음성 어시스턴트 옵션들을 다른 음성 어시스턴트로 보완(예컨대, 고객이 iPhone을 사용할 때 Google Now 및/또는 Cortana에의 액세스를 제공함)할 수 있게 할 것이다.In one embodiment, the voice assistant of system 100 includes one or more modules for interacting with one or more other voice assistants (eg, Siri from Apple, Cortana from Microsoft, Google Now from Google, etc.). For example, in one embodiment, in response to the user providing voice input including a shortcut or keyword such as "Search Google Now for X" or "Ask Siri for Y", the connectivity module 330 selects a module 330 for connecting to or interacting with Google Now or Siri, respectively, and forwards the query to its voice assistant. In one embodiment, the voice and connectivity engine 109/124 triggers the personal assistant of the system 100 to resume control of the flow of the user experience (eg, to resume a conversation or provide functionality and assistance). Voice inputs can be monitored for a wake-up word. This embodiment advantageously enables an entity operating in system 100 to provide its customers with access to other voice assistants and their features. For example, car manufacturers may advantageously allow customers to access the voice assistant of their customer's cell phone (eg, Siri when the customer is using an iPhone) or supplement a customer's voice assistant options with another voice assistant (eg, customer (providing access to Google Now and/or Cortana) when using this iPhone.

연결 엔진(330)은 사용자의 의도된 요청을 처리한다. 일 실시예에서, 연결 엔진(330)은 NLU 엔진(326)으로부터 튜플을 수신하고, 튜플에서의 행위자(전화)에 기초하여 모듈(예컨대, 전화 모듈)을 결정하며, 튜플의 행동(예컨대, 통화) 및 엔티티/항목(예컨대, Greg)을 결정된 모듈에 제공하고, 모듈은 행위자 애플리케이션으로 하여금 엔티티/항목을 사용하여 행동을 수행하게 한다(예컨대, 전화 애플리케이션으로 하여금 Greg에게 전화하게 함).The connection engine 330 processes the user's intended request. In one embodiment, the connection engine 330 receives the tuple from the NLU engine 326, determines a module (eg, a phone module) based on the actor (call) in the tuple, and determines the behavior of the tuple (eg, a call). ) and an entity/item (eg, Greg) to the determined module, which causes the actor application to perform an action using the entity/item (eg, causes the phone application to call Greg).

예시적인 서버측 음성 및 연결 엔진(124)Exemplary Server-Side Voice and Connectivity Engine (124)

이제 도 4를 참조하면, 일 실시예에 따른 서버측 음성 및 연결 엔진(124)이 보다 상세히 도시되어 있다. 예시된 실시예에서, 서버측 음성 및 연결 엔진(124)은 컨텍스트 에이전트(422), 컨텍스트 엔진(424) 및 연합 엔진(federation engine)(426)을 포함한다. 서버측 음성 및 연결 엔진(124)에 포함된 컴포넌트들(422, 424, 426) 모두가 꼭 동일한 음성 및 연결 서버(122) 상에 있는 것은 아님을 잘 알 것이다. 일 실시예에서, 모듈들(422, 424, 426) 및/또는 그들의 기능이 다수의 음성 및 연결 서버들(122)에 걸쳐 분산되어 있다.Referring now to FIG. 4 , a server-side voice and connectivity engine 124 is illustrated in greater detail in accordance with one embodiment. In the illustrated embodiment, the server-side voice and connection engine 124 includes a context agent 422 , a context engine 424 , and a federation engine 426 . It will be appreciated that not all components 422 , 424 , 426 included in the server-side voice and connectivity engine 124 are necessarily on the same voice and connectivity server 122 . In one embodiment, modules 422 , 424 , 426 and/or their functionality are distributed across multiple voice and connectivity servers 122 .

컨텍스트 에이전트(422)는 클라이언트 디바이스(106)와 음성 및 연결 서버(122) 사이에서 컨텍스트를 동기화시키고 동기화를 유지하기 위한 코드 및 루틴들을 포함한다. 일 실시예에서, 컨텍스트 에이전트(422)는 프로세서(202)에 의해 실행가능한 한 세트의 명령어들이다. 다른 실시예에서, 컨텍스트 에이전트(422)는 메모리(204)에 저장되고, 프로세서(202)에 의해 액세스가능하며 실행가능하다. 어느 실시예에서나, 컨텍스트 에이전트(422)는 프로세서(202), (예컨대, 버스(206)를 통해) 음성 및 연결 서버(122)의 다른 컴포넌트들, 시스템(100)의 다른 컴포넌트들(예컨대, 통신 유닛(208)을 통해 클라이언트 디바이스들(106)), 및 서버측 음성 및 연결 엔진(124)의 다른 컴포넌트들과의 협력 및 통신을 위해 적합하게 되어 있다.The context agent 422 includes code and routines for synchronizing and maintaining the context between the client device 106 and the voice and connectivity server 122 . In one embodiment, the context agent 422 is a set of instructions executable by the processor 202 . In another embodiment, the context agent 422 is stored in the memory 204 and is accessible and executable by the processor 202 . In either embodiment, the context agent 422 is the processor 202 , (eg, via bus 206 ) voice and other components of the connectivity server 122 , and other components of the system 100 (eg, communication). The unit 208 is adapted for cooperation and communication with client devices 106 ) and other components of the server-side voice and connectivity engine 124 .

클라이언트측 컨텍스트 홀더(324)를 참조하여 앞서 논의된 바와 같이, 컨텍스트 에이전트(422)는 서버측 컨텍스트 홀더로서 동작하고, 클라이언트측 컨텍스트 홀더(324)와 동기화된다. 일 실시예에서, 클라이언트측 컨텍스트와 서버측 컨텍스트가 동일하지 않으면, 클라이언트측이 대체한다. 클라이언트측이 사용자(112)와 보다 직접 상호작용하고, 따라서, 컨텍스트를 정의하기 위한 보다 정확한 실시간 데이터(예컨대, 위치, 광도, 로컬 시간, 온도, 속도 등)를 가질 가능성이 보다 많을 수 있기 때문에, 클라이언트측이 서버측을 대체하는 것이 유익할 수 있는데, 그 이유는, 예를 들어, 연관된 센서들이 클라이언트 디바이스(106)에 위치되고 네트워크(102) 신뢰성이 정확하고 최신의 컨텍스트를 유지하는 서버측의 능력에 영향을 미칠 수 있기 때문이다.As discussed above with reference to client-side context holder 324 , context agent 422 acts as a server-side context holder and is synchronized with client-side context holder 324 . In one embodiment, if the client-side and server-side contexts are not the same, the client-side substitutes. Because the client side interacts more directly with the user 112 and thus may be more likely to have more accurate real-time data (e.g., location, luminosity, local time, temperature, speed, etc.) to define the context, It may be beneficial for the client-side to replace the server-side because, for example, the associated sensors are located on the client device 106 and the network 102 reliability is accurate and maintains an up-to-date context. Because it can affect your ability.

일 실시예에서, 컨텍스트 에이전트(422)는 현재 컨텍스트를 컨텍스트 엔진(424)에 전달한다. 예를 들어, 컨텍스트 에이전트는 현재 컨텍스트를 송신하기 위해 컨텍스트 엔진(424)에 통신가능하게 결합되어 있다. 일 실시예에서, 컨텍스트 에이전트(422)는 현재 컨텍스트를 저장 디바이스(241)(또는 통신에 의해 액세스가능한 임의의 다른 비일시적 저장 매체)에 저장하고, 컨텍스트 엔진(424)은 저장 디바이스(241)(또는 다른 비일시적 저장 매체)에 액세스함으로써 현재 컨텍스트를 검색할 수 있다.In one embodiment, the context agent 422 passes the current context to the context engine 424 . For example, the context agent is communicatively coupled to the context engine 424 for sending the current context. In one embodiment, the context agent 422 stores the current context on the storage device 241 (or any other non-transitory storage medium accessible by communication), and the context engine 424 stores the current context on the storage device 241 ( or other non-transitory storage media) to retrieve the current context.

컨텍스트 엔진(424)은 하나 이상의 컨텍스트들을 발생시키고 유지하기 위한 코드 및 루틴들을 포함한다. 일 실시예에서, 컨텍스트 엔진(424)은 프로세서(202)에 의해 실행가능한 한 세트의 명령어들이다. 다른 실시예에서, 컨텍스트 엔진(424)은 메모리(204)에 저장되고, 프로세서(202)에 의해 액세스가능하며 실행가능하다. 어느 실시예에서나, 컨텍스트 엔진(424)은 프로세서(202), 서버측 음성 및 연결 플랫폼(124)의 다른 컴포넌트들, 및 시스템의 다른 컴포넌트들과의 협력 및 통신을 위해 적합하게 되어 있다.Context engine 424 includes code and routines for generating and maintaining one or more contexts. In one embodiment, context engine 424 is a set of instructions executable by processor 202 . In another embodiment, the context engine 424 is stored in the memory 204 , and is accessible and executable by the processor 202 . In either embodiment, context engine 424 is adapted for collaboration and communication with processor 202 , other components of server-side voice and connectivity platform 124 , and other components of the system.

일 실시예에서, 컨텍스트 엔진(424)은 컨텍스트들의 이력을 생성하기 위해 현재 컨텍스트를 아카이브(archive)한다. 이러한 실시예는, NLU 엔진(326)의 이해를 통보하기 위해 또는 대화를 사전 대응적으로 개시하기 위해, 패턴들 또는 습관들을 인식하는 데, 작업 흐름 등에서의 다음 단계를 예측하는 데, 기타에서 기계 학습과 관련하여 사용될 수 있다. 예를 들어, 사용자 x가 사용자 유형 X의 그룹으로부터 폐쇄된 프로파일(closed profile)이고; 일 실시예에서, 컨텍스트 엔진(424)이 특정의 거동, 습관, 질의 등을 포착하고 사용자에 대한 사전 대응성(proactivity)을 생성하기 위해 x와 그룹 내의 모든 다른 사람들 사이의 차이를 검출한다고 가정하자. 예를 들어, 사용자가 극장에 대해 질문을 하고 있고 컨텍스트 엔진(424)이 동일한 그룹 내의 다른 사용자들이 특정의 일식 레스토랑을 좋아한다는 것을 검출하며; 일 실시예에서, 시스템(100)이 사용자의 스케줄에서 사용자가 영화 이전에 시간이 없을 것임을 검출했기 때문에, 시스템(100)이 사전 대응적으로 사용자에게 영화 이후에 그 일식 레스토랑에 예약을 하라고 제안하는 것으로 가정하자. 일부 실시예들에서, 시스템(100)은 레스토랑 메뉴로부터 API에 액세스할 수 있다(일부 웹사이트들은 이러한 종류의 API를 제공함). 시스템(100)은 메뉴 또는 오늘의 특별 요리들이 사용자의 선호사항에 아주 적합하다는 것을 이해하고, 에이전트의 답변에서, 사용자의 관심을 끌기 위해 메뉴 또는 오늘의 특별 요리를 직접 읽을 수 있다.In one embodiment, the context engine 424 archives the current context to create a history of contexts. Such an embodiment may be useful in recognizing patterns or habits, predicting next steps in a workflow, etc., in order to inform understanding of the NLU engine 326 or to proactively initiate a conversation, etc. It can be used in connection with learning. For example, user x is a closed profile from a group of user type X; In one embodiment, assume that the context engine 424 detects the difference between x and all other people in the group to capture a particular behavior, habit, query, etc. and create a proactivity for the user. . For example, a user is asking a question about a theater and the context engine 424 detects that other users in the same group like a particular Japanese restaurant; In one embodiment, because the system 100 has detected in the user's schedule that the user will not have time before the movie, the system 100 proactively suggests to the user to make a reservation at the Japanese restaurant after the movie. Let's assume In some embodiments, system 100 may access an API from a restaurant menu (some websites provide this kind of API). The system 100 may understand that the menu or specials of the day are well suited to the user's preferences, and in the agent's answer, read the menu or specials of the day directly to grab the user's attention.

연합 엔진(426)은 사용자의 계정들 및 클라이언트 디바이스들(106) 중 하나 이상을 관리하기 위한 코드 및 루틴들을 포함한다. 일 실시예에서, 연합 엔진(426)은 프로세서(202)에 의해 실행가능한 한 세트의 명령어들이다. 다른 실시예에서, 연합 엔진(426)은 메모리(204)에 저장되고, 프로세서(202)에 의해 액세스가능하며 실행가능하다. 어느 실시예에서나, 연합 엔진(426)은 프로세서(202), 애플리케이션 서버(122)의 다른 컴포넌트들 및 개발 애플리케이션(124)의 다른 컴포넌트들과의 협력 및 통신을 위해 적합하게 되어 있다.Federation engine 426 includes code and routines for managing a user's accounts and one or more of client devices 106 . In one embodiment, the association engine 426 is a set of instructions executable by the processor 202 . In another embodiment, the federation engine 426 is stored in the memory 204 and is accessible and executable by the processor 202 . In either embodiment, the federation engine 426 is adapted for collaboration and communication with the processor 202 , other components of the application server 122 and other components of the development application 124 .

일 실시예에서, 연합 엔진(426)은 통합된 ID(unified identity)를 관리한다. 통합된 ID는, 사용자의 소셜 네트워크들 및/또는 습관들에 기초하여 사용자 경험을 향상시키기 위해, 사용자의 계정들(예컨대, 페이스북, Google+, 트위터 등), 사용자의 클라이언트 디바이스들(106)(예컨대, 태블릿, 휴대폰, TV, 자동차 등), 이전의 음성 입력들 및 대화들, 기타 중 하나 이상을 포함할 수 있지만, 이들로 제한되지 않는다. 통합된 ID는, 시스템(100)의 특징들 및 기능을 향상시킬 수 있는, 사용자에 관한 집계된 정보를 제공한다. 예를 들어, 사용자(112)가 입력 "휘발유가 필요해"를 제공한다고 가정하자. 일 실시예에서, 통합된 ID의 집계된 데이터에의 액세스는 시스템(100)이 사용자의 의도된 요청이 주유소로의 길 안내에 대한 것임과 주유소가 좋아하는 바(bar)(예컨대, 사용자가 단골로 가는, 휘발유 값이 최저인, 금요일 오후 6시 이후이고 집계된 데이터가 사용자가 금요일 퇴근 후에 좋아하는 바로 향한다는 것을 나타내기 때문에, 보다 가까운 주유소가 사용자 뒤에 있거나 보다 가깝지만 시스템(100)이 사용자가 향해 가고 있다고 결정하는 곳으로부터 벗어나 있더라도 바에 가는 길을 따라서 진향 방향에 있는 브랜드의 주유소)에 가는 도중에 있다는 것을 이해할 수 있게 할 것이다. 다른 예에서, 시스템(100)은 (예컨대, 오픈 테이블(open table)과 같은 서비스를 사용하여 행해진 이전의 예약들, Yelp 상의 사용자의 레스토랑 댓글들, 그리고 음식에 관한 사용자(112)와 시스템(100) 간의 이전의 음성 질의들 및 대화들과 같은 집계된 데이터에 기초하여) 특정의 레스토랑을 선택하고 사용자를 그에게로 안내하기 위해 집계된 데이터를 사용할 수 있다.In one embodiment, federation engine 426 manages a unified identity. The unified identity can be applied to the user's accounts (eg, Facebook, Google+, Twitter, etc.), the user's client devices 106 ( eg, tablet, cell phone, TV, car, etc.), previous voice inputs and conversations, and the like. The unified ID provides aggregated information about the user that can enhance the features and functionality of the system 100 . For example, suppose user 112 provides the input “I need gasoline”. In one embodiment, access to aggregated data of the aggregated ID allows the system 100 to determine that the user's intended request is for directions to a gas station and that the gas station's favorite bar (eg, the user is a regular). The nearest gas station is behind or closer to the user, but the system 100 does not allow the user to Even if you're away from where you decide you're heading, it'll make sure you understand that you're on your way to the brand's gas station in the direction you're heading to along the way to the bar. In another example, system 100 provides system 100 with user 112 regarding food (eg, previous reservations made using a service such as open table, the user's restaurant comments on Yelp, and food). ) may use the aggregated data to select a particular restaurant and direct the user to it) based on aggregated data, such as previous voice queries and conversations between

연합 엔진(426)은 사용자가 하나의 클라이언트 디바이스(106)로부터 다른 것으로 전환하는 것을 조율하기 위해 사용자의 디바이스들을 관리한다. 예를 들어, 사용자(112)가 사용자의 태블릿(예컨대, 클라이언트 디바이스(106))을 통해 오늘의 헤드라인들을 요청했고 시스템(100)이 헤드라인들을 사용자(112)에게 읽어주기 시작하는 것으로 가정하자. 또한 사용자(112)가 이어서 자신이 직장에 지각할 것임을 깨닫고 헤드라인들의 읽기의 중단을 요청하는 것으로 가정하자. 일 실시예에서, 연합 엔진(426)은 사용자가 태블릿으로부터 사용자의 자동차(즉, 다른 클라이언트 디바이스(106))로 전환하는 것을 관리하고, 따라서 사용자(112)가, 일단 자동차에 있으면, 시스템(100)에게 계속하도록 요청할 수 있고, 시스템(100)은 태블릿으로 중단한 곳에서부터 헤드라인들을 계속하여 읽을 것이다. 연합 엔진(426)은 또한, 사용자가 직장에 도착할 때, 사용자의 휴대폰(즉, 또 다른 클라이언트 디바이스(106))으로의 전환을 제안하고 관리할 수 있다. 이러한 실시예들은, 유익하게도, 하나의 클라이언트 디바이스(106)로부터 다른 것으로의 서비스의 연속성 또는 "연속적인 서비스"를 제공한다. 다른 예에서, 사용자는 소파에서 태블릿을 통해 도로 여행을 계획할 수 있고, 경로를 자동차의 내비게이션 시스템에 매핑되게 할 수 있다. 일 실시예에서, 시스템(100)은 사용자가 출근하기 전에 헤드라인들을 검토하고 출근하는 도중에 자동차에서 계속하는 습관을 가지고 있다는 것을 인식할 수 있고, (아마도 실시간 교통 상황 데이터에 기초하여) 언제 출근할 것인지를 태블릿 상에서 사용자에게 프롬프트하고 사용자가 자동차에서 헤드라인들을 재개하고자 하는지를 질문할 수 있다.Federation engine 426 manages a user's devices to coordinate the user's transition from one client device 106 to another. For example, suppose user 112 requested today's headlines via the user's tablet (eg, client device 106 ) and system 100 begins reading the headlines to user 112 . . Also assume that the user 112 then realizes that he will be late for work and requests to stop reading the headlines. In one embodiment, the federation engine 426 manages the user's transition from the tablet to the user's car (ie, another client device 106 ), so that the user 112 , once in the car, the system 100 . ) to continue, and the system 100 will continue reading the headlines from where it left off with the tablet. The federation engine 426 may also suggest and manage the transition to the user's mobile phone (ie, another client device 106 ) when the user arrives at work. Such embodiments advantageously provide for continuity or “continuous service” of service from one client device 106 to another. In another example, a user may plan a road trip through a tablet on the sofa and have the route mapped to the car's navigation system. In one embodiment, system 100 may recognize that the user has a habit of reviewing headlines before going to work and continuing in the car on the way to work, and when to go to work (perhaps based on real-time traffic data). may prompt the user on the tablet to confirm and ask if the user would like to resume headlines in the car.

일 실시예에서, 연합 엔진(426)은, 수신자 디바이스로의 전환을 관리하기 위해, 컨텍스트를 하나의 클라이언트 디바이스(106)로부터 다른 것으로 전달한다. 예를 들어, 연합 엔진(426)은 수신자 디바이스의 클라이언트측 컨텍스트 홀더(324)에 통신가능하게 결합되어 있다. 다른 실시예에서, 연합 엔진(426)은 현재 컨텍스트를 서버(122)의 저장 디바이스(241)(또는 통신에 의해 액세스가능한 임의의 다른 비일시적 저장 매체)에 저장하고, 수신자 디바이스(106)의 클라이언트측 컨텍스트 홀더(324)는 저장 디바이스(241)(또는 다른 비일시적 저장 매체)에 액세스함으로써 현재 컨텍스트를 검색할 수 있다.In one embodiment, the federation engine 426 passes the context from one client device 106 to another to manage the transition to the recipient device. For example, the federation engine 426 is communicatively coupled to the client-side context holder 324 of the recipient device. In another embodiment, the federation engine 426 stores the current context in the storage device 241 of the server 122 (or any other non-transitory storage medium accessible by communication), and the client of the recipient device 106 . The side context holder 324 can retrieve the current context by accessing the storage device 241 (or other non-transitory storage medium).

예시적인 방법들Exemplary methods

도 5, 도 6 및 도 7은 도 1 내지 도 4를 참조하여 앞서 기술된 시스템에 의해 수행되는 다양한 방법들(500, 508, 700)을 도시하고 있다.5, 6, and 7 illustrate various methods 500, 508, and 700 performed by the system described above with reference to FIGS.

도 5를 참조하면, 일 실시예에 따른 음성 및 연결 플랫폼을 사용하여 요청을 수신하고 처리하는 예시적인 방법(500)이 도시되어 있다. 블록(502)에서, NLU 엔진(326)은 인식된 음성을 수신한다. 블록(504)에서, NLU 엔진(326)은 컨텍스트를 수신한다. 블록(506)에서, NLU 엔진(326)은, 임의로, 블록(504)에서 수신된 컨텍스트에 기초하여, 인식된 음성을 전처리한다. 블록(508)에서, NLU 엔진(326)은 사용자의 의도된 요청을 결정한다. 블록(510)에서, 연결 엔진은 의도된 요청을 처리하고 방법(500)이 종료된다.5 , illustrated is an exemplary method 500 of receiving and processing a request using a voice and connectivity platform in accordance with one embodiment. At block 502 , the NLU engine 326 receives the recognized speech. At block 504 , the NLU engine 326 receives the context. At block 506 , the NLU engine 326 preprocesses the recognized speech, optionally based on the context received at block 504 . At block 508, the NLU engine 326 determines the user's intended request. At block 510 , the connection engine processes the intended request and method 500 ends.

도 6을 참조하면, 일 실시예에 따른, 사용자의 의도된 요청을 결정하는 예시적인 방법(508)이 도시되어 있다. 블록(602)에서, NLU 엔진(326)은 사용자의 요청 및 컨텍스트에 기초하여 튜플을 발생시킨다. 블록(604)에서, NLU 엔진(326)은 튜플을 완성하기 위해 부가 정보가 필요한지를 결정한다. NLU 엔진(326)이 튜플을 완성하기 위해 부가 정보가 필요하지 않다고 결정할 때(604-아니오), 방법(508)이 종료된다. NLU 엔진(326)이 튜플을 완성하기 위해 부가 정보가 필요하다고 결정할 때(604-예), 방법(508)은 블록(606)에서 계속된다.6 , shown is an exemplary method 508 of determining a user's intended request, according to one embodiment. At block 602, the NLU engine 326 generates the tuple based on the user's request and context. At block 604, the NLU engine 326 determines if additional information is needed to complete the tuple. When the NLU engine 326 determines that no side information is needed to complete the tuple (604-No), the method 508 ends. When the NLU engine 326 determines that additional information is needed to complete the tuple (604-Yes), the method 508 continues at block 606 .

블록(606)에서, 차선책 엔진(328)은 튜플을 완성하기 위해 어떤 부가 정보가 필요한지를 결정하고, 블록(608)에서, 필요한 부가 정보를 제공하라는 사용자에 대한 프롬프트를 발생시킨다. 블록(610)에서, NLU 엔진(326)은 블록(610)에서 발생된 프롬프트에 대한 사용자의 응답에 기초하여 튜플을 수정하고, 방법은 블록(604)에서 계속되며, NLU 엔진(326)이 튜플을 완성하기 위해 부가 정보가 필요하지 않다(604-아니오)고 결정하고 방법(508)이 종료될 때까지 블록들(604, 606, 608 및 610)이 반복된다.At block 606, the suboptimal engine 328 determines what additional information is needed to complete the tuple, and at block 608 generates a prompt for the user to provide the necessary additional information. At block 610, the NLU engine 326 modifies the tuple based on the user's response to the prompt generated at block 610, and the method continues at block 604, where the NLU engine 326 Blocks 604, 606, 608, and 610 are repeated until it is determined that no additional information is needed to complete (604-No) and the method 508 ends.

도 7을 참조하면, 다른 실시예에 따른, 음성 및 연결 플랫폼을 사용하여 요청을 수신하고 처리하는 예시적인 방법(700)이 도시되어 있다.Referring to FIG. 7 , illustrated is an exemplary method 700 of receiving and processing a request using a voice and connectivity platform, according to another embodiment.

이상의 설명에서, 설명의 목적상, 본 개시내용의 완전한 이해를 제공하기 위해 다수의 구체적 상세들이 기재되어 있다. 그렇지만, 본원에 설명되는 기술이 이 구체적 상세들 없이 실시될 수 있다는 것을 잘 알 것이다. 게다가, 설명을 모호하게 하는 것을 피하기 위해, 다양한 시스템, 디바이스들, 및 구조들이 블록도 형태로 도시되어 있다. 예를 들어, 다양한 구현들이 특정의 하드웨어, 소프트웨어, 및 사용자 인터페이스들을 갖는 것으로 기술된다. 그렇지만, 본 개시내용은 데이터 및 명령들을 수신할 수 있는 임의의 유형의 컴퓨팅 디바이스에 그리고 서비스들을 제공하는 임의의 주변 기기들에 적용된다.In the foregoing description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosure. However, it will be understood that the technique described herein may be practiced without these specific details. In addition, various systems, devices, and structures are shown in block diagram form in order to avoid obscuring the description. For example, various implementations are described as having specific hardware, software, and user interfaces. However, the present disclosure applies to any type of computing device capable of receiving data and instructions and any peripherals that provide services.

명세서에서의 "일 실시예" 또는 "실시예"에 대한 언급은 그 실시예와 관련하여 기술된 특정의 특징, 구조, 또는 특성이 적어도 하나의 실시예에 포함된다는 것을 의미한다. 명세서의 여러 곳에서 나오는 "일 실시예에서"라는 문구 모두가 꼭 동일한 실시예를 지칭하는 것은 아니다.Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The phrases "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.

어떤 경우에, 다양한 구현들이 컴퓨터 메모리 내의 데이터 비트들에 대한 연산들의 알고리즘들 및 심볼 표현들로 본원에서 제시될 수 있다. 알고리즘은 여기서 일반적으로 원하는 결과를 가져오는 자체 일관성있는 한 세트의 동작들인 것으로 생각된다. 동작들은 물리적 양들의 물리적 조작들을 필요로 하는 것이다. 꼭 그럴 필요는 없지만, 보통 이 양들은 저장, 전송, 결합, 비교, 그리고 다른 방식으로 조작될 수 있는 전기 또는 자기 신호들의 형태를 갖는다. 원칙적으로 흔히 사용되기 때문에, 이 신호들을 비트, 값, 요소, 심볼, 문자, 용어, 숫자 등으로 지칭하는 것이 때로는 편리한 것으로 밝혀졌다.In some cases, various implementations may be presented herein as algorithms and symbolic representations of operations on bits of data within a computer memory. An algorithm is here generally conceived of as being a self-consistent set of operations that produces a desired result. Operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transmitted, combined, compared, and otherwise manipulated. Because of their common usage in principle, it has been found convenient at times to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, etc.

그렇지만, 이들 및 유사한 용어들 모두가 적절한 물리적 양들과 연관되어 있고 이 양들에 적용되는 편리한 명칭들에 불과하다는 것을 염두에 두어야 한다. 달리 구체적으로 언급하지 않는 한, 이하의 논의로부터 명백한 바와 같이, 본 개시내용 전체에 걸쳐, "처리" 또는 "계산" 또는 "산출" 또는 "결정" 또는 "디스플레이" 등을 비롯한 용어들을 이용하는 논의들이, 컴퓨터 시스템의 레지스터들 및 메모리들 내에 물리적(전자적) 양들로 표현된 데이터를, 컴퓨터 시스템 메모리들 또는 레지스터들 또는 다른 이러한 정보 저장, 전송 또는 디스플레이 디바이스들 내의 물리적 양들로 유사하게 표현되는 다른 데이터로 조작하고 변환하는 컴퓨터 시스템 또는 유사한 전자 컴퓨팅 디바이스의 동작 및 프로세스들을 지칭한다는 것을 잘 알 것이다.It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient designations applied to these quantities. Unless specifically stated otherwise, throughout this disclosure, as will be apparent from the discussion below, discussions using terms including "processing" or "compute" or "calculate" or "determining" or "display" and the like , data represented in physical (electronic) quantities in registers and memories of a computer system, to other data similarly represented in physical quantities in computer system memories or registers or other such information storage, transmission or display devices. It will be understood that reference to the operations and processes of a computer system or similar electronic computing device that manipulates and transforms.

본원에 기술되는 다양한 구현들은 본원에서의 동작들을 수행하기 위한 장치에 관한 것이다. 이 장치는 요구된 목적들을 위해 특정 방식으로 구성될 수 있거나, 컴퓨터에 저장된 컴퓨터 프로그램에 의해 선택적으로 활성화되거나 재구성되는 범용 컴퓨터를 포함할 수 있다. 이러한 컴퓨터 프로그램은, 전자 명령어들을 저장하는 데 적합하고 각각이 컴퓨터 시스템 버스에 결합되어 있는, 플로피 디스크, 광학 디스크, CD-ROM, 및 자기 디스크를 비롯한 임의의 유형의 디스크들, ROM(read-only memory), RAM(random access memory), EPROM, EEPROM, 자기 또는 광학 카드, 비휘발성 메모리를 갖는 USB 키를 비롯한 플래시 메모리 또는 임의의 유형의 매체(이들로 제한되지 않음)를 비롯한 컴퓨터 판독가능 저장 매체에 저장될 수 있다.Various implementations described herein relate to apparatus for performing the operations herein. This apparatus may be configured in a particular manner for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored thereon. This computer program is a read-only (ROM) disk, suitable for storing electronic instructions, each of which is coupled to a computer system bus, including floppy disks, optical disks, CD-ROMs, and magnetic disks. memory), random access memory (RAM), EPROM, EEPROM, magnetic or optical cards, flash memory including a USB key with non-volatile memory, or any tangible medium, including but not limited to computer-readable storage media. can be stored in

본원에 설명되는 기술은 전적으로 하드웨어 구현, 전적으로 소프트웨어 구현, 또는 하드웨어와 소프트웨어 요소들 둘 다를 포함하는 구현들의 형태를 취할 수 있다. 예를 들어, 본 기술은, 펌웨어, 상주 소프트웨어, 마이크로코드 등을 포함하지만 이들로 제한되지 않는, 소프트웨어로 구현될 수 있다.The technology described herein may take the form of an entirely hardware implementation, an entirely software implementation, or implementations comprising both hardware and software elements. For example, the subject technology may be implemented in software including, but not limited to, firmware, resident software, microcode, and the like.

게다가, 본 기술은 컴퓨터 또는 임의의 명령어 실행 시스템에 의해 또는 그와 관련하여 사용하기 위한 프로그램 코드를 제공하는 컴퓨터 사용가능 또는 컴퓨터 판독가능 매체로부터 액세스가능한 컴퓨터 프로그램 제품의 형태를 취할 수 있다. 본 설명의 목적상, 컴퓨터 사용가능 또는 컴퓨터 판독가능 매체는 명령어 실행 시스템, 장치, 또는 디바이스에 의해 또는 그와 관련하여 사용하기 위한 프로그램을 포함, 저장, 전달, 전파, 또는 전송할 수 있는 임의의 비일시적 저장 장치일 수 있다.Furthermore, the subject technology may take the form of a computer program product accessible from a computer-usable or computer-readable medium that provides program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer-readable medium is any non-transitable medium that can contain, store, transfer, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. It may be a temporary storage device.

프로그램 코드를 저장 및/또는 실행하는 데 적당한 데이터 처리 시스템은 시스템 버스를 통해 메모리 요소들에 직접 또는 간접적으로 결합되는 적어도 하나의 프로세서를 포함할 수 있다. 메모리 요소들은 프로그램 코드의 실제 실행 동안 이용되는 로컬 메모리, 대용량 저장소(bulk storage), 및 코드가 실행 동안 대용량 저장소로부터 검색되어야만 하는 횟수를 감소시키기 위해 적어도 일부 프로그램 코드의 일시적 저장을 제공하는 캐시 메모리들을 포함할 수 있다. 입출력 또는 I/O 디바이스들(키보드, 디스플레이, 포인팅 디바이스 등을 포함하지만 이들로 제한되지 않음)은 직접 또는 중간의 I/O 제어기들을 통해 시스템에 결합될 수 있다.A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements via a system bus. Memory elements include local memory used during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code to reduce the number of times the code must be retrieved from the mass storage during execution. may include Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intermediary I/O controllers.

데이터 처리 시스템이 중간의 사설 및/또는 공중 네트워크들을 통해 다른 데이터 처리 시스템들, 저장 디바이스들, 원격 프린터들 등에 결합될 수 있게 하기 위해 네트워크 어댑터들이 또한 시스템에 결합될 수 있다. 무선(예컨대, Wi-Fi^TM)송수신기들, 이더넷 어댑터들, 및 모뎀들은 네트워크 어댑터들의 몇몇 에들에 불과하다. 사설 및 공중 네트워크들은 임의의 수의 구성들 및/또는 토폴로지들을 가질 수 있다. 데이터가, 예를 들어, 다양한 인터넷 계층, 전송 계층, 또는 애플리케이션 계층 프로토콜들을 비롯한, 각종의 상이한 통신 프로토콜들을 사용하여 네트워크들을 통해 이 디바이스들 사이에서 전송될 수 있다. 예를 들어, 데이터가 TCP/IP(transmission control protocol / Internet protocol), UDP(user datagram protocol), TCP(transmission control protocol), HTTP(hypertext transfer protocol), HTTPS(secure hypertext transfer protocol), DASH(dynamic adaptive streaming over HTTP), RTSP(real-time streaming protocol), RTP(real-time transport protocol) 및 RTCP(real-time transport control protocol), VOIP(voice over Internet protocol), FTP(file transfer protocol), WS(WebSocket), WAP(wireless access protocol), 다양한 메시징 프로토콜들(SMS, MMS, XMS, IMAP, SMTP, POP, WebDAV 등), 또는 다른 공지된 프로토콜들을 사용하여 네트워크들을 통해 전송될 수 있다.Network adapters may also be coupled to the system to enable the data processing system to be coupled to other data processing systems, storage devices, remote printers, etc. via intermediate private and/or public networks. Wireless (eg Wi-Fi ^™ )Transceivers, Ethernet adapters, and modems are just a few examples of network adapters. Private and public networks may have any number of configurations and/or topologies. Data may be transferred between these devices over networks using a variety of different communication protocols, including, for example, various Internet layer, transport layer, or application layer protocols. For example, if the data is TCP/IP (transmission control protocol / Internet protocol), UDP (user datagram protocol), TCP (transmission control protocol), HTTP (hypertext transfer protocol), HTTPS (secure hypertext transfer protocol), DASH (dynamic adaptive streaming over HTTP), real-time streaming protocol (RTSP), real-time transport protocol (RTP) and real-time transport control protocol (RTCP), voice over Internet protocol (VOIP), file transfer protocol (FTP), WS (WebSocket), wireless access protocol (WAP), various messaging protocols (SMS, MMS, XMS, IMAP, SMTP, POP, WebDAV, etc.), or other known protocols.

마지막으로, 본원에 제시되는 구조, 알고리즘들, 및/또는 인터페이스들이 임의의 특정의 컴퓨터 또는 다른 장치에 내재적으로 관련되어 있지는 않다. 다양한 범용 시스템들이 본원에서의 교시내용들에 따라 프로그램들과 함께 사용될 수 있거나, 보다 특수화된 장치를 요구된 방법 블록들을 수행하도록 구성하는 것이 편리한 것으로 판명될 수 있다. 각종의 이 시스템들에 대한 요구된 구조가 이상의 설명으로부터 나타날 것이다. 그에 부가하여, 명세서가 임의의 특정의 프로그래밍 언어를 참조하여 기술되어 있지 않다. 각종의 프로그래밍 언어들이 본원에 기술되는 바와 같이 명세서의 교시내용들을 구현하는 데 사용될 수 있다는 것을 알 것이다.Finally, the structure, algorithms, and/or interfaces presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the required method blocks. The required structure for a variety of these systems will emerge from the description above. In addition, the specification is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the specification as described herein.

이상의 설명은 예시 및 설명을 위해 제시되었다. 이 설명은 총망라한 것으로도 명세서를 개시된 정확한 형태로 제한하는 것으로도 의도되어 있지 않다. 이상의 교시내용을 바탕으로 많은 수정들 및 변형들이 가능하다. 본 개시내용의 범주가 이 상세한 설명에 의해서가 아니라 오히려 본 출원의 청구항들에 의해 제한되는 것으로 의도되어 있다. 잘 알 것인 바와 같이, 명세서가 그의 사상 또는 본질적인 특성들을 벗어나지 않고 다른 특정 형태들로 구현될 수 있다. 마찬가지로, 모듈들, 루틴들, 특징들, 속성들, 방법들 및 다른 양태들의 특정의 명명 및 구분은 필수적이지도 않고 중요하지도 않으며, 명세서 또는 그의 특징들을 구현하는 메커니즘들이 상이한 명칭들, 구분들 및/또는 포맷들을 가질 수 있다.The foregoing description has been presented for purposes of illustration and description. This description is not intended to be exhaustive or to limit the specification to the precise form disclosed. Many modifications and variations are possible in light of the above teachings. It is intended that the scope of the present disclosure be limited not by this detailed description, but rather by the claims of the present application. As will be understood, the specification may be embodied in other specific forms without departing from its spirit or essential characteristics. Likewise, specific naming and divisions of modules, routines, features, properties, methods, and other aspects are neither essential nor critical, and that the specification or mechanisms implementing its features may differ by different names, divisions, and/or divisions. or formats.

게다가, 본 개시내용의 엔진들, 모듈들, 루틴들, 특징들, 속성들, 방법들 및 다른 양태들이 소프트웨어, 하드웨어, 펌웨어, 또는 이들의 임의의 조합으로서 구현될 수 있다. 또한, 명세서의 컴포넌트 - 그의 일 예는 모듈임 - 가 소프트웨어로서 구현되는 어떤 경우에 있어서나, 컴포넌트는 독립형 프로그램으로서, 보다 큰 프로그램의 일부로서, 복수의 개별 프로그램들로서, 정적으로 또는 동적으로 링크된 라이브러리로서, 커널 로딩가능 모듈로서, 디바이스 드라이버로서, 그리고/또는 현재 공지된 또는 장래에 공지되는 모든 그리고 임의의 다른 방식으로 구현될 수 있다. 그에 부가하여, 본 개시내용은 임의의 특정 프로그래밍 언어로 된 또는 임의의 특정 운영 체제 또는 환경에 대한 구현으로 결코 제한되지 않는다. 그에 따라, 본 개시내용은 이하의 청구항들에 기재되는 발명 요지의 범주를 제한하는 것이 아니라 예시적인 것으로 의도되어 있다.Moreover, engines, modules, routines, features, properties, methods, and other aspects of the present disclosure may be implemented as software, hardware, firmware, or any combination thereof. Moreover, in any case in which a component of the specification - an example of which is a module - is implemented as software, the component is a standalone program, as part of a larger program, as a plurality of separate programs, statically or dynamically linked. It may be implemented as a library, as a kernel loadable module, as a device driver, and/or in any and all other ways now known or known in the future. In addition, the disclosure is in no way limited to implementations in any particular programming language or for any particular operating system or environment. Accordingly, the present disclosure is intended to be illustrative and not limiting of the scope of the subject matter set forth in the following claims.

부록 A: 자동차 개인 어시스턴트 및 GoPadAppendix A: Car Personal Assistant and GoPad

GoPad 프로젝트 요약GoPad Project Summary

GoPad는 보다 안전하고 보다 편리한 자동차내 안드로이드 디바이스 경험을 제공하는 것에 의해 안드로이드 디바이스-사용자 및 차량-사용자 거동 데이터를 발생시키는 액세서리 제품이다. GoPad는 선택된 안드로이드 디바이스들을 차량들에 보다 긴밀하게 통합시킬 것이다. 그렇지만, GoPad는 안드로이드 디바이스들과의 통합으로 제한되지 않고 다른 디바이스들(예컨대, iOS, Windows, Fire 등)과 통합할 수 있다.GoPad is an accessory product that generates Android device-user and vehicle-user behavior data by providing a safer and more convenient in-car Android device experience. GoPad will more tightly integrate selected Android devices into vehicles. However, GoPad is not limited to integration with Android devices and may integrate with other devices (eg, iOS, Windows, Fire, etc.).

GoPad 디바이스는 클립 메커니즘을 통해 자동차 앞유리 근방에서 사용자의 차량의 대시보드에 고정될 하드웨어 크레이들이다. 이는 하기의 특징들을 제공할 것이다: The GoPad device is a hardware cradle that will be secured to the dashboard of the user's vehicle near the windshield via a clip mechanism. This will provide the following features:

차량 정보를 포착하고 분석 및 사용자에의 제시를 위한 시스템들로 전송하는 OBD2 리더(OBD2 Reader) 하드웨어 디바이스

OBD2 Reader hardware device that captures vehicle information and transmits it to systems for analysis and presentation to the user

임베디드 블루투스 연결이 없는 차량들에서 핸즈프리 능력을 제공하기 위해 크레이들에 있는 블루투스 무선 및 듀얼 마이크로폰들

Bluetooth radio and dual microphones in cradle to provide hands-free capability in vehicles without an embedded Bluetooth connection

차량 스테레오 시스템에의 보조 입력(Aux-in) 연결을 통한 오디오를 사용한, 음성 다이얼링 및 제어를 비롯한, 핸즈프리 휴대폰 사용

Hands-free mobile phone use, including voice dialing and control, with audio via an Aux-in connection to the vehicle stereo system

차량 스테레오 시스템에의 보조 입력 연결을 통한 오디오를 사용한, 음성 개시 및 음성 제어를 비롯한, 핸즈프리 내비게이션

Hands-free navigation, including voice initiation and voice control, using audio via an auxiliary input connection to a vehicle stereo system

보조 입력 스테레오 연결을 통한 자동차 스테레오에의 오디오 출력을 사용한 미디어 재생

Media playback with audio output to car stereo via auxiliary input stereo connection

충전 및 사용을 위해 USB(차량 보조 전원 포트)를 통한 안드로이드 디바이스에의 전력 공급

Powering Android devices via USB (Vehicle Auxiliary Power Port) for charging and use

음성 및 연결 플랫폼을 통한 모든 음성 제어 기능들에 대한 지능적 에이전트 도움

Intelligent agent assistance for all voice control functions via voice and connectivity platform

음성 및 연결 플랫폼을 통한 지능적 에이전트, 사용자 데이터 포착, 및 콘텐츠의 전달을 위한 클라우드-연결 웹 서비스

Cloud-connected web services for intelligent agents, user data capture, and content delivery via voice and connectivity platforms

사용자의 운전 경험을 향상시키기 위한 안드로이드 디바이스 상의 운전 효율 및 피드백 특징들

Driving Efficiency and Feedback Features on Android Devices to Improve User's Driving Experience

안드로이드 디바이스의 아이즈프리(eyes-free) 사용을 추가로 가능하게 하기 위한 크레이들 상의 최적화된 한 세트의 물리적 컨트롤들

An optimized set of physical controls on the cradle to further enable eyes-free use of Android devices.

운전자들이 사용하고자 하는 앱들을 쉽고 안전하게 시작할 수 있게 하는 간단한 앱 시작 관리자(launcher) 메커니즘

A simple app launcher mechanism that allows drivers to easily and safely launch the apps they want to use.

API 써드파티 소프트웨어가 크레이들의 물리적 버튼들을 이용할 수 있게 하는 간단한 물리적/에이전트 컨트롤들

Simple physical/agent controls that allow API 3rd party software to use the cradle's physical buttons

핸즈프리 착신 문자 메시지 읽기

Read incoming text messages hands-free

핸즈프리 페이스북 활동 읽기

Read hands-free Facebook activity

크레이들 하드웨어Cradle Hardware

크레이들 설계Cradle design

기계적 설계mechanical design

크레이들은 2 부분들로 설계될 것이다: 1) 베이스 크레이들 유닛, 및 2) 디바이스 특정 어댑터. 모든 주요 기능은 베이스 크레이들 유닛에 들어갈 것이고, 어댑터는 안드로이드 디바이스-특정 물리적 및 전기적 적합 능력만을 제공한다.The cradle will be designed in two parts: 1) a base cradle unit, and 2) a device specific adapter. All major functions will go into the base cradle unit, and the adapter only provides Android device-specific physical and electrical fit capabilities.

크레이들의 물리적 폼 팩터는 크기 및 부피를 최소화하면서 디바이스 + 어댑터(견고하게), 명시된 물리적 컨트롤들, 및 크레이들 마더보드를 수용해야만 한다. 디바이스는 반대 방향으로 또는 뒤집어서 삽입가능해서는 안된다.The cradle's physical form factor should accommodate the device + adapter (rigidly), specified physical controls, and cradle motherboard while minimizing size and volume. The device shall not be insertable in the opposite direction or upside down.

크레이들 전자부품의 냉각은 가능한 한 많이 가려져서 사용자에게 보이지 않거나 설계 내에 포함된 통풍구들에 의해 수동적일 것이다.The cooling of the cradle electronics would be passive by vents included in the design or hidden from view as much as possible.

산업적 설계industrial design

크레이들의 전체적 설계는 사용자가 가능한 한 적은 직접적 관찰/상호작용으로 행동들을 완료하는 데 도움을 주어야만 한다. 버튼들은 촉각적 차별화를 가져야 하고, 청각적/촉각적 큐들이 적절한 경우 사용되어야 하며, 기타이다.The overall design of the cradle should help the user complete actions with as little direct observation/interaction as possible. Buttons should have tactile differentiation, audible/tactile cues should be used where appropriate, etc.

크레이들 산업적 설계는 미정이지만, 공개 시연을 위해 아주 높은 마무리(fit-and-finish) 레벨이 요구된다. 크레이들은 명품 레벨로 제작된 실제 상업적 제품의 느낌이 있다. 크레이들은 최고급 아우디 또는 메르세데즈 차량 인테리어에서 어울리지 않는 느낌이 나지 않으며, 재료 품질 및 외관의 면에서 이 인테리어들과 어울린다.The cradle industrial design is undecided, but a very high fit-and-finish level is required for public demonstration. The cradle has the feel of a real commercial product made to a luxury level. The cradle does not feel out of place in the interior of a high-end Audi or Mercedes car, and matches these interiors in terms of material quality and appearance.

마감 재료 조사는 페인트, 가공된 금속, 가공된 플라스틱, 고무 페인트 등을 포함해야만 한다.Finishing material surveys should include paints, machined metals, machined plastics, rubber paints, etc.

물리적 컨트롤physical control

버튼button

크레이들은 아이즈프리 사용 편의성을 돕는 엄선한 물리적 컨트롤들(버튼들)을 포함할 것이다.The cradle will include handpicked physical controls (buttons) that aid in ease of use of Eyesfree.

하기의 버튼들이 필요하다:The following buttons are required:

에이전트 버튼: 음성 제어를 활성화, 앱 시작 관리자를 활성화, 기타

Agent button: enable voice control, enable app launcher, etc.

앞으로 버튼: 다음 미디어 트랙, 전화 통화 종료/거부

Forward button: next media track, end/reject phone call

뒤로 버튼: 이전 미디어 트랙, 전화 통화 응답

Back button: previous media track, answer phone calls

재생/일시중지 버튼: 재생 또는 미디어 재생을 일시중지, 전화 통화 음소거

Play/Pause button: Pause play or media playback, mute phone calls

버튼들은 그들이 어떻게 사용되는지(한 번 누름, 두 번 누름, 길게 누름 등에 기초하여 다수의 오버로딩된(overload) 행동들을 가능하게 한다.Buttons enable multiple overloaded actions based on how they are used (single press, double press, long press, etc.).

조명light

저조명 환경에서의 사용을 위해 물리적 컨트롤들의 백라이팅/하이라이팅이 필요하다. 조명/범례는 다음과 같이 거동해야만 한다:Needs backlighting/highlighting of physical controls for use in low light environments. The light/legend should behave like this:

앞으로/통화 종료 버튼: 전화 통화가 활성일 때를 제외하고는 디폴트 조명을 사용해야만 한다. 통화가 활성일 때, 통화가 종료될 때까지 통화 종료 범례가 발광되어야 한다.

Forward/End Call Button: The default lighting should be used except when a phone call is active. When a call is active, the call end legend should flash until the call is ended.

뒤로/통화 응답: 전화 통화가 들어오고 있을 때를 제외하고는 디폴트 조명을 사용해야만 한다.

Back/Call Answer: You should use the default lighting except when an incoming phone call is in progress.

재생/일시중지, 음소거: 통화가 활성일 때, 통화 음소거 범례가 발광되어야 한다. 버튼이 눌러지면, 통화가 음소거 상태에 들어가야만 하고, 음소거 상태를 표시하기 위해 음소거 범례 백라이트가 적색으로 변해야 한다. 버튼을 다시 누르는 것은 음소거 상태 및 범례 백라이트 색상을 토글시킬 것이다.

Play/Pause, Mute: When a call is active, the call mute legend should flash. When the button is pressed, the call should enter the muted state, and the mute legend backlight should turn red to indicate the muted state. Pressing the button again will toggle the muted state and the legend backlight color.

크레이들 전원 켜기를 표시하는 야단스럽지 않고 그리고/또는 매력적인 표시등(pilot light)이 필요하다.There is a need for an unobtrusive and/or attractive pilot light to indicate cradle power on.

업그레이드가능 펌웨어Upgradable Firmware

크레이들 펌웨어는 디바이스 상에서 실행 중인 GoPad 안드로이드 애플리케이션의 제어 하에서 필드 업그레이드(field upgrade)가 수행될 수 있도록 설계된다.Cradle firmware is designed so that field upgrades can be performed under the control of the GoPad Android application running on the device.

디바이스가 업데이트 동작 동안 크레이들로부터 제거되는 것으로부터 이루어질 수 있는 것과 같은, 오염된 펌웨어 업데이트로부터 복구하는 메커니즘이 존재한다.Mechanisms exist to recover from a corrupted firmware update, such as can be from a device being removed from the cradle during an update operation.

USB 오디오USB Audio

크레이들 설계는 디바이스로부터 USB 오디오를 받고(디바이스가 그 능력을 가지고 있을 때) 그것을 자동차 스테레오 보조 입력을 통한 재생을 위해 크레이들 라인 아웃(cradle line-out)으로 중계하는 것을 수용할 수 있다.The cradle design can accommodate receiving USB audio from a device (if the device has the capability) and relaying it to the cradle line-out for playback via the car stereo auxiliary input.

전원everyone

최대 전력 공급maximum power supply

크레이들은, 그 자신의 전력 요구에 부가하여, 5.1V에서 2A를 디바이스에 항상 공급할 수 있을 것이다.The cradle will always be able to supply 2A to the device at 5.1V, in addition to its own power requirements.

디바이스 충전device charging

크레이들은, 이하의 기능들이 동시에 사용되고 있으면서 그의 충전 상태를 증가시킬 수 있도록, 충분한 전력을 각각의 디바이스에 공급할 수 있다.The cradle can supply sufficient power to each device so that the following functions can increase its state of charge while being used simultaneously.

핸즈프리 전화 통화 진행 중

Hands-free phone call in progress

핸즈프리 내비게이션 진행 중

Hands-free navigation in progress

미디어 재생 진행 중(어쩌면 일시중지됨)

Media playback in progress (maybe paused)

고유 디바이스 및 버전 IDUnique device and version ID

크레이들은 고유 디바이스 ID는 물론, 하드웨어와 펌웨어 버전 번호 둘 다를 지원할 수 있다. 안드로이드 애플리케이션은 이 고유 ID들을 읽고/그에 대해 질의할 수 있을 것이다.The cradle can support both hardware and firmware version numbers as well as unique device IDs. The Android application will be able to read/query for these unique IDs.

크레이들 로깅Cradle logging

크레이들은 소프트웨어 개발 및 디버깅을 위한 활동 로깅을 지원할 수 있다. 이 로그들은 안드로이드 애플리케이션에 의해 액세스가능할 수 있다.Cradle can support activity logging for software development and debugging. These logs may be accessible by the Android application.

로깅할 항목들의 예들은 하기의 것들을 포함하지만, 이들로 제한되지 않는다: USB 연결 상태, 버튼 누름, 블루투스 연결 상태 등.Examples of items to log include, but are not limited to: USB connection status, button press, Bluetooth connection status, and the like.

케이블cable

필요한 케이블은 다음과 같다:The cables you will need are:

USB 케이블(전원용)

USB cable (for power)

스테레오 보조 케이블(오디오 출력용)

Stereo auxiliary cable (for audio output)

OBD2 리더OBD2 Reader

하드웨어 OBD2 리더 디바이스가 필요하다. 이 디바이스는 차량 정보를 수집하고 이 차량 정보를, 분석 및 사용자에의 차후의 제시를 위해, OPI 시스템들에 업로드할 것이다.A hardware OBD2 reader device is required. The device will collect vehicle information and upload this vehicle information to OPI systems for analysis and subsequent presentation to the user.

OBD2 리더 모듈은 블루투스 무선을 포함할 것이고, GoPad가 사용 중일 때마다 정보를 수집한다. OBD2 리더 모듈은 정보를 디바이스로 전송하고, 디바이스는 차후에 그 정보를, 분석을 위해, OPI 시스템들에 업로드한다.The OBD2 reader module will include a Bluetooth radio, collecting information whenever the GoPad is in use. The OBD2 reader module sends the information to the device, which later uploads the information to the OPI systems for analysis.

GoPad가 사용 중인지에 관계없이, 차량이 운행되고 있을 때마다 차량 정보를 수집하는 셀룰러 무선을 포함하는 대안의 OBD2 리더 모듈이 장래의 GoPad 버전들을 위해 매우 요망된다. 이 솔루션은 GoPad2 개발과 병행하여 연구될 것이다. 써드파티 파트너(OEM 제조업자)가 요망된다.An alternative OBD2 reader module that includes a cellular radio that collects vehicle information whenever the vehicle is running, regardless of whether a GoPad is being used, is highly desirable for future GoPad versions. This solution will be studied in parallel with the development of GoPad2. A third party partner (OEM manufacturer) is desired.

GoPad 기반 핸즈프리 능력GoPad-based hands-free capabilities

내재된 블루투스 핸즈프리 능력을 갖지 않는 차량들에 대해, GoPad는 이러한 특징들을 제공할 것이다. 하기의 하드웨어 컴포넌트들이 요구된다.For vehicles that do not have the built-in Bluetooth hands-free capability, the GoPad will provide these features. The following hardware components are required.

듀얼 마이크로폰dual microphone

에코 소거 및 잡음 억제 기술과 함께, 듀얼 마이크로폰이 요구된다. 아주 높은 수준의 오디오 품질이 요구된다. 전화 통화의 원격단에 있는 사람이 사용자가 자동차내 핸즈프리 디바이스를 통해 말하고 있다고 결정할 수 없는 것이 요망된다.Along with echo cancellation and noise suppression techniques, dual microphones are required. A very high level of audio quality is required. It is desirable that the person at the remote end of the phone call cannot determine that the user is speaking through a hands-free device in the vehicle.

오디오 품질 벤치마크 디바이스는 Plantronics Voyager Legend BT 헤드셋이다.The audio quality benchmark device is a Plantronics Voyager Legend BT headset.

블루투스 무선bluetooth wireless

GoPad 크레이들은 핸즈프리 프로파일을 지원하는 블루투스 무선을 포함할 것이다. 디바이스는, 크레이들에 삽입될 때, 크레이들 BT 무선에 자동 연결할 것이고, 제거될 때, 연결 해제될 것이다. BT 연결이 어떤 이유로 단절될 때, 연결이 즉각 재구축될 것이다.The GoPad cradle will include a Bluetooth radio that supports the hands-free profile. The device will automatically connect to the cradle BT radio when inserted into the cradle and disconnect when removed. When the BT connection is broken for any reason, the connection will be reestablished immediately.

안드로이드 앱 소프트웨어 android app software - 릴리스의 일 실시예- one embodiment of release

경량 시작 관리자(Lightweight Launcher)Lightweight Launcher

경량 시작 관리자는 디바이스가 크레이들 내에 위치될 때 자동으로 활성화될 것이다. 경량 시작 관리자는, 활성이면, 전화기가 크레이들로부터 제거될 때 비활성화되어야만 한다. 초기 설정 경험은 가능한 한 매끄러워야만 하고, 사용자에 의한 최소 수동 구성을 필요로 해야만 한다.The lightweight launcher will be activated automatically when the device is placed in the cradle. The lightweight launcher, if active, should be deactivated when the phone is removed from the cradle. The initial setup experience should be as smooth as possible and should require minimal manual configuration by the user.

제1 릴리스에서, 시작 관리자는 하기의 기능들에의 액세스를 제공한다:In the first release, the launcher provides access to the following features:

디폴트 바로 가기 바:

Default shortcut bar:

o 전화 통화o phone calls

o 메시지: 문자, 메일 및 페이스북 메시지o Messages: text, mail and Facebook messages

o 내비게이션o Navigation

o 뉴스캐스터: 일반 및 토픽 뉴스 + 페이스북 사용자 타임라인o Newscaster: general and topical news + Facebook user timeline

o 미디어 재생: 로컬 및 온라인 스트리밍 미디어o Media playback: local and online streaming media

자동차 개인 어시스턴트

car personal assistant

애플리케이션 리스트

application list

차량 모듈

vehicle module

GoPad 설정

GoPad Settings

크레이들에 삽입 시에, 시작 관리자는 짧은 지속시간 동안 스플래시 화면(Splash screen)을 디스플레이할 것이다. 시작 관리자는 이어서 경량 시작 관리자 홈 화면을 디스플레이하고 사용자 입력을 기다릴 것이다.Upon insertion into the cradle, the launcher will display a splash screen for a short duration. The launcher will then display the lightweight launcher home screen and wait for user input.

에이전트 버튼의 차후의 두 번 누름은, 어느 애플리케이션이 현재 포그라운드에 있든 간에, 경량 시작 관리자를 나타나게 하고 사용자가 새로운 기능을 선택할 수 있게 할 것이다. GoPad 앱이 이미 포그라운드에 있으면, 에이전트 버튼의 두 번 누름은 사용자를 홈 화면으로 되돌아가게 할 것이다.Subsequent double press of the agent button, whatever application is currently in the foreground, will bring up a lightweight launcher and allow the user to select a new function. If the GoPad app is already in the foreground, a double press of the agent button will take the user back to the home screen.

시스템 볼륨system volume

시작 관리자는 오디오 출력 볼륨을 고정된 레벨(미정임)로 설정할 것이고, 사용자는 차량 스테레오 볼륨 컨트롤을 사용하여 볼륨을 조절할 것이다.The launcher will set the audio output volume to a fixed level (to be determined), and the user will adjust the volume using the vehicle stereo volume control.

화면 밝기screen brightness

크레이들에 있을 때, 디바이스는 자동 화면 밝기 제어로 되어 있어야만 한다. 디바이스가 크레이들로부터 제거될 때, 이것은 사용자의 설정으로 복귀해야만 한다.When in the cradle, the device should have automatic screen brightness control. When the device is removed from the cradle, it must revert to the user's settings.

물리적 컨트롤physical control

크레이들 상의 물리적 컨트롤들은 그들이 어떻게 사용되는지에 따라 하기의 기능들을 가질 것이다:Physical controls on the cradle will have the following functions depending on how they are used:

제어Control 한 번 클릭one click 두 번 클릭double click 클릭하고 있기click and hold 이전Previous

이전 트랙(미디어)

통화 응답(전화)

Previous track (media)

Answering Calls (Phone) next

Next Track (Media)

End/Reject Call (Phone) Play/Pause

Play/Pause Toggle (Music)

Mute a call (phone)

media player
(GoPad) agent

Initiation/Cancellation Agent

Home screen (GoPad)

GoPad Launcher (Third Party App)

자동차 개인 어시스턴트car personal assistant

자동차 개인 어시스턴트(에이전트)는 에이전트 버튼을 한 번 누르는 것에 의해 활성화된다. 에이전트는 음성으로 응답하여, 그의 준비 완료 상태를 알려줄 것이다.The car personal assistant (agent) is activated by pressing the agent button once. The agent will respond with a voice, announcing his readiness status.

에이전트 버튼의 시퀀스 거동은 3 단계로 되어 있다:The sequence behavior of the agent button consists of three steps:

1. 대기 모드: 사용자는 음성 인식을 활성화시키기 위해 버튼을 누를 필요가 있다.1. Standby mode: the user needs to press a button to activate voice recognition.

2. 말하기 모드: 에이전트가 프롬프트를 사용자에게 말하고 있다.2. Talk mode: The agent is speaking a prompt to the user.

3. 듣기 모드: 에이전트가 사용자의 문장을 듣고 있다.3. Listening mode: The agent is listening to the user's sentences.

이 릴리스에서 에이전트가 핸들링할 기능은 하기의 것들로 제한된다:In this release, the functions the agent will handle are limited to the following:

특징 카테고리들(전화, 메시지, 내비게이션, 미디어, 뉴스/페이스북, 차량, 설정) 간의 앱내 탐색

In-app navigation between feature categories (Phone, Messages, Navigation, Media, News/Facebook, Vehicle, Settings)

통화 응답/통화 거부/연락처로부터 다이얼링/통화 이력으로부터 다이얼링/임의의 번호의 다이얼링. 통화를 거부하는 것이 API에 의해 지원되지 않는 것처럼 보이기 때문에, 사용자가 거부하기로 선택하는 경우, 벨 울림을 중단시키고 착신 통화 디스플레이를 지워야 하며, 이어서 사용자가 통화에 응답하지 않은 것처럼 통화가 음성 메일로 자연스럽게 넘어갈 수 있게 한다(이것이 본질적으로 일어난 일임).

Call answer/Call reject/Dial from contact/Dial from call history/Dial any number. Since rejecting a call does not appear to be supported by the API, if the user chooses to reject, it should stop ringing and clear the display of the incoming call, then the call will go to voicemail as if the user did not answer the call. Let it pass naturally (this is essentially what happened).

내비게이션을 개시/취소하는 것. 주소를 직접 말하는 것 또는 주소를 간접적으로 말하는 것(주소의 일부: 국가, 타운, 스트리트, ...에 의해), 연락처로부터 주소를 가져오는 것, 위치 즐겨찾기로부터 주소를 가져오는 것.

To start/cancel navigation. Saying an address directly or indirectly saying an address (part of an address: by country, town, street, ...), getting an address from a contact, getting an address from a location favorite.

지역 업체를 탐색하고("가장 가까운 스타벅스를 찾아줘") 그곳으로 내비게이션을 개시하는 것.

Searching for local businesses ("find the nearest Starbucks") and starting navigation there.

o 지역 업체가 Google Maps API 또는 Yelp에서 발견되고, 범용 연결기(generic connector)가 임의의 지역 업체 위치 소스 API의 통합을 장래에 가능하게 할 필요가 있다.o Local vendors are found in Google Maps API or Yelp, and a generic connector is needed to enable integration of any local vendor location source API in the future.

로컬 미디어를 재생하는 것. 재생 리스트/앨범/아티스트/노래/셔플.

Playing local media. Playlist/Album/Artist/Song/Shuffle.

o 온라인 미디어가 CPA의 제2 버전과 통합될 필요가 있다: Spotify, Pandora,o Online media needs to be integrated with the second version of CPA: Spotify, Pandora,

차량 상태 경고(통지 전용). 연료 부족. 엔진 점검 등, 기타.

Vehicle health alerts (notifications only). lack of fuel. engine check, etc.

써드파티 애플리케이션을 이름으로 시작하는 것.

Starting a third-party application by name.

뉴스 카테고리를 선택하고 읽는 것

Selecting and reading news categories

페이스북 업데이트를 읽는 것

Reading Facebook Updates

다수의 일치들을 감소시키는 명확화 기능이 필요하다(이하의 화면들을 참조).A disambiguation function that reduces multiple matches is needed (see screens below).

일반 사용자 경험: 음성 및 일반 패턴General User Experience: Speech and Normal Patterns

일반 패턴normal pattern

애플리케이션의 음성 시나리오를 구축하는 접근법은 사실들에 기초한다:The approach to building the voice scenario of the application is based on the facts:

음성 인식이 동작할 확률이 아주 제한된다

Speech recognition has a very limited chance of working

에이전트가 부정적 상호작용을 제한할 필요가 있다

Agents need to limit negative interactions

사용자가 원하는 행동을 달성하기 위해 가능한 한 음성 명령을 보다 적게 제공할 필요가 있다.

There is a need to provide as few voice commands as possible to achieve the desired behavior of the user.

임의의 상호작용의 수행을 ASR 신뢰도에 의해서가 아니라 달성하는 시간으로 평가할 필요가 있다.

It is necessary to evaluate the performance of any interaction by the time to achieve it, not by the ASR reliability.

이 비전이 성공적이기 위해서는, 에이전트가 양 유형의 시나리오들: 직접 음성(Direct Voice) 패턴 및 차선책(Work-a-round) 패턴의 지능적 조합을 사용할 필요가 있다.For this vision to be successful, the agent needs to use an intelligent combination of both types of scenarios: the Direct Voice pattern and the Work-a-round pattern.

직접 음성 패턴direct speech pattern

직접 음성 패턴은 음성 인식의 영역에서 통상적이다. 그의 품질은 ASR의 신뢰도 및 NLU(Natural Language Understanding)의 신뢰도에 의해 유효성 확인된다.Direct speech patterns are common in the realm of speech recognition. Its quality is validated by the reliability of ASR and the reliability of Natural Language Understanding (NLU).

전화 모듈 및 전화를 거는 행동의 경우에, "Bastien Vidal에게 전화해"(하나의 전화 번호를 갖는 고유의 연락처)라고 요구할 수 있고, 에이전트는 곧바로 연락처를 찾을 것이고 Bastien Vidal에게 전화하는 행동을 사용자에게 제안할 것이다.In the case of the phone module and the action to make a call, you can ask "Call Bastien Vidal" (unique contact with one phone number), the agent will find the contact immediately and tell the user the action to call Bastien Vidal will suggest

직접 음성 패턴에서의 문제점은 사용자의 음성 질의와 직접 일치하는 것이 없을 때 또는 명확한 행동을 달성하기 위해 사용자로부터 추가 정보를 필요로 할 때 일어나는 것이다.The problem with direct speech patterns arises when there is no direct match to the user's voice query or when additional information is required from the user to achieve an explicit action.

사례의 샘플:Sample of cases:

많은 전화 번호를 갖는 사람에게 전화하고자 하는 것

Trying to call someone who has many phone numbers

많은 전화 번호 및 이메일 주소를 갖는 사람에게 메시지를 보내고자 하는 것

You want to send a message to someone who has many phone numbers and email addresses.

직접 음성 인식에 의한 주소가 틀리고 (운전 중이기 때문에) 아무 것도 타이핑할 수 없는 것.

The address by direct speech recognition is wrong (because you're driving) and you can't type anything.

차선책 패턴(WAR)Next best pattern (WAR)

WAR 패턴은 음성 및 연결 플랫폼이 사람과 기계 사이의 대화 계속(임의의 한 차례의 질문/대답 후에, 에이전트는 음성 인식 버튼의 활성화를 자동으로 시작할 것임) 및 TDMC(Temporal Dialog Matrix Context)(TDMC의 설명에 대해서는 이하를 참조)의 생성을 가능하게 한다는 사실에 기초한다.The WAR pattern is that speech and connectivity platforms continue the conversation between humans and machines (after any one question/answer, the agent will automatically initiate activation of the speech recognition button) and Temporal Dialog Matrix Context (TDMC) It is based on the fact that it enables the creation of (see below for a description).

대화 계속은 상이한 유형의 WAR 시나리오들의 생성을 가능하게 한다Conversation continuation enables creation of different types of WAR scenarios

리스트 항목 선택

Select list item

o 내비게이션 항목 단계 및 번호의 선택을 갖는 임의의 리스트의 경우에o For any list with a selection of navigation item steps and numbers

빈도수 이력 사전 대응성

Frequency History Proactive Responsiveness

oo

단계별 선택

Step-by-step selection

애플리케이션의 각각의 항목 화면은 속성들을 갖는 리스트 항목 제시에 기초한다:Each item screen of the application is based on presenting a list item with attributes:

각각의 항목은 1부터 5까지의 숫자를 갖는다

Each item has a number from 1 to 5

각각의 항목은 라벨에 의해 읽혀진다

Each item is read by a label

일반 항목 리스트 제시Present a list of general items

일반 리스트

general list

o 엔티티 필터o Entity Filter

o 알파벳 필터o Alphabet filter

알파벳 숫자

alphanumeric

이력 숫자

history number

이력 빈도수 리스트 제시Present frequency list of history

스플래시 화면splash screen

안드로이드 앱이 시작될 때 그리고 디바이스가 크레이들에 위치될 때마다 브랜드를 디스플레이하는 스플래시 화면이 짧게 디스플레이될 것이다.When the Android app is launched and whenever the device is placed in the cradle, a splash screen displaying the brand will be displayed briefly.

로그인 화면login screen

전화기가 처음으로 크레이들에 위치될 때 또는 사용자가 안드로이드 애플리케이션으로부터 명시적으로 로그 아웃했을 때 시작 관리자 로그인 화면이 스플래시 화면에 뒤따른다. 이는 브랜드를 디스플레이할 것이고, 사용자 이름/패스워드에 의한 로그인을 제안할 것이다. 계정 생성 랭크가 또한 제시되어, 필요한 경우, 사용자가 이메일, 사용자 이름/패스워드 또는 페이스북 계정을 통해 새로운 계정을 생성할 수 있게 할 것이다.The launcher login screen is followed by a splash screen when the phone is placed in the cradle for the first time or when the user has explicitly logged out of the Android application. This will display the brand and suggest login by username/password. An account creation rank will also be presented, allowing the user to create a new account, if necessary, via email, username/password or Facebook account.

로그인 login 등록 옵션Registration option 등록 화면registration screen

홈 화면home screen

홈 버튼이 눌러질 때 또는 전화기가 크레이들에 놓여진 후에, 홈 화면은 상단에 걸쳐 있는 주요 기능들에 대한 바로 가기 버튼들은 물론 하단에 걸쳐 있는 어떤 상태 정보(온도 및 나침반 방향)를 갖는 현재 위치의 지도를 디스플레이할 것이다. 상단 바는 또한 적절한 경우 상태 및 통지 정보를 반영할 것이다.When the home button is pressed or after the phone is placed in the cradle, the home screen displays a view of your current location with some status information (temperature and compass direction) spanning the bottom as well as shortcut buttons for key functions spanning the top. It will display a map. The top bar will also reflect status and notification information where appropriate.

홈 화면(크레이들)Home screen (cradle) 홈 화면(크레이들 없음)Home screen (no cradle)

홈 화면은 하기의 통지를 디스플레이할 것이다:The home screen will display the following notification:

부재중 전화

missed call

착신 메시지

Incoming message

차량 고장

vehicle breakdown

전화telephone

GoPad는 커스텀 GoPad 전화 UX를 뒷받침하는 비치된 안드로이드 전화 API를 사용할 것이다.GoPad will use the built-in Android phone API to support custom GoPad phone UX.

착신 통화 통지Incoming call notification

에이전트는 착신 통화 정보(발신자가 연락처에 있는 경우 발신자 이름, 그렇지 않은 경우, 발신자 번호)를 크게 읽고, 필요한 경우 벨소리를 무음화하고 미디어 재생을 일시중지하며, 이어서 사용자 행동을 요청해야만 한다. 사용자는 3가지 방법들 중 하나를 통해 응답할 수 있다:The agent must read the incoming call information (caller name if the caller is in Contacts, caller ID otherwise) aloud, mute the ringer if necessary, pause media playback, and then request user action. Users can respond in one of three ways:

음성으로 통화를 수락하거나 통화를 거부하고 그것을 음성 메일로 송신한다.

Accept or reject a call by voice and send it to voicemail.

온스크린 터치 버튼을 통해 통화를 수락/거부한다

Accept/reject calls via on-screen touch buttons

이전 트랙/통화 수락 또는 다음 트랙/통화 거부 버튼을 통해

via previous track/accept call or next track/reject call button

상호작용이 끝나면, 임의의 일시중지된 미디어가 재개되어야만 한다.When the interaction is over, any paused media must be resumed.

터치스크린으로부터, 착신 통화가 다음과 같이 제시될 것이다:From the touchscreen, the incoming call will be presented as follows:

착신 통화 Incoming call 통화 종료end call

발신 통화outgoing call

발신 통화가 에이전트를 깨우기 위해 에이전트 버튼을 누르고, 이어서 번호 또는 연락처 이름과 함께 다이얼 명령을 말하는 것에 의해 음성으로 개시될 수 있다.An outgoing call may be initiated audibly by pressing the agent button to wake the agent, followed by saying a dial command along with a number or contact name.

다수의 번호들이 연락처 이름과 일치하는 경우, 에이전트는, 연락처 최근성(즉, 번호가 최근에 호출함, 최근에 호출됨, 기타)에 의해 그리고 이어서 알파벳 순으로 정렬된, 번호가 매겨진 옵션들의 리스트를 말할 것이다. 사용자는 이어서 통화할 옵션 번호를 음성으로 선택할 것이다. 에이전트는 전화를 걸고 그 번호에 대한 최근성 값을 업데이트할 것이다.If multiple numbers match the contact name, the agent creates a list of numbered options, sorted alphabetically followed by contact recency (ie, number last called last, last called last, etc). will say The user will then voice select an option number to call. The agent will place a call and update the recency value for that number.

통화가 전화 터치스크린을 통해 하기의 방법들을 통해 개시될 수 있다:A call may be initiated via the phone touchscreen via the following methods:

다이얼 패드dial pad 즐겨찾기Favorites 최근recent

연락처contact

통화 상태 디스플레이call status display

모든 통화 상태 정보가 화면(상기 홈 화면을 참조)의 상단에 있는 상태 바에 의해 핸들링될 것이다.All call status information will be handled by the status bar at the top of the screen (see home screen above).

오디오 재생audio playback

미디어 재생media playback

하기의 선택 카테고리를 통해 안드로이드 기반(Android-native) 미디어 파일을 재생하기 위해 미디어 플레이어가 사용될 것이다.A media player will be used to play Android-native media files through the following selection categories.

아티스트

artist

앨범

album

재생 리스트

playlist

아티스트artist 앨범 album 재생 리스트playlist

긴 리스트에서의 항목의 빠른 선택은 리스트의 서브그룹으로의 알파벳 건너뛰기(alphabetic skipping)에 의해 용이하게 될 것이다. 예를 들어:Quick selection of items in a long list will be facilitated by alphabetic skipping into subgroups of the list. For example:

화면의 오른쪽 경계에 있는 알파벳 리스트가 빠른 탐색을 위해 손가락 끝으로 스크러빙될 수 있다.The alphabetical list on the right border of the screen can be scrubbed with your fingertips for quick navigation.

미디어 플레이어에 대한 주 제어는 에이전트를 통할 것이다. 주어진 카테고리에서 다수의 일치들이 가능할 때, 에이전트는 온스크린 번호가 매겨진 리스트를 제공할 것이고 사용자가 번호에 의해 일치를 선택할 수 있게 할 것이다. 예를 들어:The main control over the media player will be through the agent. When multiple matches are available in a given category, the agent will present an on-screen numbered list and allow the user to select a match by number. For example:

미디어 플레이어는 재생 중에, 이용가능한 경우 앨범 아트와 함께, 아티스트/앨범/노래 정보를 디스플레이할 것이다. 경과 시간과 총 시간이 디스플레이될 것이다. 공간이 있으면, 다음 트랙 이름도 디스플레이될 수 있다.The media player will display artist/album/song information during playback, along with album art if available. The elapsed time and total time will be displayed. If space is available, the next track name may also be displayed.

크레이들 상의 이전, 다음, 및 재생/일시중지 버튼을 누르는 것은, 적절한 경우, 재생에 영향을 미쳐야만 한다.Pressing the previous, next, and play/pause buttons on the cradle should affect playback, where appropriate.

미디어 플레이어는 디바이스 상의 디폴트 위치에 있는 미디어 파일을 재생해야 한다(즉, 다른 미디어 플레이어들로부터 공유 라이브러리들이 미디어 플레이어에 의해 액세스가능해야만 한다).The media player must play the media file in the default location on the device (ie, shared libraries from other media players must be accessible by the media player).

재생 리스트가 미디어 플레이어에 특유하기 때문에, GoPad 미디어 플레이어는 하기의 미디어 플레이어 애플리케이션으로부터 재생 리스트를 가져오기(import)해야만 한다:Because playlists are specific to the media player, the GoPad media player must import the playlists from the following media player application:

Google Play Music

Android Music App

내비게이션navigation

기본 내비게이션basic navigation

내비게이션 기계 부분은 비치된 안드로이드 구글 내비게이션 애플리케이션을 통해 핸들링될 것이다. LW 시작 관리자 및 에이전트는 하기의 것들 중 하나를 선택하는 것에 의해 목적지로의 내비게이션을 시작하는 데 사용될 수 있는 음성 프런트엔드를 Google Nav에 제공할 것이다.The navigation mechanism part will be handled via the built-in Android Google navigation application. The LW launcher and agent will provide Google Nav with a voice frontend that can be used to initiate navigation to a destination by selecting one of the following:

즐겨찾기

Favorites

최근 목적지

recent destination

주소록 연락처

address book contact

임의의 주소("333 West San Carlos, San Jose, California")

Any address ("333 West San Carlos, San Jose, California")

주소록 연락처address book contact 즐겨찾기Favorites 최근recent

경량 시작 관리자는 Google Nav를 개시하고 그에 목적지를 넘겨줄 것이고, 이 때 Google Nav는 내비게이션 제공자의 자리를 넘겨 받을 것이다.The lightweight launcher will launch Google Nav and hand over the destination to it, which will then take over as the navigation provider.

사용자는, 내비게이션 기능을 취소시키고 나중에 그에게로 복귀하는 일 없이, 시작 관리자(또는 다른 애플리케이션)로 돌아가고 내비게이션 기능을 백그라운드에 둘 수 있을 것이다. 이것을 하는 통상적인 방법은 홈 화면으로 돌아가기 위해 에이전트 버튼을 두 번 누르는 것 또는 에이전트를 활성화시키고 새로운 기능을 요청하는 것을 포함한다.The user will be able to return to the launcher (or other application) and put the navigation function in the background, without canceling the navigation function and returning to it later. Common ways to do this include pressing the agent button twice to return to the home screen or activating the agent and requesting a new function.

착신 문자 메시지 응답Reply to incoming text messages

에이전트는 문자 메시지를 수신했다는 것을, 보낸 사람이 주소록에 있다면 보낸 사람의 이름을 포함하여, 사용자에게 음성으로 통지하고, 답신 전화를 하거나 "현재 운전중입니다 곧 다시 연락드리겠습니다" 형태의 자동화된 사용자 정의 상용구 응답을 보낼 옵션을 사용자에게 제공해야만 한다.The agent notifies the user by voice that a text message has been received, including the sender's name if the sender is in the address book, returns a call, or automated customization in the form of "I'm driving and I'll get back to you soon" You must give the user the option to send a boilerplate response.

착신 문자 디스플레이Incoming text display

페이스북 활동 리더facebook activity leader

GoPad 페이스북 활동 리더(Activity Reader)가 GoPad 앱에 포함될 것이다. 이 특징은 페이스북 담벼락 글(wall post)을 사용자에게 읽어줄 것이고, 좋아요(Liking)에 대한 큰 버튼을 제공할 것이다.The GoPad Facebook Activity Reader will be part of the GoPad app. This feature will read Facebook wall posts to users and provide a large button for Likes.

페이스북 활동Facebook activity

착신 페이스북 메시지가 또한, 착신 문자 메시지가 읽혀지는 것과 거의 동일한 방식으로, 읽혀질 것이다. 사용자는 상용구 응답을 보낸 사람에게 송신할 수 있다.Incoming Facebook messages will also be read in much the same way as incoming text messages are read. Users can send boilerplate responses to the sender.

페이스북 메시지facebook message

뉴스 리더news reader

GoPad 애플리케이션은 뉴스캐스터의 방식의 통합된 뉴스 읽어주기를 포함할 것이다. 이는 하기의 특징들을 지원할 것이다:The GoPad application will include integrated news reading in the manner of a newscaster. This will support the following features:

즐겨찾기

Favorites

최근

recent

뉴스 카테고리(즉, 기술, 스포츠, 기타)

News categories (i.e. technology, sports, other)

생일 미리 알림

birthday reminder

즐겨찾기Favorites 최근recent 카테고리category

뉴스 스토리가 전체 화면 텍스트 대안을 갖는 쉽게 파싱되는 포맷으로 제시될 것이다.News stories will be presented in an easily parsed format with full screen text alternatives.

뉴스 스토리news story 전체 화면 텍스트full screen text

차량 상태/효율Vehicle Condition/Efficiency

차량 상태 특징을 시작하는 것은 BT OBD 리더로부터 데이터에 기초하여 하기의 정보를 디스플레이할 것이다.Starting the vehicle status feature will display the following information based on data from the BT OBD reader.

차량이 OBD를 통한 연료 레벨 측정을 지원하면, 연료 충전 이전의 현재 속도에서의 마일/킬로미터 및 시간의 범위가 필요하다(이 숫자는 보수적이어야만 한다). 이것은 최근 거동의 추후 결정되는 윈도우 상에서 계산되어야만 한다. OBD를 통해 연료 탱크 충전 상태를 제공하지 못하는 자동차에 대해 차선책이 매우 요망된다.

If the vehicle supports fuel level measurement via OBD, then a range of miles/km and hours at the current speed prior to refueling is required (this number should be conservative). It has to be calculated on a later determined window of recent behavior. A workaround is highly desirable for automobiles that do not provide fuel tank fill status via OBD.

이 여행의 MPG 및 모든 여행의 주행 평균

MPG for this trip and driving average for all trips

기본적으로 가속률/감속률을 측정하고 가속 페달 및 브레이크 페달을 부드럽게 다루라고 운전자를 그래픽으로 격려하는 순간 운전 효율 디스플레이 및 운전자가 시간에 따라 어떻게 했는지의 이력 디스플레이(아마도 자동차의 EPA 등급에 대해 플로팅됨?).

It basically measures the acceleration/deceleration rate and the moment it graphically encourages the driver to handle the accelerator and brake pedal gently, a driving efficiency display and a history display of what the driver has done over time (probably plotted against the car's EPA rating). ?).

경과 여행 시간, 여행 중의 효율, 사용된 연료 등을 비롯한 여행 통계.

Travel statistics including elapsed travel time, efficiency during travel, fuel used and more.

여행 통계를 0으로 설정하는 리셋 버튼

Reset button to set travel stats to zero

최적으로는 최근의 운전 이력에 기초하여 시간(일)으로 변환되는, (차량 데이터베이스로부터 유지관리 스케줄 정보에 기초한) 필요한 다가오는 유지관리

Upcoming maintenance required (based on maintenance schedule information from vehicle database), optimally converted to hours (days) based on recent driving history

고장 진단 오류 코드

Troubleshooting Error Codes

차량 보안(안전하지 않은 차량 거동, 임계 척도 등)

Vehicle security (unsafe vehicle behavior, critical measures, etc.)

그에 부가하여, 하기의 고 우선순위 시나리오에 대한 음성 경보가 임의의 다른 현재 디스플레이된 기능을 중단시켜야 하고, 디바이스 화면은 오류 디스플레이를 갖는 차량 상태 페이지로 전환해야 한다.In addition, voice alerts for the following high priority scenarios should interrupt any other currently displayed function and the device screen should switch to a vehicle status page with an error display.

연료 부족(문턱값 미정임, 최근 운전에 기초하여 변할 수 있다 - 상기 참조). 이것은 연료 레벨 읽기 능력에 의존한다(상기 참조).

Low fuel (threshold undetermined, may vary based on recent operation - see above). This depends on the ability to read the fuel level (see above).

즉각적인 운전자 행동을 요구하는 파국적 차량 오류(오류 코드 리스트는 미정임)(즉, 길 한쪽으로 차를 대고 엔진을 정지시키는 것이 안전하므로 곧바로 그렇게 함)

Catastrophic vehicle error that requires immediate driver action (error code list undecided) (i.e. it is safe to pull over to the side of the road and stop the engine, so do so immediately)

차량 효율vehicle efficiency 고장 진단fault diagnosis 차량 보안vehicle security

차량 여행 정보vehicle travel information

써드파티 애플리케이션third-party applications

GoPad 애플리케이션은 GoPad가 내재적으로 제공하지 않는 기능을 제공하는 써드파티 안드로이드 애플리케이션을 시작하는 빠르고 쉬운 길을 제공할 것이다. 써드파디 앱 시작 관리자는 차량을 운전하고 있는 동안 애플리케이션 시작을 쉽게 만들어주는 큰 터치 타깃을 제공할 것이다.The GoPad application will provide a quick and easy way to launch third-party Android applications that provide functionality that GoPad does not inherently provide. A third-party app launcher will provide a large touch target that makes it easy to launch applications while driving a vehicle.

제시되는 애플리케이션들의 리스트는 사용자에 의해 구성될 것이며, 사용자는 디바이스 상에 존재하는 모든 애플리케이션들의 리스트로부터 선택할 것이다.The list of applications presented will be configured by the user, and the user will select from a list of all applications present on the device.

앱 시작 관리자 화면app launcher screen

설정setting

설정 영역은 사용자가 자신의 선호사항에 따라 GoPad 애플리케이션을 구성하는 곳이다. 설정의 최종 리스트는 미정이지만, 하기의 것들을 포함할 것이다:The settings area is where users configure the GoPad application according to their preferences. The final list of settings is undecided, but will include:

착신 문자 자동 응답 상용구

Auto Answer Incoming Text AutoText

착신 페이스북 메시지 자동 응답 상용구

Incoming Facebook Message Auto Answer AutoText

(페어링된 BT OBD2 어댑터들의 리스트로부터) BT OBD2 어댑터 선택

(from list of paired BT OBD2 adapters) BT OBD2 adapter selection

엔진 교체

engine replacement

엔진 유형(가솔린 또는 디젤)

Engine type (gasoline or diesel)

측정 단위(야드파운드법 또는 미터법)

Units of measurement (pound-pound or metric)

설정setting

차량 식별vehicle identification

다수의 차량/크레이들을 식별할 수 있는 것이 요구된다. 차량/크레이들별로 추적하는 항목들은 하기의 것들을 포함한다:It is desirable to be able to identify multiple vehicles/cradles. Items tracked by vehicle/cradle include:

번호판

license plate

VIN(OBD가 VIN을 제공하지 않는 경우)

VIN (if OBD does not provide VIN)

크레이들 고유 ID

Cradle Unique ID

블루투스 페어링Bluetooth pairing

크레이들 페어링Cradle Pairing

디바이스가 크레이들에 처음으로 삽입될 때 시작 관리자가 디바이스를 그 크레이들에 자동으로 페어링할 수 있는 것이 요구된다.It is desired that the launch manager be able to automatically pair the device to the cradle when the device is inserted into the cradle for the first time.

기존의 차량 HFP 또는 A2DP에 페어링하는 것은 이 릴리스에 대한 특징이 아니고, 어떤 지원도 필요하지 않다.Pairing to an existing vehicle HFP or A2DP is not a feature for this release and no support is required.

데이터 수집data collection

하기의 데이터가 수집되고 시스템에 저장되어야만 한다:The following data must be collected and stored in the system:

사용자 이름/이메일/ 전화 번호

Username/Email/Phone Number

자동차 정보

car information

o VIN 번호o VIN number

o 번호판 번호o License plate number

운전 로그(모든 엔트리가 타임 스탬핑됨)

Driving log (all entries are timestamped)

o 자동차o car

거리

Street

속도

speed

엔진 가동 시간

engine uptime

위치(들)

location(s)

내비게이션 목적지

navigation destination

o 애플리케이션o Applications

소프트웨어 미세 조정을 위해 모든 사용자 상호작용이 로깅되어야만 한다.

All user interactions must be logged for software fine-tuning.

오류 코드 로그

error code log

연비

Fuel efficiency

데이터 수집 기법Data Collection Techniques

각각의 데이터 또는 각각의 유형의 데이터에 대한 가장 쉬운 데이터 수집 방법이 이용되어야만 한다. 사용자를 대신하여 데이터를 제공할 수 있는 경우, 그렇게 해야만 한다(예를 들어, 연료 탱크 크기를 VIN 번호에 기초하여 결정할 수 있다면, 그 정보에 대해 사용자에게 질문하지 말고, 그렇게 해야만 한다). The easiest data collection method for each data or each type of data should be used. If data can be provided on behalf of the user, it should do so (eg, if fuel tank size can be determined based on the VIN number, do not ask the user for that information, but should do so).

애플리케이션은 번호판의 카메라 포착을 포함해야만 하고, 애플리케이션은 번호판으로부터 번호판 번호를 파싱하고 그를 사용하여 VIN 번호 및 모든 부가의 부속 데이터를 결정할 수 있다.The application must include camera capture of the license plate, and the application can parse the license plate number from the license plate and use it to determine the VIN number and any additional accessory data.

데이터 익명화Data Anonymization

특정 유형의 수집된 데이터는 전체로서만 관심을 끈다 - 사용자 특정 형태로는 가치가 없다 -. Mixpanel과 같은 서비스에 의해 수집되는 종류의, 애플리케이션 자체의 사용성 데이터(즉, 버튼 클릭의 패턴 등)가 이 카테고리에 속한다. 이 데이터는 실시가능한 경우 데이터 프라이버시를 이유로 익명화되어야만 한다.Certain types of collected data are of interest only as a whole - they have no value in a user-specific form -. Usability data of the application itself (ie patterns of button clicks, etc.), of the kind collected by services like Mixpanel, fall into this category. This data should be anonymized for data privacy reasons where practicable.

소프트웨어 업데이트software update

경량 시작 관리자는 새로운 소프트웨어 버전이 현장에 있는 디바이스들로 푸시 아웃될 수 있게 하기 위해 OTA 업데이트 메커니즘을 필요로 한다.The lightweight launcher requires an OTA update mechanism to allow new software versions to be pushed out to devices in the field.

물리적/에이전트 컨트롤 APIPhysical/Agent Control API

디바이스가 크레이들에 있고 그의 앱이 포그라운드(또는 어떤 경우에, 백그라운드)에서 실행 중인 동안 써드파티 앱 개발자가 크레이들 물리적 컨트롤에는 물론 에이전트 명령에도 응답할 수 있게 하는 간단한 소프트웨어 API가 요구된다. A simple software API is required that allows a third-party app developer to respond to agent commands as well as cradle physical controls while the device is in the cradle and his app is running in the foreground (or in the background, in some cases).

이 API는 가능한 한 간단해야만 한다.This API should be as simple as possible.

물리적 컨트롤physical control

물리적 컨트롤 API는 하기의 3개의 버튼에만 대해 3개의 명령 입력(한 번 누름, 두 번 누름, 길게 누름)을 가능하게 해야만 한다.The physical control API should enable 3 command input (single press, double press, long press) for only the 3 buttons below.

이전 트랙

previous track

재생/일시중지

Play/Pause

다음 트랙

next track

써드파티 앱에 의한 에이전트 버튼에의 액세스가 허용되지 않는다.Access to the agent button by third-party apps is not allowed.

에이전트agent

써드파티 앱은 간단한 API를 통해 특정 음성 명령을 받기 위해 등록할 수 있다. 명령의 예는 하기의 것들을 포함할 수 있다:Third-party apps can register to receive specific voice commands through a simple API. Examples of instructions may include:

"다음 트랙"

"next track"

"이전 트랙"

"Previous track"

"일시중지"

"Pause"

소프트웨어 UI 흐름Software UI flow

시장 기회market opportunity

카테고리category 능력ability 시장 기회market opportunity 잠재적 파트너potential partner 이점advantage NavNav

경로 탐색 및 길 안내

POI 검색

Route navigation and directions

POI Search

Business Search Placement

Yelp

OpenTable

Michelin Guide

Oscaro
data
collection mobile service

message

newscaster

phone call

3rd party apps

Generates carrier traffic

ATT

T-Mobile

Boyges

Orange

sprint driving data

OBD2 Reader

GPS

travel data

Driving data for partners

insurance:
Ax
Allianz
AAA

rental car

car manufacturer music Third-party apps:

Pandora

Spotify

TuneIn

Account Registration Incentives

Sales of usage data

Pandora

Spotify

TuneIn

부록 B: 애플리케이션 전반적 제시Appendix B: General presentation of the application

a. 애플리케이션 설명a. Application Description

애플리케이션 Oscar는 운전 중일 때 즐겨찾기 애플리케이션의 사용에 전용되어 있는 애플리케이션이다.Application Oscar is an application dedicated to the use of your favorite applications while driving.

Oscar는 사용자가 안전 모드에서 임의의 기능을 사용하거나 임의의 행동을 할 수 있게 한다. 생성되는 사용자 경험이 임의의 상황에서 달성하기 위한 능력에서의 핵심이다. 언제라도 3가지 매체를 사용할 수 있다:Oscar allows users to use arbitrary functions or perform arbitrary actions in safe mode. The user experience that is created is key in its ability to achieve in any situation. You can use three media at any time:

터치 스크린(애플리케이션의 버튼 인터페이스)

Touch screen (button interface of the application)

물리적 버튼(OE의 경우에 자동차로부터 또는 애프터마켓에 대해 크레이들로부터)

Physical button (from car for OE or from cradle for aftermarket)

음성 명령

voice command

핵심적인 음성 기능은 다음과 같다: Key voice features include:

전화를 거는 것과 받는 것

making and receiving calls

메시지(문자, 메일 및 페이스북)를 보내는 것과 받는 것

Sending and receiving messages (text, mail and Facebook)

내비게이션을 정의하는 것: 원 샷.

What defines navigation: One shot.

뉴스를 읽는 것과 공유하는 것

reading and sharing the news

음악을 재생하는 것

playing music

애플리케이션은 하기의 보조 정리에 기초한다:The application is based on the following lemma:

음성 인식이 작동하지 않음 = 사용자의 문장을 제한함

Speech Recognition Not Working = Restricts User's Sentences

자연스러운 상호작용 = 사람의 대화와 가능한 한 가까운 것

Natural interaction = as close as possible to human conversation

에이전트의 피드백 길이를 제한함 = 짧은 문장

Limiting agent feedback length = short sentences

에이전트의 부정적 피드백을 제한함 = 아니다, 없다, 모른다,...

Limiting agent's negative feedback = no, no, don't know,...

사용자 반복을 제한함 = "다시 말해주세요"라고 요구하지 않음

Limit user repeats = don't ask "Tell me again"

이 5개의 보조 정리는 임의의 사용자 경험의 생성의 중심 키이다.These five lemmas are central to the creation of any user experience.

b. 애플리케이션 아키텍처b. application architecture

하기로 가세요go to do

c. 아키텍처에 기초한 사용자 경험c. User experience based on architecture

d. 검출된 핵심 혁신d. Key innovations detected

i. 대화를 계속하는 것i. continuing the conversation

ii. 전체 화면 에이전트 활성화ii. Enable full screen agent

iii. 자동 웨이크업iii. automatic wake up

iv. 리스트 탐색iv. list navigation

1. 음성 탐색 다음, 이전, 처음, 마지막1. Voice Search Next, Previous, First, Last

2. 알파벳순으로 가기2. Go Alphabetically

3. 리스트의 음성 재생3. Play the list's voice

a. 음성 피드백 최적화a. Voice Feedback Optimization

i. 질의로부터i. from the query

ii. 이전 재생으로부터ii. from previous play

b. 단계별 재생b. step-by-step play

4. 선택4. Choice

a. 목표로 한 항목의 번호에 의해a. by the number of the targeted item

b. 목표로 한 항목의 부분 내용에 의해b. By the partial content of the targeted item

5. 지능적 선택5. Intelligent Choice

a. 운전자 사용으로부터 학습a. Learning from driver use

v. 리스트 필터v. list filter

1. 알파벳 필터1. Alphabet filter

2. 이력 필터2. History filter

3. 빈도수 이력3. Frequency history

4. 연속적 필터4. Continuous filter

vi. 버튼 픽셀 사용자 경험vi. button pixel user experience

vii.vii.

전화 모듈

phone module

e. 서론e. Introduction

f. 구조f. structure

g. 설명g. Explanation

h. 사용자 경험h. user experience

메시지 모듈

message module

내비게이션 모듈

navigation module

뉴스 모듈

News module

미디어 모듈

media module

부록 C:Appendix C:

음성 및 연결 플랫폼Voice and Connectivity Platform

실시 개요Implementation overview

자동차 시장은, 너무나 많은 상이한 종류의 다툼 속에 있기 때문에, 자동차 시장 혼란이라고 부를 수 있는 새로운 혁신 중 하나에 있다. 전기 엔진으로부터 무인 자동차까지, 자동차의 디지털화는 계속 진행 중이고, 자동차 제작회사 전체는 디지털 수명 주기 대 차량 수명 주기에 관한 큰 과제들 중 하나에 직면해 있다.The automotive market is in one of the newest innovations that can be called the automotive market chaos, because it is in so many different kinds of fights. From electric engines to driverless vehicles, the digitization of automobiles continues, and automakers as a whole face one of the big challenges of the digital versus vehicle lifecycle.

그러나 결국, 운전자는 음성, 도로에서 혼자 시간을 보내는 것을 그만두고자 하는 음성이고, 이 시간은, 제약 환경에서 새로운 사용자 경험을 생성할 수 있다면, 자동차를 디지털 세계에 연결하여 사용자를 임의의 컨텍스트에서 자신의 즐겨찾기의 애플리케이션들과 더 많이 연결할 수 있다면, 유용하고 흥미로운 시간으로 변환될 수 있다!But in the end, drivers are the voices who want to stop spending time alone on the road, and this time, if they can create new user experiences in a constrained environment, connect the car to the digital world and connect users to themselves in arbitrary contexts. If you can connect more with the applications of your favorites, it can turn into a useful and interesting time!

xBrainSoft는, 하이브리드 모드 및 사용자의 클라우드와 자동차 임베디드 플랫폼 사이의 동기화된 공중을 통한 업데이트 모드를 가능하게 하는, 자동차 안과 밖에서의 연속적인 사용자 경험에서 시장의 모든 디바이스들에 대해 작동하는 확장가능하고 유연한 플랫폼인 자동차 개인 어시스턴트 제품을 제작하였다.xBrainSoft is a scalable and flexible platform that works for all devices on the market in a continuous user experience in and out of the car , enabling hybrid mode and synchronized over-the- air update mode between the user's cloud and the car embedded platform. Manufactured in- car personal assistant products.

어떤 컨텍스트에서도 적당한 때에 각각의 환경의 최상의 것! XXXXX는 자동차 수명 주기에 영향을 주지 않고 디지털 세계의 짧은 수명 주기의 과제에 맞설 준비가 되어 있다!The best of each environment at the right time in any context! XXXXX is ready to meet the challenges of the short life cycle of the digital world without impacting the car life cycle!

기술 관련 내용Technical content

개요outline

xBrainSoft 음성 및 연결 플랫폼은, 일부 실시예들에서, 온보드 환경과 오프보드 환경 사이의 링크를 구축하도록 이루어져 있는 진보된 플랫폼이다.The xBrainSoft voice and connectivity platform, in some embodiments, is an advanced platform adapted to establish a link between an onboard environment and an offboard environment.

하이브리드, 모듈식 및 애그노스틱 아키텍처에 기초하여, 음성 및 연결 플랫폼은 그의 임베디드 솔루션과 오프보드 플랫폼 사이의 그 자신의 "공중을 통한" 업데이트 메커니즘을 제공한다.Based on a hybrid, modular and agnostic architecture, the voice and connectivity platform provides its own "over the air" update mechanism between its embedded solution and offboard platform.

오프보드 확장 시맨틱 처리 능력에의 연결을 갖지 않는 임베디드 대화 관리로부터, 음성 및 연결 플랫폼은 차량 연결의 "상실 및 복구"를 바탕으로 한 시나리오를 가능하게 하는 컨텍스트 동기화에 의해 하이브리드 관리를 향상시킨다.From embedded dialog management that has no connection to offboard extended semantic processing capabilities, the voice and connectivity platform enhances hybrid management with context synchronization enabling scenarios based on “lost and recover” vehicle connectivity.

강력하고 혁신적이며 완전히 커스터마이즈가능한 자연어 이해 기술을 기반으로, 음성 및 연결 플랫폼은, 특정의 음성 기술에 의존하지 않고, 몰입적 사용자 경험을 제공한다.Based on powerful, innovative and fully customizable natural language understanding technology, the speech and connectivity platform provides an immersive user experience without relying on any specific speech technology.

그의 다채널 능력은, 완전 동기화 메커니즘으로 인해 동일한 사용자별 컨텍스트를 공유하여, 침투적 방식으로 다수의 디바이스들(차량, 전화, 태블릿...)을 통한 상호작용을 가능하게 한다.Its multi-channel capabilities enable interaction through multiple devices (vehicles, phones, tablets...) in an intrusive manner, sharing the same per-user context due to the fully synchronized mechanism.

음성 및 연결 플랫폼의 클러스터화된 서버 아키텍처는 확장가능하고 따라서 서비스의 고부하 및 높은 소비에 대응한다. 이는 산업 표준 기술에 기반하고 통신 보안 및 최종 사용자 개인정보보호를 바탕으로 최상의 실시를 구현한다.The clustered server architecture of the voice and connectivity platform is scalable and thus responds to high loads and high consumption of services. It is based on industry standard technology and implements best practices based on communication security and end user privacy.

음성 및 연결 플랫폼은 또한 복잡한 음성 사용자 상호작용 흐름을 고안하기 위한, 전체 개발 환경에 통합된, 풀 세트의 기능 및 개발자 도구들을 제공한다.The Voice and Connectivity Platform also provides a full set of features and developer tools, integrated into the overall development environment, for designing complex voice user interaction flows.

부가 가치added value

이하에서 xBrainSoft 기술의 기술적 돌파구들 중 일부인, 클라우드 플랫폼과 임베디드 플랫폼으로 구성되어 있는 음성 및 연결 환경을 발견할 것이다.Below you will find some of the technological breakthroughs of xBrainSoft technology, a voice and connected environment consisting of a cloud platform and an embedded platform.

이하의 항목들은 요점 정리로서 제시된다.The following items are presented as a summary.

하이브리드 설계: "서버, 임베디드 자율적 동기화"

Hybrid Design: "Server, Embedded Autonomous Synchronization"

설계에 의해, 음성 및 연결 플랫폼은 로컬적으로는 물론 원격적으로도 동작하는 어시스턴트를 제공한다. 임의의 어시스턴트의 이러한 하이브리드 아키텍처는 처리를 분산시키고, 전체 컨텍스트 동기화를 유지하며, 사용자 인터페이스 또는 심지어 대화 이해를 업데이트하는 강력한 메커니즘을 기반으로 한다.By design, the voice and connectivity platform provides assistants that operate both locally as well as remotely. This hybrid architecture of any assistant is based on a powerful mechanism for distributing processing, maintaining full context synchronization, and updating the user interface or even conversational understanding.

대화 흐름 생성을 위한 한 세트의 기능 도구들

A set of functional tools for creating conversation flows

처음부터, xBrainSoft는 어시스턴트의 개발을 가속시키고 개선시키기 위해 우리의 기술을 바탕으로 한 최상의 세트의 도구들을 제공하는 것에 많은 노력을 하고 있다. 이는 대화 언어 관리자, 기능 모듈의 재사용성, 임의의 VPA의 배포 자동화 또는 유지관리 그리고 임의의 클라이언트 디바이스 상에의 이식성을 향상시키는 전체 개발자 환경을 포함한다.From the very beginning, xBrainSoft has been dedicated to providing the best set of tools based on our technology to accelerate and improve the development of Assistant. It includes a conversational language manager, reusability of functional modules, automation of deployment or maintenance of any VPA, and a full developer environment that enhances portability on any client device.

ID 및 디바이스 연합 서비스(VCP-FS)

Identity and Device Federation Service (VCP-FS)

음성 및 연결 플랫폼 연합 서비스는 사용자 ID와 디바이스를 연합시키는 서비스이다. VCP-FS는 소셜 ID(페이스북, 트위터, Google+) 및 사용자 소유의 연결된 디바이스를 다루며, 이는 침투적 방식으로 가상 개인 어시스턴트에 의해 제공되는 능력 및 기능을 향상시킨다. VCP 연합 서비스는 사용자의 소셜 네트워크 및 심지어 그의 습관을 사용하는 것에 의해 사용자 경험을 향상시킨다.The voice and connectivity platform federation service is a service that associates a user ID with a device. VCP-FS handles social identities (Facebook, Twitter, Google+) and user-owned connected devices, enhancing the capabilities and capabilities provided by virtual personal assistants in an intrusive manner. The VCP federation service enhances the user experience by using the user's social network and even his habits.

자동차 애플리케이션 제품군 준비 완료(CPA)

Automotive Application Suite Ready (CPA)

음성 및 연결 플랫폼의 최상단에서, xBrainSoft는 음성, 터치 스크린 또는, 날씨, 주식, 뉴스, TV 프로그램, 연락처, 캘린더, 전화 등과 같은, 물리적 버튼에 의해 사용되는, 자동차 개인 어시스턴트(CPA) 제품을 제작하기 위한 차량용 애플리케이션 제품군을 제공한다. On top of the voice and connectivity platform, xBrainSoft creates Car Personal Assistant (CPA) products , which are used by voice, touch screen or physical buttons, such as weather, stocks, news, TV shows, contacts, calendar, phone, etc. It provides a suite of automotive applications for

xBrainSoft는 또한 자동차의 CAN 네트워크, 그의 GPS 위치 그리고, 온도, 와이퍼 상태, 엔진 상태 등과 같은, 다양한 차량 센서에 액세스할 수 있는 완전히 통합된 애플리케이션을 제작하기 위한 SDK를 제안한다.xBrainSoft also proposes an SDK for creating fully integrated applications that can access a car's CAN network, its GPS location, and various vehicle sensors, such as temperature, wiper status, engine status, and more.

오프보드 데이터 동기화기

Offboard Data Synchronizer

음성 및 연결 플랫폼은 전역적 데이터 동기화기 시스템을 제공한다. 이 메커니즘은 순회 및 모바일 데이터 연결의 저용량에 의해 야기되는 동기화 문제를 해결한다. 이는 개발자가, 그것이 어떻게 행해지는지가 아니라, 어느 데이터가 동기화될 필요가 있는지에 집중할 수 있게 하기 위해 동기화 시스템의 구성가능한 추상화를 제공한다.The voice and connectivity platform provides a global data synchronizer system. This mechanism solves the synchronization problems caused by the low capacity of traversal and mobile data connections. It provides a configurable abstraction of the synchronization system to allow the developer to focus on which data needs to be synchronized, not how it is done.

외부 API 자동 밸런서

External API Auto Balancer

외부 API를 사용하는 것은 시나리오에 대한 큰 향상이지만, 서비스가 이용가능하지 않게 될 때 또는 클라이언트가 다수의 인자들(가격, 사용자 가입...)에 따라 특정 서비스를 사용하고자 할 수 있는 경우 부작용이 있다. 이 특정 요구사항에 대응하기 위해, 음성 및 연결 플랫폼은 고도로 구성가능하고 제3 데이터 제공업자를 플러그인(예: 마이크로빌링 관리 시스템 상에서 연결할 이벤트 핸들러에 의한 API 소비 관리)으로서 통합하도록 설계되었다.Using an external API is a huge improvement for the scenario, but has side effects when the service becomes unavailable or when the client may want to use a particular service depending on a number of factors (price, user subscription...). have. To address this specific requirement, the voice and connectivity platform is highly configurable and designed to integrate third-party data providers as plug-ins (eg, managing API consumption by event handlers to connect on microbilling management systems).

기능들이 단일의 외부 API에 의존하지 않고 기능들 중 다수를 관리할 수 있는 내부 제공업자에 의존한다. 이 아키텍처에 따라, VCP는 XXXXX 요구사항들을 충족시키도록 구성될 수 있는 자동 밸런스 시스템을 제공한다.Functions do not depend on a single external API, but rather on an internal provider that can manage many of the functions. In accordance with this architecture, VCP provides an autobalancing system that can be configured to meet XXXXX requirements.

사전 대응적 대화

Proactive Conversation

음성 및 연결 플랫폼은 초기 요청 없이 사용자와 대화를 시작하기 위한 전문가 시스템 및 메커니즘을 통합하고 있다.The voice and connectivity platform incorporates expert systems and mechanisms for initiating conversations with users without an initial request.

그들은 함께, 사용자 관심이 이용가능하면 관련 정보를 제공하는 것 또는 사전 대응적 대화 빈도수를 관리하는 것과 같은, 복잡한 작업을 달성하는 한 세트의 도구를 제공한다.Together, they provide a set of tools to accomplish complex tasks, such as providing relevant information when user interests are available or proactively managing conversation frequency.

실제 컨텍스트 대화 이해

Understanding real-world context conversations

"실제 컨텍스트 대화 이해"는 다음과 같은 파라미터들을 갖는 컨텍스트와 관련된 다차원 대화 흐름이다: 컨텍스트 이력, 대화 이력, 사용자 이력, 사용자 프로파일, 지역화, 현재 컨텍스트 도메인 등."Understanding Real Context Dialog" is a context-related multidimensional dialog flow with the following parameters: Context History, Conversation History, User History, User Profile, Localization, Current Context Domain, and the like.

각각의 대화를 분석하는 이 컨텍스트 관련 접근법은 임의의 대화 흐름의 최상의 정확도 이해 그리고, 어시스턴트의 지식을 보관하는 데 필요한 메모리를 최소화하는 것, 임의의 종류의 단절 후의 대화의 연속성, 임의의 애플리케이션의 변환의 단순화 등과 같은, 많은 다른 긍정적 효과들을 가능하게 한다.This context-relevant approach to analyzing each conversation seeks to understand the best accuracy of any conversation flow, and to minimize the memory required to hold the assistant's knowledge, the continuity of the conversation after any kind of disconnection, and the transformation of any application. Many other positive effects are possible, such as the simplification of

공중을 통한 업데이트

update over the air

VCP 전역적 데이터 동기화 메커니즘은 차량의 수명 전부 동안 클라우드 플랫폼, 임베디드 플랫폼 그리고 임의의 연결된 디바이스들 사이에서 "공중을 통해" 임의의 종류의 패키지를 업데이트하는 방법을 제공한다. 우리의 온라인 솔루션과 임베디드 솔루션 사이에서 대화, UI, 로그, 스냅숏을 동기화하는 데 내부적으로 사용될 때, 이러한 "공중을 통한" 시스템은 임베디드 TTS 음성, 임베디드 ASR 사전과 같은 써드파티 자원들을 포함하기 위해 확장될 수 있다. 버전 관리 시스템, 의존성 관리자 및 고압축 데이터 전송에 기초하여, 이것은 하이브리드 솔루션에 대한 최고의 메커니즘을 제공한다.The VCP global data synchronization mechanism provides a way to update packages of any kind "over the air" between cloud platforms, embedded platforms and any connected devices throughout the life of the vehicle. When used internally to synchronize dialogs, UIs, logs, and snapshots between our online and embedded solutions, these "over the air" systems are designed to include third-party resources such as embedded TTS voices and embedded ASR dictionaries. can be expanded. Based on a version control system, dependency manager and highly compressed data transfer, it provides the best mechanism for a hybrid solution.

임의의 디바이스에 대한 서비스의 연속성

Continuity of service to any device

음성 및 연결 플랫폼은, VCP 연합 서비스를 통해, 운전자 ID 및 디바이스를 통한 중단 없이 서비스의 연속성을 제공할 수 있다. 연결된 디바이스들의 증가로 인해, XXXXX 가상 개인 어시스턴트에 의해 액세스가능한 운전자 주의가 자동차에서 소비되는 시간을 초과한다.The voice and connectivity platform, through VCP federated services, can provide continuity of service without interruption through driver IDs and devices. Due to the increasing number of connected devices, driver attention accessible by the XXXXX virtual personal assistant exceeds the time spent in the car.

음성 및 음향 애그노스틱 통합

Voice and acoustic agnostic integration

음성 및 연결 플랫폼은 특정의 음성 기술에 의존하지 않고 음성 인식 및 텍스트-음성 변환 둘 다를 위해 로컬 음성 엔진 또는 원격 음성 제공업자를 사용할 수 있다. 로컬인 것들은 VCP 플러그인에 캡슐화되고, VCP 데이터 동기화 메커니즘을 통해 쉽게 업데이트될 수 있다. 원격 음성 제공업자는 클라우드측에서 직접 VCP를 관리할 수 있다.The speech and connectivity platform does not rely on specific speech technology and can use a local speech engine or remote speech provider for both speech recognition and text-to-speech conversion. Local ones are encapsulated in VCP plugins and can be easily updated via the VCP data synchronization mechanism. Remote voice providers can manage VCPs directly from the cloud side.

VPA가 음성 인식 및 텍스트-음성 변환을 위해 어느 음성 기술을 사용하는지를 정의하는 것은 임의의 대화에 대해 완전히 구성가능하다.Defining which speech technology the VPA uses for speech recognition and text-to-speech conversion is fully configurable for any conversation.

인공 지능 알고리즘

artificial intelligence algorithm

제약조건 타이밍에서 결과를 얻는 것에 중점을 두면, 음성 및 연결 플랫폼은 AI에 관한 애그노스틱 접근법을 취한다. 이것이, CLIPS 엔진을 사용하는 이벤트 기반 전문가 시스템으로 했을 때, 최고의 독창적인 도구를 추상적 방식으로 제작하거나 플랫폼에 통합시키는 이유이다.With a focus on getting results from constraint timing, voice and connectivity platforms take an agnostic approach to AI. This is why the best creative tools, when done with event-driven expert systems using the CLIPS engine, are built in an abstract way or integrated into the platform.

우리의 전문 지식은 자연어, 지식 그래프, 기계 학습, 소셜 인텔리전스 및 범용 AI 알고리즘들에 있다. 우리의 한 세트의 도구들은 XXXXX가 이 과학 영역에서의 최근의 진화를 계속하여 통합시킬 수 있게 하기 위한 상단 프레임워크들과 현재 이용가능한 오픈 소스 알고리즘들 사이의 링크이다.Our expertise lies in natural language, knowledge graphs, machine learning, social intelligence and general-purpose AI algorithms. Our set of tools is a link between currently available open source algorithms and the top frameworks to enable XXXXX to continue integrating recent evolutions in this scientific domain.

자연어 이해 애그노스틱 통합

Natural language understanding agnostic integration

인공 지능 알고리즘에 대해 채택된 전략과 동일한 방식으로, 음성 및 연결 플랫폼은 자연어 처리 모듈을 통합시키기 위해 애그노스틱 접근법을 취한다. 이 분야에서의 우리의 전문 지식에 기초하여, 이것은, 정확한 이해를 최적화하고 독특한 사용자 경험을 보장하기 위해, 우리의 핵심 모듈들 중 하나를 빈번히 업데이트할 수 있게 한다.In the same way as the strategy adopted for artificial intelligence algorithms, speech and connectivity platforms take an agnostic approach to integrate natural language processing modules. Based on our expertise in this field, this allows us to frequently update one of our core modules to optimize accurate understanding and ensure a unique user experience.

기술적 아키텍처technical architecture

아키텍처architecture

아키텍처 설명Architecture Description

음성 및 연결 플랫폼은 "스마트 디스패처(Smart Dispatcher)"라고 불리우는 비동기 파이프라인에 기초한다. 이는 플랫폼 및 연결된 디바이스들 전체에 걸쳐 메시지 및 사용자 컨텍스트를 전달하는 일을 맡고 있다.The voice and connectivity platform is based on an asynchronous pipeline called "Smart Dispatcher". It is responsible for delivering message and user context across the platform and connected devices.

VCP 연합 서비스는 플랫폼에 걸친 사용자 ID 관리를 맡고 있다. 이는 My XXXXX, 페이스북, 트위터, Google+ 및 Microsoft Live와 같은 숫자 및 소셜 ID에 대해 써드파티 ID 제공업자에 의존한다. 이는 또한 사용자의 자동차, 전화기, 태블릿, TV...와 같은 사용자의 모든 연결된 디바이스들을 연합시키는 내부 메커니즘을 가지고 있다.The VCP Federation service is responsible for managing user identities across platforms. It relies on third-party identity providers for numeric and social IDs such as My XXXXX, Facebook, Twitter, Google+ and Microsoft Live. It also has an internal mechanism that unites all of the user's connected devices, such as the user's car, phone, tablet, TV...

음성 및 연결 클라우드 플랫폼은 VCP 임베디드 솔루션과 협력하기 위해 "스마트 디스패처" 및 전체 동기화 메커니즘을 통해 애그노스틱 모듈식 아키텍처를 제공한다. 자동 ASR/TTS 릴레이를 사용해 ASR 또는 TTS를 기능 레벨에서 추상화할 수 있기 때문에, VCP 서버는 Nuance, Google Voice, Telisma, CreaWave 등과 같은 써드파티 ASR/TTS 제공업자에 의존한다.The Voice and Connected Cloud Platform provides an agnostic modular architecture with a "smart dispatcher" and full synchronization mechanism to work with VCP embedded solutions. Because ASR or TTS can be abstracted at the functional level using automatic ASR/TTS relay, VCP servers rely on third-party ASR/TTS providers such as Nuance, Google Voice, Telisma, CreaWave, etc.

음성 및 연결 클라우드 플랫폼은 또한 시맨틱스 도구에 의해 강화되는 대화 관리를 위해 VCP 플랫폼에 의해 제공되는 모든 기술적 블록들을 포함한다. 이벤트 기반 전문가 시스템, 센서, AI 및 사전 대응적 작업과 결합될 때, 이것은 애플리케이션을 개발하는 데 사용되는 코어 스택(core stack)을 제공한다.The Voice and Connectivity Cloud Platform also includes all the technical blocks provided by the VCP Platform for conversation management, powered by semantic tools. When combined with event-driven expert systems, sensors, AI and proactive work, it provides a core stack used to develop applications.

써드파티 데이터 제공업자는 사용자 프로파일 선호사항 또는 XXXXX 비즈니스 규칙보다 대체 시나리오 또는 규칙 기반 선택을 지원하기 위해 추상적 방식으로 포함된다. 이 진입점은 VCP가 모든 기존의 XXXXX 연결된 서비스들을 통합하고 그들을 애플리케이션 개발 레벨에 의해 이용가능할 수 있게 한다. Third-party data providers are included in an abstract way to support alternative scenarios or rule-based choices over user profile preferences or XXXXX business rules. This entry point allows the VCP to integrate all existing XXXXX connected services and make them available by the application development level.

VCP 임베디드 솔루션은 VCP 서버의 차량 대응물 부분이다. "공중을 통해" 업데이트가능하기 때문에, 이 임베디드 솔루션은 다음과 같은 것들을 제공한다:The VCP embedded solution is the vehicle counterpart part of the VCP server. Being updatable "over the air", this embedded solution provides:

- UI 전달 및 관리- UI delivery and management

- 온보드 대화 관리- Onboard conversation management

- "상실 및 복구" 연결 시나리오에 대한 컨텍스트 로깅- Context logging for "lost and recover" connection scenarios

- 로그 또는 임의의 다른 써드파티 동기화에 대한 스냅숏 관리자- Snapshot manager for log or any other 3rd party sync

차량 아키텍처에서, 임베디드 ASR 및 TTS 제공업자는 온보드 대화 관리를 위해 포함될 수 있고, 음성 및 연결 플랫폼의 컴포넌트로서 제공되지 않는다.In vehicle architectures, embedded ASR and TTS providers may be included for onboard conversation management and are not provided as components of the voice and connectivity platform.

VCP 데이터 저장소는 음성 및 연결 플랫폼의 모든 데이터 입력을 저장하고 분석하는 데 사용되는 Apache Hadoop 기반 인프라스트럭처이다. 기계 학습 또는 AI 처리를 위해 사용될 때, VCP 데이터 저장소는 분석 결과를 VCP 연합 서비스에 저장된 사용자 프로파일에 삽입하는 메커니즘을 제공한다. The VCP datastore is an Apache Hadoop-based infrastructure used to store and analyze all data inputs from voice and connectivity platforms. When used for machine learning or AI processing, the VCP data repository provides a mechanism to insert the analysis results into user profiles stored in the VCP federation service.

필드별 기술적 상세Technical details by field

음성 및 음향voice and sound

설명

Explanation

음성 및 음향 수명 주기는 최고의 사용자 경험을 생성하는 가장 중요한 상호작용들 중 하나이다. 이것은 예상되는 품질을 달성하기 위해 고 수준의 주의 및 고 수준의 컴포넌트로 취해져야 할 필요가 있다.The voice and sound lifecycle is one of the most important interactions that create the best user experience. This needs to be taken with a high level of care and high level of components to achieve the expected quality.

예상되는 품질을 얻는 것은 다수의 측면들을 결합하여 달성될 수 있다:Achieving the expected quality can be achieved by combining a number of aspects:

o 마이크로폰, 필터, 잡음 감소, 에코 소거...의 최상위 품질o Top quality of microphone, filter, noise reduction, echo cancellation...

o 다수의 ASR / TTS 제공업자(Nuance, Google, Telisma, Microsoft Speech Server ...)의 통합o Integration of multiple ASR / TTS providers (Nuance, Google, Telisma, Microsoft Speech Server...)

o 사용 사례에 관한 그 제공업자들 사이에서 전환할 수 있는 것:o What you can switch between those providers for your use case:

- ASR: 온보드, 오프보드 스트리밍 또는 오프보드 릴레이- ASR: onboard, offboard streaming or offboard relay

- TTS: 온보드, 오프보드, 정서적 내용, 혼합 연속성 모드- TTS: onboard, offboard, emotional content, mixed continuity mode

o 사용자 대화 컨텍스트에 기초한 ASR 정정 관리o ASR correction management based on user conversation context

o "실제 대화" 관리o Manage “real conversations”

xBrainSoft에서, 우리는 그 측면들을 2개의 카테고리로 분류하였다:At xBrainSoft, we grouped those aspects into two categories:

o 음성 포착부터 ASR 프로세스 종료까지o From voice capture to ASR process termination

o ASR 프로세스 이후 자연어 처리를 거쳐 자연어 이해까지o From the ASR process to natural language processing through natural language understanding

ASR 제공업자 또는 하드웨어 마이크로폰 제조가 우리의 사업 범위가 아니기 때문에, 우리는, 임의의 종류의 ASR/TTS 엔진을 통합하고 그와 통신할 수 있기 위해, 음성 관리에 대해 기술적 애그노스틱 접근법을 취하였다. 우리의 경험 및 프로젝트는 우리를 Nuance 자료 통합에 의한 VPA 프로토타입 동안 행해진 것과 같이 제약조건 환경에서의 그 기술들에 대한 높은 수준의 전문 지식에 이르게 하였다.As ASR providers or hardware microphone manufacturing are not our business scope, we have taken a technically agnostic approach to voice management, in order to be able to integrate and communicate with any kind of ASR/TTS engine. . Our experiences and projects have led us to a high level of expertise in those techniques in constrained environments, such as was done during VPA prototyping with Nuance data integration.

이 유형의 아키텍처는 우리의 파트너가 모든 유형의 ASR 또는 TTS에 대해 많은 언어들로 강력한 대화 시나리오를 신속하게 생성할 수 있게 한다. 이것은 또한 사용자 경험을 개선시키기 위해 임의의 컴포넌트를 용이하게 업그레이드하는 것을 가능하게 한다. This type of architecture allows our partners to quickly create robust dialog scenarios in many languages for any type of ASR or TTS. This also makes it possible to easily upgrade any component to improve the user experience.

제2 카테고리는 사용자 대화 컨텍스트에 기초하여 상이한 레벨들의 소프트웨어 필터들에 의해 관리된다. 대화가 단지 한 세트의 양방향 문장이 아니기 때문에, 우리는 음성 및 연결 플랫폼에서 "실제 컨텍스트 대화 이해"에 기초하여 상이한 필터들을 개발하였다. 실제 컨텍스트 대화 이해는 다음과 같은 파라미터들을 갖는 컨텍스트와 관련된 다차원 대화 흐름이다: 컨텍스트 이력, 대화 이력, 사용자 이력, 지역화, 현재 컨텍스트 도메인 등.The second category is managed by different levels of software filters based on the user conversation context. Since the dialogue is not just a set of interactive sentences, we developed different filters based on “understanding real-world context dialogue” in speech and connectivity platforms. Real Context Dialog Understanding is a context-related multidimensional dialog flow with the following parameters: Context History, Conversation History, User History, Localization, Current Context Domain, etc.

우리의 VCP 시맨틱스 도구들에 의해 강화되기 때문에, 우리는 사용자 입력의 깊은 시맨틱 이해를 달성한다.As enhanced by our VCP semantics tools, we achieve a deep semantic understanding of user input.

이 접근법은, 최종 사용자 대화 흐름의 면에서 동일한 정확한 의미를 유지하면서, "뉴스 및 음성 검색" 애플리케이션(뉴스캐스터)을 120만개의 언어 패턴 진입점으로부터 100개 미만으로 감소시킬 수 있게 하였다.This approach has enabled the "news and voice search" application (newscaster) to be reduced from 1.2 million language pattern entry points to less than 100 while maintaining the same exact semantics in terms of end-user conversation flow.

패턴의 설명에 대한 이 새로운 접근법은 많은 긍정적 측면들을 가져온다:This new approach to the explanation of patterns has many positive aspects:

o 명확화 시나리오, 오류 키워드 또는 불완전한 엔터티 추출을 단순화시킨다o Simplify disambiguation scenarios, erroneous keywords or incomplete entity extraction

o 패턴의 디버깅을 단순화시키고 자동화 도구의 생성을 가능하게 한다o Simplifies debugging of patterns and enables creation of automated tools

o 패턴의 정정 및 유지관리를 "즉각" 단순화시킨다o Simplifies pattern correction and maintenance “on the fly”

o 패턴 사전을 로딩할 메모리 자원을 최소화시킨다.o Minimize the memory resources to load the pattern dictionary.

o 언어 적응을 위한 임의의 대화 번역의 노력을 최소화시킨다o Minimize the effort of translating random conversations for language adaptation

완전한 하이브리드 및 "공중을 통한" 업데이트가능 시스템, VCP 컴포넌트 "온라인 또는 임베디드 대화 관리자"는 차량이 완전 온라인 대화 경험에의 그의 연결을 상실할 때 임베디드 대화를 관리하는 최상의 솔루션을 제공하는 것을 목표로 한다.A fully hybrid and "over the air" updatable system, VCP component "Online or Embedded Conversation Manager" aims to provide the best solution for managing embedded conversations when the vehicle loses its connection to a fully online conversational experience. .

이와 같이, 최고의 자료를 요구받을 때, 음성 및 연결 플랫폼은 지금까지 의도된 최상의 사용자 경험을 생성하는 데 가장 효율적인 것이 되도록 보장한다.As such, when the best material is required, the voice and connectivity platform ensures that it is the most efficient at creating the best user experience ever intended.

한편으로, xBrainSoft는 대화 흐름에서의 감정 분석, 소셜 및 대화 흐름으로부터 추론되는 사용자 컨텍스트에서의 사회적 및 교육적 행동 레벨 또는 VoiceXML 표준에 기초한 운율 관리와 같은 많은 연구 측면들의 부가에 의해 사용자 경험의 한계를 계속하여 밀어내고 있다.On the one hand, xBrainSoft continues to push the limits of user experience by adding many research aspects such as sentiment analysis in conversation flow, social and educational behavior level in user context inferred from social and conversation flow, or prosody management based on VoiceXML standard. is pushing it

혁신적 특징

innovative features

o ASR/TTS 제공업자의 애그노스틱 접근법 o ASR/TTS providers' agnostic approach

o 오프보드 ASR/TTS 릴레이 능력o Offboard ASR/TTS relay capability

o 온보드 대화 관리o Onboard conversation management

o 오프보드 대화 관리o Offboard conversation management

o "공중을 통한" 업데이트에 의한 하이브리드 대화 관리o Hybrid conversation management by update “over the air”

o VCP 시맨틱 도구o VCP Semantic Tools

o 대화 관리를 위한 통합 개발 환경o Integrated development environment for conversation management

예시적인 요소

exemplary elements

o 고품질 마이크로폰 및 소리 포착o High quality microphone and sound capture

o 잡음 감소, 에코 소거를 비롯한 음성 신호 처리o Speech signal processing including noise reduction and echo cancellation

o 자동 블랭크 검출(automatic blank detection)을 지원하는 마이크로폰 오디오 APIo Microphone audio API with automatic blank detection

o 온보드 및 오프보드에 대한 하나 이상의 음성 인식 엔진o One or more speech recognition engines for onboard and offboard

o 온보드 및 오프보드에 대한 하나 이상의 텍스트-음성 변환 엔진o One or more text-to-speech engines for onboard and offboard

o VCP 임베디드 솔루션o VCP Embedded Solutions

o VCP 서버o VCP Server

예시적인 제휴 파트너

Exemplary Affiliate Partners

소리 포착: Parrott 또는 NuanceSound Capture: Parrott or Nuance

음성 신호 처리: Parrott 또는 NuanceSpeech signal processing: Parrott or Nuance

ASR: Google, Nuance 또는 TelismaASR: Google, Nuance or Telisma

TTS: Nuance, Telisma 또는 CreaWaveTTS: Nuance, Telisma or CreaWave

하이브리드 구조 및 거동Hybrid structure and behavior

설명

Explanation

데이터 연결이 이용가능하지 않을 때 자율적일 수 있는 연결된 클라우드 기반 개인 어시스턴트. 목표는 언제나 사용자에게 신속하고 정확한 답변을 가져다 줄 수 있는 것이다.A connected cloud-based personal assistant that can be autonomous when a data connection is not available. The goal is to always be able to provide users with quick and accurate answers.

VCP 임베디드 솔루션은, 자동차와 같은, 임베디드 디바이스 상에서 실행되고 서버측 대응물에 연결되는 하이브리드 어시스턴트로 이루어져 있다. 임의의 사용자 요청이, 연결성과 같은 기준에 기초하여, 그 사용자 요청을 서버로 포워딩해야 하는지 여부를 결정하는 임베디드 어시스턴트에 의해 직접 핸들링된다. 이러한 방식으로, 모든 사용자 요청이 로컬적으로 또는 원격적으로 핸들링될 수 있다. 오프보드 레벨의 능력은 성능 및 사용자 경험을 향상시키기 위해 용이하게 조정될 수 있다.A VCP embedded solution consists of a hybrid assistant that runs on an embedded device, such as an automobile, and connects to a server-side counterpart. Any user request is handled directly by the embedded assistant, which determines whether or not that user request should be forwarded to the server, based on criteria such as connectivity. In this way, all user requests can be handled either locally or remotely. Offboard level capabilities can be easily tuned to improve performance and user experience.

음성 및 연결 플랫폼과 마찬가지로, VCP 임베디드 솔루션은, 데이터 연결을 필요로 함이 없이, 사용자 요청을 처리하는 진보된 자연어 처리 및 이해 능력을 제공한다. 이것은 VPA가 임의의 사용자 요청을 로컬적으로 신속하게 이해하고, 필요한 경우, 사용자에게 직접 답변 할 수 있으며, 계산을 많이 필요로 하는(heavy computational) 응답을 서버로부터 비동기적으로 가져올 수 있도록 보장한다. 연결이 없는 경우에, 사용자에게 충분히 답변하기 위해 외부 데이터가 필요한 경우(예로서, 날씨 요청), 응답은 사용자의 요청이 이행될 수 없다는 것을 사용자에게 통지하도록 되어 있다. 시나리오에 따라, VPA는, 연결이 복원되자마자 서버에 포워딩할 수 있도록, 사용자 요청을 큐잉할 수 있다.Like voice and connectivity platforms, VCP embedded solutions provide advanced natural language processing and understanding capabilities to handle user requests without requiring a data connection. This ensures that the VPA can quickly understand any user request locally and, if necessary, respond directly to the user, and asynchronously fetch a heavy computational response from the server. In the absence of a connection, if external data is needed to adequately answer the user (eg, a weather request), the response is adapted to notify the user that the user's request cannot be fulfilled. Depending on the scenario, the VPA may queue user requests so that they can be forwarded to the server as soon as the connection is restored.

음성 및 연결 플랫폼은 또한 임베디드 에이전트와 서버 사이의 완전한 컨텍스트 동기화를 제공함으로써, 데이터가 분리되는 대신에 그들 사이에서 공유되도록 한다. 데이터가 항상 최신의 것이도록 보장하기 위해 연결의 문제가 발생할 때마다 재동기화가 수행된다.The voice and connectivity platform also provides full context synchronization between embedded agents and servers, allowing data to be shared between them instead of being separated. A resynchronization is performed whenever a connection problem occurs to ensure that the data is always up-to-date.

VCP 임베디드 솔루션은 "공중을 통한" 프로세스를 통해 쉽게 업데이트 또는 교환될 수 있는 플러그인으로 이루어져 있다. 음성, IA, 대화 이해, 데이터 처리 및 사용자 인터페이스는 그러한 업그레이드가능 모듈들의 일부이다.VCP embedded solutions consist of plug-ins that can be easily updated or exchanged through an "over the air" process. Voice, IA, dialogue understanding, data processing and user interfaces are some of those upgradeable modules.

VCP 임베디드 솔루션은 또한 응답을 처리하는, AI의 일부인, 한 세트의 스크립트들로 이루어져 있다. 응답의 일관성을 보장하기 위해, 연결의 레벨이 무엇이든 간에, 이 스크립트들이 서버와 임베디드 에이전트 사이에서 동기화된다.The VCP embedded solution also consists of a set of scripts, part of the AI, that process the response. To ensure consistency of response, whatever the level of connection, these scripts are synchronized between the server and the embedded agent.

혁신적 특징

innovative features

o 사용자 인터페이스 관리자 o User Interface Manager

o 서버와 동기화된 로컬 인터페이스o Local interface synchronized with the server

o 임베디드 대화 관리자o Embedded dialog manager

- 순수 임베디드 시나리오- Pure Embedded Scenarios

- 온보드/오프보드 하이브리드 시나리오- Onboard/Offboard Hybrid Scenarios

- 순수 오프보드 시나리오- Pure Offboard Scenarios

인터넷 연결을 사용하여 또는 사용함이 없이 사용자 요청에 항상 응답하는 것Always responding to user requests with or without an internet connection

연결 상실 사용 사례에 대한 컨텍스트 동기화Context Synchronization for Loss of Connection Use Cases

예시적인 요소

exemplary elements

자동차 컴퓨터 시스템 상에서 이용가능한 Linux 플랫폼Linux platform available on car computer system

성능

Performance

효율적인 성능 위주 프로그래밍 언어 (C++)Efficient, performance-oriented programming language (C++)

대역폭과 응답 시간을 최적화하기 위한 교환되는 데이터의 고압축High compression of exchanged data to optimize bandwidth and response time

VCP 임베디드 솔루션은 Raspberry PI Model A 상에서 컴파일되고 테스트되었다:The VCP embedded solution was compiled and tested on Raspberry PI Model A:

o CPU: 700 MHz 저전력 ARM1176JZ-F 애플리케이션 프로세서o CPU: 700 MHz low-power ARM1176JZ-F application processor

o RAM: 256MB SDRAMo RAM: 256MB SDRAM

인공 지능A.I

설명

Explanation

인공 지능은 하기와 같은 많은 전문 분야들을 포함하는 대규모 영역이다 :Artificial intelligence is a large field that encompasses many areas of specialization, such as:

o 추론, 유추, 문제 해결o Reasoning, analogy, and problem solving

o 지식 그래프 발견o Discover the knowledge graph

o 이벤트 기반 전문가 시스템에 의한 계획 세우기 및 행동하기o Planning and acting by an event-based expert system

o 자연어 처리 및 시맨틱 검색o Natural Language Processing and Semantic Search

o 기계 학습, 맵 리듀스, 딥 러닝o Machine Learning, Map Reduce, Deep Learning

o 소셜 인텔리전스, 감정 분석, 사회적 행동o Social intelligence, sentiment analysis, social behavior

o 아직 발견되지 않은 다른 사용o Other uses not yet discovered

xBrainSoft는 엄청난 잠재성 범위를 인식하고 과학 상태의 현재 과제들에 직면하여 겸손하게 있다.xBrainSoft recognizes the enormous potential range and remains humble in the face of the current challenges of the state of science.

제약조건 타이밍에서 결과를 얻는 것에 중점을 두면, 음성 및 연결 플랫폼은 AI에 관한 애그노스틱 접근법을 취한다. 이것이, 독창적인 CLIPS 엔진을 사용하는 이벤트 기반 전문가 시스템으로 했을 때, 최고의 독창적인 도구를 추상적 방식으로 제작하거나 플랫폼에 통합시키는 이유이다.With a focus on getting results from constraint timing, voice and connectivity platforms take an agnostic approach to AI. This is why we create the best creative tools in an abstract way or integrate them into our platform, when done as an event-driven expert system using the unique CLIPS engine.

우리의 전문 지식은 자연어, 지식 그래프, 기계 학습, 소셜 인텔리전스 및 범용 AI 알고리즘들에 있다.Our expertise lies in natural language, knowledge graphs, machine learning, social intelligence and general-purpose AI algorithms.

우리의 한 세트의 도구들의 주요 특성은 최상위 프레임워크와 현재 이용가능한 오픈 소스 알고리즘들 사이를 결속시키는 것이라는 것이다.A key characteristic of our set of tools is that they bridge the gap between top-level frameworks and currently available open source algorithms.

그로써, xBrainSoft는 VPA 프로젝트의 예상된 시나리오들을 100% 달성할 수 있는데, 그 이유는 우리의 모듈들을 시장에서 이용가능한 임의의 다른 보다 귀중한 것으로 교체하는 것이 가능하기 때문이다.Thereby, xBrainSoft can achieve 100% of the expected scenarios of the VPA project, because it is possible to replace our modules with any other more valuable ones available in the market.

이러한 이유는 xBrainSoft가 또한, 우리의 플랫폼을 통해 이용가능한 AI의 가능성을 확장하기 위해, Kyron(실리콘 밸리, 건강 관리에 적용되는 AI, 빅 데이터 및 기계 학습), Visteon 또는 Spirops와 같은 파트너와 협력하고 있기 때문이다.This is why xBrainSoft is also collaborating with partners such as Kyron (Silicon Valley, AI Applied to Healthcare, Big Data and Machine Learning), Visteon or Spirops to expand the possibilities of AI available through our platform. because there is

혁신적 특징

innovative features

데이터를 익명화된 방식으로 외부 AI 모듈에 제공하는 능력. 외부 시스템이 그 정보를 실제 사용자에 상관시킬 수 없는 경우 적절한 레벨에서 동작할 수 있도록 사용자 또는 세션이 랜덤한 고유 번호로서 표현된다.Ability to provide data to external AI modules in an anonymized manner. A user or session is represented as a random unique number so that it can operate at an appropriate level if the external system cannot correlate that information to the real user.

xBrainSoft 또는 외부 AI 도구를 사용해 AI를 음성 및 연결 플랫폼(VCP)에 임베딩하는 애그노스틱 접근법An agnostic approach to embedding AI into voice and connectivity platforms (VCPs) using xBrainSoft or external AI tools

AI 도구로부터 다시 데이터를 얻고 보다 나은 사용자 컨텍스트 관리를 위한 사용자 프로파일을 향상시키기 위해 VCP에 의해 제공되는 VCP 연합 서비스에의 브리지Bridge to VCP federation service provided by VCP to get data back from AI tools and enhance user profile for better user context management

예시적인 요소

exemplary elements

o Apache Hadoop에 기초한 VCP 데이터 저장 o VCP data storage based on Apache Hadoop

o VCP 이벤트 기반 전문가 시스템o VCP event-based expert system

o VCP 연합 서비스o VCP Federation Service

오프보드 플랫폼 및 서비스Offboard Platforms and Services

설명

Explanation

자신의 자동차에 있는 사용자에게 제공되는 서비스를 풍부하게 하기 위해, 오프보드 플랫폼은 그의 높은 가용성 및 강력한 컴포넌트들로 인해 고수준의 연결된 기능들을 가져다준다. 사용자는 자동차 서비스에 중점을 둔 종합적이고 지능적인 생태계의 중심에 설정된다. 오프보드 플랫폼은 또한 자동차 및 연결된 서비스를 혼합하는 기능을 가져다주는 진입점이다.In order to enrich the services provided to users in their cars, the offboard platform brings a high level of connected functions due to its high availability and powerful components. Users are set at the center of a comprehensive and intelligent ecosystem focused on car services. Offboard platforms are also entry points that bring the ability to blend automotive and connected services.

오프보드 플랫폼은 브랜드의 자동차들 및 사용자의 연결된 디바이스들 모두를 지원하는 고가용성을 가진다. 이는 점점 더 많은 사용자를 핸들링하고 부하 변동을 처리하는 시간 동안 진화할 수 있다.The offboard platform has high availability to support both the brand's cars and the user's connected devices. This can evolve over time as it handles more and more users and handles load fluctuations.

그 과제들 전부에 대응하기 위해, 음성 및 연결 플랫폼은 "클라우드 내에" 또는 구내에 배포될 수 있는 클러스터링된 아키텍처를 제공한다. 클러스터링된 아키텍처를 통해 서비스 연속성을 유지하기 위해 교차 노드 연결된 디바이스(cross-nodes connected devices) 시나리오를 가능하게 하는 모든 클러스터링된 노드들은 서로를 알고 있다.To address all of those challenges, voice and connectivity platforms provide a clustered architecture that can be deployed “in the cloud” or on premises. All clustered nodes know each other, enabling a cross-nodes connected devices scenario to maintain service continuity through the clustered architecture.

음성 및 연결 플랫폼은 그의 소셜 계정 및 디바이스를 통해, 기술적 데이터 서비스부터 사용자 정보까지, 제3 데이터 서비스를 소비할 수 있는 것을 제공한다. 그 정보 모두는 "적절하고" 지능적인 시나리오를 생성하는 데 유용하다.The voice and connectivity platform offers the ability to consume third-party data services, from technical data services to user information, via their social accounts and devices. All of that information is useful for generating "appropriate" and intelligent scenarios.

기능 및 서비스의 범위는 넓고, 기술 진보로 인해 시간에 따라 진화할 것이다. 플랫폼의 아키텍처는 그의 모듈식 아키텍처에 기초하여 기존의 기능에 영향을 주지 않고 새로운 서비스/기능을 제공해야만 한다.The range of functions and services is wide and will evolve over time due to technological advances. The architecture of the platform must provide new services/functions without affecting existing functions based on its modular architecture.

혁신적 특징

innovative features

o 클라우드내 또는 구내 호스팅o In-cloud or on-premises hosting

o 고가용성 및 부하 변동에 대해 클러스터링된 아키텍처 배포로 갈 준비가 됨o Ready to go clustered architecture deployment for high availability and load fluctuations

o 클러스터링된 아키텍처를 통한 디바이스간 능력o Cross-device capabilities through clustered architecture

예시적인 요소

exemplary elements

o VCP 서버 o VCP Server

o 써드파티 데이터 제공업자o Third-party data providers

성능

Performance

서버별 5k개의 동시 연결 객체(자동차), 프로토타입은 고수준의 SLA를 보장하기 위해 3개의 서버로 된 세트를 구현하고 앞서 10k개의 동시 연결 객체를 제안할 것이다.5k concurrent connection objects per server (car), the prototype will implement a set of 3 servers to ensure a high-level SLA, and we will propose 10k concurrent connection objects earlier.

오프보드 프레임워크 및 일반적 보안Offboard frameworks and general security

설명

Explanation

써드파티 데이터 서비스 제공업자로서, XXXXX SIG가, 우리의 현재 구현된 제공업자에 부가하여, 음성 및 연결 플랫폼에 의해 사용될 수 있다. 고수준의 추상화로 인해, 우리는 상이한 써드파티 데이터 서비스 제공업자를 구현하고, VPA의 기능 일부를 업데이트하지 않고 프로젝트 수명 주기 동안 그들을 통합할 수 있다.As a third-party data service provider, the XXXXX SIG can be used by voice and connectivity platforms, in addition to our currently implemented providers. Due to the high-level abstraction, we can implement different third-party data service providers and integrate them throughout the project lifecycle without updating some of the functionality of the VPA.

음성 및 연결 플랫폼은 외부 제공업자를 통해 데이터의 고가용성을 보장하기 위해 대체 시나리오를 구현하는 수단을 제공한다. 예를 들어, 주요 제공업자가 이용가능하지 않을 때 교체하기 위한 다수의 날씨 데이터 제공업자.Voice and connectivity platforms provide a means to implement alternative scenarios to ensure high availability of data through external providers. For example, multiple weather data providers to replace when a major provider is not available.

음성 및 연결 플랫폼은 또한 제공업자 적격성을 위한 그의 전문가 시스템의 구현을 제공한다. 비즈니스 규칙에 기초하여, 시스템은 과금 최적화를 관리하는 데 도움을 준다. 시스템이 가입 요금에 기초하여 사용자 1 또는 공급자 거래 계약에 기초하여 플랫폼 1과 같이 상이한 레벨에서 사용될 수 있다.The voice and connectivity platform also provides implementation of its expert systems for provider qualifications. Based on business rules, the system helps manage billing optimization. The system can be used at different levels, such as User 1 based on subscription fees or Platform 1 based on supplier transaction agreements.

음성 및 연결 플랫폼이 HTTP API의 전체 세트에 의해 노출될 수 있기 때문에, 시스템이 임의의 종류의 머신간 네트워크에 용이하게 통합될 수 있다.Because the voice and connectivity platform can be exposed by a full set of HTTP APIs, the system can be easily integrated into any kind of machine-to-machine network.

통신 및 인증 시에, 음성 및 연결 플랫폼은 인터넷 산업에 사용되는 최신의 관행을 제공한다. CHAP(Challenge Handshake Authentication Protocol)에 대한 SSL 인증서로 모든 통신을 보호하는 것으로부터, 음성 및 연결 플랫폼은 최종 사용자 개인정보보호에 관련된 높은 보안 레벨을 보장한다.When it comes to communications and authentication, voice and connectivity platforms provide the latest practices used in the Internet industry. From securing all communications with SSL certificates to the Challenge Handshake Authentication Protocol (CHAP), the voice and connectivity platform ensures a high level of security when it comes to end-user privacy.

최종 사용자 로그인 및 패스워드가 음성 및 연결 플랫폼을 결코 통과하지 않기 때문에, 보안 및 사용자 개인정보보호가 또한 VCP 연합 서비스 ID 연관 동안 고려된다. 이러한 시스템 모두는, 예에서, ID 공급자에 의해 제공되는 토큰 기반 인증에 기초한다: 페이스북 계정의 경우, 최종 사용자는 최종 사용자 ID를 확인하고 인증 토큰을 우리에게 다시 제공하는 페이스북 서버 상에서 인증된다.Security and user privacy are also considered during VCP federated service identity association, as end user logins and passwords never pass through the voice and connectivity platform. All of these systems are based, in the example, on token-based authentication provided by the identity provider: in the case of a Facebook account, the end user is authenticated on the Facebook server, which verifies the end user identity and provides an authentication token back to us. .

VCP 임베디드 솔루션이 구축되는 방식은, 통합기(integrator)에 의해 제공되는 기본적인 기존 기능들에 의존하기 때문에, 신뢰성 또는 안전성 문제를 방지한다. 우리의 기술적 제안에서, VPA는 직접 명령을 자동차로 송신할 수 없지만, VPA는 신뢰성 및 안전성 문제를 제공하는 기본 시스템으로 명령을 송신할 수 있다.The way the VCP embedded solution is built avoids reliability or safety issues because it relies on basic existing functions provided by the integrator. In our technical proposal, the VPA cannot send commands directly to the car, but the VPA can send commands to the underlying system, which provides reliability and safety issues.

혁신적 특징

innovative features

o XXXXX 연결 서비스 API의 완전한 통합을 가능하게 하는 모듈식 아키텍처 o Modular architecture that enables full integration of XXXXX Connectivity Services APIs

o 우리의 XXXXX는 사용자가 그의 소셜 ID를 링크시킬 때 안심하도록 하는 데 도움을 주는 VCP 연합 서비스의 디폴트 ID 제공업자로서 구현될 수 있다.o Our XXXXX can be implemented as the default identity provider for the VCP federation service, helping to give users peace of mind when linking their social IDs.

o 최종 사용자 개인정보를 보호하는 고수준 보안o High-level security to protect end-user privacy

예시적인 요소

exemplary elements

o M2M 네트워크와 같은 자동차 연결을 위한 보안 인프라스트럭처o Secure infrastructure for vehicle connectivity, such as M2M networks

o VCP 연합 서비스 ID 제공업자를 구현하는 토큰 기반 인증 APIo Token-based authentication API implementing the VCP federated service identity provider

컨텍스트 및 이력 인식Context and history awareness

설명

Explanation

효율적인 컨텍스트 관리는 대화, 어시스턴트 거동 또는 기능 개인화에 필수적이다. 엔진 레벨에서 구현될 때, 사용자 컨텍스트는 향상된 개인화된 경험을 가능하게 하기 위해 음성 및 연결 플랫폼의 임의의 컴포넌트에 의해 액세스될 수 있다.Efficient context management is essential for personalizing conversations, assistant behaviors or functions. When implemented at the engine level, user context can be accessed by any component of the voice and connectivity platform to enable an enhanced personalized experience.

임의의 데이터 소스 - 차량 데이터(CAN, GPS ...), 소셜 프로파일, 외부 시스템(날씨, 교통 ...), 사용자 상호작용 ... 등 - 에 의해 확장가능하기 때문에, 사용자 컨텍스트가 또한 사전 대응적 사용 사례를 생성하기 위해 우리의 이벤트 기반 전문가 시스템에 의해 과도하게 사용된다.Because it is extensible by any data source - vehicle data (CAN, GPS...), social profiles, external systems (weather, traffic...), user interactions... etc. - the user context is also Overused by our event-based expert system to create reactive use cases.

온보드와 오프보드에 걸쳐 공유되기 때문에, 음성 및 연결 플랫폼은 두 환경 사이의 컨텍스트 재동기화에 주의를 기울인다.Because it is shared across onboard and offboard, the voice and connectivity platform pays attention to context resynchronization between the two environments.

이력 인식과 관련하여, 음성 및 연결 플랫폼은 데이터를 집계, 저장 및 분석하기 위한 완전한 솔루션을 제공한다. 그 데이터는 앞서 기술된 바와 같은 임의의 소스로부터 온 것일 수 있다.When it comes to historical recognition, voice and connectivity platforms provide a complete solution for aggregating, storing and analyzing data. The data may be from any source as described above.

분석될 때, 데이터 결과는 개인화된 경험을 전달하는 데 도움을 주기 위해 사용자 프로파일을 풍부하게 하는 데 사용된다.When analyzed, the data results are used to enrich the user profile to help deliver a personalized experience.

혁신적 특징

innovative features

엔진 특징으로서 통합되기 때문에, 사용자 컨텍스트 관리는 음성 및 연결 플랫폼에서 횡단적(transversal)이다. 이는 시스템 내의 임의의 모듈, 대화 상자, 작업 또는 규칙에서 액세스될 수 있다. 이는 또한 VCP 연합 서비스의 구현에 의해 디바이스들에 걸쳐 공유될 수 있다.Because it is integrated as an engine feature, user context management is transversal in voice and connectivity platforms. It can be accessed from any module, dialog, task or rule within the system. It can also be shared across devices by implementation of the VCP federation service.

음성 및 연결 플랫폼은 터널을 통과하여 운전하는 것과 같은 연결 문제를 핸들링하기 위해 온보드와 오프보드 사이에 전체 컨텍스트 재동기화 시스템을 제공한다.The voice and connectivity platform provides a full context resynchronization system between onboard and offboard to handle connectivity issues such as driving through tunnels.

Apache Hadoop 스택 및 도구에 기초하여, VCP 데이터 저장소는 사용자 행동주의, 습관 학습 및 임의의 다른 관련 기계 학습 분류 또는 추천 작업과 같은 기계 학습 목표를 수행할 준비가 된 인프라스트럭처를 제공한다.Based on the Apache Hadoop stack and tools, the VCP datastore provides an infrastructure ready to perform machine learning goals such as user behaviorism, habit learning, and any other related machine learning classification or recommendation tasks.

예시적인 요소

exemplary elements

o VCP 데이터 저장소 o VCP data store

o 요구사항에 기초하여 Hadoop 인프라스트럭처를 정의함o Define Hadoop infrastructure based on requirements

사전 대응성proactive

설명

Explanation

사전 대응성은 최종 사용자를 위한 보다 스마트한 애플리케이션을 생성하는 수단 중 하나이다.Proactiveness is one of the means of creating smarter applications for end users.

VC 플랫폼은 사전 대응성 관리의 2가지 상이한 레벨을 제공한다:The VC platform provides two different levels of proactive responsiveness management:

o 백그라운드 작업자(Background Worker): 주요 파이프라인에 다시 연결하고 사용자 세션과 상호작용하거나 대체 통지 도구를 사용할 수 있는 완전한 백그라운드 작업 시스템o Background Worker : A complete background work system that can reconnect to the main pipeline and interact with user sessions or use alternative notification tools.

o 이벤트 기반 전문가 시스템: 외부 센서 및 사용자 컨텍스트에 반응할 수 있는 완벽하게 통합된 비즈니스 규칙 엔진o Event-based expert system : a fully integrated business rules engine capable of reacting to external sensors and user context

VCP 연합 서비스와 결합될 때, 이는 디바이스를 넘어서는 사전 대응성의 능력을 이용한다.When combined with the VCP federation service, it exploits the ability to be proactive beyond devices.

혁신적 특징

innovative features

o 실시간으로 컨텍스트 항목에 사전 대응적으로 반응하는 이벤트 기반 전문가 시스템 o Event-based expert system that proactively responds to contextual items in real time

o 교차 디바이스 사전 대응적 경험(cross devices proactive experience)을 가능하게 하는 VCP 연합 서비스의 사용o Use of VCP federation services to enable a cross devices proactive experience

o 사전 대응적 대체 사용 사례에 대한 주요 통지 제공업자의 구현을 제공함(Google, Apple, Microsoft ...)o Provide implementations of major notification providers for proactive alternative use cases (Google, Apple, Microsoft ...)

o 기능적 시점을 바탕으로, 사전 대응성 조정의 레벨이 사용자 설정으로서 노출될 수 있다o Based on functional perspective, the level of proactive adjustment can be exposed as user settings

예시적인 요소

exemplary elements

디바이스 정보에 대한 VCP 연합 서비스VCP Federation Service for Device Information

대체 사용 사례에 대한 통지 프로세스를 지원하는 디바이스Devices that support the notification process for alternate use cases

전반적 업그레이드가능성Overall upgradeability

설명

Explanation

전반적 업그레이드가능성은 자동차 산업과 관련하여 중대한 프로세스이다. 자동차가 자동차 딜러에 그렇게 자주 가지는 않기 때문에, 전체적인 솔루션이 "공중을 통한" 업데이트의 전체 메커니즘을 제공해야 한다.Overall upgradeability is a critical process with respect to the automotive industry. Since cars don't go to car dealerships that often, the overall solution should provide a full mechanism of "over the air" updates.

음성 및 연결 플랫폼은 대화 상자 및 사용자 인터페이스를 동기화하기 위해 자신의 VCP 임베디드 솔루션으로 그 "공중을 통한"메커니즘을 이미 구현하고 있다.The voice and connectivity platform is already implementing its "over the air" mechanism with their VCP embedded solutions to synchronize dialog boxes and user interfaces.

공장 아키텍처에 기초하여, 이 "공중을 통한" 프로세스는 연결된 디바이스와 음성 및 연결 플랫폼 사이의 임의의 종류의 데이터를 관리하기 위해 확장될 수 있다.Based on the factory architecture, this “over the air” process can be extended to manage any kind of data between connected devices and voice and connected platforms.

혁신적 특징

innovative features

o 버전 관리 지원, 의존성 해결 및 통신 압축을 포함하는 확장가능한 "공중을 통한" 메커니즘 o Extensible “over the air” mechanisms that include versioning support, dependency resolution, and communication compression

o VCP 서버는 차량 수명 동안 (새로운) 모듈을 추가 또는 제거하는 것을 가능하게 하는 모듈식 아키텍처에 기초한다.o The VCP server is based on a modular architecture that makes it possible to add or remove (new) modules over the life of the vehicle.

o VCP 임베디드 솔루션은 새로운 자동차 기능 또는 메시지에 액세스할 새로운 상호운용성 기능을 추가하는 것을 가능하게 하는 플러그인 아키텍처에 기초하고 있다.o The VCP embedded solution is based on a plug-in architecture that makes it possible to add new vehicle functions or new interoperability functions to access messages.

예시적인 요소

exemplary elements

o (하드웨어 및 연결의 유형에 의존하는) 인터넷 연결o Internet connection (depending on hardware and type of connection)

내부 & 외부 연속성Internal & External Continuity

설명

Explanation

디바이스 연속성은, 음성 및 연결 플랫폼을 통해, 운전자가 자동차에서는 물론 거리에서 또는 집에서와 같은 외부에서도 가상 개인 어시스턴트에 연결할 수 있다는 것을 의미한다. 운전자는 원하는 모든 곳에서 서비스를 사용할 수 있다.Device continuity means, via voice and connectivity platforms, drivers can connect to their virtual personal assistants in the car as well as on the street or outside such as at home. Drivers can use the service wherever they want.

이 능력은 XXXXX가 자동차 내에서 그리고 그 밖에서 그의 고객과의 관계의 범위를 확장할 수 있게 한다. 브랜드는 그의 전통적인 영역을 넘어 서비스를 제공하고 참여를 유발시킬 수 있는 기회를 확장시킨다. 이와 같이, 이는 경쟁력있는 API 또는 서비스를 가져다줄 수 있는 써드파티 통신사업자와의 보다 많은 수의 잠재적인 비즈니스 파트너쉽에 대한 여지를 열어준다.This capability enables XXXXX to expand the scope of its relationships with its customers both within and outside the automobile. A brand extends its opportunities to serve and engage beyond its traditional sphere. As such, this opens the door for a greater number of potential business partnerships with third-party carriers that can deliver competitive APIs or services.

VCP 연합 서비스에 기초하여, VPA는 최종 사용자 생태계에 완전히 통합될 수 있다. 사용자의 자동차, 사용자의 다수의 디바이스들로부터 사용자의 숫자 및 소셜 ID까지, 그 생태계의 모든 입력들은 사용자의 침투적 경험을 강화시킬 수 있다.Based on the VCP federation service, VPA can be fully integrated into the end-user ecosystem. From the user's car, the user's multiple devices, to the user's number and social ID, all inputs to the ecosystem can enhance the user's intrusive experience.

혁신적 특징

innovative features

음성 및 연결 플랫폼은 모든 인식된 디바이스들로부터 액세스될 수 있는 표준 보안 프로토콜(HTTPS)을 통해 그의 서비스를 제공한다. 엔드-투-엔드 관점에서 볼 때, 음성 및 연결 플랫폼은 Android, iOS, Windows + Windows Phone 및 임베디드와 같은 모든 주요 디바이스 플랫폼을 위한 프레임워크 및 도구를 제공한다.The voice and connectivity platform provides its services via a standard secure protocol (HTTPS) that can be accessed from all recognized devices. From an end-to-end perspective, the Voice and Connectivity Platform provides frameworks and tools for all major device platforms such as Android, iOS, Windows + Windows Phone and Embedded.

VCP 연합 서비스는 사용자에게 최상의 연결된, 침투적 경험을 제공하기 위해, 사용자의 숫자 ID 및 디바이스를 집계한다. 예를 들어, VCP는 사용자 전화 상에서, 이어서 그의 자동차에서 시나리오를 시작하고 다른 디바이스에서 시나리오를 종료할 수 있다.The VCP Federation service aggregates a user's numeric ID and device to provide the user with the best connected, intrusive experience. For example, the VCP may start a scenario on the user's phone, then in his car and end the scenario on another device.

VCP 사용자 인터페이스 관리자는 웹 브라우저 API를 제공하는 임의의 디바이스 상에서 VCP 웹 객체를 다운로드, 저장 및 실행할 수 있다. 이것을 고려하면, 연결된 디바이스들 상의 애플리케이션들의 사용자 인터페이스 및 논리는 교차 플랫폼(cross platform)일 수 있고, "공중을 통해" 쉽게 업데이트가능하다. VCP 사용자 인터페이스 관리자는 또한 특정 플랫폼, 지역 또는 언어에 대해 상이한 템플릿/논리를 적용할 수 있다.VCP User Interface Administrators can download, store and run VCP web objects on any device that provides a web browser API. With this in mind, the user interface and logic of applications on connected devices can be cross platform and easily updateable “over the air”. The VCP user interface administrator can also apply different templates/logics for a particular platform, region or language.

예시적인 요소

exemplary elements

VCP 연합 서비스는 서비스 연속성의 중심에 있다.The VCP federation service is at the heart of service continuity.

연결된 디바이스들의 이질성(플랫폼, 크기, 하드웨어, 사용 ...)으로 인해, 시나리오가 대상 디바이스에 가장 적합하도록 되어야만 한다. 예를 들어, 디바이스는, 음성 사용자 인터페이스와 호환되지 않을, 마이크로폰을 갖지 않을 수 있고, 물리적 상호작용이 사용되어야만 한다.Due to the heterogeneity of the connected devices (platform, size, hardware, usage ...), the scenario must be tailored to the best fit for the target device. For example, the device may not have a microphone, which will not be compatible with the voice user interface, and physical interaction must be used.

문화 및 지리적 컨텍스트cultural and geographic context

설명

Explanation

XXXXX의 높은 국제화로 인해, VPA가 문화 또는 지리적 관점에서 사용자에게 적합하게 될 수 있다. 이것은 사용자에게 제공되는 모든 스크립트 및 인터페이스의 번역, ASR 및 TTS 제공업자의 구성, 그리고 필요한 경우 일부 시나리오의 거동의 수정을 의미한다.Due to the high internationalization of XXXXX, VPAs may become suitable for users from a cultural or geographic point of view. This means the translation of all scripts and interfaces provided to the user, the configuration of ASR and TTS providers, and, if necessary, modification of the behavior of some scenarios.

혁신적 특징

innovative features

완전한 모듈식 아키텍처에 기초하여, 음성 및 연결 플랫폼 모듈은 국제화 설정에 따라 플러깅(plug)될 수 있습니다. 이것은 지역에 따라 상이한 서비스 전달 또는 특징을 관리하는 것을 가능하게 한다.Based on a fully modular architecture, voice and connectivity platform modules can be plugged in according to internationalization settings. This makes it possible to manage different service delivery or characteristics according to region.

음성 및 연결 플랫폼은 지역 배포 또는 사용자 설정에 기초할 수 있는 ASR/TTS 제공업자 릴레이의 완전한 추상화를 제공한다. 이것은 음성 취득/재생 제공업자와 ASR/TTS 제공업자 사이의 관심사의 분리를 담당하는 자동차 또는 연결된 디바이스에 대한 음성 인식 및 음성 합성을 위한 통합된 진입점을 가능하게 한다.The Voice and Connectivity Platform provides a complete abstraction of ASR/TTS provider relays, which can be based on regional deployments or user settings. This enables a unified entry point for speech recognition and speech synthesis for automobiles or connected devices that is responsible for the separation of concerns between speech acquisition/playback providers and ASR/TTS providers.

VCP 대화 관리자 및 VCP 시맨틱 도구는 기능적 구현에 영향을 주지 않고 새로운 언어에 대한 확장성을 가능하게 하는 높은 수준의 추상화를 제공한다.The VCP dialog manager and VCP semantic tools provide a high-level abstraction that enables extensibility to new languages without affecting functional implementation.

예시적인 요소

exemplary elements

o 자신의 API를 통한 번역을 지원하는 외부 써드파티 데이터 제공업자 o External third-party data providers that support translations via their own APIs

o 선택된 언어(들)에 대한 ASR/TTS 제공업자(들)o ASR/TTS provider(s) for selected language(s)

o 예에서, VCP 연합 서비스에 대한 최종 사용자 소셜 ID를 정의함: 중국에 대해 트위터 대신에 웨이보(Weibo)o In the example, we define the end-user social ID for the VCP federation service: Weibo instead of Twitter for China.

o 최종 사용자 문화 및 지역에 맞춰 사용 사례 및 VPA 거동을 적합하게 함o Tailoring use cases and VPA behavior to end-user culture and geography

부록 Appendix D: "직접D: "directly 및 차선책 시나리오 프로세스" and "suboptimal scenario process"

우리가 Siri, Google Now, Nuance, ... 또는 다양한 실시예들에 따른 임의의 유형의 개인 어시스턴트와 같은 다른 제품에 관한 우리의 부가 가치를 발견할 범용 접근법이 기술되어 있다.A general approach is described in which we will find our added value in relation to Siri, Google Now, Nuance, ... or other products such as any type of personal assistant according to various embodiments.

범례:　Legend:

VCP = 음성 및 연결 플랫폼

VCP = Voice and Connectivity Platform

ASR = 자동 음성 인식

ASR = automatic speech recognition

TTS = 텍스트-음성 변환

TTS = text-to-speech

TUI = 터치 사용자 상호작용

TUI = touch user interaction

VUI = 음성 사용자 상호작용

VUI = Voice user interaction

NLU =　자연어 이해

NLU = natural language understanding

VCP는 동기 및 비동기이다. 이는 각각의 행동, 이벤트가 곧바로 또는 사용자로부터 요청으로부터 긴 시간 후에 실행될 수 있다는 것을 의미한다. 내가 오랜 작업 또는 장기 작업에 대한 나의 매출 보고를 매달 1일에(비동기적) 나에게 보내라고 에이전트에게 요청할 수 있고, 내가 (직접 컨텍스트를 사용해) 오늘의 날씨를 요청하고 그의 대답 직후에 내일의 날씨에 대해 질문할 수 있다.VCPs are synchronous and asynchronous. This means that each action, event can be executed immediately or long after a request from the user. I can ask the agent to send me my sales report for a long job or long job on the 1st (asynchronously) of every month, i can ask for today's weather (using direct context) and immediately after his reply i can ask for tomorrow's weather you can ask about

수명 주기(도 7을 참조)의 설명은 좌측 하단으로부터 시작하여 우측 상부로 간다.The description of the life cycle (see Fig. 7) starts from the bottom left and goes to the top right.

수명 주기life cycle

ASR 엔진:ASR Engine:

ASR(Automatic Speech Recognition) 이전에, 우리는 3가지 방법으로 ASR을 활성화시킬 수 있다:

Before Automatic Speech Recognition (ASR), we can activate ASR in 3 ways:

o ASR 자동 웨이크업 단어: 애플리케이션을 웨이크업시키고 ASR을 시작하는 데 임의의 키워드를 사용할 수 있음(Angie, Sam, ADA, ... 등)o ASR Auto Wakeup Words: Any keyword can be used to wake up the application and start ASR (Angie, Sam, ADA, ... etc.)

o ASR 사전 대응적 활성화: 내부 또는 외부 이벤트에 의존함o ASR Proactive Activation: Rely on internal or external events

타이머: 매일 타이머에 기초한 자동 웨이크업

Timer: Automatic wake-up based on a timer every day

내부 이벤트: 디바이스 컴포넌트(GPS, 가속기, ...) 또는 애플리케이션의 임의의 기능 또는 모듈로부터 임의의 내부 이벤트

Internal Events: Any internal events from any function or module of the device component (GPS, accelerator, ...) or application.

당신이 집에 위치해 있다는 것을 검출하고 당신이 무언가를 할 때 ASR(컨텍스트 프롬프트를 갖는 TTS)을 시작할 수 있다

It detects that you are located at home and can initiate ASR (TTS with Context Prompt) when you do something.

내가 내 자동차에 위치해 있을 때(내가 전원 및 OBD를 검출하기 때문에), 나는 음악을 시작하고 내비게이션을 시작하라고 당신에게 제안할 수 있다.

When I am located in my car (since I detect power and OBD), I can suggest you to start music and start navigation.

당신이 캘린더에 새로운 약속을 가지고 있을 때, 에이전트가 자동으로 시작되고 (자동차가 필요로 하는 경우) 내비게이션이 다음 회의로 가기를 원하는지를 당신에게 질문할 수 있다.

When you have a new appointment in your calendar, the agent starts automatically (if your car needs it) and the navigation can ask you if you want to go to the next meeting.

외부 이벤트: 우리는 ASR/TTS를 활성화시키기 위해 데이터베이스 또는 제3 API로부터 임의의 외부 이벤트를 검출한다.

External Events: We detect any external events from a database or 3rd party API to activate ASR/TTS.

당신이 목적지 근방에 도착할 때, 시스템은 당신이 자동차를 주차할 수 있을 때를 당신에게 알려주기 위해 외부 주차 이용가능성 API를 살펴볼 수 있다.

When you arrive near your destination, the system may look at the External Parking Availability API to inform you when you can park your car.

당신이 교통 체증에 있을 때, 시스템은 자동차에 의한 방향 수정뿐만 아니라 목적지에 어떻게 갈 것인지를 변경하고 당신에게 당신의 자동차를 주차하고 기차를 타라고 제안하는 기회를 평가할 수 있다.

When you are in a traffic jam, the system can evaluate opportunities to change how you will get to your destination, as well as change directions by the car and offer you to park your car and take the train.

o ASR 푸시 버튼: 가상 버튼(화면) 또는 (크레이들 또는 휠 버튼으로부터) 물리적 버튼 상에서의 간단한 클릭(푸시)로부터 에이전트의 활성화o ASR Push Button: Activation of an agent from a simple click (push) on a virtual button (screen) or a physical button (from a cradle or wheel button)

ASR의 활성화(음성 입력)

Activation of ASR (voice input)

ASR-NLU 전처리 = 애플리케이션의 컨텍스트에 기초하여, 우리는 (그의 신뢰도로) 문장을 취하고 자연어 이해 엔진에 보내기 이전에 그것을 재가공할 수 있다

ASR-NLU preprocessing = based on the context of the application, we can (with its confidence) take a sentence and reprocess it before sending it to the natural language understanding engine

o 우리가 전화를 걸 모듈 컨텍스트에 있다는 것을 알고 있기 때문에, 우리는 문장에서의 임의의 단어를, NLU 엔진으로 보내기 전에, 빼내거나 변경할 수 있다.o Knowing that we are in the context of the module to call, we can extract or change any word in the sentence, before sending it to the NLU engine.

o 프랑스어로, 사용자가 말할 때:o In French, when you speak:

"donne-moi l'information technologique" => ASR은 (사용자 의도를 완전히 벗어나) "Benoit la formation technologique"를 우리에게 보낼 수 있다.

"donne-moi l'information technologique"=> ASR can send us "Benoit la formation technologique" (completely out of user intent).

우리는 단어들을 고칠 수 있다: 'Benoit'를 'Donne-moi'로 그리고 'formation'를 'information'로

We can fix the words: 'Benoit' to 'Donne-moi' and 'formation' to 'information'

전처리 후에, 문장은 문장이 NLU에 의해 이해될 기회를 완전히 확장하고 사용자에 대한 행동을 생성할 것이다.

After preprocessing, the sentence will fully expand the chances that the sentence will be understood by the NLU and generate behavior for the user.

NLU 엔진:NLU Engine:

특정의 모듈을 시작하려는 사용자의 의도의 검출, 각각의 검출은 이하의 다음 장에서 설명하는 바와 같이 애플리케이션의 컨텍스트에서 기능한다.

Detection of the user's intent to start a particular module, each detection functioning in the context of an application, as described in the next chapter below.

o 샘플o sample

Gregory에게 전화해 = 전화 모듈

Call Gregory = Phone Module

Bastien에게 문자를 보내 = 메시지 모듈

Text Bastien = Messages module

o 키워드 = 모듈에서 직접 액세스하기 위한 키워드o keywords = keywords for direct access from modules

전화 = 전화에의 액세스를 제공함

Phone = Provides access to the phone

내비게이션 = 내비게이션에의 액세스를 제공함

Navigation = Provides access to navigation

o 바로 가기 = 스키마에 열거되어 있는 것과 같은 주요 작업들에 대해서만, 사용자가 애플리케이션에서의 임의의 곳으로부터 말할 수 있는 문장이다.o Shortcut = A statement that the user can speak from anywhere in the application, only for the main tasks, such as those listed in the schema.

모듈(의도)로부터 행동(기능)의 검출

Detection of actions (functions) from modules (intents)

o 샘플　o sample

전화를 거는 것 = Gregory Renard에게 전화를 거는 행동

Calling = Calling Gregory Renard

이 문장은 모듈, 행동 및 엔티티(사람 = Gregory Renard)를 검출하는 것을 가능하게 한다

This statement makes it possible to detect modules, actions and entities (person = Gregory Renard).

디폴트 모듈 리스트 = 우리는 애플리케이션이 할 수 있는 것과 할 수 없는 것을 정확히 알고 있기 때문에, 우리는 사용자가 애플리케이션이 할 수 없는 무언가를 하려고 시도하고 있다는 것을 검출할 수 있거나 우리가 ASR로부터 좋지 않은 응답을 받았을 수 있다. 이 경우에, 우리는 (전형적으로 Siri와 Google Now가 사용자를 웹 검색으로 밀어내는 경우에) 사용자의 의도의 의미를 검출하려고 시도하기 위해 디폴트 모듈을 활성화시킬 수 있다.

Default module list = Since we know exactly what the application can and cannot do, we can detect that the user is trying to do something that the application cannot do or we may have received a bad response from the ASR. can In this case, we can activate the default module to try to detect the meaning of the user's intent (typically when Siri and Google Now push the user to web searches).

o 애플리케이션에서 이용가능한 모듈들의 리스트를 사용자에게 제안하는 것(제한되지 않음, 우리는 필요한 경우 임의의 유형의 애플리케이션으로부터 모듈의 리스트를 확장할 수 있다)o Proposing to the user a list of modules available in the application (not limited, we can expand the list of modules from any type of application if necessary)

o 사용자가 틀린 무언가를 다시 말하는 경우 또는 음성 인식이 작동하지 않는 경우 = 시스템은 문장 음성 인식으로부터 숫자 인식으로 전환하라고 제안한다o If the user repeats something wrong or if speech recognition doesn't work = the system suggests switching from sentence speech recognition to number recognition

사용자가 시스템이 인식하지 못하는 무언가를 말한 경우, 시스템은 = "어떤 애플리케이션을 시작하길 원하세요"라고 말할 것이고 + 애플리케이션의 리스트를 열 것이다.

If the user said something the system doesn't recognize, the system will say = "What application do you want to start" + open a list of applications.

사용자가 시스템이 인식하지 못하는 무언가를 다시 말한 경우, 시스템은 = "당신이 원하는 애플리케이션의 번호는 무엇입니까"라고 말할 것이다(우리는 연락처, 주소, 앨범, 아티스트, 뉴스 카테고리, 메시지와 같은 임의의 유형의 리스트에서 이 작업 흐름을 사용한다)

If the user retells something the system doesn't recognize, the system will say = "what's the number of the application you want" (we can use any type like contact, address, album, artist, news category, message) use this workflow from the list in

o 사용자가 선택을 한다o The user makes a choice

시스템은 모듈에 대한 디폴트 항목 리스트를 보여주고 모듈에서 이용가능한 기능들을 (음성으로 그리고/또는 시각적으로) 제안한다. 사용자는, 이 경우에, 달성하기 위한 지침을 사용하여 선택을 할 수 있다.

The system shows a list of default items for the module and suggests (voice and/or visual) the functions available in the module. The user, in this case, can make a choice using the guidelines to accomplish.

o 리스트는 다음과 같을 수 있다:　o The list could be:

필터: Malvoisin에게 전화하는 것 => Celine을 바탕으로 필터링하는 것 = 연락처 리스트에 대한 Celine Malvoisin의 리스트를 보여주는 것

Filter: Calling Malvoisin => Filtering by Celine = Showing Celine Malvoisin's list to your contact list

글자에 의해 필터링하는 것: 임의의 리스트에 기초하여, 당신은 한 글자씩 필터를 생성할 수 있다.

Filtering by character: Based on an arbitrary list, you can create a filter by character.

사용자가: '글자 M, 글자 A, 글자 L을 바탕으로 필터링해'라고 말할 수 있다(이것은 발음할 수 없는 연락처에의 액세스를 가능하게 한다)

User can say: 'Filter based on letter M, letter A, letter L' (this allows access to non-pronounceable contacts)

항목 라벨에 있는 임의의 단어를 글자 필터링에 의해 필터링하는 것

Filtering any word in the item label by letter filtering

글자 탐색에 의해 필터링하는 것: 임의의 리스트에 기초하여, 사용자는 "글자 V로 가"라고 말할 수 있다.

Filtering by letter search: Based on an arbitrary list, the user can say "Go to letter V".

에이전트는 곧바로 글자 V로 시작하는 모든 연락처를 보여줄 것이다.

The agent will immediately show you all contacts starting with the letter V.

탐색: 사용자가 하기와 같이 리스트를 탐색할 수 있다

Navigation: The user can navigate the list as follows:

다음/이전 = 현재 리스트에서의 다음 또는 이전 항목 리스트를 보여준다

Next/Previous = Shows the list of next or previous items in the current list

처음 = 리스트에서의 첫 번째 항목을 보여준다

first = show the first item in the list

끝 = 리스트에서의 마지막 항목을 보여준다

end = show the last item in the list

o 리스트는 언제든지 읽을 수 있다:o The list can be read at any time:

항목 리스트의 임의의 화면에서, 사용자는 리스트를 읽어주라고 요청할 수 있다

On any screen of the list of items, the user can request that the list be read.

리스트는 다음과 같이 읽혀질 것이다

The list will be read as

각각의 항목이 읽혀지고 이어서 번호에 의해 사용자가 항목 번호를 기억하는 데 도움을 준다.

Each item is read and subsequently numbered to help the user memorize the item number.

이전 항목 연락처가 우리가 이미 알고 있는 부분을 통합하지 않는 경우 각각의 항목의 내용이 읽혀질 것이다.

Each item's content will be read if the previous item contact doesn't incorporate what we already know.

우리가 전화 번호 리스트에 5개의 연락처 Malvoisin(Celine에 대해 3개의 상이한 유형의 전화, Luc에 대해 1개, 그리고 Gregoire에 대해 1개)를 갖는다고 상상해보자

Imagine we have 5 contacts Malvoisin (3 different types of phones for Celine, 1 for Luc, and 1 for Gregoire) in our list of phone numbers.

에이전트는 말할 것이다: 　(에이전트가 말하고 있을 때 우리는 어떤 내용도 반복하지 않는다)

The agent will say: (we do not repeat anything when the agent is speaking)

Celine 모바일 US는 번호 1이다(Malvoisin이 없는데 그 이유는 그것이 내 요청이었고 내가 읽고 있을 때 Malvoisin 연락처를 원한다는 것을 내가 알고 있기 때문이다.

Celine Mobile US is number 1 (I don't have Malvoisin because I know that was my request and I want Malvoisin contacts when I'm reading.

집은 번호 2이다

house is number two

사무실은 번호 3이다

office is number 3

Luc 모바일은 번호 4이다.

Luc Mobile is number 4.

Gregoire 집은 번호 5이다

Gregoire house is number 5

사용자에 의한 항목 선택　

Item selection by user

o 항목 번호 선택 = 사용자가 항목 앞에 있는 숫자로부터 항목을 선택할 수 있게 한다(우리는 단지 1부터 5까지의 숫자에 대해 기능할 뿐이다)o Select item number = Allows the user to select an item from the number preceding the item (we only function for numbers 1 to 5)

o 항목 내용 선택 = 사용자가 항목의 레이블로부터 항목을 선택할 수 있게 한다(예: celine)o Select item content = Allows the user to select an item from the item's label (eg celine)

튜플 = 모듈, 기능 및 엔티티(항목 선택)의 검출 후에

tuple = after detection of modules, functions and entities (item selection)

o 시스템은 2가지 유형의 기능에 대해 처리를 실행할 수 있다o The system can execute processing for two types of functions

지식 유형 = 사용자에게 답변을 제공하는 데이터 지식(QA, 카탈로그, 위키 백과, ...)에의 액세스

Knowledge Type = Access to data knowledge (QA, Catalog, Wikipedia, ...) that provides answers to users

행동 유형 = 외부/내부 APi들에 액세스를 관리할 필요가 있음

Behavior Type = Need to manage access to external/internal APIs

이하에서 기술되는 NLU 처리의 결과에 기초하여, 시스템은 2개의 동기 요소를 발생시킨다:

Based on the results of the NLU processing described below, the system generates two synchronization factors:

o TUI = 터치 사용자 상호작용(Touch User Interaction)(사용자에 대한 화면을 임의의 유형의 애플리케이션으로서 설계)o TUI = Touch User Interaction (designing the screen to the user as any type of application)

o VUI = 음성 사용자 상호작용(Voice User Interaction)(사용자에게 추가 정보 또는 상세를 요구하거나 다른 질문을 하는 능력을 갖는 음성 피드백)o VUI = Voice User Interaction (voice feedback with the ability to ask the user for additional information or details or ask other questions)

o VUI와 TUI은 완전히 동기이고, 당신은 터치 또는 음성에 의해 기능적 작업 흐름의 다음 단계로 갈 수 있고, 둘 다 동기이다.o VUI and TUI are fully synchronous, you can go to the next level of functional workflow by touch or voice, both are synchronous.

당신이 항목을 선택하기 위해 화면 상에서 클릭하면, 당신은 다음 단계로 갈 것이고, 에이전트는 애플리케이션에서의 당신의 컨텍스트 위치를 알고 있다.

When you click on the screen to select an item, you go to the next step, and the agent knows your context location in the application.

이 컨텍스트 위치는 음성이 시각과 동기될 수 있게 한다.

This contextual location allows speech to be synchronized with time.

현재 작업 흐름에 기초하여, 에이전트는 사용자의 현재 의도를 완성하는 데 추가 정보가 필요한지를 검출하고 (TTS에 문장 피드백을 보낸 후에) ASR의 새로운 시작과 함께 그것을 요청할 수 있다

Based on the current workflow, the agent can detect if additional information is needed to complete the user's current intent (after sending sentence feedback to the TTS) and request it with a fresh start of the ASR.

o 사용자: 오늘 밤 TV에서 무엇이 나와?o User: What's on TV tonight?

o 시스템: 어느 채널에서(사용자의 의도가 TV = 모듈 및 오늘 밤 = 행동의 일부 채널 황금 시간대 오늘 밤에 의해 검출되었기 때문임)>o System: on which channel (because your intent was detected by TV = module and tonight = some channel prime time tonight in action)>

o 시스템은 그것이 행동을 완성하는 변수를 누락하고 있다는 것을 이해하고 그것에 대해 질문한다.o The system understands that it is missing a variable that completes the action and asks about it.

o 사용자: 채널 1에서o Users: on channel 1

o 시스템: 채널 1에서 황금 시간대가 있습니다.... blablablao System: there is a prime time on channel 1.... blablabla

o 사용자: 그리고 채널 2(이 경우에, 우리는 현재 의도가 무엇이었는지 및 사용자로부터 마지막 행동 = TV/오늘 밤의 황금 시간대 프로그램을 제공하는 것을 알기 위해 컨텍스트를 사용한다)o User: and Channel 2 (in this case, we use the context to know what the current intent was and the last action from the user = TV/Providing the prime time program of the night)

o 시스템: 채널 2에서 황금 시간대가 있습니다.... blibliblio System: There is a prime time on channel 2.... bliblibli

o ... 그리고 시스템은 제한 없이 이 컨텍스트로 계속될 수 있고, 우리는 이 작업 흐름을 "직접 컨텍스트"라고 부른다.o ...and the system can continue into this context without restrictions, we call this workflow "direct context ".

이전의 지점(의도/컨텍스트의 관리)에 기초하여, 우리는 상이한 유형들의 컨텍스트를 사용할 수 있다.

Based on the previous point (management of intent/context), we can use different types of context.

o 이하의 지점에서의 설명을 참조하십시오. o See the explanations at the points below.

시간 컨텍스트 매트릭스 종속성.Time Context Matrix Dependencies.

컨텍스트의 유형들로 가기 전에, 우리는 xBrainSoft로부터 VCP에서 생성된 컨텍스트를 정의할 필요가 있다.Before going to the types of context, we need to define the context created in VCP from xBrainSoft.

컨텍스트는 다음과 같다(현재의 컨텍스트로서 정의) The context is (defined as the current context)

3D 저장 매트릭스로 기능함:

Functions as a 3D storage matrix:

o 차원 1: 현재 모듈(전화 모듈)o Dimension 1: Current Module (Phone Module)

o 차원 2: 현재 행동(전화 모듈에서 전화를 거는 행동)o Dimension 2: Current behavior (action to make a call from the phone module)

o 차원 3: 현재 화면(행동의 단계, 예: 모듈 전화에서 전화를 거는 행동에 대한 연락처의 선택)o Dimension 3: Current screen (step of action, e.g. selection of a contact for the action to place a call on a module phone)

여기서 당신은 임의의 저장 사례(컨텍스트 필드)에서 임의의 레벨에서 저장 항목을 확장하는 능력을 갖는 최소 3개의 항목을 갖는 튜플(객체 유형, ID '이름' 및 값)에 의해 임의의 유형의 정보를 저장할 수 있다.

Here you can store any type of information by a tuple (object type, id 'name' and value) with at least 3 items with the ability to expand the storage item at any level in any storage instance (context field). can be saved

o 임의의 유형의 변수(int, string, Date, ...)　o Variables of any type (int, string, Date, ...)

o 임의의 유형의 직렬화가능 객체(Car Type, User Type, ...)o Serializable objects of any type (Car Type, User Type, ...)

이력을 사용할 능력을 가짐 = 4D 저장 매트릭스(컨텍스트는 시간 변수에 의해 진행 중인 작업이다)

Has ability to use history = 4D storage matrix (context is work in progress by time variable)

o 각각의 시간 상태가 단기간 및 중간 기간 동안 사용자 세션에 대해 저장된다o Each time state is stored for user sessions for short and medium durations

o 각각의 시간 상태가 장기간 동안 파일 또는 데이터베이스에 저장될 수 있다o Each time state can be stored in a file or database for a long period of time

컨텍스트는 중간 기간 및 장기간에 대한 의도 학습을 생성할 가능성을 제공하기 위해 사용자의 기능적 현재 작업 흐름과 관계가 있다.Context relates to the user's functional current workflow to provide the possibility to generate intention learning for medium and long term periods.

우리는 2개의 컨텍스트 카테고리를 가질 수 있다:We can have two context categories:

애플리케이션 컨텍스트 = 단기간, 중간 기간 또는 장기간 동안 많은 사용자들(애플리케이션의 모든 사용자들 또는 애플리케이션의 사용자들 중 일부)에 의한 전반적 컨텍스트 공유

Application context = overall context sharing by many users (all users of the application or some of the users of the application) for a short, medium or long term

세션 컨텍스트 = 고유의 사용자에 대한 컨텍스트.

Session context = Context for your own user.

컨텍스트의 유형:Type of context:

직접 컨텍스트: 이상의 설명을 참조하십시오.

Direct context : see description above.

간접 컨텍스트(임시 컨텍스트) = (직접 컨텍스트를 사용하거나 사용하지 않는) 사용자와 에이전트 사이의 임의의 질문/대답 이후에, 사용자는 자신이 직접 컨텍스트를 다시 사용할 수 있는 다른 모듈/기능으로 갈 수 있다. 그러나 이 시점 이후에, 사용자는 이하에서 기술되는 바와 같이 사용자와 시스템 간의 대화를 계속하기 위해 이전의 직접 컨텍스트 모듈에 액세스할 수 있다:

Indirect Context (Temporary Context) = After any question/answer between the user and the agent (with or without direct context), the user can go to another module/function where he can directly reuse the context. However, after this point, the user can access the previous direct context module to continue the conversation between the user and the system as described below:

o 사용자: 날씨는 어때 => 에이전트는 나에게 Palo Alto의 날씨를 제공한다(에이전트는 내 위치를 결정하고 나에게 Palo Alto의 날씨를 제공하기 위해 내 디바이스 GPS 정보를 검출했다)o user: how is the weather => agent gives me the weather in Palo Alto (agent detected my device GPS information to determine my location and give me the weather in Palo Alto)

o 사용자: 그리고 샌프란시스코에 있을 때 => 에이전트는 나의 마지막 직접 컨텍스트를 발견하고 나에게 SF의 날씨를 제공한다.o User: And when I am in San Francisco => Agent discovers my last direct context and gives me the weather in SF.

o 사용자: 거기는 몇시야 => 에이전트는 내가 의도 모듈을 변경하고자 한다는 것을 이해하고 SF의 시간(시)에 대한 질의를 완성하는 데 필요한 변수를 이전 컨텍스트로부터 다시 결정한다.o user: what time is there => the agent understands that I want to change the intent module and determines again from the previous context the variables needed to complete the query for hours (hours) in SF.

o 사용자: 그리고 내일 날씨는 어때 => 에이전트는 내가 날씨 모듈(새 의도)로 돌아가고자 한다는 것을 검출하고, 나의 마지막 날씨 질의에서 장소를 발견하고 SF의 내일 날씨를 나에게 제공한다.o User: And how is tomorrow's weather => Agent detects that I want to return to the weather module (new intent), finds the place in my last weather query and provides me with the weather for tomorrow in SF.

o // 간접 컨텍스트는 파일 또는 데이터베이스와 같은 장기 저장소에 저장하여 그 시간 동안 지속될 수 있다. 모듈에서의 임의의 행동에 임의의 단절이 있는 경우 간접 컨텍스트로 되는 직접 컨텍스트에 대해 동일하다.o // An indirect context can persist for that time by storing it in a long-term storage such as a file or database. The same is true for a direct context that becomes an indirect context if there is any break in any action in the module.

의도 학습(IL) = 이상의 설명에 기초하여, 우리는 모듈/행동/화면/작업 흐름의 각각의 튜플에 대한 가중치를 사용해 의도 학습을 어떻게 생성할지를 쉽게 상상할 수있다.

Intention Learning (IL) = Based on the explanation above, we can easily imagine how we would generate intention learning using weights for each tuple of modules/actions/screens/workflows.

o IL은 무엇인가 = 시스템이 사용자의 이력 질의, 실행된 작업 흐름 및 달성된 결과에 기초하여 사용자의 질의의 의미를 이해할 수 있는 것(개인 어시스턴트에 대한 지능적 매크로(Intelligent Macro)와 유사하다)o What is IL = the system can understand the meaning of a user's query based on the user's historical queries, the workflow executed and the results achieved (similar to an Intelligent Macro for a personal assistant)

o 샘플 1 (단순) =o Sample 1 (simple) =

시각 1

time 1

사용자: Louis에게 전화해주실 수 있으세요?

User: Could you please call Louis?

시스템: 연락처 리스트에 많은 Louis가 있습니다. 어느 Louis를 원하세요? 　(3명의 Louis가 있습니다)

System: I have a lot of Louis on my contact list. Which Louis do you want? (There are 3 Louis)

사용자: Monier(= 성)

User: Monier (= last name)

시스템: 알았습니다, Louis Monier에게 전화합니다

System: Okay, call Louis Monier.

시각 2

time 2

사용자: Louis에게 전화해!

User: Call Louis!

시스템: Louis Monier에게 전화합니다, 맞습니까?

System: I'm calling Louis Monier, right?

사용자: 예 ...시스템은 Louis Monier에게 전화합니다.

User: Yes ...The system calls Louis Monier.

사용자: 아니오 ... 맞아, 연락처 리스트에 여러 명의 Louis가 있어, ... (시스템은 시각 1의 두 번째 단계에서와 동일한 리스트를 나에게 보여준다)

User: No ... that's right, there are several Louis on the contact list, ... (the system shows me the same list as in the second step of time 1)

시각 3:

Time 3:

사용자: 지금 당장 Louis에게 전화해!

User: Call Louis right now!

시스템: Louis Monier에게 전화합니다. (시스템은 전화를 건다)

System: Call Louis Monier. (system dials)

o 샘플 2 (보다 복잡함) =　o Sample 2 (more complex) =

시각 1

time 1

사용자: Louis에게 메시지를 보내

User: Send a message to Louis

시스템 = 메시지를 보내고자 하는 연락처가 누구입니까

System = Who is the contact you want to send the message to

사용자: Louis Monier(또는 Louis + Monier를 필터링, 또는 Monier + Louis를 필터링)

User: Louis Monier (or filter Louis + Monier, or filter Monier + Louis)

시스템 = 알았습니다, 문자 또는 메일을 보내기를 원하세요?

System = Okay, want to send a text or mail?

사용자 = 문자

user = character

시스템 = 알았습니다, 어떤 유형의 콘텐츠를 보내고 싶으세요(나는 자유 형식 텍스트, 음성 텍스트의 템플릿 텍스트를 선택할 수 있습니다)

System = OK, what type of content would you like to send (I can choose free-form text, template text for spoken text)

사용자 = 자유 형식 텍스트

user = free-form text

시스템 = 보낼 메시지를 말하세요

system = say message to send

사용자가 메시지를 말한다

user speaks a message

시스템은 메시지 내용의 확인, 내용 그리고 보내기 위한 유효성 확인을 요구한다.

The system requires confirmation of message content, content and validation for sending.

사용자는 예라고 말하거나 연락처, 메시지를 변경하거나, 더 많은 내용 메시지를 추가하거나 취소할 수 있다.

Users can say yes, change contacts, messages, add or cancel more content messages.

시각 2

time 2

사용자: Louis에게 메시지를 보내

User: Send a message to Louis

시스템 = 보낼 메시지를 말하세요

system = say message to send

사용자가 메시지를 말한다

user speaks a message

o 샘플 3 (보다 복잡함)o Sample 3 (more complex)

시각 N

time N

사용자 = 매주 수요일, 사용자는 오후 6시에 사무실을 떠날 때 집으로 가기 전에 아내에게 전화를 한다 (아이들을 스포츠에 갈 준비를 하라고 요청함)

User = Every Wednesday, when he leaves the office at 6pm, he calls his wife before going home (requesting the kids to get them ready for sports)

그것을 위해, 사용자는 아내에게 전화하기 위해 상기 #1과 같은 샘플을 사용할 필요가 있다

For that, the user needs to use a sample like #1 above to call his wife.

시스템은 많은 정보 소스 = BT 데이터, OBD 데이터(또는 사용하지 않음), 시간(오후 6시), 지역화(집에 있지 않음), 이력 의도 (의제 미리 알림 + 지오펜싱 등)를 사용하고 있다.

The system is using many information sources = BT data, OBD data (or not used), time (6pm), localization (not at home), historical intent (agenda reminder + geofencing, etc).

시스템 = 사용자가 자신의 자동차에 도착할 때(자동차 BT 연결 또는 OBD 연결에 의해 검출됨) 그리고 x분(사용자가 자동차에 들어가는 데 걸리는 평균 시간) 후에　

System = when the user arrives in his car (detected by the car BT connection or OBD connection) and after x minutes (average time it takes the user to enter the car)

시스템이 자동으로 사용자에게 돌아와서 말한다:

The system automatically returns to the user and says:

시스템: "Greg, 집으로 내비게이션을 시작하고 아내에게 전화해주기를 원하세요."

System: "Greg, you want to start navigating home and call your wife."

사용자: 그래 => Celine Malvoisin에 대한 통화 행동이 시작된다.

User: Yes => Call action for Celine Malvoisin begins.

사용자: 아니오 => 에이전트는 아무것도 하지 않고 의도 학습 항목의 다운그레이드를 통지한다.

User: no => The agent does nothing and notifies the downgrade of the intent learning item.

일 실시예에서, IL은 사용자와의 ASR 상호작용을 제한하고 에이전트가 실행할 필요가 있는 임의의 행동에 대해 달성할 시간을 최적화하도록 생성되었다. IL은 현재 컨텍스트에 기초하여 일반 작업 흐름 실행을 저장하고 그 자체로 발견할 수 없는 파라미터들을 요구한다.In one embodiment, the IL was created to limit ASR interactions with the user and optimize the time to accomplish for any action the agent needs to execute. The IL stores a generic workflow execution based on the current context and requires parameters that cannot be discovered by themselves.

나는, 다음 주에 배포할 것과 같은, 시스템의 IL의 많은 다른 샘플을 가지고 있습니다.... 나는 프랑스 남자이고 영어 ASR 시스템은 (내 프랑스어 억양에 대한) 내 음성을 잘 인식하지 않으며, 시스템을 사용하여 영어로 당신에게 텍스트를 보내고자 하는 경우에, 나는 샘플 2를 사용할 수 있고, 당신에게 문자를 보내기 직전에, 나는 영어로 된 텍스트를 번역하라고 요청할 수 있으며(당신이 원하는 경우 당신을 위한 데모를 가지고 있다), 시스템은 내 프랑스어 문장을 영어로 번역하고 당신에게 보낼 것입니다. 동시에, 시스템은 당신이 영어를 말하고 있다는 것을 이해할 것이고, 당신으로부터 임의의 메시지에 대해 (기본적으로) 영어로 TTS를 사용할 것입니다(유효성 확인 이전에 당신은 나에게 영어로 텍스트를 보냅니다). // 복잡한 작업을 그렇게 쉽게 해킹할 수 있다니 재미있다; p = 음성에 의한 실시간 텍스트 번역.I have many different samples of the system's IL, like the one I'll be deploying next week.... I'm a French guy and the English ASR system doesn't recognize my voice (for my French accent) well, and I use the system. so if you want to send you text in english, i can use sample 2, just before i text you, i can ask you to translate the text in english have), the system will translate my French sentences into English and send them to you. At the same time, the system will understand that you are speaking English and will (by default) use TTS in English for any messages from you (before validation you send me texts in English). // It's funny how complex tasks can be hacked so easily; p = real-time text translation by voice.

다른 흥미로운 점은 우리가 작업 흐름의 애플리케이션에서의 임의의 장소로부터 임의의 키워드 또는 바로 가기 문장에 우선순위를 부여하기 위해 컨텍스트 또는 의도를 분리할 수 있다는 것이다.Another interesting thing is that we can separate context or intent to give priority to any keyword or shortcut sentence from anywhere in the application of the workflow.

부록 E: 컨텍스트 Appendix E: Context

컨텍스트: 기존의 개인 어시스턴트의 현재 상태Context: the current state of an existing personal assistant

오늘날, 개인 어시스턴트는, 주로 개인 어시스턴트가 사용자의 문장을 이해하고 단어를 잘 인식하려고 하는 것을 돕기 위해, 제1 레벨의 컨텍스트를 가지고 있다. 이하의 샘플은 개인 어시스턴트가 어떻게 동작하는지를 설명한다Today, personal assistants have a first level of context, primarily to help the personal assistant understand the user's sentences and try to recognize words well. The sample below illustrates how Personal Assistant works

나는 Renaud에게 전화하고자 한다 => 성

I would like to call Renaud => Last name

나는 Renault로 드라이브 중이다 => 자동차 브랜드

I am driving with Renault => car brand

시스템이 어느 [Renaud,Renault]를 해석하고 사용자에게 다시 보내야 할 필요가 있는지를 정의하기 위한 관계 및 컨텍스트 정의가 있다. 컨텍스트는 또한 날씨가 어때요 ... 그리고 내일과 같은 특정의 경우에 사용된다(컨텍스트 변수와 같은 지역화, 그러나 그것이 2개의 단계 사이에 공유되는 간단한 지역화 변수를 갖는 프로세스에 불과할 수 있다).There are relationship and context definitions to define which [Renaud,Renault] the system needs to interpret and send back to the user. Context is also what the weather is like... and is used in certain cases like tomorrow (localization like context variable, but it could just be a process with a simple localization variable shared between the two steps).

과제　assignment

개인 어시스턴트에서의 주요 과제는 사용자와 에이전트 간의 실제 대화 교환을 생성하는 것이다.The main task in personal assistants is to create real conversational exchanges between users and agents.

이러한 측면을 이해하기 위해, 우리는 "실제 대화"의 자격을 이해할 필요가 있다:To understand this aspect, we need to understand the qualifications of "real conversation":

(질문 대답이 아니라) 임의의 사람 논의와 같은 대화 관리를 계속하는 것

Continuing conversation management, such as discussing random people (rather than answering questions)

o Yahoo에 관한 정보를 질문하는 능력... 설립자는 누구인가, 주식은 어떤가 및 뉴스(에이전트는 주제를 기억한다)o Ability to ask information about Yahoo... who the founders are, what's the stock and news (agents remember topics)

컨텍스트 대화 정보 메모리: 단기간, 중간 기간 또는 장기간 동안

Contextual conversation information memory : for short, medium or long periods of time

o 논의 흐름에서의 정보를 기억하는 능력o Ability to remember information in the flow of discussion

프로세스 작업 흐름 메모리의 컨텍스트 상태: 단기간, 중간 기간 또는 장기간 동안

Context state of process work flow memory : for short, medium or long term

o 장래에 언제든지 프로세스 또는 작업 흐름을 계속하는 능력을 제공하기 위해 (행동을 야기하기 위해 또는 그렇지 않기 위해) 당신이 프로세스 또는 논의 작업 흐름에서 어디에 있는지(단계)를 기억하는 능력o Ability to remember where you are (steps) in a process or discussion workflow (to cause an action or not) to provide the ability to continue the process or workflow at any time in the future

그것 이외에, 우리는 사용자와 교환하기 위해 에이전트에 의해 사용되는 언어의 발전을 가져올 필요가 있다. 그리고 그것에 덧붙여, 우리는 에이전트로부터 공감의 인식을 줄 필요가 있다.Other than that, we need to bring about the evolution of the language used by agents to exchange with users. And in addition to that, we need to give the agent a sense of empathy.

xBrainSoft에 의한 일반 컨텍스트 관리General context management by xBrainSoft

컨텍스트는, 마지막 통화 동안 설명한 바와 같이, 4개의 컴포넌트들로 구성된다:The context, as described during the last call, consists of four components:

1. 컨텍스트 클라이언트측 홀더(CCSH) 1. Context Client-Side Holder (CCSH)

이 제1 컴포넌트는 서버측과 공유하기 위해 클라이언트측(로봇, 스마트폰, 차량, 집, ...)으로부터 컨텍스트 작업흐름의 클라이언트 저장, 사용 및 정의(값)를 가능하게 한다. CCSH는 클라이언트측으로부터 컨텍스트 작업흐름의 값을 생성, 사용 및 정의하고 이를 이하의 CSP를 통해 송신하는 API를 갖는 Fx이다.This first component enables the client storage, use and definition (value) of the context workflow from the client side (robot, smartphone, vehicle, home, ...) for sharing with the server side. CCSH is an Fx with an API that creates, uses, and defines values of a context workflow from the client side and sends them through the CSP below.

2. 컨텍스트 동기화 프로토콜(CSP) 2. Context Synchronization Protocol (CSP)

이 제2 컴포넌트는 현재 컨텍스트의 상태 또는 서브상태의 각각의 속성(변수)에 대한 키 액세스(컨텍스트 ID)의 프로토콜 (표준화)을 정의하고, 키 액세스의 포맷과 존재를 검증한다. 그들은 간단한 텍스트 변수(이름/값) 또는 자신의 인스턴스를 갖는 특정의 객체일 수 있다. CSP의 목표는 통신 프로토콜이고, 에이전트(클라이언트/서버)의 양측에서의 2 프레임워크 구현에 의해 구축하는 것이며, 클라이언트와 서버 사이의 올바른 프로토콜 통신을 검증하고 컨텍스트 정보가 잘 전달되고 동기화되도록 하는 일을 맡고 있다.This second component defines the protocol (standardization) of key access (context ID) for each attribute (variable) of the state or substate of the current context, and verifies the format and existence of the key access. They can be simple text variables (name/value) or specific objects with instances of themselves. The goal of CSP is to be a communication protocol, build by a two-framework implementation on both sides of the agent (client/server), to validate correct protocol communication between client and server, and to ensure that contextual information is well communicated and synchronized. is in charge

3. 컨텍스트 에이전트 - 서버측 홀더(CA) 3. Context Agent - Server-Side Holder (CA)

이 제3 컴포넌트는 CSP를 통해 클라이언트측과 공유하기 위해 서버측(온라인 서버)으로부터 컨텍스트 작업흐름의 서버 저장, 사용 및 정의(값)를 가능하게 한다. CA는 서버측으로부터 컨텍스트 작업흐름의 값을 생성, 사용 및 정의하고 이를 이상의 CSP를 통해 송신하는 API를 갖는 Fx이다.This third component enables server storage, use and definition (values) of context workflows from the server side (online server) to share with the client side via CSP. A CA is an Fx with an API that creates, uses, and defines the values of a context workflow from the server side and sends them through the above CSP.

4. 컨텍스트 엔진 4. Context Engine

이 마지막 컴포넌트는 (임의의 지원을 바탕으로) 데이터 저장소에서의 변수 공유 레벨 및 중장기 세션을 가능하게 한다.This last component (based on any support) enables a level of variable sharing in the data store and medium to long term sessions.

단기 저장소는 클라이언트측과 서버측 사이에서 공유되는 현재 세션에 의해 관리된다.Short-lived storage is managed by the current session shared between the client side and the server side.

이는 토픽의 컨텍스트 유형의 분류 또는 유형을 정의할 수 있다(변수는 간단한 변수 또는 직렬화된 객체 + 값(들)일 수 있다).It can define a classification or type of context type of a topic (a variable can be a simple variable or a serialized object + value(s)).

1. 현재 사용자 프로파일 = 사용자 프로파일에 관한 임의의 정보(페이스북 프로파일, 앱 프로파일, ...)1. Current User Profile = Any information about the user profile (Facebook Profile, App Profile, ...)

2. 현재 모듈 = 모듈(전화, 메시지, 내비게이션, 뉴스, ...)에 관한 임의의 정보2. Current module = any information about the module (phone, messages, navigation, news, ...)

3. 현재 기능 = 기능(전화를 거는 것, 전화를 받는 것, 문자를 보내는 것, 뉴스를 읽는 것, 뉴스를 공유하는 것, ...)에 관한 임의의 정보3. Current function = any information about the function (to make a call, to receive a call, to send a text, to read the news, to share the news, ...)

1. Louis Monier에게 전화하는 것을 위해 Louis에 전화하는 것은 Louis = Louis Monier라는 것을 학습한 중기/장기 컨텍스트 엔진으로부터 로딩될 수 있다.1. Calling Louis for calling Louis Monier can be loaded from a medium/long term context engine that has learned that Louis = Louis Monier.

4. 현재 화면 = 사용자에게 현재 보여주는 화면에 관한 임의의 정보4. Current screen = Random information about the screen currently shown to the user

5. 커스텀 데이터 = 개발자가 자신이 원하는 임의의 측면에서 컨텍스트를 사용하게 하는 API(새로운 컨텍스트 형상)5. Custom Data = API that allows developers to use context in any aspect they want (new context geometry)

6. 작업 흐름 이력 = 보여진 또는 보여줄 화면, 특정의 단계에서의 변수 값, 작업 흐름 상태, ...에 관한 정보를 갖는 사용자의 작업 흐름에서의 위치에 관한 임의의 정보6. Workflow history = any information about the user's position in the workflow with information about the screens shown or to be shown, the values of variables at specific steps, the status of the workflow, ...

1. 내가 페이스북 상에서 뉴스를 공유하라고 요청하고, 내가 "계속해"라고 말한 후에, 에이전트는 현재 카테고리에 대한 뉴스의 리스트에서 다음 뉴스로 갈 것이다. 에이전트는 컨텍스트로부터 현재 카테고리, 그가 있었던 뉴스 읽기에서의 단계 ...를 알고 사용자가 요구하는 올바른 의도를 나에게 보낼 수 있다1. After I ask to share news on Facebook, and I say "continue", the agent will go to the next news in the list of news for the current category. The agent knows from the context the current category, the stage in reading the news he was in... and can send me the correct intent the user is asking for.

프로세스process

1. 음성 및 연결 플랫폼은 동기 및 비동기 모드에서 동작하고 있고, 우리는 클라이언트측과 서버측 사이의 컨텍스트의 완벽한 동기화를 언제든지 검증할 필요가 있다.1. The voice and connectivity platform is operating in both synchronous and asynchronous modes, and we need to verify the perfect synchronization of the context between the client side and the server side at any time.

2. 각각의 모듈, 기능, 화면, 애플리케이션, 세션 또는 임의의 상태 등이 클라이언트와 서버 사이에서 공유될 고유 ID(컨텍스트 ID)로 식별될 필요가 있다.2. Each module, function, screen, application, session or any state, etc. needs to be identified with a unique ID (context ID) to be shared between the client and server.

3. 컨텍스트 ID(정보 저장 메모리) 및 그의 값은 에이전트의 양측(클라이언트/서버) 상에 저장되고, 각각의 상호작용에서 양측 사이에서 동기화된다.3. The context ID (information storage memory) and its values are stored on both sides of the agent (client/server), and are synchronized between both sides in each interaction.

4. 컨텍스트 ID는 하기의 것들을 가능하게 한다:4. Context IDs enable:

1. 변수(간단한 변수 또는 객체)의 값에 기초하여 필터 및 컨텍스트와 관련된 행동을 생성하는 것 만약 ... 그러면 ... 그렇게 ... 1. Creating a filter and context-related action based on the value of a variable (simple variable or object) if ... then ... so ...

2. 중기 또는 장기 저장소에서 단기 메모리에 로딩할 필요가 있는 정보를 찾는 것(또는 전세계 사용자 거동/애플리케이션 레벨, 요청된 값에 대한 확률로부터 기계 학습에 의해)2. Finding information that needs to be loaded into short-term memory from medium- or long-term storage (or by machine learning from global user behavior/application level, probabilities for requested values)

3. 작업 흐름에서 우리가 있는 단계를 아는 것(또는 전세계 사용자 거동, 다음 단계에 대한 확률로부터 기계 학습에 의해)3. Knowing which stage we are in the workflow (or by machine learning from global user behavior, probabilities for the next stage)

4. ... 이 혁신으로부터 우리가 발견하는 다른 것4. ...what else we discover from this innovation

그것이 동작하는 방법(수명 주기)How It Works (Life Cycle)

임의의 ASR 이후 그리고 NLU 프로세스 직전에, 디바이스는 문장 메시지와 함께 디바이스로부터 현재 컨텍스트 ID를 갖는 숨겨진 부분을 송신하고 있다.

After any ASR and just before the NLU process, the device is sending a hidden part with the current context ID from the device with a sentence message.

에이전트는 임의의 자연어 이해를 실행하기 전에 키 액세스(컨텍스트 ID)를 찾고 있다

The agent is looking for key access (context ID) before performing any natural language understanding.

o 에이전트는 내용을 보고 현재 컨텍스트에 대한 행동과 이해의 전세계 언어 사전을 필터링한다.o Agents view content and filter the global linguistic dictionary of actions and understandings for the current context.

에이전트는 컨텍스트 이해 시에 NLU 프로세스를 시작한다.

The agent starts the NLU process upon understanding the context.

o 행동은 시작하는 것(API 액세스 또는 지식 액세스)이다o Action is what starts (API access or knowledge access)

o 에이전트는 사용자의 질의의 의미를 해석한다 ... (이전의 메일을 참조)o Agent interprets the meaning of the user's query... (see previous mail)

디바이스에 대답을 주는 것(또는 임의의 종류의 끝 지점) 이전에,　

Before giving the device an answer (or any kind of endpoint),

o 에이전트는 (HTML 페이지에 대한 헤더와 같이) 숨겨진 부분에 있는 답변 메시지를 통해 새로운 컨텍스트(모듈/기능/화면)를 송신한다o Agent sends new context (module/function/screen) via reply message in hidden part (such as header for HTML page)

o 새로운 컨텍스트는 많은 변수로부터 정의될 수 있다:o New contexts can be defined from many variables:

종단점 유닛에서의 현재 화면

Current screen on endpoint unit

현재 모듈, 기능

Current module, function

문장, 대화 및 사용자의 선택 작업 흐름.

Sentences, conversations, and user selection workflows.

에이전트는 사용자에게 렌더링하기 위해 디바이스(종단점)로 송신할 답변(음성, 화면, 정보를 갖는 패키지)을 병합한다.

The agent merges the answers (packages with voice, screen, information) to send to the device (endpoint) for rendering to the user.

클라이언트측은 패키지를 실행하고 현재 컨텍스트를 저장한다.

The client side runs the package and saves the current context.

o 컨텍스트가, 홈 화면의 경우에, 임의의 화면, 기능 또는 모듈 ...로부터 강제될 수 있고, 우리는 컨텍스트를 강제로 리셋시키며 사용자가 에이전트와 깨끗한 상호작용으로부터 시작하게 한다.o Context can be enforced from any screen, function or module ... in the case of the home screen, we force reset the context and let the user start from a clean interaction with the agent.

서버와 클라이언트(종단점) 사이의 컨텍스트 충돌의 경우에, 클라이언트(종단점: 디바이스, 차량, 집)가 마스터인데, 그 이유는 클라이언트가 사용자(실제 마스터)의 행동을 나타내기 때문이다.In the case of a context conflict between a server and a client (endpoint), the client (endpoint: device, vehicle, home) is the master, since the client represents the behavior of the user (the actual master).

사용 샘플:Samples used:

사용자가 말할 때 선택할 Louis를 컨텍스트화한다: 나는 (Louis의 이력 전화 거동에 기초하여) Louis에게 전화하고자 한다 => Louis Monier에게 전화한다

Contextualize which Louis to choose when the user speaks: I want to call Louis (based on Louis' history calling behavior) => Call Louis Monier

실행할 프로세스를 컨텍스트화한다: Louis에게 메시지를 보낸다　

Contextualize the process to run: send a message to Louis

o 시스템은 메시지 = 이메일, Louis = Louis Monier라는 것을 알고 있다.o The system knows that message = email, Louis = Louis Monier.

o 음성 바로 가기 ...를 가능하게 하고 Louis Monier에게 이메일을 보내기 위해 작업 흐름에서의 2개의 단계를 잘라낸다.o Enable voice shortcuts... and cut two steps in the workflow to email Louis Monier.

실행할 다음 단계를 컨텍스트화한다: 많은 세션에서, 나는 뉴스 순서 = 환경, 정치 및 스포츠를 요구한다. 내가 다음 번에 환경을 요청할 때, 에이전트는 정치 및 스포츠 뉴스를 읽을 것을 당신에게 제안할 것이다.

Contextualize the following steps to execute: In many sessions, I ask for news order = environment, politics and sports. The next time I request the environment, the agent will offer you to read political and sports news.

애플리케이션 전역 예측 작업 흐름에 따라 다음 단계를 컨텍스트화한다.

Contextualize the following steps according to the application-wide prediction workflow.

요청된 작업을 컨텍스트화하고 그것이 현재 컨텍스트를 목표로 하지 않는다는 것을 이해하며 이전 행동에 대해 그것을 사용할 수 있다.

You can contextualize the requested operation, understand that it does not target the current context, and use it for previous actions.

o 나는 뉴스의 리스트를 읽고 있고, 나는 날씨를 요청하고 있으며, 나는 "계속해"라고 말하고, 에이전트는 다음 뉴스로 간다.o I am reading a list of news, I am requesting the weather, I say “go on” and the agent goes to the next news.

뮤지컬 뉴스 또는 당신의 전화에서의 음악일 수 있는 뉴스의 컨텍스트에서 질문하는 "음악" ...과 같은 특정의 단어를 컨텍스트화한다.

Contextualize certain words like "music"... you are asking in the context of the news, which could be musical news or music on your phone.

o 음악 컨텍스트로부터, 디바이스의 음악 트랙에 액세스하는 것이 명백하다.o From the music context, it is clear to access the device's music tracks.

o 뉴스 컨텍스트에서, 그것은 뉴스의 음악 재생에 대한 것일 수 있고, 에이전트는 이해하고 더 정확하게 질문하기 위해 사용자에게 돌아온다.o In a news context, it may be about a music reproduction of the news, and the agent returns to the user to understand and ask more precisely.

o 사용자가 뉴스 컨텍스트에서 음악을 재생하라고 말하는 경우, 에이전트는 사용자가 뉴스를 읽고 싶지 않다는 것을 이해한다.o If the user tells you to play music in the news context, the agent understands that the user does not want to read the news.

우리는 현재 컨텍스트를 알고 있기 때문에, 우리는 임의의 입력 음성 인식을 컨텍스트화하고 문장의 의미를 이해하려고 하기 전에 문장에서의 단어를 변경할 수 있고 ... 또는 반대로 임의의 행동을 시작하기 위해 특정의 컨텍스트에서 이용가능한 어휘를 확장할 수 있다.

Because we know the current context, we can contextualize any input speech recognition and change words in a sentence before trying to understand the meaning of the sentence... or conversely, to initiate an arbitrary action. It is possible to expand the vocabulary available in the context.

o 제2 효과는 우리가 행동을 검증하기 위해 많은 패턴들을 생성할 필요가 없다는 것이다(예: 음악을 재생하는 행동을 시작하는 루트 화면의 컨텍스트에서 짧거나 긴, 임의의 문장에서 음악이 포착될 수 있다)o A second effect is that we do not need to generate many patterns to validate the behavior (e.g. music can be captured in arbitrary sentences, short or long, in the context of the root screen that initiates the behavior playing music). have)

o 제3 효과는 번역에 대한 것인데, 그 이유는 당신이 각각의 컨텍스트 모듈/기능/화면에 대해 사용자에 의해 의도된 행동을 포착하는 키워드를 제한할 수 있기 때문이다.o A third effect is for translation, because for each context module/function/screen you can limit keywords that capture the action intended by the user.

TV의 컨텍스트에서의 재생은 게임을 플레이하거나 TV 프로그램을 재생하는 것이다

Playback in the context of TV is playing a game or playing a TV show

스포츠의 컨텍스트에서의 플레이는 새로운 게임을 플레이하는 것이다

Playing in the context of sports is playing a new game

디스코테크의 컨텍스트에서의 재생은 음악을 재생하는 것이다

Playing in the context of a discotheque is playing music

... 1개의 단어, 컨텍스트에 의존하는 많은 의도는 ...임의의 언어로 번역하기 쉽다

...one word, many intentions dependent on context ...easy to translate into any language

o 제4 효과는 임의의 에이전트의 지원인데, 그 이유는 사전이 매우 제한될 수 있기 때문이다.o A fourth effect is the support of arbitrary agents, since dictionaries can be very limited.

뉴스캐스터의 경우에, 우리는 "뉴스"(+ 동의어) 및 뉴스 토픽 엔티티를 포착한다.

In the case of a newscaster, we capture "news" (+ synonyms) and news topic entities.

작업 우선순위의 파이프라인의 생성

Creating a pipeline of task priorities

o 나는 현재 연락처에 대한 메시지를 작성하고 있다(일반적으로, 나는 행동을 끝내고자 한다)o I am currently composing a message for a contact (usually, I want to end an action)

o 나는 이 시간 동안 연락처로부터 텍스트를 수신하고, 시스템은 현재 컨텍스트를 보고 사용자가 메시지를 작성 중에 있을 때를 알며, 그는 현재 행동을 중단할 필요가 없다o I receive text from the contact during this time, the system sees the current context and knows when the user is composing a message, he does not need to interrupt the current action

o 에이전트는 메시지의 파이프라인을 생성하고, 메시지 생성 컨텍스트의 끝에서, 에이전트는 (컨텍스트가 변하고 있을 때) 나에게 메시지를 읽으라고 제안할 것이다o The agent creates a pipeline of messages, and at the end of the message creation context, the agent will offer me to read the message (when the context is changing).

컨텍스트에 따라 임의의 메시지의 번역

Translation of arbitrary messages according to context

o 나는 Mark에 대한 메시지를 작성하고(그는 영어를 말하고 나는 프랑스어로 메시지를 작성함), 시스템은, 메시지의 컨텍스트에 기초하여, 그가 그것을 번역하기 위해 보내기 전에 수신자의 언어를 알고 있는지를 확인해야 할 필요가 있다는 것을 알고 있다.o I write a message to Mark (he speaks English and I write the message in French), and the system, based on the context of the message, has to make sure he knows the recipient's language before sending it to translate it. know you need to

콘텍스트 작업 흐름은 사용자 세션의 시작부터 종료까지의 프로세스 작업 흐름에서의 컨텍스트 매트릭스(모듈, 기능, 화면)의 상태이다. 우리는 컴퓨터가 의도 학습으로부터 집단 지성(collective intelligence)(숫자 직관 발생)으로부터 직관을 생성할 수 있게 하는 시스템을 제작하였다. The context workflow is the state of the context matrix (module, function, screen) in the process workflow from the beginning to the end of a user session. We have built a system that allows computers to generate intuitions from collective intelligence (generating numerical intuitions) from intention learning.

전술한 것에 관해 몇가지를 살펴보면:A few things to note about the above:

설명된 바와 같이, 우리는 동기 및 비동기 모드에서 작업하고 있다.

As explained, we are working in synchronous and asynchronous modes.

o 이 2개의 경로는 비동기 모드에 대한 사전 대응성 이상의 것을 가능하게 하는 데 사용된다.o These two paths are used to enable more than proactive response to asynchronous mode.

o 양측이 대화에 대한 양측에서의 상태 모두가 어디에 있는지를 알 수 있게 한다.o Allow both sides to know where both sides of the conversation are.

수명 주기에 대한 부가 기능:

Additional features for lifecycle:

o 제1 지점에 대해:　또한 ASR로부터뿐만 아니라, 애플리케이션 탐색(촉각적 상호작용) 동안에도 보낼 수 있다.o For the first point: 　 can also be sent not only from ASR, but also during application discovery (tactile interaction).

o 제5 지점에 대해: 패키지가 모든 또는 일부 콘텐츠와 함께 송신 수 있습니다　o About point 5: Packages can be sent with all or some content

우리는 모든 요소를 음성을 통합하지 않고 보낼 수 있고, 이 경우에, 에이전트는 전체 렌더링 및 컨텍스트의 생성/편집을 관리할 것이다.

We can send all elements without voice integration, in this case the agent will manage the entire rendering and creation/editing of the context.

Claims

A computer-implemented method comprising:
detecting an event;
in response to detecting the event, proactively initiating a conversation of a voice assistant on a first user device with a user;
in response to initiating a conversation with the user, receiving, at the first user device, a first audio input associated with the conversation from the user requesting a first action;
performing automatic speech recognition on the first audio input;
determining, at the first user device, a first context of the user;
determining a first tuple describing a user intent, the first tuple comprising the first action and an actor associated with the first action, the first tuple being based on automatic speech recognition of the first audio input determined by performing natural language understanding -;
initiating the first action on the first user device based on the first tuple;
after initiating the first action, receiving a second audio input from the user requesting a second action unrelated to the first action;
initiating the second action;
After initiating the second action, at a second user device different from the first user device, receive a third audio input from the user continuing the conversation and requesting a third action related to the first action wherein the third audio input is missing information for completing a third tuple for initiating the third action;
obtaining missing information using the first context to complete the third tuple associated with the third action; and
initiating the third action at the second user device based on the third tuple.
A computer-implemented method comprising:

The method of claim 1 , wherein the event is an internal event.

The method of claim 1 , further comprising initiating the voice assistant without user input and receiving the first audio input from the user after initiation of the voice assistant.

The method of claim 1 , wherein the first context comprises one or more of a context history, a conversation history, a user profile, a user history, a location, and a current context domain.

The method of claim 1 , wherein the missing information is one or more of the third action, an actor associated with the third action, and an entity associated with the third action.

According to claim 1,
determining that the first context and the first audio input are missing first information used to initiate the first action;
determining which information is the missing first information; and
prompting the user to provide an audio input providing the missing first information;
A computer-implemented method further comprising:

According to claim 1,
determining that first information used to initiate the first action cannot be obtained from the first audio input;
determining which information is the missing first information; and
prompting the user to provide an audio input providing the missing first information that cannot be obtained from the first audio input;
A computer-implemented method further comprising:

According to claim 1,
determining that first information used to initiate the first action cannot be obtained from the first audio input;
determining which information is the missing first information;
providing a plurality of options for selection by the user, the options providing potential information for completing the first action; and
receiving an audio input for selecting a first option from the plurality of options;
A computer-implemented method further comprising:

The method of claim 1 , wherein the second action not associated with the first action is associated with a second context, and wherein the first action and the third action are associated with the first context. .

As a system,
one or more processors; and
memory to store instructions
wherein the instructions, when executed by the one or more processors, cause the system to:
detecting an event;
in response to detecting the event, proactively initiating a conversation of the voice assistant on the first user device with the user;
in response to initiating a conversation with the user, receiving, at the first user device, a first audio input associated with the conversation from the user requesting a first action;
performing automatic speech recognition on the first audio input;
determining, at the first user device, a first context of a user;
determining a first tuple describing a user intent, the first tuple comprising the first action and an actor associated with the first action, the first tuple being based on automatic speech recognition of the first audio input determined by performing natural language understanding -;
initiating the first action on the first user device based on the first tuple;
after initiating the first action, receiving a second audio input from the user requesting a second action unrelated to the first action;
initiating the second action;
After initiating the second action, at a second user device different from the first user device, receive a third audio input from the user continuing the conversation and requesting a third action related to the first action wherein the third audio input is missing information for completing a third tuple for initiating the third action;
obtaining missing information using the first context to complete the third tuple associated with the third action; and
initiating the third action on the second user device based on the third tuple.
A system for performing steps comprising:

11. The system of claim 10, wherein the event is an internal event.

11. The method of claim 10, wherein the instructions, when executed by the one or more processors, cause the system to:
initiate the voice assistant without user input and receive the first audio input from the user after initiation of the voice assistant.

The system of claim 10 , wherein the first context comprises one or more of a context history, a conversation history, a user profile, a user history, a location, and a current context domain.

The system of claim 10 , wherein the missing information is one or more of the third action, an actor associated with the third action, and an entity associated with the third action.

11. The method of claim 10, wherein the instructions, when executed by the one or more processors, cause the system to:
determine that the first context and the first audio input are missing first information used to initiate the first action;
determine which information is missing first information;
prompt the user to provide an audio input providing the missing first information.

11. The method of claim 10, wherein the instructions, when executed by the one or more processors, cause the system to:
determine that first information used to initiate the first action cannot be obtained from the first audio input;
determine which information is missing first information;
prompt the user to provide an audio input providing the missing first information that cannot be obtained from the first audio input.

11. The method of claim 10, wherein the instructions, when executed by the one or more processors, cause the system to:
determine that first information used to initiate the first action cannot be obtained from the first audio input;
determine which information is missing first information;
providing a plurality of options for selection by the user, the options providing potential information for completing the first action;
receive an audio input selecting a first option from the plurality of options.

The system of claim 10 , wherein the second action not associated with the first action is associated with a second context, and wherein the first action and the third action are associated with the first context.

delete