CN113748458A - Hub device, multi-device system including hub device and plurality of devices, and operating method thereof - Google Patents

Hub device, multi-device system including hub device and plurality of devices, and operating method thereof Download PDF

Info

Publication number
CN113748458A
CN113748458A CN202080031822.XA CN202080031822A CN113748458A CN 113748458 A CN113748458 A CN 113748458A CN 202080031822 A CN202080031822 A CN 202080031822A CN 113748458 A CN113748458 A CN 113748458A
Authority
CN
China
Prior art keywords
text
determination model
function determination
information
hub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080031822.XA
Other languages
Chinese (zh)
Inventor
李连浩
朴相昱
吕国珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority claimed from PCT/KR2020/005704 external-priority patent/WO2020222539A1/en
Publication of CN113748458A publication Critical patent/CN113748458A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/081Search algorithms, e.g. Baum-Welch or Viterbi
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/227Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology

Abstract

A hub device, a multi-device system including the hub device, and methods of operating the same may include: converting, by the hub device, the received voice input to text; identifying, by the hub device, a device capable of performing an operation corresponding to the text; identifying which device stores a function determination model corresponding to a device capable of performing an operation corresponding to a text, from among a hub device and a plurality of other devices connected to the hub device; and determining that the device of the model is a different device than the hub device based on the identified storage function, sending at least a portion of the text to the identified device, wherein the hub device includes a hardware processor.

Description

Hub device, multi-device system including hub device and plurality of devices, and operating method thereof
Technical Field
The present disclosure relates to a hub device, a multi-device system comprising a hub device and a method of operating the hub device. For example, according to embodiments, a hub device may determine, from among the hub device itself and a plurality of other electronic devices, an operation performing device (e.g., an internet of things (IoT) device) for performing an operation according to a user's intent (e.g., included in voice input received from the user) in a multi-device environment. According to an embodiment, the hub device may control the determined operation performing device.
Background
With the development of multimedia technology and network technology, users can receive various services by using a device IoT device, such as an air purifier or a Television (TV). In particular, with the development of voice recognition technology such as virtual personal assistant technology, a user may input voice (e.g., an utterance) to a device (e.g., a listening device) and may receive a response message to the voice input through a service providing agent (e.g., a virtual assistant).
However, in a multi-device system such as a home network environment including a plurality of IoT devices, when a user wants to receive a service by using an unregistered IoT device that has not yet been registered to interact through voice input or the like, the user has to inconveniently register the IoT device (e.g., including selecting the IoT device to provide the service). In particular, since types of services provided by a plurality of IoT devices are different, a technology capable of recognizing an intention included in a voice input of a user and efficiently providing the corresponding service is required.
In order to recognize the intention included in the user's voice input, Artificial Intelligence (AI) technology may be used, and rule-based Natural Language Understanding (NLU) technology may also be used. When a user's voice input is received through the hub device, the user may have to pay a network usage fee because the hub device may not directly select a device for providing a service according to the voice input and has to control the device by using a separate voice assistant service providing server, and a response speed is reduced because the voice assistant service providing server is used.
Disclosure of Invention
Solution to the problem
The present disclosure relates to a hub device, a multi-device system including the hub device and a plurality of devices, and an operating method thereof, and more particularly, to a hub device, a multi-device system, and an operating method thereof, the hub device receiving a voice input of a user, automatically determining a device for performing an operation according to the user's intention based on the received voice input, and providing a plurality of pieces of information required to perform a service according to the determined device.
Additional aspects will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the embodiments presented in this disclosure.
Drawings
The above and other aspects, features and advantages of certain embodiments of the present disclosure will become more apparent from the following description when taken in conjunction with the accompanying drawings, in which like reference characters identify like structural elements, and wherein:
fig. 1 is a block diagram illustrating some elements of a multi-device system including a hub device, a voice assistant server, an internet of things (IoT) server, and a plurality of devices, in accordance with an embodiment of the present disclosure;
fig. 2 is a block diagram illustrating elements of a hub device according to an embodiment of the present disclosure;
FIG. 3 is a block diagram illustrating elements of a voice assistant server according to an embodiment of the present disclosure;
fig. 4 is a block diagram illustrating elements of an IoT server in accordance with an embodiment of the present disclosure;
FIG. 5 is a block diagram illustrating some elements of a plurality of devices according to an embodiment of the present disclosure;
FIG. 6 is a flow diagram of a method performed by a hub device to control devices based on voice input in accordance with an embodiment of the present disclosure;
FIG. 7 is a flow diagram of a method performed by a hub device to provide at least a portion of text to one of the hub device, the voice assistant server, and the operation performing device according to a voice input of a user in accordance with an embodiment of the present disclosure;
fig. 8 is a flow diagram of a method of operating a hub device, a voice assistant server, an IoT server, and an operation execution device in accordance with an embodiment of the present disclosure;
FIG. 9 is a flow chart of a method of operating a hub device and operating an execution device according to an embodiment of the present disclosure;
FIG. 10 is a flow chart of a method of operating a hub device and operating an execution device according to an embodiment of the present disclosure;
fig. 11 is a flow diagram of a method of operating a hub device, a voice assistant server, an IoT server, a third party IoT server, and a third party device, in accordance with an embodiment of the present disclosure;
Fig. 12A is a conceptual diagram illustrating the operation of a hub device and a plurality of devices according to an embodiment of the present disclosure;
fig. 12B is a conceptual diagram illustrating the operation of a hub device and a plurality of devices according to an embodiment of the present disclosure;
fig. 13 is a flowchart illustrating a method in which a hub device determines an operation performing device based on a voice signal received from a listening device and transmits text to a device storing a function determination model corresponding to the operation performing device according to an embodiment of the present disclosure;
fig. 14 is a flowchart illustrating a method in which a hub device sends text to a device storing a function determination model corresponding to an operation execution device according to an embodiment of the present disclosure;
FIG. 15 is a flow chart illustrating a method of operating a hub device, a voice assistant server and a listening device according to an embodiment of the present disclosure;
FIG. 16 is a flow chart illustrating a method of operating a hub device, a voice assistant server, a listening device and an operation execution device according to an embodiment of the present disclosure;
fig. 17 is a diagram illustrating an example of an operation execution apparatus update function determination model according to an embodiment of the present disclosure;
fig. 18 is a flow diagram illustrating a method of operating a hub device, a voice assistant server, an IoT server, and an operation execution device in accordance with an embodiment of the present disclosure;
Fig. 19 is a flow diagram illustrating a method of operating a hub device, voice assistant server, IoT server, and new device in accordance with an embodiment of the present disclosure;
FIG. 20 is a diagram illustrating a multi-device system environment including a hub device, a voice assistant server, and a plurality of devices; and
fig. 21A and 21B are diagrams illustrating a voice assistant model that may be executed by a hub device and a voice assistant server according to embodiments of the present disclosure.
Best mode for carrying out the invention
According to an embodiment of the present disclosure, a method performed by a hub device for controlling a device based on a voice input includes: receiving a voice input of a user; converting the received speech input into text by performing Automatic Speech Recognition (ASR); determining an operation execution device based on the text by using a device determination model; identifying a device storing a function determination model corresponding to the determined operation performing device from among a plurality of devices connected to the hub device; and providing at least a portion of the text to the identified device.
The device determination model may include a first Natural Language Understanding (NLU) model configured to analyze the text and determine the operation performing device based on an analysis result of the text.
The function determination model may include a second NLU model configured to analyze at least a part of the text and obtain operation information related to an operation to be performed by the determined operation performing device based on a result of the analysis of the at least a part of the text.
The method may further include obtaining information on a function determination model stored in at least one device from the at least one device storing the function determination model among the plurality of devices, wherein the function determination model is used to determine a function related to each of the plurality of devices.
The identification of the device may include: based on the obtained information on the function determination model, a device storing the function determination model corresponding to the determined operation execution device is identified.
According to another embodiment of the present disclosure, a hub device for controlling a device based on a voice input includes: a communication interface configured to perform data communication with at least one of a plurality of devices, a voice assistant server, and an internet of things (IoT) server; a microphone configured to receive a voice input of a user; a memory configured to store a program comprising one or more instructions; and a processor configured to execute one or more instructions of the program stored in the memory, wherein the processor is further configured to execute the one or more instructions to convert a speech input received through the microphone into text by performing Automatic Speech Recognition (ASR), determine an operation performing device from among the plurality of devices based on the text by using the device determination model, recognize a device storing a function determination model corresponding to the determined operation performing device by using the function determination device determination module, and control the communication interface to provide at least a portion of the text to the recognized device.
The device determination model may include a first Natural Language Understanding (NLU) model configured to analyze the text and determine the operation performing device based on an analysis result of the text.
The function determination model may include a second NLU model configured to analyze at least a part of the text and obtain operation information related to an operation to be performed by the determined operation performing device based on a result of the analysis of the at least a part of the text.
The processor may be further configured to execute the one or more instructions to control the communication interface to obtain, from at least one device of the plurality of devices that stores a function determination model for determining a function associated with each of the plurality of devices, information about the function determination model stored in the at least one device.
The processor may be further configured to execute the one or more instructions to identify a device storing the function determination model corresponding to the determined operation performing device based on the obtained information on the function determination model.
According to another embodiment of the present disclosure, a method of operating a system including a hub device and a first device storing a function determination model, the method includes: receiving, by the hub device, a voice input of a user; converting the received speech input to text by performing Automatic Speech Recognition (ASR) using data about an ASR module stored in a memory of the hub device; determining a first device as an operation execution device based on the text by using data on a device determination model stored in the memory of the hub device; obtaining, by the hub device from the first device, information about the function determination model stored in the first device; and transmitting, by the hub device, at least a portion of the text to the first device based on the obtained information about the function determination model.
The device determination model may include a first Natural Language Understanding (NLU) model configured to analyze text and determine a first device as an operation execution device from among the plurality of devices based on a result of the analysis of the text.
The function determination model may include a second NLU model configured to analyze at least a portion of the text received from the hub device and obtain operation information related to an operation to be performed by the first device based on a result of the analysis of the at least a portion of the text.
The method may further comprise: analyzing, by the first device, at least a portion of the text by using the second NLU model of the function determination model, and obtaining operation information related to an operation to be performed by the first device based on a result of the analysis of the at least a portion of the text.
The method may further comprise: generating, by the first device, a control command for controlling an operation of the first device based on the operation information; and performing, by the first device, the operation based on the control command.
According to another embodiment of the present disclosure, a multi-device system includes a hub device and a first device storing a function determination model, wherein the hub device includes: a communication device configured to perform data communication with a first device; a microphone configured to receive a voice input of a user; a memory configured to store a program comprising one or more instructions; and a processor configured to execute one or more instructions of the program stored in the memory, wherein the processor is further configured to execute the one or more instructions to convert a speech input received through the microphone into text by performing Automatic Speech Recognition (ASR), determine the first device as an operation performing device based on the text by using the device determination model, control the communication interface to obtain information about the function determination model stored in the first device from the first device, and control the communication interface to transmit at least a part of the text to the first device based on the obtained information about the function determination model.
The device determination model may include a first Natural Language Understanding (NLU) model configured to analyze text and determine a first device as an operation execution device from among the plurality of devices based on a result of the analysis of the text.
The first device may include a communication interface configured to receive at least a portion of the text from the hub device, wherein the function determination model includes a second NLU model configured to analyze the at least a portion of the received text and obtain operation information related to an operation to be performed by the first device based on a result of the analysis of the at least a portion of the text.
The first device may further include a processor configured to analyze at least a portion of the text by using the second NLU model, and obtain operation information to be performed by the first device based on a result of the analysis of the at least a portion of the text.
The processor of the first device may be further configured to control at least one element of the first device to generate a control command for controlling an operation of the first device based on the operation information, and to perform the operation based on the control command.
According to another embodiment of the present disclosure, a method of controlling a device performed by a hub device includes: receiving a voice signal from a listening device; converting the received speech signal into text by performing Automatic Speech Recognition (ASR); analyzing the text by using a first Natural Language Understanding (NLU) model, and determining an operation performing device corresponding to the analyzed text by using a device determination model; identifying a device storing a function determination model corresponding to the determined operation performing device from among the determined operation performing device and the listening device; and providing at least a portion of the text to the identified device.
The device determination model may include a first NLU model configured to analyze text and determine an operation execution device based on an analysis result of the text.
The function determination model may include a second NLU model configured to analyze at least a part of the text and obtain operation information related to the operation to be performed by the determined operation performing device based on a result of the analysis of the at least a part of the text.
The method may further include determining whether the determined operation performing device is the same as the listening device.
The device that recognizes the storage function determination model based on the determination operation performing device being the same as the listening device may include: obtaining function determination model information regarding whether the listening device stores the function determination model in the internal memory; and determining the listening device as a device storing the function determination model based on the obtained function determination model information.
Based on determining that the operation performing device is a different device than the listening device, identifying the device storing the function determination model may include: obtaining function determination model information as to whether the determined operation performing device stores the function determination model in the internal memory; and determining whether the operation execution apparatus is an apparatus storing the function determination model based on the obtained function determination model information.
The method may further comprise: receiving update data for the device determination model from the voice assistant server; and updating the device determination model by using the received update data.
The update data may include data for updating the device determination model based on update information of the function determination model included in at least one of the operation performing device or the listening device to determine an updated function from the text and to determine an operation performing device corresponding to the updated function.
The method may further comprise: receiving device information of a new device from a voice assistant server, wherein the device information of the new device includes at least one of device identification information of the new device, storage information of a device determination model, and storage information of a function determination model; and updating the device determination model by adding the new device to device candidates that can be determined by the device determination model as the operation performing device, using the received device information of the new device.
According to another embodiment of the present disclosure, a hub device for controlling a device includes: a communication interface configured to perform data communication with at least one of a voice assistant server and a plurality of devices including a listening device; a voice signal receiver configured to receive a voice signal from the listening device; a memory configured to store a program comprising one or more instructions; and a processor configured to execute one or more instructions of the program stored in the memory, wherein the processor is further configured to convert the received speech signal into text by performing Automatic Speech Recognition (ASR), analyze the text by using a first Natural Language Understanding (NLU) model, and determine an operation performing device corresponding to the analyzed text by using a device determination model, recognize a device storing a function determination model corresponding to the determined operation performing device from the determined operation performing device and the listening device, and transmit at least a portion of the text to the recognized device by using the communication interface.
The device determination model may include a first NLU model configured to analyze text and determine an operation execution device based on an analysis result of the text.
The function determination model may include a second NLU model configured to analyze at least a part of the text and obtain operation information related to the operation to be performed by the determined operation performing device based on a result of the analysis of the at least a part of the text.
The processor may be further configured to determine whether the determined operation performing device is the same as the listening device.
The processor may be further configured to obtain function determination model information as to whether the listening device stores the function determination model in the internal memory based on the determination operation performing device being the same as the listening, and determine the listening device as a device storing the function determination model based on the obtained function determination model information.
Based on the determination that the operation performing device is a different device from the listening device, the processor may be further configured to obtain function determination model information regarding whether the determined operation performing device stores the function determination model in the internal memory, and determine whether the operation performing device is a device storing the function determination model based on the obtained function determination model information.
The processor may be further configured to receive update data for the device determination model from the voice assistant server by using the communication interface, and update the device determination model by using the received update data.
The update data may include data for updating the device determination model based on update information of the function determination model included in at least one of the operation performing device and the listening device to determine an updated function from the text and to determine an operation performing device corresponding to the updated function, as described below.
The processor may be further configured to update the device determination model by receiving device information of the new device including at least one of device identification information of the new device, storage information of the device determination model, and storage information of the function determination model from the voice assistant server using the communication interface, and by adding the new device to a device candidate that can be determined as an operation execution device by the device determination model using the received device information of the new device.
According to an embodiment of the present disclosure, a method may include: based on the user's voice input received by the hub device: converting, by the hub device, the received speech input into text by performing Automatic Speech Recognition (ASR); identifying, by the hub device, a device capable of performing an operation corresponding to the text; identifying, from the hub device and a plurality of other devices connected to the hub device, which device stores a function determination model corresponding to a device capable of performing an operation corresponding to the text; and determining that the device of the model is a different device than the hub device based on the identified storage function, sending at least a portion of the text to the identified device, wherein the hub device includes a hardware processor.
The method may further include analyzing the text using a first Natural Language Understanding (NLU) model, and determining a device capable of performing an operation based on a result of the analysis of the text, wherein the NLU is a device determination model.
The method may further include analyzing at least a portion of the text using a second Natural Language Understanding (NLU) model included in the device determination model, and obtaining operation information related to an operation corresponding to the text based on a result of the analysis of the at least a portion of the text.
The method may further include obtaining, from at least one device storing the function determination model, information about the function determination model stored in the at least one device.
Identifying which device storage function determination model may include: based on the obtained information on the function determination model, a device storing the function determination model corresponding to the identified device capable of performing the operation is identified.
According to an embodiment, a hub device for controlling a device based on a voice input may comprise: a communication interface configured to perform data communication with at least one of a plurality of devices, a voice assistant server, and an internet of things (IoT) server; a microphone configured to receive a voice input of a user; a memory configured to store a program comprising one or more instructions; and a processor configured to execute one or more instructions of a program stored in the memory to: recognizing a device capable of performing an operation corresponding to a text by converting a voice input received through a microphone into the text by performing Automatic Speech Recognition (ASR); identifying, from the hub device and a plurality of other devices connected to the hub device, which device stores a function determination model corresponding to a device capable of performing an operation corresponding to the text; and based on the identified device storing the function determination model being a different device than the hub device, control the communication interface to send at least a portion of the text to the identified device storing the function determination model, wherein the hub device includes a hardware processor.
The processor may be further configured to identify a device capable of performing an operation corresponding to the text using a device determination model including a first Natural Language Understanding (NLU) model configured to analyze the text, and determine the device capable of performing the operation corresponding to the text based on a result of the analysis of the text.
The function determination model may include a second NLU model configured to analyze at least a part of the text and obtain operation information related to an operation to be performed by a device capable of performing an operation corresponding to the text based on a result of the analysis of the at least a part of the text.
The processor may be further configured to execute the one or more instructions to control the communication interface to obtain, from at least one device of the plurality of devices that stores a function determination model for determining a function associated with each of the plurality of devices, information about the function determination model stored in the at least one device.
The processor may be further configured to execute the one or more instructions to identify which device stores the function determination model corresponding to the device capable of performing the operation corresponding to the text based on the obtained information about the function determination model.
According to an embodiment, a method of operating a system, the system comprising a hub device and a first device storing a function determination model, the method comprising: based on the user's voice input received by the hub device: converting, by the hub device, the received speech input into text by performing Automatic Speech Recognition (ASR) using data about an ASR module stored in a memory of the hub device; identifying, by the hub device, the first device as a device capable of performing an operation corresponding to the text by using the data stored in the memory of the hub device; obtaining, by a hub device from a first device, information about a function determination model stored in the first device; and transmitting, by the hub device, at least a portion of the text to the first device based on the obtained information about the function determination model.
The method may further comprise: the text is analyzed using a first Natural Language Understanding (NLU) model, and a first device is determined from among the plurality of devices as a device capable of performing an operation corresponding to the text based on a result of the analysis of the text.
The method may further include performing an analysis using a function determination model, including: analyzing at least a portion of the text received from the hub device using a second Natural Language Understanding (NLU) model included in the device determination model, and obtaining operation information related to an operation to be performed by the first device based on a result of the analysis of the at least a portion of the text.
The method may further comprise: analyzing, by the first device, at least a portion of the text by using the second NLU model of the function determination model, and obtaining operation information related to the operation to be performed by the first device based on a result of the analysis of the at least a portion of the text.
The method may further comprise: generating, by the first device, a control command for controlling an operation of the first device based on the operation information; and performing, by the first device, an operation based on the control command.
According to an embodiment, a multi-device system may include a hub device and a first device storing a function determination model, wherein the hub device includes: a communication interface configured to perform data communication with a first device storing a function determination model; a microphone configured to receive a voice input of a user; a memory configured to store a program comprising one or more instructions; and a processor configured to execute one or more instructions of a program stored in the memory to: the method includes converting a voice input received through a microphone into text by performing Automatic Speech Recognition (ASR), recognizing a first device as a device capable of performing an operation corresponding to the text, controlling a communication interface to obtain information about a function determination model stored in the first device from the first device, and controlling the communication interface to transmit at least a portion of the text to the first device based on the obtained information about the function determination model, wherein a hub device includes a hardware processor.
The processor may be further configured to analyze the text by using a device determination model including a first Natural Language Understanding (NLU) model, identify the first device as a device capable of performing an operation corresponding to the text, and determine a first device among the plurality of devices as a device capable of performing the operation corresponding to the text based on a result of the analysis of the text.
The first device may include a communication interface configured to receive at least a portion of the text from the hub device, wherein the function determination model includes a second NLU model configured to analyze the at least a portion of the received text and obtain operation information related to an operation to be performed by the first device based on a result of the analysis of the at least a portion of the text.
The first device may further include a processor configured to analyze at least a portion of the text by using the second NLU model, and obtain operation information to be performed by the first device based on a result of the analysis of the at least a portion of the text.
The processor of the first device may be further configured to control at least one element of the first device to generate a control command for controlling an operation of the first device based on the operation information, and to perform the operation based on the control command.
According to an embodiment, a method may include: based on receiving the voice signal from the listening device by the hub device: converting, by the hub device, the received speech signal into text by performing Automatic Speech Recognition (ASR); analyzing a text by using a first Natural Language Understanding (NLU) model, and identifying a device capable of performing an operation corresponding to the analyzed text by using a device determination model; identifying which device stores a function determination model corresponding to a device capable of performing the operation from among the device capable of performing the operation and the listening device; and sending at least a portion of the text to the identified device.
The method may further include analyzing the text using a first NLU model included in the device determination model, and determining a device capable of performing an operation based on a result of the analysis of the text.
The method may further include analyzing at least a portion of the text using a second Natural Language Understanding (NLU) model included in the device determination model, and obtaining operation information related to an operation to be performed by the device capable of performing the operation based on a result of the analysis of the at least a portion of the text.
The method may also include determining whether the device capable of performing the operation is the same as the listening device.
The method may further include identifying a device storing the function determination model based on determining that the device capable of performing the operation is the same as the listening device, including: obtaining function determination model information regarding whether the listening device stores the function determination model in the internal memory; and determining the listening device as a device storing the function determination model based on the obtained function determination model information.
The method may further include identifying a device storing the function determination model based on determining that the device capable of performing the operation is a different device than the listening device, including: obtaining function determination model information as to whether a device capable of performing an operation stores a function determination model in an internal memory; and determining whether the device capable of performing the operation is a device storing the function determination model based on the obtained function determination model information.
The method may further comprise: receiving update data for the device determination model from the voice assistant server; and updating the device determination model by using the received update data.
The update data may include data for updating the device determination model to determine an updated function from the text and determining a device capable of performing an operation corresponding to the updated function based on update information of the function determination model included in at least one of the device capable of performing the operation and the listening device.
The method may further comprise: receiving device information of a new device from a voice assistant server, the device information of the new device including at least one of device identification information of the new device, storage information of a device determination model, and storage information of a function determination model; and updating the device determination model by adding the new device to device candidates of devices that can be determined by the device determination model as devices capable of performing an operation using the received device information of the new device.
According to an embodiment, a hub device may comprise: a communication interface configured to perform data communication with at least one of a voice assistant server and a plurality of devices including a listening device; a voice signal receiver configured to receive a voice signal from a listening device; a memory configured to store a program comprising one or more instructions; and a processor configured to execute one or more instructions of a program stored in the memory to: the method includes converting a received speech signal into text by performing Automatic Speech Recognition (ASR), analyzing the text by using a first Natural Language Understanding (NLU) model, and identifying a device capable of performing an operation corresponding to the analyzed text by using a device determination model, identifying which device stores a function determination model corresponding to the device capable of performing the operation from among the device capable of performing the operation and a listening device, and transmitting at least a portion of the text to the identified device storing the function determination model by using a communication interface.
The device determination model may include a first NLU model configured to analyze text and determine a device capable of performing an operation based on a result of the analysis of the text.
The function determination model may include a second NLU model configured to analyze at least a part of the text and obtain operation information related to an operation to be performed by the operation-capable device based on a result of the analysis of the at least a part of the text.
The processor may also be configured to determine whether the device capable of performing the operation is the same as the listening device.
The processor may be further configured to: based on determining that the device capable of performing the operation is the same as the listening device, the processor is further configured to obtain function determination model information regarding whether the listening device stores the function determination model in the internal memory, and determine the listening device as the device storing the function determination model based on the obtained function determination model information.
The processor may be further configured to: based on the determination that the device capable of performing the operation is a different device from the listening device, the processor is further configured to obtain function determination model information regarding whether the device capable of performing the operation stores the function determination model in the internal memory, and determine whether the device capable of performing the operation is a device storing the function determination model based on the obtained function determination model information.
The processor may be further configured to: the method further includes receiving update data for the device determination model from the voice assistant server using the communication interface, and updating the device determination model using the received update data.
The update data may include data for updating the device determination model to determine an updated function from the text and determining a device capable of performing an operation corresponding to the updated function based on update information of the function determination model included in at least one of the device capable of performing the operation and the listening device.
The processor may be further configured to: the device determination model is updated by receiving device information of a new device including at least one of device identification information of the new device, storage information of the device determination model, and storage information of the function determination model from the voice assistant server using the communication interface, and adding the new device to a device candidate that can be determined as a device capable of performing an operation by the device determination model using the received device information of the new device.
According to an embodiment, a method may include: based on the detection of the user's voice by the hub device: converting, by the hub device, the received speech input into text by performing Automatic Speech Recognition (ASR); identifying, by the hub device, an intent of the user; identifying, by a hub device, an Internet of things (IoT) device capable of performing an operation corresponding to text; identifying, from among the hub device and a plurality of other devices connected to the hub device, which device stores a function determination model corresponding to an IoT device capable of performing an operation corresponding to the text; and determining that the device of the model is a different device than the hub device based on the identified storage function, sending at least a portion of the text to the identified device, wherein the hub device includes a hardware processor.
The method may further comprise: storing, by the hub device in the form of a look-up table (LUT), information regarding whether the function determination model for each of the plurality of IoT devices that were previously registered using the user account is associated with information regarding a storage location of the function determination model for each of the plurality of IoT devices.
The method may further comprise: obtaining operation information related to an operation corresponding to text to be performed by the IoT device by using the function determination model.
Detailed Description
This application is based on and claims the rights of 62/862,201 and 62/905,707 U.S. provisional patent applications filed at 17.6.2019 and 25.9.2019, respectively, and the priority of 10-2019-.
Although the terms used herein are selected from common terms that are currently widely used in consideration of their functions in the present disclosure, the terms may be changed according to the intention of a person of ordinary skill in the art, precedent cases, or the emergence of new technology. Further, in certain cases, terms are arbitrarily selected by the applicant of the present disclosure, and the meanings of these terms will be described in detail in corresponding parts of specific embodiments. Accordingly, terms used in the present disclosure are not merely designated as terms, but are defined based on meanings of the terms and contents throughout the present disclosure.
Throughout the disclosure, the expression "at least one of a, b and c" means only a, only b, only c, both a and b, both a and c, both b and c, all or a variant thereof.
As used herein, the singular forms are intended to include the plural forms as well, unless the context clearly indicates otherwise. Terms including technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
Throughout this application, when a component "comprises" an element, it is to be understood that the component additionally comprises, but not exclusively, other elements, unless specifically stated to the contrary. Also, terms such as "… unit", "module", etc., used in the present disclosure indicate a unit that processes at least one function or motion, and the unit may be implemented as hardware or software, or a combination of hardware and software.
The expression "configured (or arranged)" as used herein may be replaced with, for example, "adapted", "having … … capability", "designed", "adapted", "manufactured" or "capable", as the case may be. The expression "configured (or set)" may not necessarily mean "specially designed" in hardware. Conversely, in some cases, the expression "a system configured as …" may mean that the system is "… capable" with other devices or components. For example, "a processor configured to (or arranged to) perform A, B and C" may refer to a dedicated processor (e.g., an embedded processor) for performing the corresponding operations, or a general-purpose processor (e.g., a Central Processing Unit (CPU) or an Application Processor (AP)) capable of performing the corresponding operations by executing one or more software programs stored in a memory.
According to an embodiment of the present disclosure, the term "first Natural Language Understanding (NLU) model" used herein may refer to a model trained to analyze text converted from a speech input and determine an operation performing device based on the analysis result. The first NLU model may be used to determine the intent by interpreting the text and determine the operation performing device based on the intent.
The term "second NLU model" as used herein may refer to a model trained to analyze text related to a particular device, according to embodiments of the present disclosure. The second NLU model may be a model trained to obtain operation information related to an operation to be performed by the specific device by interpreting at least a portion of the text. The storage capacity of the second NLU model may be greater than the storage capacity of the first NLU model.
The term "intention" used herein may refer to information indicating a user intention determined by interpreting text, according to an embodiment of the present disclosure. The intention as the information indicating the utterance intention of the user may be information indicating an operation of the operation execution apparatus requested by the user. The intent may be determined by interpreting the text using NLU models. For example, based on the text converted from the user's speech input being "movie vender play on TV," it may be determined that the intent is "content playback. Alternatively, it may be determined that the intention is "temperature control" based on the fact that the text converted from the user's voice input is "lower the air conditioner temperature to 18 ℃".
According to an embodiment of the present disclosure, the intention may include not only information indicating the utterance intention of the user (hereinafter, referred to as intention information) but also a numerical value corresponding to the information indicating the intention of the user. The numerical value may indicate a probability that the text is related to information indicating a particular intent. After the text is interpreted by using the NLU model, when a plurality of pieces of intention information indicating the intention of the user are obtained, intention information having a largest numerical value among a plurality of numerical values corresponding to the plurality of pieces of intention information may be determined as the intention.
As used herein, the term "operation" of a device may refer to at least one action performed by the device when the device performs a particular function, in accordance with an embodiment of the present disclosure. According to embodiments of the present disclosure, an operation may indicate at least one action performed by a device when the device executes an application. For example, when the device executes an application, the operation may indicate, for example, one of: video playback, music playback, email creation, weather information reception, news information display, game play, and performed photography. However, the operation is not limited to the above example.
According to an embodiment of the present disclosure, the operation of the device may be performed based on information on detailed operations output from the action plan management module. According to an embodiment of the present disclosure, the device may perform at least one action by performing a function corresponding to a detailed operation output from the action plan management module. According to an embodiment of the present disclosure, the device may store an instruction for executing a function corresponding to a detailed operation, and when the detailed operation is determined, the device may determine the instruction corresponding to the detailed operation and may execute a specific function by executing the instruction.
Further, according to an embodiment, the device may store instructions for executing an application corresponding to the detailed operation. According to an embodiment of the present disclosure, the instructions for executing the application may include instructions for executing the application itself and instructions for executing detailed functions constituting the application. Based on the determination of the detailed operation, the device may execute the application by executing an instruction for executing the application corresponding to the detailed operation, and may execute the detailed function by executing an instruction for executing the detailed function of the application corresponding to the detailed operation.
According to an embodiment of the present disclosure, the term "operation information" used herein may refer to information related to a detailed operation to be determined by a device, a relationship between each detailed operation and another detailed operation, and an execution order of the detailed operations. According to an embodiment of the present disclosure, when a first operation is to be performed, a relationship between each detailed operation and another detailed operation may include information about a second operation that has to be performed before the first operation is performed. For example, when the operation to be performed is "music playback", the "power on" may be another detailed operation that has to be performed before the "music playback" is performed. According to embodiments of the present disclosure, the operational information may include, but is not limited to, one or more of the following: a function to be executed by the operation execution apparatus to execute a specific operation, an execution order of the functions, an input value required to execute the functions, and an output value output as a result of the execution of the functions.
According to an embodiment of the present disclosure, the term "operation performing device" used herein may refer to a device determined to perform an operation based on an intention obtained from a text among a plurality of devices. The text may be analyzed by using the first NLU model, and the operation execution device may be determined based on the analysis result. According to an embodiment of the present disclosure, the operation performing device may perform at least one action by performing a function corresponding to a detailed operation output from the action plan management module. According to an embodiment of the present disclosure, the operation performing apparatus may perform an operation based on the operation information.
According to an embodiment of the present disclosure, the term "action plan management module" used herein may refer to a module for managing detailed operations to be performed by an operation performing apparatus and operation information related to the detailed operations of the apparatus to generate an execution order of the detailed operations. According to an embodiment of the present disclosure, the action plan management module may manage operation information regarding detailed operations of a device according to a device type and a relationship between the detailed operations.
According to embodiments of the present disclosure, the term "internet of things (IoT) server" as used herein may refer to a server that obtains, stores, and manages IoT device information about each of a plurality of devices (e.g., including IoT devices, mobile phones, etc.). The IoT server may obtain, determine, or generate control commands for controlling devices (e.g., IoT devices) using the stored device information. According to an embodiment of the present disclosure, the IoT server may send a control command to the device that determines to perform the operation based on the operation information. According to embodiments of the present disclosure, an IoT server may be implemented as, but is not limited to, a hardware device separate from the "server" of the present disclosure. Depending on the embodiment, the IoT server may be an element of the "voice assistant server" of the present disclosure, or may be a server designed to be categorized as software. Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so as to enable those skilled in the art to easily implement and practice the present disclosure. However, the present disclosure may be embodied in many different forms according to the embodiments of the present disclosure, and should not be construed as being limited to the embodiments of the present disclosure set forth herein.
Reference will now be made in detail to embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings.
Fig. 1 is a block diagram illustrating some elements of a multi-device system including a hub device 1000, a voice assistant server 2000, an IoT server 3000, and a plurality of devices 4000, according to an embodiment of the present disclosure.
In the embodiment of fig. 1, elements are illustrated for describing the operation of the hub device 1000, the voice assistant server 2000, the IoT server 3000, and the plurality of devices 4000. The elements included in the hub device 1000, voice assistant server 2000, IoT server 3000, and plurality of devices 4000 are not limited to those illustrated in fig. 1.
Reference numerals S1 to S16 denoted by arrows in fig. 1 denote data movement operations (transmission or reception) between a plurality of entities through a network. The numerals appended to the english alphabet S in S1 to S16 are for convenience of explanation, regardless of the order of data movement operations (transmission or reception).
Referring to fig. 1, according to an embodiment of the present disclosure, a hub device 1000, a voice assistant server 2000, an IoT server 3000, and a plurality of devices 4000 may be connected to each other by using a wired communication or wireless communication method and may perform communication. In an embodiment of the present disclosure, the hub device 1000 and the plurality of devices 4000 may be connected to each other directly or via a communication network, but the present disclosure is not limited thereto. According to an embodiment, the hub device 1000 and the plurality of devices 4000 may be connected to the voice assistant server 2000, and the hub device 1000 may be connected to the plurality of devices 4000 through the voice assistant server 2000. Further, according to an embodiment, the hub device 1000 and the plurality of devices 4000 may be connected to the IoT server 3000. In another embodiment of the present disclosure, the hub device 1000 and each of the plurality of devices 4000 may be connected to the voice assistant server 2000 through a communication network and may be connected to the IoT server 3000 through the voice assistant server 2000.
According to an embodiment, the hub device 1000, the voice assistant server 2000, the IoT server 3000, and the plurality of devices 4000 may be connected through one or more of the following: a Local Area Network (LAN), a Wide Area Network (WAN), a Value Added Network (VAN), a mobile radio communication network, a satellite communication network, or a combination thereof. Examples of wireless communication methods may include, but are not limited to, Wi-Fi (wireless fidelity), Bluetooth Low Energy (BLE), Zigbee, Wi-Fi direct (WFD), Ultra Wideband (UWB), infrared data association (IrDA), or Near Field Communication (NFC).
According to an embodiment, the hub device 1000 may be a device that receives voice input from a user and controls at least one of the plurality of devices 4000 based on the received voice input. According to embodiments of the present disclosure, the hub device 1000 may be a listening device that receives voice input from a user.
According to an embodiment, at least one of the plurality of devices 4000 may be an operation performing device that performs a specific operation by receiving a control command of the hub device 1000 or the IoT server 3000. According to an embodiment of the present disclosure, the plurality of devices 4000 may be IoT devices that are logged in by using the same user account as the user account of the hub device 1000 and that were previously registered with the IoT server 3000 by using the user account of the hub device 1000.
According to an embodiment, at least one of the plurality of devices 4000 may be a listening device that receives voice data (e.g., voice input from a user). The listening device may be, but is not limited to, a device designed to process a user's voice (e.g., a device that only receives and processes voice input from a human user or from a particular registered user). In an embodiment of the present disclosure, the listening device may be an operation execution device that receives a control command from the hub device 1000 and executes an operation for a specific function.
According to an embodiment of the present disclosure, at least one of the plurality of devices 4000 may receive a control command from the IoT server 3000 (S12, S14, and S16), or may receive at least a portion of text converted from input voice from the hub device 1000 (S3 and S5). According to an embodiment of the present disclosure, at least one of the plurality of devices 4000 may receive the control command from the IoT server 3000 (S12, S14, and S16) without receiving at least a portion of the text from the hub device 1000.
According to an embodiment, the hub device 1000 may include a device determination model 1330, the device determination model 1330 determining a device for performing an operation based on a voice input of a user. According to an embodiment, the device determination model 1330 may determine the operation performing device from among the plurality of devices 4000 registered according to the user account. In an embodiment of the present disclosure, the hub device 1000 may receive device information including at least one of identification information (e.g., device id information) of each of the plurality of devices 4000, a device type of each of the plurality of devices 4000, function execution capability, location information, and status information of each of the plurality of devices 4000 from the voice assistant server 2000 (S2). According to an embodiment, each of the plurality of devices 4000 is an IoT device. According to an embodiment, the hub device 1000 may determine a device (e.g., an IoT device) for performing an operation according to a voice input of a user from among the plurality of devices 4000 based on the received device information by using data on the device determination model 1330.
According to an embodiment of the present disclosure, the function determination model corresponding to the operation execution device determined by the hub device 1000 may be stored in the memory 1300 (see fig. 2) of the hub device 1000, may be stored in the operation execution device itself, or may be stored in the memory 2300 (see fig. 3) of the voice assistant server 2000. According to an embodiment of the present disclosure, the term "function determination model" corresponding to each device refers to a model for obtaining operation information on detailed operations and relationships between the detailed operations that perform the operations according to the determined functions of the devices.
According to an embodiment of the present disclosure, the function determination device determining module 1340 of the hub device 1000 may identify a device in which a function determination model corresponding to an operation execution device is stored, from among the hub device 1000, the voice assistant server 2000, and the operation execution device, by using the database 1360 including information about function determination models of devices stored in a memory. According to embodiments of the present disclosure, the database 1360 may include information about a plurality of devices 4000 registered with a user account associated with the hub device 1000. Specifically, identification information (e.g., device Identifier (ID) information) of each of the plurality of devices 4000, information on whether there is a function determination model of each of the plurality of devices 4000, information on a storage location of the function determination model of each of the plurality of devices 4000 (e.g., stored identification information of a device/server, stored Internet Protocol (IP) address of a device/server, or stored Media Access Control (MAC) address of a device/server) may be stored. In an embodiment of the present disclosure, the function determination device determining module 1340 may search the database 1360 according to the device identification information of the operation execution device output by the device determination model 1330, and may obtain information on a storage location of the function determination model corresponding to the operation execution device based on a search result of the database 1360.
According to an embodiment of the present disclosure, the hub device 1000 may transmit at least a portion of text converted from the user's voice input to a device recognized as storing a function determination model corresponding to an operation performing device by using the function determination device determination module 1340.
For example, based on the hub device 1000 receiving a voice input from the user saying "raise the temperature by 1 ℃," the hub device 1000 may determine, through the device determination model 1330, that the device for raising the temperature is an air conditioner. Next, according to an embodiment, the function determination device determination module 1340 may check whether a function determination model corresponding to an air conditioner is stored in the air conditioner (IoT device), and based on determining that it is stored in the air conditioner, may transmit a text corresponding to "raise the temperature by 1 ℃" to the air conditioner. According to an embodiment, an IoT device (e.g., an air conditioner) capable of performing an operation of increasing the temperature of air may analyze a received text received through a stored function determination model corresponding to the air conditioner and perform a temperature control operation by using the text analysis result. That is, the operation performing device that determines that the voice input by the user can be performed is the first device 4100 as "air conditioner" based on the device determination model 1330, and because the function determination model corresponding to the air conditioner is stored in the first device 4100 itself, the function determination device determining module 1340 may transmit at least a part of the text to the first device 4100(S3) (e.g., air conditioner).
For example, based on the hub device 1000 receiving a voice input that the user says "change channel," the hub device 1000 may determine, through the device determination model 1330, that the device for changing channels is a TV. Next, according to an embodiment, the function determination device determination module 1340 may check whether a function determination model corresponding to a TV is stored in the hub device 1000, and may analyze text corresponding to "change channel" through the stored function determination model corresponding to the TV. According to an embodiment, the hub device 1000 may determine a channel change operation as an operation to be performed by the TV by using the text analysis result, and may transmit operation information regarding the channel change operation to the TV. That is, determining that the operation execution device is the second device 4200 that is "TV" based on the device determination model 1330, because the function determination model corresponding to the TV is stored in the hub device 1000, the function determination device determination module 1340 may provide (e.g., transmit) at least a portion of the text to the TV function determination model 1354 so that the hub device 1000 itself may process at least a portion of the text.
For example, based on the hub device 1000 receiving a voice input from the user saying "perform a deodorization mode", the hub device 1000 may determine, through the device determination model 1330, that the device for performing the deodorization mode is an air purifier. Next, according to an embodiment, the function determination device determination module 1340 may check whether a function determination model corresponding to the air purifier is stored in the voice assistant server 2000, and may transmit text corresponding to "execute deodorization mode" to the voice assistant server 2000 based on a determination that the function determination model corresponding to the air purifier is stored in the voice assistant server 200. According to an embodiment of the present disclosure, the voice assistant server 2000 may analyze the received text through a function determination model corresponding to the air purifier, and may determine a deodorization mode performing operation as an operation to be performed by the air purifier by using a text analysis result. According to an embodiment of the present disclosure, the voice assistant server 2000 may transmit operation information regarding the deodorization mode performing operation to the air purifier, in which case the operation information related to the deodorization mode performing operation may be transmitted through the IoT server 3000. That is, according to an embodiment, based on the determination by the device determination model 1330 that the operation performing device is the third device 4300 that is the "air purifier", the function determination device determination module 1340 may transmit at least a portion of the text to the voice assistant server 2000 because the function determination model corresponding to the air purifier is stored in the voice assistant server 2000 (S1).
In an embodiment of the present disclosure, the hub device 1000 itself may store a function determination model corresponding to at least one of the plurality of devices 4000. For example, when the hub device 1000 is a voice assistant speaker, the hub device 1000 may store the speaker function determination model 1352, the speaker function determination model 1352 being used to obtain operation information on detailed operations of performing functions of the voice assistant speaker and relationships between the detailed operations.
According to an embodiment, the hub device 1000 may also store a function determination model corresponding to another device. For example, the hub device 1000 may store a TV function determination model 1354 for obtaining operation information on detailed operations corresponding to a TV and relationships between the detailed operations. According to an embodiment of the present disclosure, the TV may be a device previously registered with the IoT server 3000 by using the same user account as that of the hub device 1000.
According to an embodiment, the speaker function determination model 1352 and the TV function determination model 1354 may include second NLU models 1352a and 1354a and action plan management modules 1532b and 1354b, respectively. Second NLU models 1352a and 1354a and action plan management modules 1352b and 1354b are described with reference to fig. 2.
According to an embodiment, the voice assistant server 2000 may determine an operation performing device for performing an operation intended by the user based on the text received from the hub device 1000. According to an embodiment, the voice assistant server 2000 may receive user account information from the hub device 1000 (S1). According to an embodiment of the present disclosure, based on the voice assistant server 2000 receiving the user account information from the hub device 1000, the voice assistant server 2000 may transmit a query for requesting device information about the plurality of devices 4000 previously registered according to the received user account information to the IoT server 3000 (S9), and may receive device information about the plurality of devices 4000 from the IoT server 3000 (S10). According to an embodiment, the voice device information may include at least one of identification information (e.g., device id information) of each of the plurality of devices 4000, a device type of each of the plurality of devices 4000, a function execution capability of each of the plurality of devices 4000, location information, and status information. According to an embodiment, the voice assistant server 2000 may transmit the device information received from the IoT server 3000 to the hub device 1000 (S2).
Elements of a hub device 1000 according to an embodiment will be described with reference to fig. 2.
According to an embodiment, the voice assistant server 2000 may include a device determination model 2330 and a plurality of function determination models 2342, 2344, 2346, and 2348. According to the embodiment, the voice assistant server 2000 may select a function determination model corresponding to at least a part of the text received from the hub device 1000 from among the plurality of function determination models 2342, 2344, 2346, and 2348 by using the device determination model 2330, and may obtain operation information necessary to operate the execution device to perform the operation by using the selected function determination model. According to an embodiment, the voice assistant server 2000 may transmit the operation information to the IoT server 3000 (S9).
The elements of the voice assistant server 2000 in accordance with an embodiment of the present disclosure are described with reference to fig. 3.
According to an embodiment, the IoT server 3000 may be connected through a network and may store information about a plurality of devices 4000 that were previously registered through a user account using the hub device 1000. In an embodiment of the present disclosure, the IoT server 3000 may receive at least one of user account information with which each of the plurality of devices 4000 logs in, identification information (e.g., device id information) of each of the plurality of devices 4000, a device type of each of the plurality of devices 4000, and function execution capability information of each of the plurality of devices 4000 (S11, S13, and S15). In an embodiment of the present disclosure, the IoT server 3000 may receive state information regarding power on/off or an operation being performed of each of the plurality of devices 4000 from the plurality of devices 4000 (S11, S13, and S15). The IoT server 3000 may store device information and state information received from the plurality of devices 4000.
According to an embodiment, the IoT server 3000 may generate a control command readable and executable by the operation execution device based on the operation information received from the voice assistant server 2000. According to an embodiment of the present disclosure, the IoT server 3000 may transmit a control command to a device determined as an operation performing device among the plurality of devices 4000 (S12, S14, and S16).
Elements of IoT server 3000 according to an embodiment are described with reference to fig. 4.
In the embodiment of fig. 1, the plurality of devices 4000 may include a first device 4100, a second device 4200, and a third device 4300. Although the first device 4100 may be an air conditioner, the second device 4200 may be a TV, and the third device 4300 may be an air purifier in fig. 1, the present disclosure is not limited thereto. The plurality of devices 4000 may include not only an air conditioner, a TV, and an air purifier, but also other IoT devices such as home appliances such as a robot cleaner, a washing machine, an oven, a microwave oven, a scale (e.g., a weight scale), a refrigerator, or an electronic photo frame, and mobile devices such as a smart phone, a tablet Personal Computer (PC), a mobile phone, a video phone, an e-book reader, a desktop PC, a laptop PC, a netbook computer, a workstation, a server, a Personal Digital Assistant (PDA), a Portable Multimedia Player (PMP), an MP3 player, a mobile medical device, a camera, or a wearable device.
According to an embodiment, at least one of the plurality of devices 4000 may itself store the function determination model. For example, a function determination model 4132 for obtaining operation information about detailed operations and relationships between the detailed operations required for the first device 4100 to perform an operation determined from the voice input of the user, and generating a control command based on the operation information may be stored in the memory 4130 of the first device 4100.
According to an embodiment, each of the second device 4200 and the third device 4300 among the plurality of devices 4000 may not store the function determination model.
According to the embodiment, at least one of the plurality of devices 4000 may transmit information on whether the device itself stores the function determination model to the hub device 1000 (S4, S6, and S8).
According to an embodiment, the plurality of devices 4000 may also include third party devices that are not manufactured by the same manufacturer as the manufacturer of the hub device 1000 (e.g., the third party devices are manufactured by a different manufacturer than the manufacturer of the hub device 1000), the voice assistant server 2000, and the IoT server 3000, and are not directly controlled by the hub device 1000, the voice assistant server 2000, and the IoT server 3000. The third party device will be described in detail with reference to fig. 11.
Fig. 2 is a block diagram illustrating elements of a hub device 1000 according to an embodiment of the present disclosure.
According to an embodiment, the hub device 1000 may be a device that receives voice input from a user and controls at least one of the plurality of devices 4000 based on the received voice input. According to an embodiment, the hub device 1000 may be a listening device that receives voice input from a user.
Referring to fig. 2, according to an embodiment, a hub device 1000 may include a microphone 1100, a processor 1200, a memory 1300, and a communication interface 1400. According to an embodiment, the hub device 1000 may receive voice input (e.g., an utterance of a user) from the user through the microphone 1100 and may obtain a voice signal from the received voice input. In an embodiment of the present disclosure, the processor 1200 of the hub device 1000 may convert sound received through the microphone 1100 into an acoustic signal, and may obtain a voice signal by removing noise (e.g., non-voice components) from the acoustic signal.
However, the present disclosure is not limited thereto, and the hub device 1000 may receive a voice signal from a listening device.
According to an embodiment, the hub device 1000 may include a voice recognition module having a function of detecting a specified voice input (e.g., a wake-up input such as "Hi, Bixby" or "OK, Google") or a function of preprocessing a voice signal obtained from a portion of the voice input.
According to an embodiment of the present disclosure, processor 1200 may execute one or more instructions of a program stored in memory 1300. According to embodiments of the present disclosure, processor 1200 may include hardware components to perform arithmetic, logical, and input/output operations, as well as signal processing. According to an embodiment, the processor 1200 may include, but is not limited to, at least one of a Central Processing Unit (CPU), a microprocessor, a Graphics Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), and a Field Programmable Gate Array (FPGA).
According to an embodiment, a program may include one or more instructions for controlling the plurality of devices 4000 based on user voice input received through the microphone 1100, which may be stored in the memory 1300. According to an embodiment, instructions and/or program code readable by processor 1200 may be stored in memory 1300. In an exemplary embodiment of the present disclosure, the processor 1200 may be implemented by executing instructions or code stored in a memory.
According to an embodiment, one or more or all of data regarding the Automatic Speech Recognition (ASR) module 1310, data regarding the Natural Language Generator (NLG) module 1320, data regarding the device determination model 1330, data regarding the function determination device determination module 1340, data corresponding to each of the plurality of function determination models 1350, and data corresponding to the database 1360 may be stored in the memory 1300.
According to an embodiment, the memory 1300 may include at least one type of storage medium of a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (e.g., SD (secure digital) memory or XD (eXtreme digital) memory), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, and an optical disk.
According to an embodiment, processor 1200 may perform ASR by using data stored in memory 1300 for ASR module 1310 and may convert speech signals received through microphone 1100 to text.
However, the present disclosure is not limited thereto. In embodiments of the present disclosure, processor 1200 may receive speech signals from a listening device through communication interface 1400 and may convert the speech signals to text by performing ASR using ASR module 1310.
According to an embodiment, the processor 1200 may analyze the text by using data about the device determination model 1330 stored in the memory 1300, and may determine an operation performing device (e.g., IoT device) from among the plurality of devices 4000 (e.g., a plurality of IoT devices) based on the analysis result of the text. According to an embodiment of the present disclosure, the device determination model 1330 may include a first NLU model 1332. In an embodiment of the present disclosure, the processor 1200 may analyze the text by using data on the first NLU model 1332 included in the device determination model 1330, and may determine an operation performing device for performing an operation according to the user's intention from among the plurality of devices 4000 based on the analysis result of the text.
According to an embodiment, the first NLU model 1332 may be a model trained to analyze text converted from a speech input and determine an operation performing device based on the analysis result. According to an embodiment of the present disclosure, the first NLU model 1332 may be used to determine an intent by interpreting text and determine an operation performing device based on the intent.
In an embodiment of the present disclosure, the processor 1200 may parse text in morpheme, word, or phrase units by using data about the first NLU model 1332 stored in the memory 1300, and may infer the meaning of a word extracted from the parsed text by using linguistic features (e.g., grammatical components) of the morpheme, word, or phrase. According to an embodiment, processor 1200 may compare the inferred meaning of the word to the predefined intent provided by first NLU model 1332 and may determine an intent corresponding to the inferred meaning of the word.
According to an embodiment, the processor 1200 may determine a device related to the intention recognized from the text as the operation performing device based on a matching model for determining a relationship between the intention and the device. In an embodiment of the present disclosure, the matching model may be included in the data on the device determination model 1330 stored in the memory 1300 and may be obtained through learning via a rule-based system, but the present disclosure is not limited thereto.
In an embodiment of the present disclosure, the processor 1200 may obtain a plurality of numerical values indicating degrees of relationship between the intention and the plurality of devices 4000 by applying the matching model to the intention, and may determine a device having a largest numerical value among the obtained plurality of numerical values as a final operation performing device (e.g., an IoT device capable of performing an operation intended by a user's voice (voice input)). For example, based on the intention being related to each of the first device 4100 (see fig. 1) and the second device 4200 (see fig. 1), the processor 1200 may obtain a first numerical value indicating a degree of relationship between the intention and the first device 4100 and a second numerical value indicating a degree of relationship between the intention and the second device 4200, and may determine the first device 4100 having a larger numerical value of the first numerical value and the second numerical value as a final operation performing device.
For example, based on the hub device 1000 receiving a speech input from the user that the user said "lower the set temperature by 2 ℃, because it is hot," the processor 1200 can execute an ASR that converts the speech input to text and can analyze the text by using data related to the first NLU model 1332 to obtain (e.g., by inferring) an intent corresponding to "set temperature adjustment. According to an embodiment, the processor 1200 may obtain a first numerical value indicating a degree of relationship between the intention of "set temperature adjustment" and the first device 4100 as an air conditioner, a second numerical value indicating a degree of relationship between the intention of "set temperature adjustment" and the second device 4200 as a TV, and a third numerical value indicating a degree of relationship between the intention of "set temperature adjustment" and the third device 4300 (see fig. 1) as an air purifier by applying a matching model. According to an embodiment, the processor 1200 may determine the first device 4100 as an operation performing device related to "set temperature adjustment" by using the first numerical value that is the maximum value among the obtained numerical values.
According to an embodiment, upon the hub device 1000 receiving a voice input from the user saying "play movie vengeant", the processor 1200 may analyze the text converted from the voice input and may obtain an intent corresponding to "content playback". According to an embodiment, the processor 1200 may determine the second device 4200 as the operation execution device related to the "content playback" based on the second numerical information being a maximum value among a first numerical value indicating a degree of relationship between the intention of the "content playback" and the first device 4100 as the air conditioner, a second numerical value indicating a degree of relationship between the intention of the "content playback" and the second device 4200 as the TV, and a third numerical value indicating a degree of relationship between the intention of the "content playback" and the third device 4300 as the air purifier, which are calculated by using the matching model.
However, the present disclosure is not limited to the above-described example, and the processor 1200 may arrange numerical values indicating degrees of relationship between the intention and the plurality of devices in ascending order, and may determine a predetermined number of devices as operation performing devices. In an embodiment of the present disclosure, the processor 1200 may determine a device, which indicates that the numerical value of the degree of relationship is equal to or greater than a predetermined critical value, as the operation execution device related to the intention. In this case, a plurality of devices may be determined as the operation execution devices.
According to an embodiment, the processor 1200 may train a matching model between the intent and the operation performing device by using, for example, a rule based system, but the disclosure is not limited thereto. The AI model used by processor 1200 may be, for example, a neural network based system (e.g., Convolutional Neural Network (CNN) or Recurrent Neural Network (RNN)), a Support Vector Machine (SVM), linear regression, logistic regression, naive bayes, random forests, decision trees, or k-nearest neighbor algorithms. Alternatively, the AI model may be a combination of the above examples or any other AI model. According to an embodiment, the AI model used by the processor 1200 may be stored in the device determination model 1330.
According to an embodiment, the device determination model 1330 stored in the memory 1300 of the hub device 1000 may determine an operation performing device from among the plurality of devices 4000 registered according to the user account of the hub device 1000. According to an embodiment, the hub device 1000 may receive device information about each of the plurality of devices 4000 from the voice assistant server 2000 by using the communication interface 1400. According to an embodiment, the device information may include, for example, at least one of identification information (e.g., device id information) of each of the plurality of devices 4000, a device type of each of the plurality of devices 4000, a function execution capability of each of the plurality of devices 4000, location information, and status information. According to an embodiment, the processor 1200 may determine a device for performing an operation according to an intention from among the plurality of devices 4000 based on the device information by using data about the device determination model 1330 stored in the memory 1300.
In an embodiment of the present disclosure, the processor 1200 may analyze a numerical value indicating a degree of relationship between the intention and a plurality of devices 4000 by using the device determination model 1330, the plurality of devices 4000 being previously registered by using the same user account as that of the hub device 1000, and may determine a device indicating a maximum value among the numerical values having the degree of relationship between the intention and the plurality of devices 4000 as the operation performing device.
Because the device determination model 1330 is configured to determine the operation execution device by using only the plurality of devices 4000 as device candidates, the plurality of devices 4000 logging in and registering by using the same user account as that of the hub device 1000, there may be the following technical effects: the computational load performed by the processor 1200 to determine the degree of relationship to the intent may be reduced to less than the computational load of the processor 2200 of the voice assistant server 2000. Further, due to the reduction in the amount of calculation, the processing time required to determine the operation execution device can be reduced, and thus the response speed can be improved.
In an embodiment of the present disclosure, the processor 1200 may obtain a name of the device from text by using the first NLU model 1332, and may perform a device-based name determination operation by using data about the device determination model 1330 stored in the memory 1300. In an embodiment of the present disclosure, the processor 1200 may extract a common name related to the device and a word or phrase about the installation location of the device from the text by using the first NLU model 1332, and may determine the operation performing device based on the extracted common name and the installation location of the device. For example, when the text converted from the voice input is "movie vender is played on TV", the processor 1200 may parse the text in units of words or phrases by using the first NLU model 1332, and may recognize the name of the device corresponding to "TV" by comparing the words or phrases with pre-stored words or phrases. According to an embodiment, the processor 1200 may determine the second device 4200, which is a TV among the plurality of devices 4000 connected to the hub device 1000 and is logged in by using the same account as the user account of the hub device 1000, as the operation execution device.
In an embodiment of the present disclosure, any one of the plurality of devices 4000 may be a listening device that receives a voice input from a user, and the processor 1200 may determine the listening device as an operation performing device by using the device determination model 1330. However, the present disclosure is not limited thereto, and the processor 1200 may determine a device other than the listening device as the operation performing device from among the plurality of devices 4000 by using the device determination model 1330.
According to an embodiment, the processor 1200 may receive the update data of the device determination model 1330 from the voice assistant server 2000 by using the communication interface 1400. In an embodiment of the present disclosure, the updated latest function may be included in at least one of the plurality of devices 4000, or a new device having a new function may be added to the user account. In this case, the device determination model 1330 of the hub device 1000 may not determine new functions of the plurality of devices 4000, or may not determine an added new device as an operation performing device. According to an embodiment, the processor 1200 may update the device determination model 1330 to the latest version by using the received update data. According to an embodiment, the latest version may be the same as the version of the device determination model 2330 (see fig. 3) of the voice assistant server 2000.
According to an embodiment, the update data of the device determination model 1330 may include updated data, so that the device determination model 1330 may determine detailed operation information on the updated latest function of each of the plurality of devices 4000 connected through the user account and may determine an operation performing device that performs the updated latest function from text. In an embodiment of the present disclosure, the update data of the device determination model 1330 may include information about the functionality of a new device newly added to the user account, according to an embodiment.
An embodiment of the update device determination model 1330 will be described with reference to fig. 17 and 18.
According to an embodiment, the processor 1200 may receive device information of a new device from the voice assistant server 2000 by using the communication interface 1400. According to an embodiment, the device information obtained by the processor 1200 of the hub device 1000 from the voice assistant server 2000 may comprise at least one of device identification information (e.g. device id information) of the new device, stored information of a device determination model of the new device, and stored information of a function determination model of the new device, for example. According to an embodiment, the new device may be a target device that is expected to be registered in the user account of the voice assistant server 2000.
According to an embodiment, the processor 1200 may add the new device to the device candidates that may be determined as operation performing devices by the device determination model 1330 by using the device information of the new device obtained from the voice assistant server 2000. In embodiments of the present disclosure, because the device determination model 1330 causes new devices to be included in the device candidates, the processor 1200 may promote device determination capabilities to perform the operation from the text determination.
An embodiment of registering a new device is described with reference to fig. 19.
According to an embodiment, NLG model 1320 may be used to provide response messages during interaction between hub device 1000 and a user. For example, the processor 1200 may generate a response message such as "i will play a movie on TV" or "i will lower the set temperature of the air conditioner by 2 ℃" by using the NLG model 1320.
According to an embodiment, when there are a plurality of operation performing devices determined by the processor 1200 or there are a plurality of devices having similar degrees of relation to the intention, the NLG model 1320 may store data for generating a query message for determining a specific operation performing device. In an embodiment of the present disclosure, the processor 1200 may generate a query message for selecting one operation performing device from among a plurality of device candidates by using the NLG model 1320. According to an embodiment, the query message may be a message for requesting a response from the user as to which device candidate of the plurality of device candidates is to be identified (determined) as the operation performing device.
According to an embodiment, the function determination device determining module 1340 may be a module for identifying a device in which a function determination model corresponding to an operation performing device is stored and may determine a target device to which a small portion of text or the entire text is to be transmitted. According to an embodiment, the function determination device determining module 1340 may identify a device in which a function determination model corresponding to an operation execution device is stored, from among the hub device 1000, the voice assistant server 2000, and the operation execution device, by using the database 1360 including information about the function determination model of the device. In the embodiment of the present disclosure, when the hub device 1000 receives a voice signal from the listening device, the function determination device determining module 1340 may identify a device in which the function determination model corresponding to the operation execution device is stored, from among the hub device 1000, the internal memory of the operation execution device, the internal memory of the listening device, and the voice assistant server 2000, by using the database 1035.
According to an embodiment, information on whether a function determination model of each of the plurality of devices 4100 previously registered by using the user account may be stored, and information (e.g., device identification information, IP address, or MAC address) on a storage location of the function determination model of each of the plurality of devices 4000 may be stored in the database 1360 in the form of a look-up table (LUT). In an embodiment of the present disclosure, the function determination device determining module 1340 may search the lookup table in the database 1360 according to the device identification information of the operation execution device output by the device determination model 1330, and may obtain information on a storage location of the function determination model corresponding to the operation execution device based on a search result of the lookup table.
In embodiments of the present disclosure, the function determining device determining module 1340 itself may store data for determining a target device to which the entire text or at least a portion of the entire text is to be sent. According to an embodiment, the processor 1200 may identify a device storing a function determination model corresponding to an operation performing device by using data regarding the function determination device determining module 1340.
According to embodiments, the function determination model corresponding to the operation execution device may be stored in the memory 1300 of the hub device 1000, may be stored in the memory of the operation execution device itself, or may be stored in the memory 2300 of the voice assistant server 2000.
According to the embodiment, the term "function determination model corresponding to the operation execution apparatus" refers to a model for obtaining detailed operations regarding the function execution operations according to the determination of the operation execution apparatus and operation information on the relationship between the detailed operations. In an embodiment of the present disclosure, the speaker function determination model 1352 and the TV function determination model 1354 stored in the memory 1300 of the hub device 1000 may correspond to a plurality of devices which are logged in by using the same account as the user account and are connected to the hub device 1000 through a network, respectively.
For example, the speaker function determination model 1352, which is a first function determination model, may be a model for obtaining operation information on detailed operations of performing operations according to the function of the first device 4100 (see fig. 1) and relationships between the detailed operations. In the embodiment of the present disclosure, the speaker function determination model 1352 may be, but is not limited to, a model for obtaining operation information according to the function of the hub device 1000. Also, the TV function determination model 1354, which is the second function determination model, may be a model for obtaining operation information on detailed operations of the function execution operation according to the second device 4200 (see fig. 1) and relationships between the detailed operations.
According to an embodiment, the speaker function determination model 1352 and the TV function determination model 1354 may include second NLU models 1352a and 1354a, respectively, configured to analyze at least a portion of the text and obtain operation information on an operation to be performed by the operation performing apparatus determined based on the analysis result of the at least a portion of the text. According to an embodiment, the speaker function determination model 1352 and the TV function determination model 1354 may include action plan management modules 1352b and 1354b, respectively, configured to manage operation information related to detailed operations of the device so as to generate detailed operations to be performed by the device and an execution order of the detailed operations. According to an embodiment, the action plan management modules 1352b and 1354b may manage operation information regarding detailed operations of devices according to a relationship between the devices and the detailed operations. According to an embodiment, the action plan management modules 1352b and 1354b may plan detailed operations to be performed by the device and an execution order of the detailed operations based on the analysis result of at least a portion of the text.
According to an embodiment, a plurality of function determination models for a plurality of devices, e.g., a speaker function determination model 1352 and a TV function determination model 1354, may be stored in the memory 1300 of the hub device 1000.
As described above, according to an embodiment, the processor 1200 may check whether the function determination model corresponding to the operation performing device is stored in the memory 1300 by searching the lookup table stored in the database 1360 using data regarding the function determination device determining module 1340. For example, based on the determination that the operation performing device is the first device 4100, the function determination model corresponding to the first device 4100 may not be stored in the memory 1300 of the hub device 1000. In this case, according to an embodiment, the processor 1200 may check that the function determination model corresponding to the operation performing device is not stored in the hub device 1000. As another example, based on determining that the operation execution device is the second device 4200, the TV function determination model 1354 corresponding to the second device 4200 may be stored in the memory 1300 of the hub device 1000. In this case, according to an embodiment, the processor 1200 may check that the function determination model corresponding to the operation performing device is stored in the hub device 1000.
According to an embodiment, when the processor 1200 checks that the function determination model corresponding to the operation performing device is stored in the hub device 1000, the processor 1200 may provide at least a part of the text to the function determination model corresponding to the operation performing device stored in the hub device 1000 by using data regarding the function determination device determining module 1340. For example, when the operation performing device is the second device 4200, the TV function determination model 1354 corresponding to a TV that is the second device 4200 may be stored in the memory 1300 of the hub device 1000, and thus the processor 1200 may provide at least a portion of the text to the TV function determination model 1354 by using the function determination device determining module 1340.
In an embodiment of the present disclosure, the processor 1200 may send only a portion of the entire text (rather than the entire text) to the TV function determination model 1354. For example, when the text converted from the voice input is "movie revenge on TV", the name of the operation performing apparatus is specified "on TV", and thus may be unnecessary information for the television function determination model 1354. According to an embodiment, the processor 1200 may recognize a word or phrase specifying a name, a common name, or an installation location of a device by parsing text in units of the word or phrase using the first NLU model 1332, and may provide the remaining portion of the text excluding the word or phrase recognized in the entire text to the TV function determination model 1354.
According to an embodiment, the processor 1200 may obtain operation information related to an operation to be performed by the operation performing apparatus by using a function determination model corresponding to the operation performing apparatus, for example, the second NLU model 1354a of the TV function determination model 1354, stored in the memory 1300. According to an embodiment, the second NLU model 1354a, which is a model dedicated to a specific device (e.g., TV), may be an AI model trained to obtain an intention related to a device corresponding to the operation performing device determined by the first NLU model 1332 and corresponding to text. Further, the second NLU model 1354a may be a model trained to determine the operation of the device related to the user's intent by interpreting text. According to an embodiment, an operation may refer to at least one action performed by a device when the device performs a particular function. An operation may refer to at least one action performed by a device when the device executes an application.
In an embodiment of the present disclosure, the processor 1200 may analyze the text by using the second NLU model 1354a of the TV function determination model 1354 corresponding to the determined operation performing apparatus (e.g., TV). According to an embodiment, the processor 1200 may parse text in morpheme, word, or phrase units by using the second NLU model 1354a, may recognize meanings of morphemes, words, or phrases parsed by a grammatical or semantic analysis, and may determine intentions and parameters by matching the recognized meanings with predefined words. According to an embodiment, the term "parameter" as used herein refers to variable information for determining a detailed operation of an operation performing device related to an intention. For example, when the text sent to the TV function determination model 1354 is "movie vender is played on TV", the intention may be "content playback", and the parameter may be "movie vender" as information on the content to be played.
According to an embodiment, the processor 1200 may obtain operational information about at least one detailed operation related to the intention and the parameter by using the action plan management module 1354b of the TV function determination model 1354. According to an embodiment, the action plan managing module 1354b may manage information about detailed operations of the device according to a relationship between the device and the detailed operations. According to an embodiment, the processor 1200 may plan a detailed operation to be performed by an operation performing device (e.g., a TV) and an execution order of the detailed operation based on the intention and the parameter by using the action plan management module 1354b and may obtain operation information. According to an embodiment, the operation information may be information related to a detailed operation to be performed by the device and an execution order of the detailed operation. According to an embodiment, the operation information may include information related to detailed operations to be performed by the device, a relationship between each detailed operation and another detailed operation, and an execution order of the detailed operations. The operation information may include, but is not limited to, functions performed by the operation performing apparatus to perform specific operations, execution orders of the functions, input values required to perform the functions, and output values output as a result of the execution of the functions.
According to an embodiment, the processor 1200 may generate a control command for controlling the operation performing apparatus based on the operation information. According to an embodiment, the control command may refer to an instruction readable and executable by the operation execution apparatus, so that the operation execution apparatus performs a detailed operation included in the operation information. According to an embodiment, the processor 1200 may control the communication interface 1400 to transmit the generated control command to the operation performing device.
In an embodiment of the present disclosure, the processor 1200 may obtain information on the function determination model from a plurality of devices 4000 which are logged in by using the same account as the user account of the hub device 1000 and connected to the hub device 1000 through a network through the communication interface 1400. According to an embodiment, the information on the function determination model may include information on whether each of the plurality of devices 4000 stores the function determination model. Referring to fig. 1, although the first device 4100 of the plurality of devices 4000 may store the function determination model 4132 in the memory 4130 in the embodiment of fig. 1, the second device 4200 and the third device 4300 may not have the function determination model (or may not have the function determination model 4132). According to an embodiment, when information on whether at least one of the first device 4100, the second device 4200, or the third device 4300 stores the function determination model is transmitted to the hub device 1000 (S4, S6, and S8), the hub device 1000 may provide the information on the function determination model obtained through the communication interface 1400 to the processor 1200.
In an embodiment of the disclosure, the processor 1200 may obtain information about the function determination model of the plurality of devices 4000 (see fig. 1) from the voice assistant server 2000 (see fig. 1) through the communication interface 1400. According to an embodiment, each of the plurality of devices 4000 may be registered according to a user account logged in by a user when the user inputs a user id and a password, and user account information and device information of each of the plurality of devices 4000 may be transmitted to the IoT server 3000. In this case, information on whether each of the plurality of devices 4000 itself stores the device determination model and the storage function determination model may also be transmitted to the IoT server 3000. According to an embodiment, the IoT server 3000 may transmit device information about the plurality of devices 4000 registered in the user account information, information about the device determination model, and information about the function determination model to the voice assistant server 2000. According to an embodiment, the voice assistant server 2000 may transmit device information about the plurality of devices 4000, information about the device determination model, and information about the function determination model to the hub device 1000 having the same user account information. According to an embodiment, the hub device 1000 may store information regarding the function determination model of each of the plurality of devices 4000 in the form of a look-up table in the database 1360.
However, the present disclosure is not limited thereto, and the hub device 1000 may receive information on the function determination model from at least one of the plurality of devices 4000. In an embodiment of the present disclosure, the processor 1200 may control the communication interface 1400 to receive information on the function determination model from at least one device storing the function determination model for determining the function of each of the plurality of devices 4000. According to an embodiment, based on receiving information on the function determination models of the plurality of devices 4000, the processor 1200 may control the communication interface 1400 to further receive information (e.g., device identification information, IP address, or MAC address) on a storage location of the function determination model of each of the plurality of devices 4000.
According to an embodiment, the processor 1200 may store the received information on the function determination model and the received information on the storage location of the function determination model of each device in the database 1360. In an embodiment of the present disclosure, the processor 1200 may store information about a storage location of the function determination model and information about whether to store the function determination model in the form of a lookup table according to the name or identification information of the device.
According to an embodiment, the processor 1200 may obtain information on the function determination model of each device by searching a lookup table stored in the database 1360, and may identify a device storing a function determination model corresponding to the operation performing device based on the obtained information on the function determination model. For example, according to an embodiment, when the operation performing device determined based on the text is the first device 4100, the processor 1200 may determine the first device 4100 itself as a device storing the function determination model 4132 (see fig. 1) based on the information on the function determination model obtained from the first device 4100. In this case, the processor 1200 may determine the first device 4100 as a target device to which at least a portion of the text is to be sent by using data regarding the function determining device determining module 1340. According to an embodiment, the processor 1200 may control the communication interface 1400 to send at least a portion of the text to the function determination model 4132 of the first device 4100.
In an embodiment of the present disclosure, the processor 1200 may control the communication interface 1400 to separate a portion of the text regarding the name of the operation performing device and transmit only the remaining portion of the text to the first device 4100. For example, when the first device 4100 is an air conditioner and the text is "lower the set temperature by 2℃ in the air conditioner", when the text is transmitted to the air conditioner, it is not necessary to transmit "in the air conditioner". In this case, the processor 1200 may parse the text in units of words or phrases, may identify words or phrases that specify a name, a common name, an installation location, etc. of the first device 4100, and may provide the remaining portion of the text to the first device 4100 other than the words or phrases identified throughout the text.
According to an embodiment, when the processor 1200 is to identify a device storing a function determination model corresponding to an operation execution device based on the obtained information on the function determination model, the processor 1200 may check whether the function determination model is not stored in the operation execution device. For example, based on the determination that the operation execution device is the third device 4300, according to the embodiment, the function determination model is not stored in the third device 4300. According to an embodiment, the function determination model corresponding to the third device 4300 is also not stored in the hub device 1000. In this case, the processor 1200 may check that the function determination model corresponding to the operation performing device is not stored in any one of the hub device 1000 and the plurality of devices 4000 by using the data on the function determination device determination module 1340. According to an embodiment, the processor 1200 may determine that the target device to which at least a portion of the text is to be sent is the voice assistant server 2000 by using data regarding the function determination device determination module 1340. According to an embodiment, the processor 1200 may control the communication interface 1400 to send at least a portion of the text to the voice assistant server 2000.
In an embodiment of the present disclosure, the function determination device determination module 1340 may be configured to determine whether the operation execution device determined by the device determination model 1330 is the same as a listening device. In an embodiment of the present disclosure, when the processor 1200 receives a voice signal from a listening device through the communication interface 1400, the processor 1200 may also receive device identification information of the listening device (e.g., device id information of the listening device). According to an embodiment, the function determining device determining module 1340 may check the function information of the listening device from the device identification information of the listening device, and may determine whether the listening device and the operation performing device are identical by comparing the function information of the listening device with the function information of the operation performing device.
According to an embodiment, based on the determination operation performing device being the same as the listening device, the processor 1200 may determine whether the listening device stores the function determination model in the internal memory by using data regarding the function determination device determining module 1340. According to an embodiment, when the function determination model stored in the internal memory of the listening device is identical to the function determination model corresponding to the operation performing device, the processor 1200 may transmit at least a portion of the text to the function determination model previously stored in the internal memory of the listening device by using the communication interface 1400. In an embodiment of the present disclosure, the processor 1200 may control the communication interface 1400 to separate a portion of the text regarding the name of the operation performing device and transmit only the remaining portion of the text to the listening device. For example, when the listening device is an air conditioner and the text is "lower the set temperature to 20 ℃ in the air conditioner", when the text is transmitted to the air conditioner, it is not necessary to transmit "in the air conditioner". In this case, the processor 1200 may parse the text in units of words or phrases, may identify a word or phrase that specifies a name, a common name, an installation location, etc. of the listening device, and may provide the remaining portion of the text to the listening device except for the word or phrase identified in the entire text.
According to an embodiment, when the function determination model stored in the internal memory of the listening device is not identical to the function determination model corresponding to the operation performing device, the processor 1200 may transmit at least a portion of the text to the voice assistant server 2000 by using the communication interface 1400.
According to an embodiment, based on determining that the operation performing device is a separate device different from the listening device, the processor 1200 may determine whether the operation performing device stores the function determination model in the internal memory by using data regarding the function determination device determining module 1340. According to an embodiment, when the function determination model is stored in the internal memory of the operation performing device, the processor 1200 may transmit at least a portion of the text to the function determination model previously stored in the internal memory of the operation performing device by using the communication interface 1400. In an embodiment of the present disclosure, the processor 1200 may control the communication interface 1400 to separate a part of a name regarding the operation performing apparatus from text and transmit only the remaining part of the text to the operation performing apparatus.
According to an embodiment, when the function determination model is not stored in the internal memory of the operation execution device, the processor 1200 may transmit at least a portion of the text to the voice assistant server 2000 by using the communication interface 1400.
According to the embodiment, it may be determined whether the operation performing device and the listening device are the same, and at least a part of the text may be transmitted to any one of the operation performing device, the listening device, and the voice assistant server, which will be described with reference to fig. 14.
According to an embodiment, the communication interface 1400 may perform data communication with the voice assistant server 2000, the IoT server 3000, and the plurality of devices 4000. According to an embodiment, the communication interface 1400 may perform data communication with one or more of the voice assistant server 2000, the IoT server 3000, and the plurality of devices 4000 by using at least one of data communication methods including wireless LAN, Wi-Fi, bluetooth, Zigbee, WFD, IrDA, BLE, NFC, wireless broadband internet (Wibro), World Interoperability for Microwave Access (WiMAX), Shared Wireless Access Protocol (SWAP), wireless gigabit alliance (WiGig), and Radio Frequency (RF) communication.
According to an embodiment, the database 1360 may store identification information of each of the plurality of devices 4000, information on whether a function determination model exists, and information on a storage location of the function determination model of each of the plurality of devices 4000 (e.g., stored identification information of a device, a stored IP address of a device, or a stored MAC address of a device).
Although database 1360 may be stored in memory 1300 in fig. 2, database 1360 may also be stored in a memory separate from memory 1300.
In an embodiment (e.g., of fig. 1 and 2), the hub device 1000 may include a device determination model 1330, the device determination model 1330 configured to determine the operation execution device by using only the plurality of devices 4000 as device candidates, the plurality of devices 4000 being logged in by using the same user account information as that of the hub device 1000 and being registered according to the user account information, and the function determination device determination module 1340 may be configured to identify a device storing a function determination model corresponding to the operation execution device determined by the device determination model 1330 and may determine at least a part of text to be transmitted to the identified device. Since the hub device 1000 includes some models included in the voice assistant server 2000, determines an operation performing device by using the models, and controls the operation of the operation performing device, it is not necessary to operate the voice assistant server 2000 by using a network in all processes, and thus network usage fees can be reduced and server operation efficiency can be improved. Further, since the operation execution device is determined by using only a plurality of devices registered by using the user account of the hub device 1000 as device candidates, the amount of calculation and processing time can be reduced, and the response speed can be improved.
Fig. 3 is a block diagram illustrating elements of a voice assistant server 2000 according to an embodiment of the present disclosure.
According to an embodiment, the voice assistant server 2000 may be a server that receives text converted from a user's voice input from the hub device 1000, determines an operation performing device based on the received text, and obtains operation information by using a function determination model corresponding to the operation performing device.
Referring to fig. 3, the voice assistant server 2000 may include at least a communication interface 2100, a processor 2200, and a memory 2300, according to an embodiment.
According to an embodiment, the communication interface 2100 of the voice assistant server 2000 may receive device information including at least one of the following from the IoT server 3000 by performing data communication with the IoT server 3000 (see fig. 1): identification information (e.g., device id information) of each of the plurality of devices 4000 (see fig. 1), a device type of each of the plurality of devices 4000, a function execution capability, location information, and status information of each of the plurality of devices 4000. In an embodiment of the present disclosure, the voice assistant server 2000 may receive information on the function determination model of each of the plurality of devices 4000 from the IoT server 3000 through the communication interface 2100. According to an embodiment, the voice assistant server 2000 may receive user account information from the hub device 1000 through the communication interface 2100, and may transmit device information about the plurality of devices 4000 registered according to the received user account information and information about the function determination model to the hub device 1000.
According to an embodiment, the processor 2200 and the memory 2300 of the voice assistant server 2000 may perform the same or similar functions as the functions of the processor 1200 (see fig. 2) and the memory 1300 (see fig. 2) of the hub device 1000 (see fig. 2). Therefore, the same description of the processor 2200 and the memory 2300 of the voice assistant server 2000 as those made for the processor 1200 and the memory 1300 of the hub device 1000 is not provided here.
According to an embodiment, memory 2300 of the voice assistant server 2000 may store data regarding one or more or all of the following: the ASR module 2310, data regarding the NLG module 2320, data regarding the device determination model 2330, and data corresponding to each of a plurality of function determination models 2340. According to an embodiment, the memory 2300 of the voice assistant server 2000 may store a plurality of function determination models 2340 corresponding to a plurality of devices associated with a plurality of different user accounts instead of the plurality of function determination models 1350 stored in the memory 1300 of the hub device 1000 (see fig. 2). Further, according to an embodiment, a plurality of function determination models 2340 for more types of devices than the plurality of function determination models 1350 stored in the memory 1300 of the hub device 1000 may be stored in the memory 2300 of the voice assistant server 2000. The total capacity of the plurality of function determination models 2340 stored in the memory 2300 of the voice assistant server 2000 may be greater than the capacity of the plurality of function determination models 1350 stored in the memory 1300 of the hub device 1000.
According to an embodiment, when at least a portion of text is received from the hub device 1000, the communication interface 2100 of the voice assistant server 2000 may send the at least a portion of the received text to the processor 2200, and the processor 2200 may analyze the at least a portion of the text by using the first NLU model 2332 stored in the memory 2300. According to an embodiment, the processor 2200 may determine an operation performing device related to at least a portion of the text based on the analysis result by using data about the device determination model 2330 stored in the memory 2300. According to an embodiment, the processor 2200 may select a function determination model corresponding to the operation performing device from among a plurality of function determination models stored in the memory 2300, and may obtain operation information on detailed operations of performing functions of the operation performing device and relationships between the detailed operations by using the selected function determination model.
For example, based on determining that the operation performing device is the third device 4300 that is an air purifier, the processor 2200 may analyze at least a portion of the text by using the second NLU model 2346a of the function determination model 2346 corresponding to the air purifier and may obtain operation information by planning detailed operations to be performed by the device and an execution order of the detailed operations using the action plan management module 2346 b. The detailed description thereof is the same as that of the hub device 1000, and thus, a repetitive description will be omitted.
According to an embodiment, based on the hub device 1000 transmitting at least a part of the entire text of the user's voice, the voice assistant server 2000 may determine an operation performing device related to at least a part of the entire text and may obtain operation information for performing an operation of the operation performing device. According to an embodiment, the portion of the entire text may be less than the entire text of the user's speech. According to an embodiment, voice assistant server 2000 may transmit the obtained operation information to IoT server 3000 through communication interface 2100.
According to an embodiment, the voice assistant server 2000 may receive update request information and device identification information of the function determination model from an operation performing device among the plurality of devices 4000. According to an embodiment, the update request information of the function determination model may be information for requesting synchronization of the version of the function determination model stored in the memory of the operation execution device with the version of the function determination model corresponding to the operation execution device among the plurality of function determination models 2342, 2344, 2346, and 2348 stored in the voice assistant server 2000. According to an embodiment, the processor 2200 of the voice assistant server 2000 may recognize the function determination model corresponding to the operation performing device based on the device identification information received from the operation performing device, and may check version information of the recognized function determination model. According to an embodiment, the processor 2200 may transmit update data for updating the function determination model of the operation execution apparatus to the operation execution apparatus by using the communication interface 2100.
In an embodiment of the present disclosure, based on the operation execution device receiving the control command from the IoT server 3000, the processor 2200 may control the communication interface 2100 to transmit update data for updating the function determination model to the operation execution device.
In an embodiment of the present disclosure, the processor 2200 may control the communication interface 2100 to periodically transmit update data of the function determination model to a device storing the function determination model among the plurality of devices 4000 connected through the user account of the hub device 1000. In an embodiment of the present disclosure, when the processor 2200 transmits update data for updating an application or firmware to a device storing a function determination model among the plurality of devices 4000 connected through the user account of the hub device 1000, the processor 2200 may control the communication interface 2100 to also transmit the update data of the function determination model.
In an embodiment of the present disclosure, the new device may be registered in the user account information registered to the IoT server 300. In this case, processor 2200 may receive user account information, stored information updated to include a device list of new devices and a device determination model and a function determination model of each device candidate included in the device list, from IoT server 3000 by using communication interface 2100. According to an embodiment, the processor 2200 may identify a device in which the device determination model is stored, from among a plurality of devices registered according to the user account, based on the stored information of the device determination model, and may determine the identified device as the hub device. According to an embodiment, the processor 2200 may transmit the device identification information of the new device and the storage information of the device determination model and the function determination model to the hub device by using the communication interface 2100. An embodiment of registering a new device is described with reference to fig. 19.
Fig. 4 is a block diagram illustrating elements of an IoT server 3000 according to an embodiment of the present disclosure.
According to an embodiment, the IoT server 3000 may be a server that obtains, stores, and manages device information regarding each of the plurality of devices 4000 (see fig. 1). According to an embodiment, the IoT server 3000 may obtain, determine, or generate a control command for controlling a device by using the stored device information. Although IoT server 3000 is implemented as a separate hardware device from voice assistant server 2000 in fig. 1, the present disclosure is not so limited. In an embodiment of the present disclosure, the IoT server 3000 may be an element of the voice assistant server 2000 (see fig. 1) or may be a server designed to be classified as software.
According to an embodiment, referring to fig. 4, IoT server 3000 may include at least communication interface 3100, processor 3200, and memory 3300.
According to an embodiment, the IoT server 3000 may be connected to the operation execution device or the voice assistant server 2000 through the communication interface 3100 via a network, and may receive or transmit data. According to an embodiment, IoT server 3000 may transmit data stored in memory 3300 to voice assistant server 2000 or an operation execution device through communication interface 3100 under the control of processor 3200. Further, IoT server 3000 may receive data from voice assistant server 2000 or an operation execution device through communication interface 3100 under the control of processor 3200.
In an embodiment of the present disclosure, the communication interface 3100 may receive device information including at least one of device identification information (e.g., device id information), function execution capability information, location information, and status information from each of a plurality of devices 4000 (see fig. 1). In embodiments of the present disclosure, the communication interface 3100 may receive user account information from each of the plurality of devices 4000. Further, communication interface 3100 can receive information from multiple devices 4000 regarding power on/off or operations being performed. According to an embodiment, communication interface 3100 may provide the received device information to memory 3300.
According to an embodiment, the memory 3300 may store device information received through the communication interface 3100. In an embodiment of the present disclosure, the memory 3300 may classify the device information according to user account information received from the plurality of devices 4000, and may store the classified device information in the form of a lookup table.
In an embodiment of the present disclosure, the communication interface 3100 may receive, from the voice assistant server 2000, a query for requesting user account information and device information on a plurality of devices 4000 previously registered by using the user account information. According to an embodiment, in response to the received query, the processor 3200 may obtain device information on the plurality of devices 4000 previously registered by using the user account from the memory 3300 and may control the communication interface 3100 to transmit the obtained device information to the voice assistant server 2000.
According to an embodiment, the processor 3200 may control the communication interface 3100 to transmit a control command to an operation performing device determined to perform an operation based on operation information received from the voice assistant server 2000. According to an embodiment, the IoT server 3000 may receive an operation execution result according to the control command from the operation execution device through the communication interface 3100.
Fig. 5 is a block diagram illustrating some elements of a plurality of devices 4000 according to an embodiment of the disclosure.
According to an embodiment, the plurality of devices 4000 may be devices controlled by the hub device 1000 (see fig. 1) or the IoT server 3000 (see fig. 1). In an embodiment of the present disclosure, the plurality of devices 4000 may be executor devices that perform operations based on control commands received from the hub device 1000 or the IoT server 3000. According to an embodiment, the plurality of devices 4000 may be IoT devices.
Referring to fig. 5, the plurality of devices 4000 may include a first device 4100, a second device 4200, and a third device 4300. Although in the embodiment of fig. 5, the first device 4100 is an air conditioner, the second device 4200 is a TV, and the third device 4300 is an air purifier, this is merely an example, and the plurality of devices 4000 of the present disclosure are not limited to those shown in fig. 5.
According to an embodiment, each of the first device 4100, the second device 4200, and the third device 4300 may be a processor, a memory, and a communication interface as shown in fig. 5. According to an embodiment, a plurality of elements required for the apparatus 4000 to perform an operation based on the control may be further included. According to an embodiment, the required element may be, for example, a fan for an air conditioner.
According to an embodiment, one or more of the plurality of devices 4000 may store a function determination model. In the embodiment of fig. 5, the first device 4100 may include a communication interface 4110, a processor 4120, and a memory 4130, and the function determination model 4132 may be stored in the memory 4130. According to an embodiment, the function determination model 4132 stored in the first device 4100 may be a model for obtaining operation information on detailed operations performed by the first device 4100 and relationships between the detailed operations. According to an embodiment, the function determination model 4132 may include a first NLU model 4134 configured to analyze at least a portion of text received from the hub device 1000 or the IoT server 3000 and obtain operation information about an operation to be performed by the first device 4100 based on a result of the analysis of the at least a portion of text. According to an embodiment, the function determination model 4132 may include an action plan management module 4136 configured to manage operation information related to detailed operations of the device so as to generate detailed operations to be performed by the first device 4100 and an execution order of the detailed operations. The action plan managing module 4136 may plan a detailed operation to be performed by the first device 4100 and an execution order of the detailed operation based on the analysis result of at least a part of the text.
According to an embodiment, second device 4200 may include a communication interface 4210, a processor 4220, and a memory 4230. According to an embodiment, the third device 4300 may include a communication interface 4310, a processor 4320, and a memory 4330. According to an embodiment, unlike the first device 4100, the second device 4200 and the third device 4300 may not store the function determination model. According to an embodiment, the second device 4200 and the third device 4300 may not receive at least a portion of the text from the hub device 1000 (see fig. 1) or the IoT server 3000 (see fig. 1). According to an embodiment, the second device 4200 and the third device 4300 may receive a control command from the hub device 1000 or the IoT server 3000 and may perform an operation based on the received control command.
According to an embodiment, the plurality of devices 4000 may transmit information on the user account information, the device information, and the information on the function determination model to the IoT server 3000 by using the communication interfaces 4110, 4210, and 4310. In an embodiment of the present disclosure, when a user logs in to the plurality of devices 4000, the plurality of devices 4000 may transmit user account information, information on whether each of the plurality of devices 4000 itself stores a device determination model, and information on whether each of the plurality of devices 4000 itself stores a function determination model to the IoT server 3000. According to an embodiment, the device information may include at least one of identification information (e.g., device id information) of the plurality of devices 4000, a device type of each of the plurality of devices 4000, a function execution capability of each of the plurality of devices 4000, location information, and status information.
In an embodiment of the present disclosure, the plurality of devices 4000 may transmit user account information, information on whether each of the plurality of devices 4000 itself stores a function determination model, and information on whether each of the plurality of devices 4000 itself stores a function determination model to the hub device 1000 (see fig. 1) by using the communication interfaces 4110, 4210, and 4310.
According to an embodiment, one or more of the plurality of devices 4000 may be third party devices manufactured by a manufacturer different from the manufacturer of the hub device 1000. For example, the third device 4300 may be a third party device. When the third device 4300 is a third-party device, the third-party IoT server may be logged in using a third-party user account through the third device 4300, and the IoT server 3000 may access the third-party IoT server by using a temporary account and may obtain device information and information on a function determination model of the third device 4300. According to an embodiment, based on determining that the third device 4300 is an operation execution device, the IoT server 3000 may convert the control command into a control command that is readable and executable by the third device 4300 and may transmit the control command to a third party IoT server. According to an embodiment, the third party IoT server may transmit the converted control command to the third device 4300, and the third device 4300 may perform an operation based on the received control command.
Fig. 6 is a flowchart of a method performed by the hub device 1000 to control devices based on voice input, according to an embodiment of the present disclosure.
According to an embodiment, the hub device 1000 may receive voice data (e.g., voice input of a user) in operation S610. In an embodiment of the present disclosure, the hub device 1000 may receive a voice input (e.g., an utterance of a user) from the user through the microphone 1100 (see fig. 2), and may obtain a voice signal or voice data from the received voice input. In an embodiment of the present disclosure, the processor 1200 (see fig. 2) of the hub device 1000 may convert voice data (e.g., sound received through the microphone 1100) into an acoustic signal, and may obtain the voice signal by removing noise (e.g., non-voice components) from the acoustic signal.
According to an embodiment, the hub device 1000 may convert the received speech input into text by performing ASR in operation S620. In embodiments of the present disclosure, the hub device 1000 may perform ASR, which converts speech signals into computer-readable text by using a predefined model, such as an Acoustic Model (AM) or a Language Model (LM). When an acoustic signal from which noise is not removed is received, the hub device 1000 may obtain a speech signal by removing noise from the received acoustic signal, and may perform ASR on the speech signal.
According to an embodiment, the hub device 1000 may analyze the text by using the first NLU model, and may determine an operation performing device corresponding to the analyzed text by using the device determination model in operation S630. In an embodiment of the present disclosure, the hub device 1000 may analyze the text by using the first NLU model included in the device determination model, and may determine an operation performing device for performing an operation according to the user's intention from among the plurality of devices based on the analysis result of the text. According to an embodiment, the plurality of devices may refer to devices, such as IoT devices, that log in by using the same user account as the user account of the hub device 1000 and connect to the hub device 1000 through a network. The plurality of IoT devices may be devices that are registered with the IoT server by using the same user account as the user account of the hub device 1000.
According to an embodiment, the first NLU model may be a model trained to analyze text converted from a speech input and determine an operation performing device based on the analysis result. According to an embodiment, the first NLU model may be used to determine the intent by interpreting text and determine the operation performing device based on the intent. According to an embodiment, the hub device 1000 may parse text in units of morphemes, words, or phrases by using the first NLU model, and may infer meanings of words extracted from the parsed text by using linguistic features (e.g., grammatical components) of the morphemes, words, or phrases. According to an embodiment, the hub device 1000 can compare the meaning of the inferred word with the predefined intent provided by the first NLU model and can determine an intent corresponding to the inferred meaning of the word.
According to an embodiment, the hub device 1000 may determine a device related to the intention recognized from the text as the operation performing device based on a matching model for determining the relationship between the intention and the device. In the embodiment of the present disclosure, the matching model may be obtained by learning through a rule-based system, but the present disclosure is not limited thereto.
In an embodiment of the present disclosure, the hub device 1000 may obtain a plurality of numerical values indicating degrees of relationship between the intention and the plurality of devices by applying the matching model to the intention, and may determine a device having the largest value among the obtained plurality of numerical values as a final operation performing device. For example, when the intention is related to each of the first device and the second device, the hub device 1000 may obtain a first numerical value indicating a degree of relationship between the intention and the first device and a second numerical value indicating a degree of relationship between the intention and the second device, and may determine the first device having the larger numerical value of the first numerical value and the second numerical value as the operation performing device.
Although the hub device 1000 may train a matching model between the intent and the operation performing device by using, for example, a rule-based system, the present disclosure is not limited thereto. The AI model used by the hub device 1000 may include, for example, a neural network based system (e.g., CNN or RNN), SVM, linear regression, logistic regression, naive bayes, random forests, decision trees, or k-nearest neighbor algorithms. Alternatively, the AI model may be a combination of the above examples or any other AI model.
According to an embodiment, the device determination model 1330 (see fig. 2) stored in the memory 1300 (see fig. 2) of the hub device 1000 may determine an operation performing device from among a plurality of devices registered according to the user account of the hub device 1000. In an embodiment of the present disclosure, the device determination model may analyze a numerical value indicating a degree of relationship between the intention and a plurality of devices previously registered by logging in using the same user account as that of the hub device 1000, and may determine a device having a maximum value among the numerical values indicating the degree of relationship between the intention and the plurality of devices as the operation performing device.
In an embodiment of the present disclosure, the hub device 1000 may receive device information about each of a plurality of devices previously registered according to a user account from the voice assistant server 2000 (see fig. 1). The device information may include, for example, at least one of identification information (e.g., device id information) of each of the plurality of devices, a device type of each of the plurality of devices, a function execution capability of each of the plurality of devices, location information, and status information. The hub device 1000 may determine a device for performing an operation according to an intention from among a plurality of devices based on the received device information.
According to an embodiment, the hub device 1000 may identify a device storing a function determination model corresponding to an operation performing device determined from among a plurality of devices in operation S640.
According to embodiments, the function determination model corresponding to the operation execution device may be stored in the memory of the hub device 1000, may be stored in the memory of the operation execution device itself, or may be stored in the memory of the voice assistant server 2000 (see fig. 1). According to the embodiment, the term "function determination model corresponding to the operation execution apparatus" may refer to a model for obtaining detailed operations related to the function execution operations determined according to the operation execution apparatus and operation information of a relationship between the detailed operations.
In an embodiment of the present disclosure, the hub device 1000 may identify a device storing the function determination model from the memory of the hub device 1000, the memory of the voice assistant server 2000, or the memory of the operation execution device itself by using information on whether the function determination model of each of the plurality of devices is stored in the database 1360 (see fig. 2) and information on a storage location of the function determination model corresponding to each of the plurality of devices. According to an embodiment, the hub device 1000 may obtain information on the function determination model from at least one device, which stores the function determination model for determining the function related to each of the plurality of devices, among the plurality of devices, and may store the obtained information on the function determination model in the database 1360. According to an embodiment, the hub device 1000 may also receive information (e.g., device identification information, IP address, or MAC address) about a storage location of the function determination model of each of the plurality of devices, based on receiving the information about the function determination models of the plurality of devices. According to an embodiment, information on whether to store the function determination model of each of the plurality of devices previously registered according to the user account and information on a storage location of the function determination model of each of the plurality of devices may be stored in the database 1360 in the form of a lookup table.
In the embodiment of the present disclosure, the hub device 1000 may obtain the device identification information of the operation performing device determined in operation S630, may search the lookup table in the database 1360 according to the device identification information, and may obtain information on the storage location of the function determination model corresponding to the operation performing device based on the search result of the lookup table. By using the above-described method, the hub device 1000 can identify a device storing a function determination model corresponding to the operation execution device from the memory of the hub device 1000, the memory of the voice assistant server 2000, and the memory of the operation execution device itself.
According to an embodiment, the hub device 1000 may provide at least a portion of the text to the identified device in operation S650. For example, when it is checked in operation S640 that a function determination model corresponding to the operation performing device is stored in the hub device 1000, the hub device 1000 may provide at least a portion of the text to the function determination model corresponding to the operation performing device.
For example, when it is checked that the function determination model corresponding to the operation performing device is stored in the voice assistant server 2000 in operation S640, the hub device 1000 may transmit at least a portion of the text to the voice assistant server 2000.
For example, when it is checked in operation S640 that the function determination model corresponding to the operation performing device is stored in the memory of the operation performing device itself, the hub device 1000 may transmit at least a part of the text to the operation performing device.
In embodiments of the present disclosure, the hub device 1000 may provide only a portion of the text, and not the entire text to the identified device. For example, when the text converted from the voice input is "movie revenge on TV", the name of the operation performing apparatus is specified "on TV", and thus may be unnecessary information for the television function determination model 1354. According to an embodiment, the hub device 1000 may parse text in units of words or phrases by using the first NLU model 1332 (see fig. 2), may recognize a word or phrase specifying a name, a common name, or an installation location of a device, and may provide the remaining portion of the text excluding the word or phrase recognized in the entire text to the recognized device.
Fig. 7 is a flowchart illustrating a method performed by the hub device 1000 to provide at least a portion of text to one of the hub device 1000, the voice assistant server 2000, and the operation performing device 4000a according to a voice input of a user according to an embodiment of the present disclosure. Fig. 7 illustrates a detailed embodiment of operations S640 and S650 of fig. 6. Operations S710 to S730 of fig. 7 are detailed embodiments of operation S640, and operations S740 to S760 are detailed embodiments of operation S650 of fig. 6.
According to an embodiment, operation S710 may be performed after operation S630 of fig. 6. In operation S710, according to an embodiment, the hub device 1000 may obtain information on the function determination model from at least one device storing the function determination model among the plurality of devices. In an embodiment of the present disclosure, the hub device 1000 may receive information on the function determination model from at least one device storing the function determination model for determining a function related to each of the plurality of devices by using the communication interface 1400 (see fig. 2). According to an embodiment, the term "information on the function determination model" may refer to information on whether each of the plurality of devices itself stores the function determination model for obtaining operation information on detailed operations of performing operations according to functions and relationships between the detailed operations. In an embodiment of the present disclosure, when information on the function determination model is received from the plurality of devices, the hub device 1000 may also obtain information (e.g., device identification information, IP address, or MAC address) on the storage location of the function determination model of each of the plurality of devices.
According to an embodiment, the hub device 1000 may store the received information on the function determination model and the information on the storage location of the function determination model of each device in the database 1360 (see fig. 2). In an embodiment of the present disclosure, the hub device 1000 may store information on whether to store the function determination model and information on a storage location of the function determination model in the database 1360 in the form of a lookup table according to the name or identification information of the device.
According to an embodiment, the hub device 1000 may check whether a function determination model corresponding to the determined operation performing device is stored in the memory of the hub device 1000 in operation S720. According to the embodiment, the term "function determination model corresponding to the operation execution device" may be a model for obtaining detailed operations regarding the function execution operations according to the determination of the operation execution device and operation information of the relationship between the detailed operations.
According to embodiments, the function determination model corresponding to the operation execution device may be stored in the memory 1300 (see fig. 2) of the hub device 1000, may be stored in the memory of the operation execution device itself, or may be stored in the voice assistant server 2000.
In an embodiment of the present disclosure, the hub device 1000 may obtain information on the function determination model of each device by searching a lookup table stored in the database 1360. In an embodiment of the present disclosure, the hub device 1000 may check whether the function determination model corresponding to the operation performing device is stored in the memory 1300 of the hub device 1000 based on the obtained information on the function determination model. In another embodiment of the present disclosure, the hub device 1000 may determine whether a function determination model corresponding to the operation performing device is stored in the memory 1300 by accessing the memory 1300 and scanning data stored in the memory 1300.
A plurality of function determination models for a plurality of devices may be stored in the memory 1300 of the hub device 1000. In an embodiment of the present disclosure, a plurality of function determination models corresponding to a plurality of devices that are logged in by using the same account as the user account and connected to the hub device 1000 through a network may be stored to the memory 1300 of the hub device 1000.
According to an embodiment, in operation S730, based on the determination in operation S720 that the function determination model corresponding to the operation performing device does not exist in the memory 1300 of the hub device 1000, the hub device 1000 may check whether the function determination model corresponding to the operation performing device is stored in the memory of the operation performing device based on the information on the function determination model. In an embodiment of the present disclosure, the hub device 1000 may check whether the function determination model corresponding to the operation performing device is stored in the memory of the operation performing device itself by searching the lookup table stored in the database 1360. For example, when the operation performing device determined based on text is the first device, the hub device 1000 may check information on whether the first device itself stores the function determination model by searching the lookup table.
In an embodiment of the present disclosure, the function determination model may not be stored in the memory of the first device determined as the operation execution device. According to an embodiment, in operation S740, when the hub device 1000 determines that the function determination model is not stored in the operation performing device based on the information on the function determination model in operation S730, the hub device 1000 transmits at least a portion of the text to the voice assistant server. In an embodiment of the present disclosure, the hub device 1000 may transmit at least a portion of the text to a function determination model corresponding to an operation execution device stored in the voice assistant server. In this case, the hub device 1000 may also transmit device identification information of the operation performing device together with at least a part of the text.
According to an embodiment, in operation S750, when the hub device 1000 checks in operation S720 that the function determination model corresponding to the operation performing device is stored in the memory of the hub device 1000, the hub device 1000 may provide at least a portion of the text to the previously stored function determination model. In an embodiment of the present disclosure, the hub device 1000 may select a function determination model corresponding to the operation performing device from among a plurality of function determination models previously stored in the memory 1300 (see fig. 2), and may certify at least a part of the text to the selected function determination model. For example, when the operation performing device is a second device, the processor 1200 of the hub device 1000 may select a second function determination model corresponding to the second device from among the first function determination model and the second function determination model previously stored in the memory 1300, and may provide at least a part of the text to the second function determination model.
According to an embodiment, in operation S760, when the hub device 1000 checks in operation S730 that the function determination model corresponding to the operation performing device is stored in the memory of the operation performing device itself, the hub device 1000 may transmit at least a portion of the text to the function determination model stored in the operation performing device.
According to an embodiment, the hub device 1000 may transmit only at least a portion of the text, not the entire text, in operations S740, S750, and S760. In an embodiment of the present disclosure, the processor 1200 of the hub device 1000 may express a text in units of words or phrases by using the first NLU model, may recognize a word or phrase specifying a name, a common name, or an installation location of the device, and may transmit the remaining part of the text except the word or phrase recognized in the entire text to the function determination model of the operation performing device.
Fig. 8 is a flowchart illustrating a method of operating the hub device 1000, the voice assistant server 2000, the IoT server 3000, and the execution device 4000a according to an embodiment of the present disclosure.
FIG. 8 is a flowchart illustrating the steps depicted in FIG. 7
Figure BDA0003323540350000471
Followed by a flow chart of the operation of the hub device 1000, the voice assistant server 2000, the IoT server 3000, and the entities in the multi-device system environment operating the executive device 4000 a. The steps depicted in FIG. 7
Figure BDA0003323540350000472
A state in which the function determination model corresponding to the operation execution apparatus 4000a is not stored in both the memory of the hub apparatus 1000 and the memory of the operation execution apparatus 4000a itself is checked is indicated.
Referring to fig. 8, the hub device 1000 may include an ASR module 1310, an NLG module 1320, a device determination model 1330, and a function determination device determination module 1340. In the embodiment of fig. 8, the hub device 1000 may not store the function determination model, or may not store the function determination model corresponding to the operation device.
According to an embodiment, the voice assistant server 2000 may store a plurality of function determination models 2342, 2344, and 2346. For example, the function determination model 2342, which is the first function determination model and stored in the voice assistant server 2000, may be a model for determining the function of an air conditioner and obtaining operation information about detailed operations related to the determined function and relationships between the detailed operations. For example, the function determination model 2344, which is a second function determination model, may be a model for determining a function of the TV and obtaining operation information on detailed operations related to the determined function and a relationship between the detailed operations, and the function determination model 2346 may be a model for determining a function of the washing machine and obtaining operation information on detailed operations related to the determined function and a relationship between the detailed operations.
In the embodiment of fig. 8, it may be determined that the operation performing apparatus 4000a is "TV", and the function determination model 2344 corresponding to the TV may be stored in the voice assistant server 2000.
In operation S810, the hub device 1000 transmits at least a portion of the text and identification information of the operation performing device 4000a to the voice assistant server 2000. The hub device 1000 can transmit at least a portion of the text converted from the speech input to the voice assistant server 2000 without transmitting the entire text. In the embodiment of the present disclosure, based on the determination that the operation performing apparatus 4000a is "TV", in the text corresponding to "movie vender is played on TV", the name or common name of the operation performing apparatus 4000a is specified "on TV", and thus may be unnecessary information. Further, because the hub device 1000 transmits identification information (e.g., device id) of the operation performing device 4000a to the voice assistant server 2000, "on TV" is unnecessary information. The processor 1200 (see fig. 2) of the hub device 1000 may recognize a word or phrase specifying a name, a common name, or an installation location of the device by parsing the text in units of the word or phrase using the first NLU model 1332, and may transmit the remaining part of the text except the word or phrase recognized in the entire text to the voice assistant server 2000.
In operation S810, the hub device 1000 may transmit user account information of the hub device 1000 and the operation performing device 4000a to the voice assistant server 2000 while receiving at least a portion of the text and the identification information of the operation performing device 4000 a.
According to an embodiment, the voice assistant server 2000 selects a function determination model corresponding to the operation execution apparatus 4000a in operation S820. In the embodiment of the present disclosure, the voice assistant server 2000 may recognize the operation execution device 4000a based on the identification information of the operation execution device 4000a received from the hub device 1000, and may select a function determination model corresponding to the operation execution device 4000a from among the plurality of function determination models 2342, 2344, and 2346. The term "function determination model corresponding to the operation execution apparatus" refers to a model for obtaining operation information on detailed operations for executing the determined function of the operation execution apparatus 4000a and the relationship between the detailed operations. In the embodiment of fig. 8, when the operation performing apparatus 4000a is a TV, the voice assistant server 2000 may select the function determination model 2344 for obtaining operation information on detailed operations of performing operations according to functions of the TV, for example, functions of playing a movie, and relationships between the detailed operations, from among the plurality of function determination models 2342, 2344, and 2346 stored in the memory.
In operation S830, the voice assistant server 2000 interprets the text by using the NLU model of the selected function determination model and determines the intention based on the analysis result. The speech assistant server 2000 can analyze at least a portion of the text received from the hub device 1000 by using the NLU model 2344a of the function determination model 2344. NLU model 2344a, which is an AI model trained to interpret text relating to a particular device, may be a model trained to determine intents and parameters relating to the operation intended by the user. The NLU model 2344a may be a model trained to determine functionality related to the type of a particular device as text is input.
In an embodiment of the present disclosure, the voice assistant server 2000 may parse at least a portion of the text in units of words or phrases by using the NLU model 2344a, may infer meanings of words extracted from the parsed text by using linguistic features (e.g., syntax elements) of the parsed morphemes, words, or phrases, and may obtain intentions and parameters from the text by matching the inferred meanings with predefined intentions and parameters. The intention as information indicating the utterance intention of the user included in the text may be used to determine an operation to be performed by the operation performing apparatus 4000 a. The parameter refers to variable information for determining a detailed operation of the operation performing apparatus 4000a related to the intention. The parameter may be information corresponding to an intention, and a plurality of types of parameters may correspond to one intention. For example, when the text is "movie revenge is played on TV", the intention may be "video content playback", and the parameter may be "movie" or "movie revenge".
In embodiments of the present disclosure, the voice assistant server 2000 may determine the intent from only at least a portion of the text.
In operation S840, the voice assistant server 2000 obtains operation information about an operation to be performed by the operation performing apparatus 4000a based on the intention. In an embodiment of the present disclosure, the voice assistant server 2000 plans the operation information to be executed by the operation execution apparatus 4000a based on the intention and the parameter by using the action plan management module 2344b of the function determination model 2344. The action plan management module 2344b may interpret the operations to be performed by the operation performing apparatus 4000a based on the intents and the parameters. The action plan management module 2344b may select detailed operations related to the explained operations from among the operations of the previously stored device, and may plan an execution order of the selected detailed operations. The action plan management module 2344b may obtain operation information about detailed operations to be performed by the operation performing apparatus 4000a by using the plan result. The term "operation information" may be information related to detailed operations to be performed by the device, a relationship between the detailed operations, and information related to an execution order of the detailed operations. The operation information may include, but is not limited to, functions executed by the operation execution apparatus 4000a to perform detailed operations, execution order of the functions, input values required to execute the functions, and output values output as a result of the execution of the functions.
In operation S850, the voice assistant server 2000 transmits the obtained operation information and the identification information of the operation performing device 4000a to the IoT server 3000.
In operation S860, the IoT server 3000 obtains a control command based on the identification information of the operation execution device 4000a and the received operation information. IoT server 3000 may include a database in which control commands and operational information for multiple devices are stored. In the embodiment of the present disclosure, the IoT server 3000 may select a control command for controlling a detailed operation of the operation execution device 4000a from among control commands related to a plurality of devices previously stored in the database based on the identification information of the operation execution device 4000 a.
According to an embodiment, the control command, which is information readable and executable by the device, may include instructions for sequentially performing detailed operations in an execution order according to the operation information when the device performs the function.
According to an embodiment, in operation S870, the IoT server 3000 may transmit a control command to the operation execution device 4000a by using the identification information of the operation execution device 4000 a.
In operation S880, the operation execution apparatus 4000a may execute an operation corresponding to the control command according to the received control command. For example, the operation execution apparatus 4000a may play a movie based on the control command.
In an embodiment of the present disclosure, after performing an operation, the operation performing apparatus 4000a may transmit information about the operation performing result to the IoT server 3000.
Fig. 9 is a flowchart illustrating a method of operating the hub device 1000 and the execution device 4000a according to an embodiment of the present disclosure.
FIG. 9 is a step illustrated in FIG. 7
Figure BDA0003323540350000501
Followed by a flow chart of the operation of the entities in the multi-device system including the hub device 1000 and the operation execution device 4000 a. The steps depicted in FIG. 7
Figure BDA0003323540350000502
A state in which the function determination model corresponding to the operation execution device 4000a is not stored in the memory of the hub device 1000 but stored in the memory of the operation execution device 4000a itself is checked is indicated. Although the voice assistant server 2000 and the IoT server 3000 are not shown in fig. 9, this does not mean that the multi-device system does not include the voice assistant server 2000 and the IoT server 3000.
Referring to fig. 9, the hub device 1000 may include an ASR module 1310, an NLG module 1320, a device determination model 1330, and a function determination device determination module 1340. In the embodiment of fig. 8, the hub device 1000 may not store the function determination model, or may not store the function determination model corresponding to the operation device.
The operation execution apparatus 4000a may store the function determination model 4032 of the operation execution apparatus 4000a itself. The operation performing apparatus 4000a itself may analyze text, and a function determination model 4032 for performing an operation according to the intention of the user based on the analysis result of the text may be stored in the memory of the operation performing apparatus 4000 a. For example, when the operation performing apparatus 4000a is "air conditioner", the function determination model 4032 stored in the memory of the operation performing apparatus 4000a may be a model for determining a function of the air conditioner and obtaining operation information on detailed operations related to the determined function and a relationship between the detailed operations.
In operation S910, the hub device 1000 transmits at least a part of the text and a text transmission notification signal to the function determination model 4032 of the operation execution device 4000 a. The hub device 1000 may transmit at least a part of the text converted from the voice input to the operation performing device 4000a without transmitting the entire text. For example, in the text saying "2℃ is lowered in temperature in air conditioner", the name or common name of the operation performing apparatus 4000a is specified "in air conditioner", and thus may be unnecessary information. Further, since the operation performing apparatus 4000a itself is an air conditioner, "in air conditioning" is unnecessary information. The processor 1200 (see fig. 2) of the hub device 1000 may recognize a word or phrase specifying a name, a common name, or an installation location of the device by parsing the text in units of the word or phrase using the first NLU model 1332, and may transmit the remaining part of the text except the word or phrase recognized in the entire text to the operation performing device 4000 a.
In an embodiment of the present disclosure, the hub device 1000 may transmit a text transmission notification signal to the operation execution device 4000 a. The text transmission notification signal is a signal notifying that text is transmitted to the operation execution apparatus 4000 a. When the operation execution device 4000a receives the text transmission notification signal, the operations of ASR operation, determining the operation of the operation execution device, and selecting the function determination model may be omitted, and the operation execution device 4000a may directly supply at least a part of the text to the function determination model 4032 and may determine an intention from the text by using the function determination model 4032.
However, the present disclosure is not limited thereto, and the hub device 1000 may not transmit the text transmission notification signal to the operation performing device 4000a in operation S910. In this case, when the operation execution apparatus 4000a receives at least a part of the text from the hub apparatus 1000, the operation execution apparatus 4000a may be preset to supply at least a part of the received text to the function determination model 4032. In an embodiment of the present disclosure, data regarding policies that are set to recognize received text and provide the text to the function determination model 4032 may be stored in a memory of the operation execution apparatus 4000 a.
According to an embodiment, in operation S920, the operation performing apparatus 4000a may determine the intention by interpreting the text using the NLU model 4034. In an embodiment of the present disclosure, the operation performing apparatus 4000a may analyze at least a part of the text received from the hub apparatus 1000 by using the NLU model 4034 of the function determination model 4032. The NLU model 4034, which is an AI model trained to interpret text related to the operation execution apparatus 4000a, may be a model trained to determine intentions and parameters related to an operation intended by the user. NLU model 4034 may be a model trained to determine functionality related to the type of particular device when text is entered.
In an embodiment of the present disclosure, the operation performing apparatus 4000a may parse at least a portion of text in units of words or phrases by using the NLU model 4034, may infer meanings of words extracted from the parsed text by using linguistic features (e.g., syntax elements) of the parsed morphemes, words, or phrases, and may obtain intentions and parameters from the text by matching the inferred meanings with predefined intentions and parameters. The intention as information indicating the utterance intention of the user included in the text may be used to determine an operation to be performed by the operation performing apparatus 4000 a. The parameter refers to variable information for determining a detailed operation of the operation performing apparatus 4000a related to the intention. The parameter may be information corresponding to an intention, and a plurality of types of parameters may correspond to one intention. For example, when the text is "lower temperature by 2℃ in air conditioner", the intention may be "set temperature control" or "set temperature down", and the parameter may be "2℃".
In an embodiment of the present disclosure, the operation performing apparatus 4000a may determine the intention only from at least a part of the text.
According to an embodiment, in operation S930, the operation execution apparatus 4000a obtains operation information on an operation to be executed by the operation execution apparatus 4000a based on the intention. In the embodiment of the present disclosure, the operation execution apparatus 4000a plans the operation information to be executed by the operation execution apparatus 4000a based on the intention and the parameter by using the action plan management module 4036 of the function determination model 4032. The action plan management module 4036 may interpret the operation to be performed by the operation performing apparatus 4000a based on the intention and the parameter. The action plan management module 4036 may select a detailed operation related to the explained operation from among the operations of the previously stored devices, and may plan an execution order of the selected detailed operation. The action plan management module 4036 can obtain operation information on a detailed operation to be performed by the operation performing apparatus 4000a by using the plan result. The term "operation information" may be information related to detailed operations to be performed by the device, a relationship between the detailed operations, and an execution order of the detailed operations. The operation information may include, but is not limited to, functions executed by the operation execution apparatus 4000a to perform detailed operations, execution order of the functions, input values required to execute the functions, and output values output as a result of the execution of the functions.
According to an embodiment, the operation execution apparatus 4000a obtains a control command based on the operation information in operation S940. In the embodiment of the present disclosure, a plurality of control commands respectively corresponding to a plurality of different pieces of operation information may be stored in the memory of the operation execution apparatus 4000 a. In the embodiment of the present disclosure, the operation performing apparatus 4000a may select a control command that controls a detailed operation according to the operation information from among a plurality of control commands previously stored in the memory. The control command, which is information readable and executable by the operation execution apparatus 4000a, may include instructions for sequentially performing detailed operations according to the operation information in an execution order when the operation execution apparatus 4000a performs a function.
According to an embodiment, the operation execution apparatus 4000a may execute an operation corresponding to the control command according to the control command in operation S950. For example, the operation performing apparatus 4000a may lower the set temperature by 2 ℃ based on the control command.
Fig. 10 is a flowchart illustrating a method of operating the hub device 1000 and the execution device 4000a according to an embodiment of the present disclosure.
FIG. 10 is a step illustrated in FIG. 7
Figure BDA0003323540350000531
Followed by a flow chart of the operation of the hub device 1000 and the entities in the multi-device system environment operating the execution device 4000 a. The steps shown in FIG. 7
Figure BDA0003323540350000532
Indicating a state in which the function determination model corresponding to the operation execution apparatus 4000a is checked to be stored in the memory of the hub apparatus 1000. Although the voice assistant server 2000 and the IoT server 3000 are not shown in fig. 10, this does not mean that the multi-device system does not include the voice assistant server 2000 and the IoT server 3000.
Referring to fig. 10, according to an embodiment, the hub device 1000 may include an ASR module 1310, an NLG module 1320, a device determination model 1330, a function determination device determination module 1340, a speaker function determination model 1352, and a TV function determination model 1354. In the embodiment of fig. 10, the hub device 1000 may store a function determination model corresponding to the operation execution device 4000 a. For example, when the operation performing apparatus 4000a is a TV, the hub apparatus 1000 may store a TV function determination model 1354 corresponding to the TV in the memory. The TV function determination model 1354 may be a model for determining a function of the TV and obtaining operation information on detailed operations related to the determined function and a relationship between the detailed operations.
According to an embodiment, the hub device 1000 may select a function determination model corresponding to the operation performing device 4000a from among the speaker function determination models 1352 and TV function determination models 1354 previously stored in the hub device 1000 in operation S1010. In an embodiment of the present disclosure, based on the determination operation performing apparatus 4000a being the TV, the hub apparatus 1000 may select a TV function determination model 1354 corresponding to the TV from among the speaker function determination models 1352 and the TV function determination models 1354.
In operation S1020, the hub device 1000 provides at least a portion of the text to the selected function determination model. In an embodiment of the present disclosure, the processor 1200 (see fig. 2) of the hub device 1000 may provide at least a portion of the text to the TV function determination model 1354 stored in the memory. In this case, the processor 1200 may provide at least a portion of the text to the TV function determination model 1354 without providing the entire text. For example, based on the determination that the operation performing apparatus 4000a is "TV", in the text saying "movie vender is played on TV", the name or common name of the operation performing apparatus 4000a is specified "on TV", and thus may be unnecessary information. Further, since the TV function determination model 1354 is a function determination model corresponding to a TV, "on TV" is unnecessary information for the TV function determination model 1354. The processor 1200 may recognize a word or phrase specifying a name, a common name, or an installation location of the device by parsing the text in units of the word or phrase using the first NLU model 1332, and may provide the remaining part of the text excluding the word or phrase recognized in the entire text to the TV function determination model 1354.
In operation S1030, the hub device 1000 determines an intention by interpreting at least a portion of the text using the second NLU model of the selected function determination model. In an embodiment of the present disclosure, the processor 1200 (see fig. 2) of the hub device 1000 may analyze at least a portion of the text by using the second NLU model 1354a of the TV function determination model 1354. The second NLU model 1354a, which is a model specific to a specific device, may be an AI model trained to obtain an intention related to a device corresponding to the operation execution device 4000a determined by the first NLU model 1332 and corresponding to at least a part of text. Further, the second NLU model 1354a may be a model trained to determine the operation of the device related to the user's intent by interpreting text.
According to an embodiment, in an embodiment of the present disclosure, the processor 1200 may parse text in morpheme, word, or phrase units by using the second NLU model 1354a, may recognize meanings of the morpheme, word, or phrase parsed by syntactic and semantic analysis, and may determine intentions and parameters by matching the recognized meanings with predefined words. The term "parameter" as used herein refers to variable information used to determine detailed operation of a target device in relation to an intent. The description of the same intents and parameters as those with reference to fig. 7 is not provided here.
In embodiments of the present disclosure, processor 1200 may determine intent from only at least a portion of the text by using second NLU model 1354 a.
In operation S1040, the hub device 1000 obtains operation information about an operation to be performed by the operation performing device 4000a based on the intention. In an embodiment of the present disclosure, the processor 1200 (see fig. 2) of the hub device 1000 may obtain operation information regarding at least one detailed operation related to the intention and the parameter by using the action plan management module 1354b of the TV function determination model 1354. The action plan management module 1354b may manage information on detailed operations of the operation performing apparatus 4000a and relationships between the detailed operations. The processor 1200 of the hub device 1000 may plan a detailed operation to be performed by the operation performing device 4000a and an execution order of the detailed operation based on the intention and the parameter by using the action plan management module 1354b, and may obtain operation information.
The operation information may be information related to detailed operations to be performed by the operation performing apparatus 4000a and an execution order of the detailed operations. The operation information may include information related to detailed operations to be performed by the operation performing apparatus 4000a, a relationship between each detailed operation and another detailed operation, and an execution order of the detailed operations. The operation information may include, but is not limited to, a function executed by the operation execution apparatus 4000a to perform a specific operation, an execution order of the function, an input value required to execute the function, and an output value output as a result of the execution of the function.
According to an embodiment, the hub device 1000 generates a control command based on the operation information in operation S1050. The control command refers to an instruction readable and executable by the operation execution apparatus 4000a, so that the operation execution apparatus 4000a performs a detailed operation included in the operation information.
Fig. 10 is different from fig. 9 in that the operations of determining an intention (S1030), obtaining operation information (S1040), and generating a control command (S1050) are performed by the hub device 1000. Referring to fig. 9 and 10, the entity performing the operation of determining the intention, obtaining the operation information, and generating the control command varies depending on whether the function determination model corresponding to the operation execution device 4000a is stored in the hub device 1000 or the operation execution device 4000 a.
According to an embodiment, in operation S1060, the hub device 1000 transmits a control command to the operation execution device 4000a by using the identification information of the operation execution device 4000 a. In the embodiment of the present disclosure, the hub device 1000 may identify the operation performing device 4000a by using identification information of the operation performing device 4000a among a plurality of devices connected through a network and pre-registered by using the same user account, and may transmit a control command to the identified operation performing device 4000 a.
According to an embodiment, in operation S1070, the operation execution apparatus 4000a executes an operation according to the received control command.
Fig. 11 is a flowchart illustrating a method of operating the hub device 1000, the voice assistant server 2000, the IoT server 3000, the third party IoT server 5000, and the third party device 5100 according to an embodiment of the present disclosure.
FIG. 11 is a step illustrated in FIG. 7
Figure BDA0003323540350000551
Followed by a flow chart of the operation of the entities in the multi-device system environment including the hub device 1000, the voice assistant server 2000, the IoT server 3000, the third party IoT server 5000, and the third party device 5100. The steps shown in FIG. 7
Figure BDA0003323540350000561
A state in which it is checked that the function determination model corresponding to the operation performing apparatus 4000a is not stored in both the memory of the hub apparatus 1000 and the memory of the operation performing apparatus 4000a itself is represented.
Referring to fig. 11, the hub device 1000 may include an ASR module 1310, an NLG module 1320, a device determination model 1330, and a function determination device determination module 1340. In the embodiment of fig. 11, the hub device 1000 may not store the function determination model, or may not store the function determination model corresponding to the operation device.
The speech assistant server 2000 may store a plurality of function determination models 2342, 2344, and 2348. The voice assistant server 2000 can store a function determination model 2348 (referred to as a third party function determination model).
Third party device 5100 refers to devices manufactured by manufacturers other than the manufacturer of the hub device 1000, affiliated companies, or companies with technology bundles. In an embodiment of the disclosure, when the user of the third party device 5100 is the same as the user of the hub device 1000, the third party function determination model 2348 may be stored in the voice assistant server 2000 when the user registers the third party device 5100 by using the user account for logging into the hub device 1000. For example, when a user registers the third party device 5100 by logging in using a user account of the hub device 1000, the third party IoT server 5000 may allow access rights of the IoT server 3000, and the IoT server 3000 may access the third party IoT server 5000 and may request the third party function determination model 2348 for obtaining operation information on detailed operations according to a plurality of functions of the third party device 5100 and relationships between the detailed operations. An open authorization (oAuth) method may be used to register third party device 5100 with IoT server 3000. The third party IoT server 500 can provide the data of the third party function determination model 2348 to the IoT server 3000 according to the request, and the IoT server 3000 can provide the third party function determination model 2348 to the voice assistant server 2000.
The third party IoT server 5000 is a server that stores and manages at least one of device identification information of the third party device 5100, a device type of the third party device 5100, function execution capability information, location information, and status information. The third party IoT server 5000 can obtain, determine, or generate a control command for controlling the third party device 5100 by using the device information of the third party device 5100. The third party IoT server 5000 can send a second control command to the third party device 5100 that determines to perform an operation based on the operation information. The third party IoT server 5000 can be operated by the manufacturer of the third party device 5100, but is not limited to such.
In the embodiment of fig. 11, it may be determined that the operation performing device is the third party device 5100.
According to an embodiment, the hub device 1000 transmits at least a portion of the text and the identification information of the third party device 5100 to the voice assistant server 2000 in operation S1110. Operation S1110 is the same as operation S810 of fig. 8 except that identification information of the third party 5100, for example, a device id of the third party device 5100, is transmitted to the voice assistant server 2000, and thus a repeated description thereof will not be provided.
According to an embodiment, the voice assistant server 2000 selects a function determination model corresponding to the third party device 5100 by using the received identification information of the third party device 5100 in operation S1120. In an embodiment of the present disclosure, the voice assistant server 2000 may identify the third party device 5100 based on the identification information of the third party device 5100 received from the hub device 1000, and may select a third party function determination model 2348 corresponding to the third party device 5100 from among the plurality of function determination models 2342, 2344, and 2348.
According to an embodiment, the voice assistant server 2000 analyzes the text by the NLU model 2348a of the determination model 2348 using the third party function, and determines the intention based on the analysis result in operation S1130. Operation S1130 is the same as operation S830 of fig. 8 except that the NLU model 2348a of the model 2348 is determined using a third party function, and thus a repetitive description thereof will not be provided here.
According to an embodiment, in operation S1140, the voice assistant server 2000 obtains operation information regarding an operation to be performed by the third party device 5100 based on the intention. Operation S1140 is the same as operation S840 of fig. 8 except for obtaining information about an operation to be performed by the third party device 5100, and thus a repetitive description thereof will not be provided here.
According to an embodiment, the voice assistant server 2000 transmits identification information of the third party device 5100 (e.g., device id of the third party device 5100) and operation information to the IoT server 3000 in operation S1142.
According to an embodiment, in operation S1150, the IoT server 3000 obtains the first control command based on the identification information of the third party device 5100 and the received operation information. The IoT server 3000 may include a database storing operation information about the third party device 5100 and control commands according to the operation information. The IoT server 3000 may obtain the operation information of the third party device 5100 from the third party IoT server 5000 through the registration method of the third party device 5100 and may store the obtained operation information in the database. In an embodiment of the present disclosure, the IoT server 3000 may identify the third party device 5100 based on the identification information of the third party device 5100, and may generate the first control command by using the operation information of the third party device 5100.
The first control command may include instructions for sequentially performing detailed operations according to the operation information in an execution order when the third party device 5100 performs a function. In an embodiment of the present disclosure, the first control command may not be read by the third party device 5100.
According to an embodiment, in operation S1152, the IoT server 3000 transmits the identification information of the third party device 5100 and the first control command to the third party IoT server 5000.
According to an embodiment, the third party IoT server 5000 converts the first control command into a second control command readable and executable by the third party device 5100 in operation S1160. The third party IoT server 5000 can convert the first control command into a second control command readable and executable by the third party device 5100 by using the received identification information of the third party device 5100.
According to an embodiment, the third party IoT server 5000 transmits a second control command to the third party device 5100 by using the identification information of the third party device 5100 in operation S1162.
According to an embodiment, the third party device 5100 performs an operation corresponding to the second control command according to the received second control command in operation S1170.
Fig. 12A is a conceptual diagram illustrating the operation of a hub device 1000 and a plurality of devices (e.g., a first device 4100a and a second device 4200a) according to an embodiment of the present disclosure.
Fig. 12A is a block diagram illustrating basic elements for describing the operation of the hub device 1000 and a plurality of devices (e.g., the first device 4100a and the second device 4200 a). However, the elements of the hub device 1000 and the plurality of devices (4100a and 4200a) are not limited to those of fig. 12A.
The arrows of fig. 12A indicate the movement, transmission, and reception of data, including voice signals and text, between the hub device 1000 and the first device 4100 a. The encircled numbers indicate the order in which operations are performed.
Referring to fig. 12A, a hub device 1000 and a plurality of devices (e.g., a first device 4100a and a second device 4200a) may be connected to each other by using a wired communication method or a wireless communication method, and may perform data communication. In an embodiment of the present disclosure, the hub device 1000 and the plurality of devices (e.g., the first device 4100a and the second device 4200a) may be directly connected to each other through a communication network, but the present disclosure is not limited thereto. In an embodiment of the present disclosure, although the hub device 1000 and the plurality of devices (e.g., the first device 4100a and the second device 4200a) may be connected to the voice assistant server 2000 (see fig. 3), the hub device 1000 may be connected to the plurality of devices (e.g., the first device 4100a and the second device 4200a) through the voice assistant server 2000.
The hub device 1000 and the plurality of devices (e.g., the first device 4100a and the second device 4200a) may be connected by a LAN, a WAN, a VAN, a mobile radio communication network, a satellite communication network, or a combination thereof. Examples of wireless communication methods may include, but are not limited to, Wi-Fi, bluetooth, BLE, Zigbee, WFD, UWB, IrDA, and NFC.
The hub device 1000 is a device that receives a voice signal and controls at least one of a plurality of devices (e.g., the first device 4100a and the second device 4200a) based on the received voice signal.
The plurality of devices (e.g., the first device 4100a and the second device 4200a) may be devices that are logged in by using the same user account as the user account of the hub device 1000 and that were previously registered with the IoT server 3000 by using the user account of the hub device 1000.
At least one of the plurality of devices (e.g., the first device 4100a and the second device 4200a) may be a listening device that receives voice input from a user. In the embodiment of fig. 12A, the first device 4100a may be a listening device that receives speech input from a user including a user utterance. The listening device may be, but is not limited to, a device that only receives voice input from a user. In an embodiment of the present disclosure, the listening device may be an operation execution device that receives a control command from the hub device 1000 and executes an operation for a specific function. In embodiments of the present disclosure, a listening device may receive a voice input from a user related to a function performed by the listening device. For example, the first device 4100a may be an air conditioner, and the first device 4100a may receive a voice input such as "reduce the air conditioner temperature to 20 ℃" from the user through the microphone 4140.
The first device 4100a as a listening device may receive a voice signal from a voice input received from a user. In an embodiment of the present disclosure, the first device 4100a may convert sound received through the microphone 4140 into an acoustic signal, and may obtain a voice signal by removing noise (e.g., non-voice component) from the acoustic signal.
The first device 4100a may send a voice signal to the hub device 1000 (step 1).
The hub device 1000 may receive the speech signal from the first device 4100a and may convert the speech signal to text by performing ASR using data of the ASR module 1310 previously stored in memory (step 2).
The memory 1300 (see fig. 2) of the hub device 1000 may include a device determination model 1330, the device determination model 1330 detecting an intention from a text by interpreting the text and determining a device performing an operation corresponding to the detected intention. The device determination model 1330 may determine an operation performing device from among a plurality of devices (e.g., the first device 4100a and the second device 4200a) registered according to the user account. The hub device 1000 may detect intent from the text by interpreting the text using the data of the first NLU model 1332 included in the device determination model 1330. The first NLU model 1332 is a model for determining an intention by interpreting a text and determining an operation execution device based on the intention. The hub device 1000 may determine that the operation performing device performing the operation corresponding to the intention is the first device 4100a by using the data of the device determination model 1330 (step 3).
The function determination model corresponding to the operation execution device determined by the hub device 1000 may be stored in the memory 1300 of the hub device 1000, may be stored in the first device 4100a itself determined as the operation execution device, or may be stored in the memory 2300 (see fig. 3) of the voice assistant server 2000. The function determination model corresponding to each device is a model for obtaining operation information on detailed operations of performing operations according to the determined functions of the devices and relationships between the detailed operations.
The hub device 1000 itself may store a functionality determination model corresponding to at least one of the plurality of devices (e.g., the first device 4100a and the second device 4200 a). For example, when the hub device 1000 is a voice assistant speaker, the hub device 1000 may store a speaker function determination model 1352 for obtaining operation information on detailed operations of performing functions of the voice assistant speaker and relationships between the detailed operations.
The hub device 1000 may also store a function determination model corresponding to another device. For example, the hub device 1000 may store a refrigerator function determination model 1356 for obtaining operation information on detailed operations corresponding to a refrigerator and a relationship between the detailed operations. The refrigerator may be a device previously registered with the IoT server 3000 by using the same user account as that of the hub device 1000.
The speaker function determination model 1352 and the refrigerator function determination model 1356 may include second NLU models 1352a and 1356a and action plan management modules 1352b and 1356b, respectively. In the embodiment of fig. 12A, the hub device 1000 includes a second NLU model 1352A and an action plan management module 1352b for speakers, and includes a second NLU module 1356b and an action plan management module 1356b for a refrigerator. The second NLU models 1352a and 1354a and the action plan management modules 1352b and 1354b respectively included in the speaker function determination model 1352 and the TV function determination model 1354 of fig. 2 are the same except that the device type is changed from a TV to a refrigerator, and thus a repetitive description will not be given.
The hub device 1000 may identify a device storing a function determination model 4132 by using data of the function determination device determination module 1340, the function determination model 4132 corresponding to the first device 4100a determined to operate the execution device. In embodiments of the present disclosure, the hub device 1000 may obtain information about the function determination model from each of the plurality of devices (e.g., the first device 4100a and the second device 4200 a). The term "information on the function determination model" refers to information on whether each of the plurality of devices (e.g., the first device 4100a and the second device 4200a) itself stores the function determination model for obtaining operation information on detailed operations of performing operations according to functions and relationships between the detailed operations. In an embodiment of the present disclosure, when the hub device 1000 receives information on the function determination model from each of the plurality of devices (e.g., the first device 4100a and the second device 4200a), the hub device 1000 may also obtain information (e.g., device identification information, IP address, or MAC address) on a storage location of the function determination model of each of the plurality of devices.
In another embodiment of the present disclosure, the hub device 1000 may identify a device in which a function determination model corresponding to an operation performing device is stored from among the hub device 1000, the first device 4100a, the second device 4200a, and the voice assistant server 2000 by using the database 1360 (see fig. 2) including information about function determination models of devices stored in a memory. In an embodiment of the present disclosure, the hub device 1000 may search the database 1360 according to the device identification information of the first device 4100a determined as the operation performing device by using the data of the function determining device determining module 1340, and may obtain information on the storage location of the function determination model 4132 corresponding to the first device 4100a based on the search result of the database 1360.
The hub device 1000 may transmit at least a portion of the text to the first device 4100a identified as storing the function determination model 4132 corresponding to the operation performing device by using the function determination device determining module 1340 (step 4).
The first device 4100a can receive at least a portion of the text from the hub device 1000 and can interpret the at least a portion of the text using the function determination model 4132 previously stored in the memory 4130. In an embodiment of the present disclosure, the first device 4100a may analyze at least a portion of the text by using the NLU model 4134 included in the function determination model 4132, and may obtain operation information about an operation to be performed by the first device 4100a based on the analysis result of the at least a portion of the text. The function determination model 4132 may include an action plan management module 4136 configured to manage operation information related to detailed operations of the device in order to generate detailed operations to be performed by the first device 4100a and an execution order of the detailed operations. The action plan management module 4136 may manage operation information regarding detailed operations of the first device 4100a and relationships between the detailed operations. The action plan managing module 4136 may plan a detailed operation to be performed by the first device 4100a and an execution order of the detailed operation based on the analysis result of at least a part of the text. The first device 4100a may plan a detailed operation and an execution order of the detailed operation based on the analysis result of the function determination model 4132 on at least a part of the text, and may execute the operation based on the plan result (step 5).
However, the first device 4100a of the present disclosure is not limited to performing operations based on at least a portion of the text received from the hub device 1000. In another embodiment of the present disclosure, the first device 4100a may receive voice input from a user by using the microphone 4140, may convert the received voice input into text by using the ASR model 4138, and may perform an operation corresponding to the text by interpreting the text using the function determination model 4132. That is, the first device 4100a can independently perform operations even without the participation of the hub device 1000.
Fig. 12A illustrates an embodiment in which the first device 4100a as a listening device that receives voice input from a user and an operation performing device to perform an operation related to voice input through the hub device 1000 are the same. For example, when the first device 4100a receives a voice input from the user saying "to lower the air conditioning temperature to 20 ℃", the hub device 1000 may receive the voice input of the user from the first device 4100a as a listening device, and may determine the first device 4100a as an air conditioning as an operation performing device related to the voice input. The first device 4100a may receive text related to the voice input from the hub device 1000, and may perform an air conditioner temperature adjustment operation based on the text.
Unlike fig. 12A, the listening device and the operation performing device may be different from each other, which will be described with reference to fig. 12B.
Fig. 12B is a conceptual diagram illustrating the operation of the hub device 1000 and a plurality of devices (e.g., the first device 4100a and the second device 4200a) according to an embodiment of the present disclosure.
Fig. 12B is a block diagram illustrating basic elements for describing the operation of the hub device 1000 and a plurality of devices (e.g., the first device 4100a and the second device 4200 a). The same elements as those of the hub device 1000 and the plurality of devices (e.g., the first device 4100a and the second device 4200a) of fig. 12A will not be described repeatedly.
Referring to fig. 12B, the second device 4200a may be a listening device that receives voice input from a user. In the embodiment of the present disclosure, although the second device 4200a may be a listening device, the second device 4200a may not be an operation performing device that receives a control command from the hub device 1000 or performs an operation for a specific function. In the embodiment of fig. 12B, the second device 4200a as a TV may receive a TV-independent voice input, such as "lower the air conditioner temperature to 20 ℃", through the microphone 4240.
The second device 4200a may obtain a voice signal from a voice input received through the microphone 4240 and may transmit the voice signal to the hub device 1000 (step 1).
The hub device 1000 may receive the speech signal from the second device 4200a and may convert the speech signal to text by performing ASR using data of the ASR module 1310 previously stored in memory (step 2).
The hub device 1000 may determine the intention by interpreting the text using the first NLU model 1332, and may determine an operation performing device related to the intention by using the data of the device determination model 1330 (step 3). In the embodiment of fig. 12B, the hub device 1000 may determine that the operation performing device performing the operation corresponding to the intention is the first device 4100a by using the data of the device determination model 1330. The first device 4100a may be a different device than the listening device receiving voice input from the user. For example, the first device 4100a may be an air conditioner.
The hub device 1000 may identify a device in which the function determination model 4132 corresponding to the first device 4100a determined as the operation performing device is stored by using the data of the function determination device determining module 1340, and may transmit at least a portion of the text to the identified first device 4100a (step 4).
The first device 4100a may receive at least a portion of text from the hub device 1000, may analyze the at least a portion of text by using the function determination model 4132, may plan detailed operations and an execution order of the detailed operations based on the analysis result, and may execute the operations based on the plan result (step 5).
Since the function determination model 4232, which may interpret information about detailed operations related to functions of the second device 4200a and an execution order of the detailed operations, is stored in the memory 4230 of the second device 4200a, the second device 4200a may not interpret text related to the operations of the first device 4100 a. In the embodiment of fig. 12B, since the second device 4200a is a TV and the function determination model 4232 corresponding to the TV is stored in the memory 4230, the second device 4200a may not interpret a voice input of "lowering the air conditioner temperature to 20 ℃". In this case, the second device 4200a as the listening device may transmit the voice input received from the user to the hub device 1000, and the hub device 1000 may transmit the text related to the operation to the first device 4100a determined as the operation performing device. The first device 4100a may perform an air conditioning temperature adjustment operation based on the text received from the hub device 1000.
Referring to the embodiments of fig. 12A and 12B, a user may not transmit a voice command related to an operation to be performed to a specific device performing an operation related to a specific function, but may transmit a voice command to any one of a plurality of devices (e.g., the first device 4100a and the second device 4200a) connected through a wired/wireless communication network by using a user account. For example, when the user generates a voice command saying "lower the air conditioner temperature to 20 ℃", the user may transmit the voice command to the air conditioner performing a function related to "cooling temperature setting", or may transmit the voice command related to cooling temperature adjustment to a TV which is not related to the air conditioner at all. A listening device that receives a voice input from a user among the plurality of devices (e.g., the first device 4100a and the second device 4200a) may transmit the voice input to the hub device 1000, and the hub device 1000 may determine an operation performing device corresponding to an utterance intention of the user included in the voice input and may control the operation performing device to perform an operation.
In the embodiment of fig. 12A and 12B, since the user does not need to directly specify a specific device related to a function from among the plurality of devices (e.g., the first device 4100a and the second device 4200a) and transmit a voice command to the specific device, and an operation performing device for performing an operation related to the function is automatically determined, it is possible to improve user convenience. Further, since the hub device 1000 determines the operation performing device without involving an external server such as the voice assistant server 2000, the use of a network may not be required, and thus network usage fees may be reduced.
Fig. 13 is a flowchart illustrating a method in which the hub device 1000 determines an operation performing device based on a voice signal received from a listening device and transmits text to a device storing a function determination model corresponding to the operation performing device according to an embodiment of the present disclosure.
According to an embodiment, in operation S1310, the hub device 1000 receives a voice signal from a listening device. The listening device may be any one of a plurality of devices that are previously registered with the IoT server 3000 by using the same user account as the hub device 1000 and by using the same user account as the hub device 1000. Listening devices may be connected to the hub device 1000 through a wired or wireless communication network. The listening device may be, but is not limited to, a device that only receives voice input from a user. In an embodiment of the present disclosure, the listening device may be an operation execution device that receives a control command from the hub device 1000 and executes an operation for a specific function.
In an embodiment of the present disclosure, the processor 1200 (see fig. 2) of the hub device 1000 may receive the voice signal from the listening device through the communication interface 1400 (see fig. 2).
According to an embodiment, in operation S1320, the hub device 1000 converts the received speech signal into text by performing ASR. In an embodiment of the present disclosure, the hub device 1000 may perform ASR to convert speech signals received from a listening device into computer readable text by using a predefined model such as AM or LM. When the hub device 1000 receives an acoustic signal without noise removal from a listening device, the hub device 1000 may obtain a speech signal by removing noise from the received acoustic signal, and may perform ASR on the speech signal.
According to an embodiment, the hub device 1000 analyzes a text by using the first NLU model and determines an operation performing device corresponding to the analyzed text by using the device determination model in operation S1330. Operation S1330 is the same as operation S630 of fig. 6 except that a listening device is also included in the device candidates that can be determined as the operation performing device, and thus a duplicate description will not be given.
According to an embodiment, the hub device 1000 identifies a device storing a function determination model corresponding to an operation performing device determined from among the listening devices and the operation performing devices in operation S1340. The function determination model corresponding to the operation execution device determined by the hub device 1000 may be stored in the memory 1300 (see fig. 2) of the hub device 1000, may be stored in the internal memory of the operation execution device itself, or may be stored in the memory 2300 (see fig. 3) of the voice assistant server 2000. When the determined operation execution device and the listening device are not identical, a function determination model corresponding to the operation execution device may be stored in the listening device. The function determination model corresponding to the operation execution device is a model used by the device determined as the operation execution device to obtain operation information on detailed operations to execute the operation according to the determined function of the operation execution device and a relationship between the detailed operations.
In an embodiment of the present disclosure, the processor 1200 (see fig. 2) of the hub device 1000 may identify a device storing a function determination model corresponding to an operation execution device by using program code or data of the function determination device determination module 1340 (see fig. 2). In an embodiment of the present disclosure, the hub device 1000 may obtain the stored information of the function determination model from each of a plurality of devices connected to the hub device 1000 through a wired or wireless communication network and logged in by using the same user account as the hub device 1000. The term "stored information of the function determination model" refers to information on whether each of the plurality of devices itself stores the function determination model for obtaining operation information on detailed operations performed according to the functions and relationships between the detailed operations. In an embodiment of the present disclosure, when the hub device 1000 receives information on the function determination model from a plurality of devices, the hub device 1000 may also obtain information (e.g., device identification information, IP address, or MAC address) on the storage location of the function determination model of each of the plurality of devices. When the operation execution device and the listening device are not identical, the hub device 1000 may receive information on the function determination model from the listening device, and may determine whether the function determination model corresponding to the operation execution device is stored in the internal memory of the listening device based on the received information on the function determination model.
In another embodiment of the present disclosure, the hub device 1000 may identify a device in which a function determination model corresponding to the operation performing device is stored, from among the hub device 1000, the operation performing device, and the voice assistant server 2000, by using a database 1360 (see fig. 2), the database 1360 including information about the function determination model of the device stored in the memory 1300 (see fig. 2). In an embodiment of the present disclosure, the hub device 1000 may search the database 1360 according to the device identification information of the operation performing device by using the program code or data of the function determining device determining module 1340, and may obtain information on a storage location of the function determination model corresponding to the operation performing device based on a search result of the database 1360. When the operation performing device and the listening device are not identical, the hub device 1000 may search the database 1360 for identification information of the listening device, and may determine whether a function determination model corresponding to the operation performing device is stored in the internal memory of the listening device based on the search result of the database 1360.
Operation S1340 will be described with reference to fig. 14.
According to an embodiment, the hub device 1000 transmits at least a portion of the text to the identified device in operation S1350. The hub device 1000 may transmit at least a portion of the text to the identified device by using the communication interface 1400 (see fig. 2). For example, based on determining that the function determination model corresponding to the operation execution device is stored in the internal memory of the operation execution device, the hub device 1000 may transmit at least a part of the text to the operation execution device by using the communication interface 1400. When the listening device and the operation performing device are the same, the hub device 1000 may transmit at least a part of the text to the listening device. As another example, based on determining that the function determination model corresponding to the operation execution device is not stored in the listening device and the operation execution device but is stored in the memory 2300 (see fig. 3) of the voice assistant server 2000, the hub device 1000 may transmit at least a portion of the text to the voice assistant server 2000 by using the communication interface 1400.
In an embodiment of the present disclosure, the hub device 1000 may control the communication interface to separate a portion of the name related to the operation performing device from the text and transmit only the remaining portion of the text to the identified device. For example, when the device identified as storing the function determination model corresponding to the operation execution device is "air conditioner" and the text is "lower the set temperature to 20 ℃ in the air conditioner", it is not necessary to transmit "in the air conditioner" when the text is transmitted to the air conditioner. In this case, the hub device 1000 may parse the text in units of words or phrases, may recognize the words or phrases specifying the name, common name, installation location, etc. of the device, and may provide the remaining portion of the text other than the words or phrases recognized in the text to the device (e.g., an air conditioner).
Fig. 14 is a flowchart illustrating a method in which the hub device 1000 transmits text to a device storing a function determination model corresponding to an operation execution device according to an embodiment of the present disclosure. Fig. 14 illustrates operations S1340 and S1350 of fig. 13. Operations S1410 to S1450 of fig. 14 are a detailed embodiment of operation S1340 of fig. 13, and operations S1460 to S1480 are a detailed embodiment of operation S1350 of fig. 13.
Operation S1410 is performed after operation S1330 of fig. 13 is performed. In operation S1410, the hub device 1000 determines whether the listening device and the determined operation performing device are the same. In an embodiment of the present disclosure, the processor 1200 of the hub device 1000 may obtain the device identification information (e.g., device id information) of the operation performing device and the device identification information of the listening device determined in operation S1330 (see fig. 13), and then may determine whether the listening device and the operation performing device are identical by comparing the obtained device identification information of the operation performing device with the obtained device identification information of the listening device.
Based on the determination in operation S1410 that the listening device and the operation performing device are the same (yes), the hub device 1000 may obtain the stored information of the function determination model from the listening device (S1420). In an embodiment of the present disclosure, the hub device 1000 may receive the stored information of the function determination model from the listening device by using the communication interface 1400 (see fig. 2). The storage information of the function determination model refers to information on whether the listening device itself stores in the internal memory a function determination model for obtaining operation information on detailed operations performed according to the functions and relationships between the detailed operations. In an embodiment of the present disclosure, when the hub device 1000 receives information on the function determination model from the listening device, the hub device 1000 may also obtain information on the storage location of the function determination model (for example, identification information, IP address, or MAC address of the listening device).
In operation S1430, the hub device 1000 determines whether the function determination model corresponding to the determined operation performing device is stored in the internal memory of the listening device based on the stored information of the function determination model of the listening device. In an embodiment of the present disclosure, the processor 1200 (see fig. 2) of the hub device 1000 may determine whether a function determination model corresponding to the operation performing device is stored in the internal memory of the listening device by using the program code or data of the function determination device determining module 1340 (see fig. 2). In an embodiment of the present disclosure, the processor 1200 may determine whether the function determination model stored in the internal memory of the listening device is the same as the function determination model corresponding to the operation performing device based on the obtained storage information of the function determination model of the listening device.
In another embodiment of the present disclosure, the processor 1200 may determine whether the function determination model corresponding to the operation performing device is stored in the internal memory of the listening device by using a database 1360 (see fig. 2) storing information about function determination models of a plurality of devices. In an embodiment of the present disclosure, the processor 1200 may search the database 1360 according to the device identification information of the listening device by using the program code or data of the function determination device determining module 1340, and may obtain information on whether the function determination model corresponding to the operation performing device is stored in the listening device based on the search result of the database 1360.
Based on the determination in operation S1430 that the function determination model corresponding to the operation performing apparatus is stored in the internal memory of the listening apparatus (yes), the hub apparatus 1000 transmits at least a part of the text to the listening apparatus (S1460). In an embodiment of the present disclosure, the processor 1200 may determine the model by sending at least a portion of the text to a function previously stored in the internal memory of the listening device using the communication interface 1400.
Based on the determination in operation S1430 that the function determination model corresponding to the operation performing apparatus is not stored in the internal memory of the listening apparatus (no), the hub apparatus 1000 transmits at least a portion of the text to the voice assistant server 2000 (S1480). In an embodiment of the present disclosure, the processor 1200 may transmit at least a portion of the text to a function determination model corresponding to the operation execution apparatus among the plurality of function determination models 2342, 2344, 2346, and 2348 previously stored in the memory 1300 (see fig. 3) of the voice assistant server 2000 by using the communication interface 1400.
In operations S1460 and S1480, the processor 1200 may transmit only at least a portion of the text, not the entire text, which is the same as described with reference to fig. 13, and thus a repetitive description will not be given.
Based on the determination in operation S1410 that the listening device and the operation performing device are different from each other (no), the hub device 1000 obtains the stored information of the function determination model from the operation performing device (S1440). In an embodiment of the present disclosure, the hub device 1000 may receive the stored information of the function determination model from the operation execution device by using the communication interface 1400 (see fig. 2). In an embodiment of the present disclosure, when the hub device 1000 receives information about the function determination model from the operation performing device, the hub device 1000 may also obtain information of a storage location of the function determination model (for example, identification information, an IP address, or a MAC address of the operation performing device).
In operation S1450, the hub device 1000 determines whether the function determination model is stored in the internal memory of the operation performing device. In an embodiment of the present disclosure, the processor 1200 may determine whether a function determination model corresponding to the operation performing device is stored in an internal memory of the operation performing device by using the program code or data of the function determination device determining module 1340. In another embodiment of the present disclosure, the processor 1200 may determine whether the function determination model corresponding to the operation performing device is stored in the internal memory of the operation performing device by using the database 1360 storing information about function determination models of a plurality of devices. In an embodiment of the present disclosure, the processor 1200 may search the database 1360 according to the device identification information of the operation performing device by using the program code or data of the function determining device determining module 1340, and may obtain information on whether the function determination model corresponding to the operation performing device is stored in the internal memory of the operation performing device itself based on the search result of the database 1360.
Based on the determination in operation S1450 that the function determination model corresponding to the operation performing device is stored in the internal memory of the operation performing device (yes), the hub device 1000 transmits at least a portion of the text to the operation performing device (operation S1470). In an embodiment of the present disclosure, the processor 1200 may determine the model by sending at least a portion of the text to a function previously stored in an internal memory of the operation execution device using the communication interface 1400.
Based on the determination in operation S1450 that the function determination model corresponding to the operation performing device is not stored in the operation performing device (no), the hub device 1000 transmits at least a portion of the text to the voice assistant server 2000 (S1480).
Fig. 15 is a flowchart illustrating a method of operating the hub device 1000, the voice assistant server 2000 and the listening device 4000b according to an embodiment of the present disclosure. Fig. 15 illustrates determination of the listening device 4000b as an operation execution device by the hub device 1000 via natural language interpretation.
Referring to fig. 15, the hub device 1000 may include an ASR module 1310, an NLG module 1320, a device determination model 1330, a function determination device determination module 1340, and a plurality of function determination models (e.g., 1352, 1354, and 1356). However, the present disclosure is not limited thereto, and the hub device 1000 may not store the function determination model or may not store the function determination model corresponding to the operation device.
The speech assistant server 2000 may store a plurality of function determination models 2342, 2344, and 2346. For example, the function determination model 2342, which is the first function determination model stored in the voice assistant server 2000, may be a model for determining the function of an air conditioner and obtaining operation information about detailed operations related to the determined function and relationships between the detailed operations. For example, the function determination model 2344, which is a second function determination model, may be a model for determining the function of the TV and obtaining operation information on detailed operations related to the determined function and a relationship between the detailed operations, and the function determination model 2346, which is a third function determination model, may be a model for determining the function of the washing machine and obtaining operation information on detailed operations related to the determined function and a relationship between the detailed operations.
The listening device 4000b is a device that receives voice input from a user. However, the present disclosure is not limited thereto, and in an embodiment of the present disclosure, the listening device 4000b may be an operation performing device that receives a control command from the hub device 1000 and performs an operation for a specific function. In the embodiment of fig. 15, the listening device 4000b may comprise a function determination model 4032b, an ASR module 4038b and a microphone 4040 b. The function determination model 4032b may include an NLU model 4034b and an action plan management module 4036 b.
In operation S1510, the listening device 4000b receives a voice input from the user. In an embodiment of the present disclosure, the listening device 4000b may receive voice input from a user related to functions performed by the listening device 4000b through the microphone 4040 b. For example, when the listening device 4000b is an air conditioner, the listening device 4000b may receive a voice input such as "reduce the air conditioner temperature to 20 ℃" from the user through the microphone 4040 b.
In embodiments of the present disclosure, the listening device 4000b may obtain a voice signal from a voice input received from a user. In an embodiment of the present disclosure, the listening device 4000b may convert sound received through the microphone 4040b into an acoustic signal, and may obtain a voice signal by removing noise (e.g., non-voice components) from the acoustic signal.
In operation S1520, the listening device 4000b determines whether the device determination model is stored in the memory of the listening device 4000 b. In an embodiment of the present disclosure, the listening device 4000b may determine whether program code or data corresponding to the device determination model is stored by scanning a memory. However, the present disclosure is not limited thereto, and the listening device 4000b may obtain device specification information based on the device identification information (e.g., device id information), and may determine whether the device determination model is stored in the internal memory of the listening device 4000b by using the device specification information.
Based on the determination that the device determination model is not stored in the memory of the listening device 4000b (no), the listening device 4000b transmits a voice signal to the hub device 1000 (S1522).
Based on the determination that the device determination pattern is stored in the memory of the listening device 4000b (yes), the listening device 4000b converts the speech signal into text by performing ASR (S1530). The listening device 4000b may convert the speech signal into computer readable text by performing ASR using the ASR module 4038 b. When the listening device 4000b receives an acoustic signal from which noise is not removed, the listening device 4000b may obtain a speech signal by removing noise from the received acoustic signal, and may perform ASR on the speech signal.
According to an embodiment, in operation S1540, the listening device 4000b determines the listening device 4000b as the operation performing device by interpreting the text using the device determination model. In an embodiment of the present disclosure, the device determination model included in the listening device 4000b may include an NLU model, and the listening device 4000b may determine the listening device 4000b as the operation performing device by interpreting text using the NLU model. For example, when the text converted from the voice signal is "lower the air-conditioning temperature to 20 ℃", the listening device 4000b as the air-conditioner can determine the listening device 4000b as the operation performing device by interpreting the text using the NLU model.
In operation S1550, the listening device 4000b supplies the text to the function determination model 4032. For example, the function determination model 4032 as a function determination model of an air conditioner may be a model for obtaining operation information on detailed operations of performing operations according to the function of the air conditioner and relationships between the detailed operations. The function determination model 4032 may include an NLU model 4034b, the NLU model 4034b being configured to obtain operation information related to an operation to be performed by the listening device 4000b based on an analysis result of at least a part of the text. The function determination model 4032b may include an action plan management module 4036b, and the action plan management module 4036b manages operation information related to detailed operations of the device to generate detailed operations to be performed by the listening device 4000b and an execution sequence of the detailed operations. The action plan management module 4036b can plan the detailed operations to be performed by the listening device 4000b and the execution order of the detailed operations based on the analysis result of at least a part of the text.
In the embodiment of the present disclosure, the listening device 4000b may provide the text indicating "lower the air-conditioner temperature to 20 ℃" to the function determination model 4032b for the air-conditioner, may obtain operation information to be performed by the air-conditioner, for example, information on an operation of "set temperature reduction" by interpreting the text using the NLU model 4034b, and may plan a detailed operation for performing the operation of "set temperature reduction" and an execution order of the detailed operation by using the action plan management module 4036 b.
In operation S1522, the hub device 1000 receives a voice signal from the listening device 4000 b.
According to an embodiment, in operation S1560, the hub device 1000 converts the speech signal into text by performing ASR. The detailed method of the hub device 1000 performing ASR is the same as the method described in operation S620 of fig. 6, and thus a detailed description will not be given.
According to an embodiment, in operation S1570, the hub device 1000 interprets a text by using the first NLU model 1332 and determines the listening device 4000b as an operation determination device related to the text by using the device determination model 1330. For example, the hub device 1000 may obtain an intention corresponding to "air conditioner set temperature decrease" by interpreting a text saying "decrease air conditioner temperature to 20 ℃" using the first NLU model 1332, and may determine the listening device 4000b as an air conditioner as an operation execution device based on the obtained intention by using the device determination model 1330. The detailed method in which the hub device 1000 determines the operation performing device is the same as the method described in operation S630 of fig. 6, and thus a detailed description will not be given.
According to an embodiment, in operation S1580, the hub device 1000 determines whether a function determination model is stored in the listening device 4000 b. In an embodiment of the present disclosure, the processor 1200 (see fig. 2) of the hub device 1000 determines whether the function determination model corresponding to the listening device 4000b is stored in the internal memory of the listening device 4000b based on the stored information of the function determination model obtained from the listening device 4000 b. In another embodiment of the present disclosure, the processor 1200 may determine whether the function determination model corresponding to the listening device 4000b is stored in the internal memory of the listening device 4000b by using a database 1360 (see fig. 2) storing information about function determination models of a plurality of devices. The detailed method of determining whether the function determination model corresponding to the listening device 4000b is stored in the internal memory of the listening device 4000b is the same as the method described in operation S1430 of fig. 14, and thus a repetitive description will not be given.
Based on the determination that the function determination model is stored in the listening device 4000b in operation S1580 (yes), the hub device 1000 transmits text to the function determination model of the listening device 4000b (S1592).
Based on the determination in operation S1580 that the function determination model is not stored in the listening device 4000b (no), the hub device 1000 transmits the text to the function determination model corresponding to the listening device 4000b of the voice assistant server 2000 (S1594). When the listening device 4000b is an air conditioner and it is determined that the function determination model for the air conditioner is not stored in the internal memory of the listening device 4000b, the hub device 1000 may transmit text to the function determination model 2342, the function determination model 2342 being the function determination model for the air conditioner among the plurality of function determination models 2342, 2344, m, and 2346 previously stored in the memory 2300 (see fig. 3) of the voice assistant server 2000.
In an embodiment of the present disclosure, operations S1530 to S1550 performed by the listening device 4000b and operations S1560 to S1594 performed by the hub device 1000 may be performed in parallel by separate entities. Further, the operations S1530 to S1550 performed by the listening device 4000b may be independently performed regardless of the operations S1560 to S1594 performed by the hub device 1000. That is, although the listening device 4000b may be determined by the hub device 1000 as an operation performing device and may receive at least a portion of text, the present disclosure is not limited thereto. In an embodiment of the present disclosure, the listening device 4000b may determine an operation performing device by interpreting a voice input of a user, and may provide at least a part of the text to the function determination model 4032 b.
Fig. 16 is a flowchart illustrating a method of operating the hub device 1000, the voice assistant server 2000, the execution device 4000a, and the listening device 4000b according to an embodiment of the present disclosure. Fig. 16 illustrates that the listening device 4000b and the operation performing device 4000a are different from each other. For example, the listening device 4000b may be a TV, and the operation performing device 4000a may be an air conditioner.
Referring to fig. 16, the hub device 1000 may include an ASR module 1310, an NLG module 1320, a device determination model 1330, a function determination device determination module 1340, and a plurality of function determination models (e.g., 1352, 1354, and 1356).
The speech assistant server 2000 may store a plurality of function determination models 2342, 2344, and 2346.
The hub device 1000 and the voice assistant server 2000 of fig. 16 are the same as the hub device 1000 and the voice assistant server 2000 of fig. 15, respectively, and thus a repetitive description will not be given.
Unlike fig. 15, the listening device 4000b of fig. 16 may comprise only a microphone 4040 b. The listening device 4000b may receive a voice input of a user through the microphone 4040b and may transmit the received voice input to the hub device 1000. In an embodiment of the present disclosure, when the listening device 4000b receives a voice input of a user, the listening device 4000b may be preset to perform an operation of transmitting the voice input to the hub device 1000.
The operation execution apparatus 4000a itself may store the function determination model 4032. The operation performing apparatus 4000a itself may analyze text, and a function determination model 4032 for performing an operation according to the intention of the user based on the analysis result of the text may be stored in the memory of the operation performing apparatus 4000 a. For example, when the operation performing apparatus 4000a is "air conditioner", the function determination model 4032 stored in the memory of the operation performing apparatus 4000a may be a model for determining a function of the air conditioner and obtaining operation information on detailed operations of the determined function and relationships between the detailed operations.
According to an embodiment, in operation S1610, the listening device 4000b receives a voice input from a user. Operation S1610 is the same as operation S1510 of fig. 15, and thus a repetitive description will not be given.
According to an embodiment, the listening device 4000b determines a device including the device determination model 1330 as the hub device 1000 from among a plurality of devices previously registered by using the user account in operation S1620. In an embodiment of the present disclosure, the listening device 4000b may obtain device determination model information from a plurality of devices connected through a wired or wireless communication network and logged in by using the same user account, and may determine a device storing the device determination model 1330 as the hub device 1000 based on the obtained device determination model information. The device determination model information may include at least one of information on whether there is a device determination model for each of the plurality of devices, identification information of a device/server in which the device determination model is stored, and information on an IP address of the stored device/server or a MAC address of the stored device/server.
In operation S1630, the listening device 4000b transmits a voice signal to the hub device 1000.
According to an embodiment, the hub device 1000 converts the speech signal into text by performing ASR in operation S1640.
According to an embodiment, the hub device 1000 interprets the text by using the first NLU model 1332 and determines an operation performing device related to the text by using the device determination model 1330 in operation S1650.
Operations S1640 and S1650 are the same as operations S620 and S630 of fig. 6, respectively, and thus a repetitive description will not be given.
According to an embodiment, in operation S1660, the hub device 1000 determines whether the function determination model is stored in the operation performing device. In the embodiment of the present disclosure, the processor 1200 (see fig. 2) of the hub device 1000 determines whether the function determination model 4032 corresponding to the operation execution device 4000a is stored in the internal memory of the operation execution device 4000a based on the stored information of the function determination model 4032 obtained from the operation execution device 4000 a. In another embodiment of the present disclosure, the processor 1200 may determine whether the function determination model corresponding to the operation performing device 4000a is stored in the internal memory of the operation performing device 4000a by using a database 1360 (see fig. 2) storing information about function determination models of a plurality of devices. A detailed method of determining whether the function determination model corresponding to the operation performing apparatus 4000a is stored in the internal memory of the operation performing apparatus 4000a is the same as the method described in operation S1450 of fig. 14, and thus a repetitive description will not be given.
Based on the determination in operation S1660 that the function determination model 4032 is stored in operation execution device 4000a (yes), the hub device 1000 transmits the text to operation execution device 4000a (S1672). For example, when the operation execution apparatus 4000a is an air conditioner and it is determined that the function determination model 4032 stored in the memory of the operation execution apparatus 4000a is a model for obtaining operation information of the air conditioner, the hub apparatus 1000 may transmit at least a part of the text to the function determination model 4032 stored in the operation execution apparatus 4000 a.
Based on the determination that the function determination model is not stored in the operation execution device 4000a in operation S1660 (no), the hub device 1000 transmits the text to the voice assistant server 2000 (S1674). For example, when the operation performing apparatus 4000a is an air conditioner but it is determined that the function determination model 4032 stored in the memory of the operation performing apparatus 4000a is a model for obtaining operation information of a TV, the hub apparatus 1000 may transmit at least a part of the text to the function determination model 2342 for obtaining operation information of an air conditioner among the plurality of function determination models 2342, 2344, and 2346 stored in the voice assistant server 2000.
Fig. 17 is a diagram illustrating an example in which the operation performing apparatus 4000a updates the function determination model according to an embodiment of the present disclosure.
Referring to fig. 17, the voice assistant server 2000 may include a device determination model 2330 and a plurality of function determination models 2342, 2344, and 2346. The operation execution apparatus 4000a may include a function determination model 4032, and the function determination model 4032 may include an NLU model 4034 and an action plan management module 4036.
The device determination model 2330 included in the voice assistant server 2000 may be a model configured to determine the latest version of the operation execution device 4000a by interpreting voice input for the latest function or the latest device. The plurality of function determination models 2342, 2344, and 2346 included in the voice assistant server 2000 may be latest versions of models configured to interpret voice inputs of latest functions related to, for example, an air conditioner, a refrigerator, and a TV and generate operation information as analysis results.
The function determination model 4032 included in the operation performing apparatus 4000a is configured to interpret text converted from a speech input and generate operation information for performing a specific function, and the function determination model 4032 may include an NLU model 4034 trained to interpret text regarding only a limited number of functions and an action plan management module 4036 trained to generate operation information from the text interpreted by the NLU model 4034. In the embodiment of the present disclosure, the function determination model 4032 included in the operation execution apparatus 4000a may not be the latest version, and thus may not be able to interpret text on the latest function and may not generate operation information on the latest function. When text on the latest function is received, since the function determination model 4032 may not interpret the text and may not determine the function even when the NLU model 4034 is used, the function determination model 4032 may not generate operation information even when the action plan management module 4036 is used. In this case, the operation execution apparatus 4000a cannot determine the function.
According to an embodiment, the operation performing apparatus 4000a transmits apparatus information of the function determination model and update request information to the voice assistant server 2000 in operation S1710. The update request information of the function determination model may be information for requesting synchronization of the version of the function determination model 4032 stored in the memory of the operation execution apparatus 4000a with the version of the function determination model corresponding to the operation execution apparatus 4000a among the plurality of function determination models 2342, 2344, and 2346 stored in the voice assistant server 2000.
In an embodiment of the present disclosure, the operation performing apparatus 4000a may periodically transmit the update request information of the function determination model to the voice assistant server 2000 at a specific time interval, a specific date, or the like. However, the present disclosure is not limited thereto, and the operation performing apparatus 4000a may transmit update request information of the function determination model to the voice assistant server 2000 when updating the application or the firmware. In another embodiment of the present disclosure, when the operation execution device 4000a receives a control command from the IoT server 3000, the operation execution device 4000a may transmit update request information of the function determination model to the voice assistant server 2000.
According to an embodiment, the voice assistant server 2000 transmits update data of the function determination model in operation S1720.
According to an embodiment, the operation execution device 4000a updates the function determination model to the latest version by using the update data in operation S1730. In an embodiment of the present disclosure, the operation performing apparatus 4000a may overwrite and update the previously stored function determination model 4032 by using the data of the latest version of the function determination model received from the voice assistant server 2000. The operation performing apparatus 4000a may perform training to interpret text on the latest function added, modified, or deleted by updating the function determination model 4032 and generate operation information according to the interpretation result.
According to an embodiment, the voice assistant server 2000 updates the device determination model 2330 in operation S1740. In an embodiment of the present disclosure, when the voice assistant server 2000 interprets the text related to the latest function and determines a device for performing an operation on the latest function, the operation performing device 4000a may be updated such that the device determination model 2330 is also included in the device candidates.
According to the embodiment of fig. 17, since the function determination model 4032 stored in the internal memory of the operation execution apparatus 4000a itself is synchronized with the latest function determination model of the voice assistant server 2000, when the user speaks to execute the latest function, even if the voice assistant server 2000 is not accessed by using a communication network, the operation can be executed by the function determination model 4032 of the operation execution apparatus 4000a, thereby reducing network usage fees and improving server operation efficiency.
Fig. 18 is a flowchart illustrating a method of operating the hub device 1000, the voice assistant server 2000, the IoT server 3000, and the execution device 4000a according to an embodiment of the present disclosure. Fig. 18 illustrates that the operation execution apparatus 4000a updates the function determination model 4032 to the latest version when a control command is received from the IoT server 3000.
Referring to fig. 18, the hub device 1000 may include an ASR module 1310, an NLG module 1320, a device determination model 1330, a function determination device determination module 1340, and a plurality of function determination models (e.g., 1352, 1354, and 1356).
The operation execution apparatus 4000a may include a function determination model 4032, and the function determination model 4032 may include an NLU model 4034 and an action plan management module 4036.
The hub device 1000 and the operation performing device 4000a of fig. 18 are the same as the hub device 1000 and the operation performing device 4000a of fig. 16, respectively, and thus a repetitive description will not be given.
The speech assistant server 2000 can include a device determination model 2330 and a plurality of function determination models 2342, 2344, and 2346. The device determination model 2330 is the same as the device determination model 2330 of fig. 3, and therefore a duplicate description will not be given.
In operation S1810, the IoT server 3000 transmits a control command to the operation execution device 4000 a.
In operation S1820, the operation execution apparatus 4000a determines whether operation information corresponding to the control command is included in the function determination model 4032. In an embodiment of the present disclosure, the operation performing apparatus 4000a may determine whether information about a function performing an operation according to a control command received from the IoT server 3000 is included in the function determination model 4032. For example, when the operation performing apparatus 4000a is an air conditioner and receives a control command of a dehumidification mode, the operation performing apparatus 4000a may determine whether information on detailed operations of a dehumidification function of the operation dehumidification mode and an execution sequence of the detailed operations are stored in the function determination model 4032.
Based on the determination in operation S1820 that the operation information corresponding to the control command is stored in the function determination model 4032 (yes), the operation execution apparatus 4000a executes an operation according to the control command (S1830).
Based on the determination in operation S1820 that the operation information corresponding to the control command is not stored in the function determination model 4032 (no), the operation execution apparatus 4000a transmits update request information and device information of the function determination model 4032 to the voice assistant server 2000 (S1840). The device information may include at least one of device identification information (e.g., device id information) of the operation execution device 4000a, stored information of the function determination model 4032 of the operation execution device 4000a, and version information of the function determination model 4032. The version information of the function determination model 4032 may include information on the attribute, type, or number of functions that can be identified by using the function determination model 4032.
In operation S1842, the voice assistant server 2000 transmits the update data of the function determination model to the operation execution apparatus 4000 a.
In operation S1850, the operation execution apparatus 4000a updates the function determination model 4032 to the latest version by using the update data. Operation S1850 is the same as operation S1730 of fig. 17, and thus a repetitive description will not be given.
In operation S1860, the operation execution apparatus 4000a transmits update information of the function determination model 4032 to the voice assistant server 2000. In an embodiment of the present disclosure, the operation performing apparatus 4000a may transmit at least one of information on a version of the function determination model 4032 after the update and information on an attribute, a type, or a number of functions that can be recognized by using the function determination model 4032 after the update to the voice assistant server 2000.
In operation S1870, the voice assistant server 2000 updates the device determination model 2330 based on the update information of the function determination model 4032 of the operation execution device. In an embodiment of the present disclosure, the voice assistant server 2000 may update the device determination model 2330 based on at least one of updated version information of the function determination model 4032 and information on the attribute, type and number of functions that may be recognized by using the updated function determination model 4032 received from the operation performing device 4000a so that the operation performing device 4000a is included in a device candidate list for determining a device that performs an operation of the latest function.
In operation S1880, the voice assistant server 2000 transmits update data of the device determination model 2330 to the hub device 1000.
In operation S1890, the hub device 1000 updates the device determination model 1330 based on the received update data. The device determination model 1330 can be synchronized with the device determination model 2330 of the voice assistant server 2000 by being updated. Accordingly, when the voice input of the user related to the latest function is received through the hub device 1000, since the hub device 1000 itself can determine the operation performing device 4000a by using the device determination model 1330 stored in the internal memory 1300 (see fig. 2) without accessing the voice assistant server 2000 by using the communication network, the network usage fee can be reduced, the processing time can be reduced, and thus the response speed can be improved.
Fig. 19 is a flowchart illustrating a method of operating the hub device 1000, the voice assistant server 2000, the IoT server 3000, and the new device 4000 c.
Referring to fig. 19, the hub device 1000 may include an ASR module 1310, an NLG module 1320, a device determination model 1330, and a function determination device determination module 1340.
The speech assistant server 2000 can include a device determination model 2330 and a plurality of function determination models 2342, 2344, and 2346. The device determination model 2330 is the same as the device determination model 2330 of fig. 3, and therefore a duplicate description will not be given.
The new device 4000c is a target device that is logged in by using the same user account as the user account of the hub device 1000 and is expected to be registered in the user account of the voice assistant server 2000. The new device 4000c may be connected to the hub device 1000, the voice assistant server 2000, and the IoT server 3000 through a wired or wireless communication network.
In the embodiment of fig. 19, the new device 4000c itself may analyze text, and a function determination model 4032c for performing an operation according to the user's intention based on the analysis result of the text may be stored in the memory of the new device 4000 c. For example, when the new device 4000c is an "air purifier", the function determination model 4032c stored in the memory of the new device 4000c may be a model for determining a function of an air conditioner and obtaining operation information on detailed operations of the determined function and relationships between the detailed operations. The function determination model 4032c may include an NLU model 4034c and an action plan management module 4036 c.
However, the present disclosure is not limited thereto, and the new device 4000c may not include the function determination model 4032 c. In another embodiment of the present disclosure, the new device 4000c may include a device determination model.
In operation S1910, the new device 4000c obtains user account information through login. The user account information includes a user id and a password. The obtained user account information may be the same as the user account information of the hub device 1000.
In operation S1920, the hub device 1000 provides the IoT server 3000 with the device identification information and the determination capability information of the device determination model 1330. The term "device determination capability information" refers to information about the capability of the hub device 1000 to interpret a text by using the first NLU model 1332 included in the device determination model 1330 and to determine a device for performing an operation from the interpretation result of the text by using the device determination model 1330. The capability information may include information about device candidates used by the device determination model 1330 to determine the operation performing device. For example, the device determination model 1330 of the hub device 1000 may be a model trained to interpret only texts related to an air conditioner, a TV, and a refrigerator and determine only one device among the air conditioner, the TV, and the refrigerator as an operation performing device according to the interpretation result. In this case, the candidate devices may be "air conditioner, TV, and refrigerator".
In operation S1930, new device 4000c provides IoT server 3000 with user account information, identification information (e.g., device id information) of new device 4000c, and stored information of the device determination model and the function determination model. The term "stored information of the device determination model" refers to information as to whether the new device 4000c itself stores in the internal memory a device determination model for determining the operation execution device. The term "stored information of the function determination model" refers to information on whether the new device 4000c stores, in the internal memory, the function determination model for obtaining operation information on detailed operations of performing operations according to functions and relationships between the detailed operations. The stored information of the function determination model may include information on whether to store not only the function determination model corresponding to the new device 4000c but also a function determination model corresponding to a device other than the new device 4000 c.
In operation S1940, the IoT server 3000 adds the new device 4000c to the device list corresponding to the user account. The IoT server 3000 may store the device identification information and the storage information of the device determination model and the function determination model of the new device 4000c in the device list registered according to each user account. Operation S1940 may be an operation of registering the new device 4000 c. When operation S1940 is performed, the new device 4000c is a registered device.
In operation S1950, the IoT server 3000 provides the user account, the device list, and the stored information of the device determination model and the function determination model of each device candidate included in the device list to the voice assistant server 2000.
In operation S1960, the voice assistant server 2000 determines a device storing a device determination model among a plurality of devices registered in a user account as the hub device 1000. In an embodiment of the present disclosure, the voice assistant server 2000 may identify a device storing a device determination model based on the stored information of the device determination model of each of the plurality of devices registered according to the user account obtained from the IoT server 3000, and may determine the identified device as the hub device 1000.
In operation S1970, the voice assistant server 2000 transmits the device identification information and the storage information of the device determination model and the function determination model of the new device 4000c to the hub device 1000.
In operation S1980, the hub device 1000 updates the device determination model 1330 so as to add the new device 4000c to the device candidates that can be determined as the operation performing device by the device determination model 1330. By the update, the device determination model 1330 may interpret text related to the new device 4000c by using the first NLU model 1332, and the device determination model 1330 may determine the new device 4000c as the operation performing device according to the interpretation result.
For example, when the new device 4000c is an air purifier, before updating the device determination model 1330, the device determination model 1330 may not interpret a text such as "operate the fine dust purification mode", may not determine to operate the execution device, and thus may output a "failure" message. When the new device 4000c is added to the device candidate by updating the device determination model 1330, the device determination model 1330 may interpret the text saying "operate the fine dust purification mode" by using the updated first NLU model 1332, and may determine the air purifier as the new device 4000c as the operation execution device based on the interpretation result of the text.
Fig. 20 is a diagram illustrating a network environment including a hub device 1000, a plurality of devices 4000, and a voice assistant server 2000.
Referring to fig. 20, a hub device 1000, a plurality of devices 4000, a voice assistant server 2000, and an IoT server 3000 may be connected to each other by using a wired communication or wireless communication method, and may perform communication. In an embodiment of the present disclosure, the hub device 1000 and the plurality of devices 4000 may be directly connected to each other through a communication network, but the present disclosure is not limited thereto.
The hub device 1000 and the plurality of devices 4000 may be connected to the voice assistant server 2000, and the hub device 1000 may be connected to the plurality of devices 4000 through the server. Further, a hub device 1000 and a plurality of devices 4000 may be connected to the IoT server 3000. In another embodiment of the present disclosure, the hub device 1000 and the plurality of devices 4000 may each be connected to the voice assistant server 2000 through a communication network and may be connected to the IoT server 30000 through the voice assistant server 2000. In another embodiment of the present disclosure, the hub device 1000 may be connected to multiple devices 4000, and the hub device 1000 may be connected to multiple devices 4000 through one or more nearby access points. Further, the hub device 1000 may be connected to the plurality of devices 4000 in a state in which the hub device 1000 is connected to the voice assistant server 2000 or the IoT server 3000.
The hub device 1000, the plurality of devices 4000, the voice assistant server 2000, and the IoT server 3000 may be connected through a LAN, a WAN, a VAN, a mobile radio communication network, a satellite communication network, or a combination thereof. Examples of wireless communication methods may include, but are not limited to, Wi-Fi, bluetooth, BLE, Zigbee, WFD, UWB, IrDA, and NFC.
In an embodiment of the present disclosure, the hub device 1000 may receive a voice input of a user. At least one device of the plurality of devices 4000 may be a target device that receives a control command of the voice assistant server 2000 and/or the IoT server 3000 and performs a specific operation. At least one of the plurality of devices 4000 may be controlled to perform a particular operation based on the user's voice input received by the hub device 1000. In embodiments of the present disclosure, at least one of the plurality of devices 4000 may receive control commands from the hub device 1000 without receiving control commands from the voice assistant server 2000 and/or the IoT server 3000.
The hub device 1000 may receive voice input (e.g., an utterance) from a user. In embodiments of the present disclosure, the hub device 1000 may include an ASR model. In embodiments of the present disclosure, the hub device 1000 may include an ASR model with limited functionality. For example, the hub device 1000 may include an ASR model having functionality to detect a specified speech input (e.g., a wake-up input such as "Hi, Bixby" or "OK, Google") or to pre-process speech signals obtained from a portion of the speech input. Although the hub device 1000 is the AI speaker in fig. 20, the present disclosure is not limited thereto. In an embodiment of the present disclosure, one of the plurality of devices 4000 may be a hub device 1000. Further, the hub device 1000 may include a first NLU model, a second NLU model, and a natural language generation model. In this case, the hub device 1000 may receive a user's voice input through a microphone or may receive a user's voice input from at least one of the plurality of devices 4000. When receiving the user's speech input, the hub device 1000 may process the user's speech input by using the ASR model, the first NLU model, the second NLU model, and the natural language generation model, and may provide a response to the user's speech input.
The hub device 1000 may determine the type of the target device for performing the operation intended by the user based on the received voice signal. The hub device 1000 can receive the speech signal as an analog signal and can convert the speech portion into computer readable text by performing ASR. The hub device 1000 may interpret the text by using the first NLU model, and may determine the target device based on the interpretation result. The hub device 1000 may determine at least one of the plurality of devices 4000 as a target device. The hub device 1000 may select a second NLU model corresponding to the determined target device from among a plurality of stored second NLU models. The hub device 1000 may determine an operation to be performed by the target device requested by the user by using the selected second NLU model. Based on determining that there is no second NLU model corresponding to the determined target device among the plurality of stored second NLU models, the hub device 1000 may transmit at least a portion of the text to at least one of the plurality of devices 4000 and the voice assistant server 2000. The hub device 1000 transmits information about the determined operation to the target device so that the determined target device performs the determined operation.
The hub device 1000 may receive information for multiple devices 4000 from the IoT server 3000. The hub device 1000 may determine the target device by using the received information of the plurality of devices 4000. Further, the hub device 1000 may control the target device to perform the determined operation by using the IoT server 3000 as a relay server for transmitting information on the determined operation.
The hub device 1000 may receive a voice input of a user through a microphone and may transmit the received voice input to the voice assistant server 2000. In an embodiment of the present disclosure, the hub device 1000 may obtain a voice signal from the received voice input and may transmit the voice signal to the voice assistant server 2000.
In the embodiment of fig. 20, the plurality of devices 4000 include, but are not limited to, a first device 4100 as an air conditioner, a second device 4200 as a TV, a third device 4300 as a washing machine, and a fourth device 4400 as a refrigerator. For example, the plurality of devices 4000 may include at least one of a smartphone, a tablet PC, a mobile phone, a video phone, an e-book reader, a desktop PC, a laptop PC, a netbook computer, a workstation, a server, a PDA, a PMP, an MP3 player, an ambulatory medical device, a camera, and a wearable device. In an embodiment of the present disclosure, the plurality of devices 4000 may be home appliances. The home appliance may include at least one of a TV, a Digital Video Disc (DVD) player, an audio device, a refrigerator, an air conditioner, a dust collector, an oven, a microwave oven, a washing machine, an air purifier, a set-top box, a home automation control panel, a security control panel, a game machine, an electronic key, a video camera, and an electronic photo frame.
The voice assistant server 2000 may determine the type of the target device for performing the operation intended by the user based on the received voice signal. The voice assistant server 2000 may receive the voice signal as an analog signal from the hub device 1000 and may convert the voice portion into computer readable text by performing ASR. The voice assistant server 2000 may interpret the text by using the first NLU model, and may determine the target device based on the interpretation result. Further, the voice assistant server 2000 can receive at least a portion of the text and information about the target device determined by the hub device 1000 from the hub device 1000. In this case, the hub device 1000 converts the user speech signal into text by using the ASR model and the first NLU model of the hub device 1000, and determines the target device by interpreting the text. Further, the hub device 1000 transmits at least a portion of the text and information about the determined target device to the voice assistant server 2000.
The voice assistant server 2000 may determine an operation to be performed by the target device requested by the user by using the second NLU model corresponding to the determined target device. The voice assistant server 2000 may receive information of the plurality of devices 4000 from the IoT server 3000. The voice assistant server 2000 may determine the target device by using the received information of the plurality of devices 4000. Further, the voice assistant server 2000 may control the target device to perform the determined operation by using the IoT server 3000 as a relay server for transmitting information on the determined operation. The IoT server 3000 may store information about a plurality of devices 4000 connected through a network and previously registered. In an embodiment of the present disclosure, the IoT server 3000 may store at least one of identification information (e.g., device id information) of the plurality of devices 4000, a device type of each of the plurality of devices 4000, and function execution capability information of each of the plurality of devices 4000.
In an embodiment of the present disclosure, the IoT server 3000 may store state information regarding power on/off or operations being performed of each of the plurality of devices 4000. The IoT server 3000 may transmit a control command for performing the determined operation to a target device among the plurality of devices 4000. The IoT server 3000 may receive information about the determined target device and information about the determined operation from the voice assistant server 2000, and may transmit a control command to the target device based on the received information.
Fig. 21A and 21B are diagrams illustrating a voice assistant model 200 that can be executed by a hub device 1000 and a voice assistant server 2000, according to embodiments of the present disclosure.
Referring to fig. 21A and 21B, the voice assistant model 200 is implemented as software. The voice assistant model 200 may be configured to determine a user's intent from the user's voice input and to control a target device related to the user's intent. When a device controlled by the voice assistant model 200 is added, the voice assistant model 200 may include a first assistant model 200a configured to update an existing model to a new model through learning or the like, and a second assistant model 200b configured to add a model corresponding to the added device to the existing model.
The first assistant model 200a is a model that determines a target device related to the user's intention by analyzing the user's speech input. The first assistant model 200a can include an ASR model 202, an NLG model 204, a first NLU model 300a, and a device determination model 310. In embodiments of the present disclosure, device determination model 310 may include first NLU model 300 a. In another embodiment of the present disclosure, the device determination model 310 and the first NLU model 300a may be configured as separate elements.
The device determination model 310 is a model for performing an operation of determining a target device by using the analysis result of the first NLU model 300 a. The device determination model 310 may include a plurality of detailed models, and one of the plurality of detailed models may be the first NLU model 300 a. The first NLU model 300a or the device determination model 310 may be an AI model.
When a device controlled by the speech assistant model 200 is added, the first assistant model 200a may update at least the device determination model 310 and the first NLU model 300a by learning. Learning may refer to learning using both training data for training the existing device determination model and the first NLU model, as well as additional training data related to the added device. Further, learning may refer to updating the device determination model and the first NLU model by using only additional training data related to the added device.
The second assistant model 200b, which is a model specific to a specific device, is a model that determines an operation to be performed by a target device corresponding to a voice input of a user from among a plurality of operations that can be performed by the specific device. In fig. 21A, a second assistant model 200b can include a plurality of second NLU models 300b, NLG models 206, and action plan management modules 210. The plurality of second NLU models 300b may correspond to a plurality of different devices, respectively. The second NLU model, the NLG model, and the action plan management module may be models implemented by a rule-based system. In an embodiment of the present disclosure, the second NLU model, the NLG model, and the action plan management model may be AI models. The plurality of second NLU models may be elements of a plurality of function determination models.
When a device controlled by the voice assistant model 200 is added, the second assistant model 200b can be configured to add a second NLU model corresponding to the added device. That is, the second assistant model 200b may include a second NLU model corresponding to the added device in addition to the existing plurality of second NLU models 300 b. In this case, the second assistant model 200b may be configured to select a second NLU model corresponding to the determined target device from among a plurality of second NLU models including the added second NLU model by using the information about the target device determined by the first assistant model 200 a.
Referring to fig. 21B, the second assistant model 200B may include a plurality of action plan management models and a plurality of NLG models. In fig. 21B, the plurality of second NLU models included in the second assistant model 200B may respectively correspond to the second NLU models 300B of fig. 21A, each of the plurality of NLG models included in the second assistant model 200B may correspond to the NLG model 206 of fig. 21A, and each of the plurality of action plan management models included in the second assistant model 200B may correspond to the action plan management module 210 of fig. 21A.
In fig. 21B, a plurality of action plan management models may be configured to correspond to a plurality of second NLU models, respectively. Further, the plurality of NLG models may be configured to correspond to the plurality of second NLU models, respectively. In another embodiment of the present disclosure, one NLG model may be configured to correspond to a plurality of second NLU models, and one action plan management model may be configured to correspond to a plurality of second NLU models.
In fig. 21B, when a device controlled by the voice assistant model 200 is added, the second assistant model 200B may be configured to add a second NLU model, an NLG model, and an action plan management model corresponding to the added device.
In fig. 21B, when a device controlled by the voice assistant model 200 is added, the first NLU model 300a may be configured to be updated to a new model by learning or the like. Further, when the device determination model 310 includes the first NLU model 300a, the device determination model 310 may be configured such that when a device controlled by the voice assistant model 200 is added, the existing model is completely updated to a new model by learning or the like. The first NLU model 300a or the device determination model 310 may be an AI model. Learning may refer to learning using both training data for training the existing device determination model and the first NLU model, as well as additional training data related to the added device. Further, learning may refer to updating the device determination model and the first NLU model by using only additional training data related to the added device.
In fig. 21B, when a device controlled by the voice assistant model 200 is added, the second assistant model 200B may be updated by adding a second NLU model, an NLG model, and an action plan management model corresponding to the added device to the existing model. The second NLU model, the NLG model, and the action plan management model may be models implemented by a rule-based system.
In the embodiment of fig. 21B, the second NLU model, the NLG model, and the action plan management model may be AI models. The second NLU model, the NLG model, and the action plan management model may all be managed as one device according to the corresponding device. In this case, the second assistant model 200b may include a plurality of second assistant models 200b-1, 200b-2, and 200b-3 corresponding to a plurality of devices, respectively. For example, the second NLU model corresponding to the TV, the NLG model corresponding to the TV, and the action plan management model corresponding to the TV may be managed as the second assistant model 200 b-1. Further, the second NLU model corresponding to the speaker, the NLG model corresponding to the speaker, and the action plan management model corresponding to the speaker may be managed as the second assistant model 200b-2 corresponding to the speaker. Further, the second NLU model corresponding to the refrigerator, the NLG model corresponding to the refrigerator, and the action plan management model corresponding to the refrigerator may be managed as the second assistant model 200b-3 corresponding to the refrigerator.
When a device controlled by the voice assistant model 200 is added, the second assistant model 200b can be configured to add a second assistant model corresponding to the added device. That is, the second assistant model 200b may include a second assistant model corresponding to the added device in addition to the existing plurality of second assistant models 200b-1 to 200 b-3. In this case, the second assistant model 200b may be configured to select a second assistant model corresponding to the determined target device from among a plurality of second assistant models including a second assistant model corresponding to the added device by using information about the target device determined by the first assistant model 200 a.
The programs executed by the hub device 1000, the voice assistant server 2000, and the plurality of devices 4000 according to the present disclosure may be implemented as hardware components, software components, and/or a combination of hardware components and software components. The program may be executed by any system capable of executing computer readable instructions.
The software may include computer programs, code, instructions, or a combination of one or more thereof, and may configure the processing device to operate as desired or to individually or collectively instruct the processing device.
The software may be implemented in a computer program comprising instructions stored in a computer readable storage medium. The computer-readable storage medium may include, for example, magnetic storage media (e.g., ROM, RAM, floppy disks, hard disks, etc.) and optical read media (e.g., Compact Disks (CD) -ROMs, DVDs, etc.). The computer readable recording medium can be distributed over network coupled computer systems and can store and execute computer readable code in a distributed fashion. The medium may be computer readable, may be stored in a memory, and may be executed by a processor.
The computer-readable storage medium may be provided in the form of a non-transitory storage medium. Here, "non-transitory" means that the storage medium does not include a signal and is tangible, but does not distinguish whether data is stored on the storage medium semi-permanently or temporarily.
Further, the program according to the embodiments of the present disclosure may be provided in a computer program product. The computer program product is a product that is purchasable between a seller and a buyer.
The computer program product may include a software program and a computer-readable storage medium in which the software program is stored. For example, the computer program product may include a computer program product for a device or an electronic marketplace (e.g., GooglePlay)TMStore, App store, etc.) is a software program type of product (e.g., a downloadable application) that is electronically distributed by the manufacturer. For electronic distribution, at least a portion of the software program may be stored in a storage medium or temporarily generated. In this case, the storage medium may be a storage medium of a manufacturer's server, a server of an electronic market, or a broadcasting server that temporarily stores the software program.
The computer program product may comprise a storage medium of a server or a storage medium of a device in a system comprising a server and a device. Alternatively, when there is a third device (e.g., a smartphone) connected to communicate with the server or the device, the computer program product may include a storage medium of the third device. Alternatively, the computer program product may comprise a software program sent from the server to the device or to the third device or from the third device to the device.
In this case, one of the server, the device and the third device may perform the method according to an embodiment of the present disclosure by executing the computer program product. Alternatively, at least two of the server, the device and the third device may perform the method according to embodiments of the present disclosure in a distributed manner by executing the computer program product.
For example, a server (e.g., an IoT server or a voice assistant server) may execute a computer program product stored in the server and control devices connected with the server for communication to perform methods in accordance with embodiments of the present disclosure.
As another example, the third device may execute the computer program product and control a device connected to the third device for communication to perform a method according to an embodiment of the present disclosure.
When the third device executes the computer program product, the third device may download the computer program product from the server and execute the downloaded computer program product. Alternatively, the third device may execute a computer program product provided in a free-loaded state and perform a method according to embodiments of the present disclosure.
Although the embodiments of the present disclosure have been described with respect to the limited embodiments and the accompanying drawings as described above, those skilled in the art may make various modifications and changes in light of the above description. For example, the described techniques may be performed in a different order than the described methods, and/or elements of the described computer systems, modules, etc. may be combined or integrated in a different form than the described methods, or may be substituted or replaced by other elements or equivalents to achieve the appropriate results.

Claims (11)

1. A method, comprising:
based on the user's voice input received by the hub device: converting, by the hub device, the received speech input into text by performing Automatic Speech Recognition (ASR);
identifying, by the hub device, a device capable of performing an operation corresponding to the text;
identifying which device stores a function determination model corresponding to a device capable of performing an operation corresponding to a text, from among a hub device and a plurality of other devices connected to the hub device; and
the device that determines the model based on the identified storage function is a different device than the hub device, and at least a portion of the text is sent to the identified device.
2. The method of claim 1, wherein identifying devices capable of performing the operation comprises:
analyzing the text using a first Natural Language Understanding (NLU) model; and
determining a device capable of performing the operation based on the analysis result of the text, wherein the NLU is a device determination model.
3. The method of claim 1, further comprising: analyzing at least a portion of the text using a second Natural Language Understanding (NLU) model included in the device determination model, and obtaining operation information related to an operation corresponding to the text based on a result of the analysis of the at least a portion of the text.
4. The method of claim 1, further comprising: obtaining information about the function determination model stored in at least one device from at least one device storing the function determination model.
5. The method of claim 4, wherein identifying which device stores the function determination model comprises: identifying, based on the obtained information on the function determination model, a device that stores the function determination model corresponding to the identified device capable of performing the operation.
6. A hub device for controlling a device based on a voice input, the hub device comprising:
a communication interface configured to perform data communication with at least one of a plurality of devices, a voice assistant server, and an internet of things (IoT) server;
a microphone configured to receive a voice input of a user;
a memory configured to store a program comprising one or more instructions; and
a processor configured to execute one or more instructions of a program stored in memory to:
converts speech input received through a microphone into text by performing Automatic Speech Recognition (ASR),
identifying a device capable of performing an operation corresponding to the text;
Identifying which device stores a function determination model corresponding to a device capable of performing an operation corresponding to a text, from among a hub device and a plurality of other devices connected to the hub device; and
the device that determines the model based on the identified memory function is a different device than the hub device, and the control communication interface transmits at least a portion of the text to the identified device that determines the model based on the memory function.
7. The hub device of claim 6, wherein the processor is further configured to identify devices capable of performing operations corresponding to text using a device determination model comprising a first Natural Language Understanding (NLU) model configured to analyze text, and to determine devices capable of performing operations corresponding to text based on a result of the analysis of text.
8. The hub device of claim 6, wherein the function determination model comprises a second NLU model configured to analyze at least a part of the text and obtain operation information related to an operation to be performed by a device capable of performing an operation corresponding to the text based on a result of the analysis of the at least a part of the text.
9. The hub device of claim 6, wherein the processor is further configured to execute the one or more instructions to control the communication interface to obtain information about the function determination model stored in the at least one device from the at least one device storing the function determination model for determining a function associated with each of the plurality of devices.
10. The hub device of claim 9, wherein the processor is further configured to execute the one or more instructions to identify which device stores the function determination model corresponding to the device capable of performing the operation corresponding to the text based on the obtained information about the function determination model.
11. A non-transitory computer-readable recording medium having recorded thereon a program for executing the method according to claim 1 on a computer.
CN202080031822.XA 2019-05-02 2020-04-29 Hub device, multi-device system including hub device and plurality of devices, and operating method thereof Pending CN113748458A (en)

Applications Claiming Priority (11)

Application Number Priority Date Filing Date Title
KR10-2019-0051824 2019-05-02
KR20190051824 2019-05-02
US201962862201P 2019-06-17 2019-06-17
US62/862,201 2019-06-17
US201962905707P 2019-09-25 2019-09-25
US62/905,707 2019-09-25
KR1020190123310A KR20200127823A (en) 2019-05-02 2019-10-04 The hub device, multi device system comprising the hub device and a plurality of devices and method operating the same
KR10-2019-0123310 2019-10-04
KR1020200027217A KR20200127853A (en) 2019-05-02 2020-03-04 The hub device, multi device system comprising the hub device and a plurality of devices and method operating the same
KR10-2020-0027217 2020-03-04
PCT/KR2020/005704 WO2020222539A1 (en) 2019-05-02 2020-04-29 Hub device, multi-device system including the hub device and plurality of devices, and method of operating the same

Publications (1)

Publication Number Publication Date
CN113748458A true CN113748458A (en) 2021-12-03

Family

ID=73451372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080031822.XA Pending CN113748458A (en) 2019-05-02 2020-04-29 Hub device, multi-device system including hub device and plurality of devices, and operating method thereof

Country Status (2)

Country Link
KR (2) KR20200127823A (en)
CN (1) CN113748458A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI820985B (en) * 2022-10-28 2023-11-01 犀動智能科技股份有限公司 Internet of things equipment integrated control system and Internet of things equipment integrated control method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180022021A (en) * 2016-08-23 2018-03-06 삼성전자주식회사 Method and electronic device for recognizing voice
CN107808672A (en) * 2016-09-07 2018-03-16 三星电子株式会社 For the server and method being controlled to external equipment
US9984686B1 (en) * 2015-03-17 2018-05-29 Amazon Technologies, Inc. Mapping device capabilities to a predefined set
US10127908B1 (en) * 2016-11-11 2018-11-13 Amazon Technologies, Inc. Connected accessory for a voice-controlled device
US20190019518A1 (en) * 2014-04-08 2019-01-17 Panasonic Intellectual Property Corporation Of America Device control method, device management system, and voice input apparatus
US20190066670A1 (en) * 2017-08-30 2019-02-28 Amazon Technologies, Inc. Context-based device arbitration
CN109474658A (en) * 2017-09-07 2019-03-15 三星电子株式会社 Electronic equipment, server and the recording medium of task run are supported with external equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190019518A1 (en) * 2014-04-08 2019-01-17 Panasonic Intellectual Property Corporation Of America Device control method, device management system, and voice input apparatus
US9984686B1 (en) * 2015-03-17 2018-05-29 Amazon Technologies, Inc. Mapping device capabilities to a predefined set
US10031722B1 (en) * 2015-03-17 2018-07-24 Amazon Technologies, Inc. Grouping devices for voice control
KR20180022021A (en) * 2016-08-23 2018-03-06 삼성전자주식회사 Method and electronic device for recognizing voice
CN107808672A (en) * 2016-09-07 2018-03-16 三星电子株式会社 For the server and method being controlled to external equipment
US10127908B1 (en) * 2016-11-11 2018-11-13 Amazon Technologies, Inc. Connected accessory for a voice-controlled device
US20190066670A1 (en) * 2017-08-30 2019-02-28 Amazon Technologies, Inc. Context-based device arbitration
CN109474658A (en) * 2017-09-07 2019-03-15 三星电子株式会社 Electronic equipment, server and the recording medium of task run are supported with external equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI820985B (en) * 2022-10-28 2023-11-01 犀動智能科技股份有限公司 Internet of things equipment integrated control system and Internet of things equipment integrated control method

Also Published As

Publication number Publication date
KR20200127853A (en) 2020-11-11
KR20200127823A (en) 2020-11-11

Similar Documents

Publication Publication Date Title
EP3734597A1 (en) Hub device, multi-device system including the hub device and plurality of devices, and method of operating the same
KR102429436B1 (en) Server for seleting a target device according to a voice input, and controlling the selected target device, and method for operating the same
EP3734596B1 (en) Determining target device based on speech input of user and controlling target device
JP6744314B2 (en) Updating Language Understanding Classifier Model for Digital Personal Assistant Based on Crowdsourcing
US20180336049A1 (en) Crowdsourced on-boarding of digital assistant operations
RU2689203C2 (en) Flexible circuit for adjusting language model
US20210241775A1 (en) Hybrid speech interface device
US11238860B2 (en) Method and terminal for implementing speech control
US11031011B2 (en) Electronic device and method for determining electronic device to perform speech recognition
CN110858481A (en) System for processing a user speech utterance and method for operating the system
US20180366113A1 (en) Robust replay of digital assistant operations
CN110956963A (en) Interaction method realized based on wearable device and wearable device
US20210136433A1 (en) Hub device, multi-device system including the hub device and plurality of devices, and operating method of the hub device and multi-device system
CN113748458A (en) Hub device, multi-device system including hub device and plurality of devices, and operating method thereof
KR102487078B1 (en) The hub device, multi device system comprising the hub device and a plurality of devices and method operating the same
US11243740B2 (en) Electronic device and method for controlling same
US20240129567A1 (en) Hub device, multi-device system including the hub device and plurality of devices, and operating method of the hub device and multi-device system
US20220130381A1 (en) Customized interface between electronic devices
KR102306542B1 (en) Method and device for processing information
US10546069B2 (en) Natural language processing system
KR20240047567A (en) Method for providing service of smart device
CN115346521A (en) Permission determination method of intelligent sound box, local server and intelligent sound box
WO2019083603A1 (en) Robust replay of digital assistant operations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination