WO2023048379A1

WO2023048379A1 - Server and electronic device for processing user utterance, and operation method thereof

Info

Publication number: WO2023048379A1
Application number: PCT/KR2022/010924
Authority: WO
Inventors: 박상민; 여재영; 송가진
Original assignee: 삼성전자주식회사
Priority date: 2021-09-24
Filing date: 2022-07-26
Publication date: 2023-03-30
Also published as: KR20230043397A

Abstract

An intelligent server for processing a user utterance is provided. The intelligent server may comprise: a memory for storing context information including information on domains corresponding to electronic devices and information on each of the electronic devices; and a processor which: generates, on the basis of the context information and a target utterance received from one of one or more electronic devices, combinations of domain information and information on an electronic device capable of processing the target utterance; determines, among the context information, reference information for processing of the target utterance; calculates a quality of service score for each of the combinations with reference to the reference information; determines a target combination of a target electronic device and a target domain corresponding to the target electronic device with reference to the quality of service score; and transmits a command to process the target utterance by using the target domain to the target electronic device.

Description

Server processing user utterance, electronic device, and operation method thereof

The following embodiments relate to an intelligent server that processes user speech, an electronic device, and an operating method thereof.

2. Description of the Related Art Various electronic devices equipped with a voice assistant function that provides a service based on a user's speech are becoming widespread. The electronic device can recognize the user's speech through an artificial intelligence server and grasp the meaning and intention of the speech. The artificial intelligence server interprets the user's utterance to infer the user's intention and can perform tasks according to the inferred intention. You can work according to it.

The artificial intelligence server may analyze various information about the situation at the time of utterance in connection with the utterance in order to determine the utterance intention.

The above is provided as background information only to aid in understanding the disclosure. No determination is made or any assertion is made as to whether any of the above may apply as prior art with respect to the disclosure.

Recently, as electronic devices capable of performing various functions, such as smart watches, smart refrigerators, and/or smart speakers, have increased, it has become important for an artificial intelligence server to determine which device to process speech.

The artificial intelligence server may prioritize the electronic device to process user speech according to a predefined policy, and after determining the electronic device to process user speech, process the user speech among the applications of the electronic device. application can be determined. For example, after the device is determined, the intention of the user's utterance may be classified, and an application to process the utterance among applications in the device may be determined.

However, the method of determining an application to process speech within the corresponding device after determining the electronic device only considers whether or not the application supports speech, and does not consider the service quality of the application.

One aspect of the disclosure is to solve at least the problems and/or disadvantages mentioned above and provide at least the advantages described below. Accordingly, the disclosure may provide a server and an electronic device for processing user speech and an operation method thereof.

Additional aspects are described, in part, in the description that follows, and in part, apparent from the description below or may be learned by practice of disclosed embodiments.

According to one aspect of the disclosure, an intelligent server for processing user utterances is provided. The intelligent server includes a memory for storing context information including information on each of at least one electronic device and information on at least one domain corresponding to each of the at least one electronic device, and the at least one electronic device. Based on the target speech received from any one of the electronic devices and the context information, at least one combination of electronic device information capable of processing the target speech and domain information is generated, and processing of the target speech is performed among the context information. Determines reference information for the reference information, calculates a quality of service score for each of the at least one combination with reference to the reference information, and calculates a quality of service score for each of the at least one combination, and determines a target electronic device and the target electronic device based on the quality of service score. and a processor for determining a target combination of a corresponding target domain and transmitting a command to the target electronic device to process the target utterance as the target domain.

According to one aspect of the disclosure, a method for processing user utterances in an intelligent server is provided. The method may include receiving a target speech from one or more electronic devices, and generating at least one combination of electronic device information capable of processing the target speech and domain information based on the target speech and context information. - The context information includes information on each of the at least one electronic device and information on at least one domain corresponding to each of the at least one electronic device - Processing of the target utterance among the context information An operation of determining reference information for , an operation of calculating a quality of service score for each of the at least one combination with reference to the reference information, and an operation of calculating a quality of service score for each of the at least one combination. An operation of determining a target combination of a target domain corresponding to a target electronic device and an operation of transmitting a command to process the target utterance as the target domain to the target electronic device.

According to one aspect of the disclosure, an electronic device for processing user utterance is provided. The electronic device includes context information including information on each of at least one electronic device including the electronic device and information on at least one domain corresponding to each of the at least one electronic device; a memory for storing computer-executable instructions; and based on the target speech and the context information received from the electronic device, generating at least one combination of electronic device information capable of processing the target speech and domain information, and a criterion for processing the target speech among the context information. determining information, calculating a quality of service score for each of the one or more combinations with reference to the reference information, and based on the quality of service score, a target electronic device and a corresponding target electronic device and a processor that determines a target combination of target domains and transmits a command to the target electronic device to process the target utterance as the target domain.

According to various embodiments, an intelligent server and an electronic device may be provided that process utterances in consideration of the service quality of the electronic device and the application.

According to various implementations, a better user experience may be provided by classifying user intentions based on a combination of electronic devices and applications without having to classify user intentions for each electronic device.

According to various embodiments, by classifying user intentions based on a combination of electronic devices and applications instead of classifying user intentions for each electronic device, the configuration of the user intention classifier may be simplified, learning time may be reduced, and consistent responses may be possible.

Other aspects, advantages and salient features of the disclosure will be apparent to those skilled in the art from the following detailed description disclosing various embodiments of the disclosure with reference to the accompanying drawings.

1 is a block diagram of an electronic device in a network environment according to an embodiment of the disclosure.

2 is a block diagram illustrating an integrated intelligence system according to an embodiment of the disclosure.

3 is a diagram illustrating a user terminal displaying a screen for processing a voice input received through an intelligent app, according to an embodiment of the disclosure.

4 is a diagram illustrating a form in which relationship information between concepts and actions is stored in a database according to an implementation of the disclosure.

5 is a block diagram illustrating an electronic device and an intelligent server, according to an embodiment of the disclosure.

6, 7, 8, and 9 are diagrams for explaining an operation of processing user utterance according to various embodiments of the present disclosure.

10 is a flowchart illustrating an ignition processing operation of an intelligent server according to an embodiment of the present disclosure.

Like reference numbers are used throughout the drawings to indicate like elements.

With reference to the accompanying drawings, the following descriptions are provided to aid in a thorough understanding of various embodiments of the disclosure as defined by the claims and equivalents thereof. It contains numerous specific details to aid understanding, but is to be understood as merely illustrative. Accordingly, those skilled in the art will appreciate that various changes and modifications may be made to the various embodiments described herein without departing from the spirit and scope of the disclosure. Also, descriptions of well-known functions and structures may be omitted for clarity and conciseness.

The terms and words used in the following description and claims are not limited to their bibliographical meanings, but are used by the inventors only for a clear and consistent understanding of the disclosure. Accordingly, the description below of various embodiments of the disclosure is provided for purposes of explanation only and is not intended to limit the disclosure as defined by the appended claims and equivalents thereof.

Unless the context clearly dictates otherwise, the singular forms such as "a", "an" and "the" are understood to include the plural forms as well. Thus, for example, “a component surface” includes reference to one or more such surfaces.

Electronic Devices and Intelligent Servers

Referring to FIG. 1 , in a network environment 100, an electronic device 101 communicates with an electronic device 102 through a first network 198 (eg, a short-range wireless communication network) or through a second network 199. It may communicate with at least one of the electronic device 104 or the server 108 through (eg, a long-distance wireless communication network). According to one implementation, the electronic device 101 may communicate with the electronic device 104 through the server 108 . According to an embodiment, the electronic device 101 includes a processor 120, a memory 130, an input module 150, an audio output module 155, a display module 160, an audio module 170, a sensor module ( 176), interface 177, connection terminal 178, haptic module 179, camera module 180, power management module 188, battery 189, communication module 190, subscriber identification module 196 , or the antenna module 197 may be included. In some implementations, in the electronic device 101, at least one of these components (eg, the connection terminal 178) may be omitted or one or more other components may be added. In some implementations, some of these components (eg, sensor module 176, camera module 180, or antenna module 197) may be integrated into a single component (eg, display module 160). can

The processor 120, for example, executes software (eg, the program 140) to cause at least one other component (eg, hardware or software component) of the electronic device 101 connected to the processor 120. It can control and perform various data processing or calculations. According to one implementation, as at least part of data processing or operation, processor 120 may store instructions or data received from other components (eg, sensor module 176 or communication module 190) in volatile memory 132. It may store, process commands or data stored in the volatile memory 132, and store resultant data in the non-volatile memory 134. According to one embodiment, the processor 120 includes a main processor 121 (eg, a central processing unit or an application processor) or a secondary processor 123 (eg, a graphics processing unit, a neural network processing unit (NPU)) that can operate independently of or together with the main processor 121 (eg, a central processing unit or an application processor). : neural processing unit), image signal processor, sensor hub processor, or communication processor). For example, when the electronic device 101 includes the main processor 121 and the auxiliary processor 123, the auxiliary processor 123 may use less power than the main processor 121 or be set to be specialized for a designated function. can The secondary processor 123 may be implemented separately from or as part of the main processor 121 .

The secondary processor 123 may, for example, take the place of the main processor 121 while the main processor 121 is in an inactive (eg, sleep) state, or the main processor 121 is active (eg, running an application). ) state, together with the main processor 121, at least one of the components of the electronic device 101 (eg, the display module 160, the sensor module 176, or the communication module 190) It is possible to control at least some of the related functions or states. According to one embodiment, the auxiliary processor 123 (eg, an image signal processor or a communication processor) may be implemented as part of other functionally related components (eg, the camera module 180 or the communication module 190). . According to one embodiment, the auxiliary processor 123 (eg, a neural network processing device) may include a hardware structure specialized for processing an artificial intelligence model. AI models can be created through machine learning. Such learning may be performed, for example, in the electronic device 101 itself where the artificial intelligence model is performed, or may be performed through a separate server (eg, the server 108). The learning algorithm may include, for example, supervised learning, unsupervised learning, semi-supervised learning or reinforcement learning, but in the above example Not limited. The artificial intelligence model may include a plurality of artificial neural network layers. Artificial neural networks include deep neural networks (DNNs), convolutional neural networks (CNNs), recurrent neural networks (RNNs), restricted boltzmann machines (RBMs), deep belief networks (DBNs), bidirectional recurrent deep neural networks (BRDNNs), It may be one of deep Q-networks or a combination of two or more of the foregoing, but is not limited to the foregoing examples. The artificial intelligence model may include, in addition or alternatively, software structures in addition to hardware structures.

The memory 130 may store various data used by at least one component (eg, the processor 120 or the sensor module 176) of the electronic device 101 . The data may include, for example, input data or output data for software (eg, program 140) and commands related thereto. The memory 130 may include volatile memory 132 or non-volatile memory 134 .

The program 140 may be stored as software in the memory 130 and may include, for example, an operating system 142 , middleware 144 , or an application 146 .

The input module 150 may receive a command or data to be used by a component (eg, the processor 120) of the electronic device 101 from the outside of the electronic device 101 (eg, a user). The input module 150 may include, for example, a microphone, a mouse, a keyboard, a key (eg, a button), or a digital pen (eg, a stylus pen).

The sound output module 155 may output sound signals to the outside of the electronic device 101 . The sound output module 155 may include, for example, a speaker or a receiver. The speaker can be used for general purposes such as multimedia playback or recording playback. A receiver may be used to receive an incoming call. According to one embodiment, the receiver may be implemented separately from the speaker or as part of it.

The display module 160 may visually provide information to the outside of the electronic device 101 (eg, a user). The display module 160 may include, for example, a display, a hologram device, or a projector and a control circuit for controlling the device. According to an embodiment, the display module 160 may include a touch sensor configured to detect a touch or a pressure sensor configured to measure the intensity of force generated by the touch.

The audio module 170 may convert sound into an electrical signal or vice versa. According to one embodiment, the audio module 170 acquires sound through the input module 150, the sound output module 155, or an external electronic device connected directly or wirelessly to the electronic device 101 (eg, an electronic device). Sound may be output through the device 102 (eg, a speaker or a headphone).

The sensor module 176 detects an operating state (eg, power or temperature) of the electronic device 101 or an external environmental state (eg, a user state), and generates an electrical signal or data value corresponding to the detected state. can do. According to one embodiment, the sensor module 176 may include, for example, a gesture sensor, a gyro sensor, a barometric pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biometric sensor, and a temperature sensor. A sensor, a humidity sensor, a hall sensor, or an illuminance sensor may be included.

The interface 177 may support one or more designated protocols that may be used to directly or wirelessly connect the electronic device 101 to an external electronic device (eg, the electronic device 102). According to one embodiment, the interface 177 may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.

The connection terminal 178 may include a connector through which the electronic device 101 may be physically connected to an external electronic device (eg, the electronic device 102). According to one embodiment, the connection terminal 178 may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (eg, a headphone connector).

The haptic module 179 may convert electrical signals into mechanical stimuli (eg, vibration or motion) or electrical stimuli that a user may perceive through tactile or kinesthetic senses. According to one embodiment, the haptic module 179 may include, for example, a motor, a piezoelectric element, or an electrical stimulation device.

The camera module 180 may capture still images and moving images. According to one embodiment, the camera module 180 may include one or more lenses, image sensors, image signal processors, or flashes.

The power management module 188 may manage power supplied to the electronic device 101 . According to one implementation, the power management module 188 may be implemented as at least part of a power management integrated circuit (PMIC), for example.

The battery 189 may supply power to at least one component of the electronic device 101 . According to one implementation, battery 189 may include, for example, a non-rechargeable primary cell, a rechargeable secondary cell, or a fuel cell.

The communication module 190 is a direct (eg, wired) communication channel or a wireless communication channel between the electronic device 101 and an external electronic device (eg, the electronic device 102, the electronic device 104, or the server 108). Establishment and communication through the established communication channel may be supported. The communication module 190 may include one or more communication processors that operate independently of the processor 120 (eg, an application processor) and support direct (eg, wired) communication or wireless communication. According to one embodiment, the communication module 190 may be a wireless communication module 192 (eg, a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 194 (eg, a : a local area network (LAN) communication module or a power line communication module). Among these communication modules, a corresponding communication module is a first network 198 (eg, a short-range communication network such as Bluetooth, wireless fidelity (WiFi) direct, or infrared data association (IrDA)) or a second network 199 (eg, a legacy communication module). It may communicate with the external electronic device 104 through a cellular network, a 5G network, a next-generation communication network, the Internet, or a telecommunications network such as a computer network (eg, a LAN or a WAN). These various types of communication modules may be integrated as one component (eg, a single chip) or implemented as a plurality of separate components (eg, multiple chips). The wireless communication module 192 uses subscriber information (eg, International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module 196 within a communication network such as the first network 198 or the second network 199. The electronic device 101 may be identified or authenticated.

The wireless communication module 192 may support a 5G network after a 4G network and a next-generation communication technology, for example, NR access technology (new radio access technology). NR access technologies include high-speed transmission of high-capacity data (enhanced mobile broadband (eMBB)), minimization of terminal power and access of multiple terminals (massive machine type communications (mMTC)), or high reliability and low latency (ultra-reliable and low latency (URLLC)). -latency communications)) can be supported. The wireless communication module 192 may support, for example, a high frequency band (eg, a millimeter wave (mmWave) band) in order to achieve a high data rate. The wireless communication module 192 uses various technologies for securing performance in a high frequency band, such as beamforming, massive multiple-input and multiple-output (MIMO), and full-dimensional multiplexing. Technologies such as input/output (FD-MIMO: full dimensional MIMO), array antenna, analog beam-forming, or large scale antenna may be supported. The wireless communication module 192 may support various requirements defined for the electronic device 101, an external electronic device (eg, the electronic device 104), or a network system (eg, the second network 199). According to one embodiment, the wireless communication module 192 may be used to realize peak data rate (eg, 20 Gbps or more) for realizing eMBB, loss coverage (eg, 164 dB or less) for realizing mMTC, or U-plane latency (for realizing URLLC). Example: downlink (DL) and uplink (UL) each of 0.5 ms or less, or round trip 1 ms or less) may be supported.

The antenna module 197 may transmit or receive signals or power to the outside (eg, an external electronic device). According to one embodiment, the antenna module 197 may include an antenna including a radiator formed of a conductor or a conductive pattern formed on a substrate (eg, PCB). According to one embodiment, the antenna module 197 may include a plurality of antennas (eg, an array antenna). In this case, at least one antenna suitable for a communication method used in a communication network such as the first network 198 or the second network 199 is selected from the plurality of antennas by the communication module 190, for example. can be chosen A signal or power may be transmitted or received between the communication module 190 and an external electronic device through the selected at least one antenna. According to some implementations, other components (eg, a radio frequency integrated circuit (RFIC)) may be additionally formed as a part of the antenna module 197 in addition to the radiator.

According to various implementations, the antenna module 197 may form a mmWave antenna module. According to one embodiment, the mmWave antenna module includes a printed circuit board, an RFIC disposed on or adjacent to a first surface (eg, bottom surface) of the printed circuit board and capable of supporting a designated high frequency band (eg, mmWave band), and and a plurality of antennas (eg, array antennas) disposed on or adjacent to a second surface (eg, a top surface or a side surface) of the printed circuit board and capable of transmitting or receiving signals of the designated high frequency band. can

At least some of the components are connected to each other through a communication method between peripheral devices (eg, a bus, general purpose input and output (GPIO), serial peripheral interface (SPI), or mobile industry processor interface (MIPI)) and signal ( e.g. commands or data) can be exchanged with each other.

According to one implementation, commands or data may be transmitted or received between the electronic device 101 and the external electronic device 104 through the server 108 connected to the second network 199 . Each of the external

electronic devices

102 or 104 may be the same as or different from the electronic device 101 . According to one embodiment, all or part of operations executed in the electronic device 101 may be executed in one or more external electronic devices among the external

electronic devices

102 , 104 , or 108 . For example, when the electronic device 101 needs to perform a certain function or service automatically or in response to a request from a user or another device, the electronic device 101 instead of executing the function or service by itself. Alternatively or additionally, one or more external electronic devices may be requested to perform the function or at least part of the service. One or more external electronic devices receiving the request may execute at least a part of the requested function or service or an additional function or service related to the request, and deliver the execution result to the electronic device 101 . The electronic device 101 may provide the result as at least part of a response to the request as it is or additionally processed. To this end, for example, cloud computing, distributed computing, mobile edge computing (MEC), or client-server computing technology may be used. The electronic device 101 may provide an ultra-low latency service using, for example, distributed computing or mobile edge computing. In one embodiment, the external electronic device 104 may include an internet of things (IoT) device. Server 108 may be an intelligent server using machine learning and/or neural networks. According to one implementation, the external electronic device 104 or server 108 may be included in the second network 199 . The electronic device 101 may be applied to intelligent services (eg, smart home, smart city, smart car, or health care) based on 5G communication technology and IoT-related technology.

2 is a block diagram illustrating an integrated intelligence system according to an embodiment of the present disclosure.

Referring to FIG. 2 , the integrated intelligent system 20 may include an electronic device 101 , an intelligent server 200 , and a service server 300 .

The electronic device 101 may be a terminal device (or electronic device) connectable to the Internet, and includes a mobile phone, a smart phone, a personal digital assistant (PDA), a laptop computer, a television (TV), white goods, a wearable device, and an HMD. , or a smart speaker.

As shown in FIG. 2 , the electronic device 101 includes an interface 177, a microphone 150-1, a speaker 155-1, a display module 160, a memory 130, or a processor 120. can include The components listed above may be operatively or electrically connected to each other. The microphone 150-1 may be included in an input module (eg, the input module 150 of FIG. 1). The speaker 155-1 may be included in an audio output module (eg, the audio output module 155 of FIG. 1).

The interface 177 may be connected to an external device to transmit/receive data. The microphone 150-1 may receive sound (eg, user's speech) and convert it into an electrical signal. The speaker 155-1 may output an electrical signal as sound (eg, voice). Display module 160 may be configured to display images or video. The display module 160 according to an embodiment may also display a graphic user interface (GUI) of an app (or application program) being executed.

The memory 130 may store a client module 151 , a software development kit (SDK) 153 , and a plurality of apps 146 . The client module 151 and the SDK 153 may constitute a framework (or solution program) for performing general functions. Also, the client module 151 or the SDK 153 may configure a framework for processing voice input.

The plurality of apps 146 in the memory 130 may be programs for performing designated functions. The plurality of apps 146 may include a first app 146-1 and a second app 146-2. Each of the plurality of apps 146 may include a plurality of operations for performing a designated function. For example, the apps may include an alarm app, a message app, and/or a schedule app. The plurality of apps 146 may be executed by the processor 120 to sequentially execute at least some of the plurality of operations.

The processor 120 may control overall operations of the electronic device 101 . For example, the processor 120 may be electrically connected to the interface 177, the microphone 150-1, the speaker 155-1, and the display module 160 to perform a designated operation.

The processor 120 may also execute a program stored in the memory 130 to perform a designated function. For example, the processor 120 may execute at least one of the client module 151 and the SDK 153 to perform the following operation for processing a voice input. The processor 120 may control operations of the plurality of apps 146 through the SDK 153, for example. The following operations described as operations of the client module 151 or the SDK 153 may be operations performed by the processor 120 .

The client module 151 may receive voice input. For example, the client module 151 may receive a voice signal corresponding to a user's speech detected through the microphone 150-1. The client module 151 may transmit the received voice input to the intelligent server 200. The client module 151 may transmit state information of the electronic device 101 to the intelligent server 200 together with the received voice input. The state information may be, for example, execution state information of an app.

The client module 151 may receive a result corresponding to the received voice input. For example, the client module 151 may receive a result corresponding to the received voice input when the intelligent server 200 can calculate a result corresponding to the received voice input. The client module 151 may display the received result on the display module 160 .

The client module 151 may receive a plan corresponding to the received voice input. The client module 151 may display on the display module 160 a result of executing a plurality of operations of the app according to the plan. For example, the client module 151 may sequentially display execution results of a plurality of operations on the display module 160 . For another example, the electronic device 101 may display on the display module 160 only some results of executing a plurality of operations (eg, a result of the last operation).

The client module 151 may receive a request for obtaining information necessary for calculating a result corresponding to a voice input from the intelligent server 200 . According to one embodiment, the client module 151 may transmit the necessary information to the intelligent server 200 in response to the request.

The client module 151 may transmit information as a result of executing a plurality of operations according to a plan to the intelligent server 200 . The intelligent server 200 can confirm that the received voice input has been correctly processed using the result information.

The client module 151 may include a voice recognition module. The client module 151 may recognize a voice input that performs a limited function through the voice recognition module. For example, the client module 151 may execute an intelligent app for processing a voice input to perform an organic operation through a designated input (eg, wake up!).

The intelligent server 200 may receive information related to a user's voice input from the electronic device 101 through a communication network. The intelligent server 200 may change data related to the received voice input into text data. The intelligent server 200 may generate a plan for performing a task corresponding to a user voice input based on the text data.

The plan may be generated by an artificial intelligent (AI) system. The artificial intelligence system may be a rule-based system, a neural network-based system (e.g., a feedforward neural network (FNN)), a recurrent neural network (RNN) ))) could be. Alternatively, it may be a combination of the foregoing or other artificially intelligent systems. According to one implementation of the disclosure, the plan may be selected from a set of predefined plans or may be generated in real time in response to a user request. For example, the artificial intelligence system may select at least one plan from a plurality of predefined plans.

The intelligent server 200 may transmit a result according to the generated plan to the electronic device 101 or transmit the generated plan to the electronic device 101 . The electronic device 101 may display the result according to the plan on the display module 160 . The electronic device 101 may display the result of executing the operation according to the plan on the display module 160 .

The intelligent server 200 includes a front end 210, a natural language platform 220, a capsule DB 230, an execution engine 240, and an end user interface. (end user interface) 250, a management platform 260, a big data platform 270, or an analytic platform 280 may be included.

The front end 210 may receive a voice input received from the electronic device 101 . The front end 210 may transmit a response corresponding to the voice input.

The natural language platform 220 includes an automatic speech recognition module (ASR module) 221, a natural language understanding module (NLU module) 223, a planner module 225 , a natural language generator module (NLG module) 227 or a text to speech module (TTS module) 229.

The automatic voice recognition module 221 may convert the voice input received from the electronic device 101 into text data. The natural language understanding module 223 may determine the user's intention using text data of the voice input. For example, the natural language understanding module 223 may determine the user's intention by performing syntactic analysis or semantic analysis. The natural language understanding module 223 determines the user's intention by identifying the meaning of a word extracted from a voice input using linguistic features (eg, grammatical elements) of a morpheme or phrase, and matching the meaning of the identified word to the intention. can

The planner module 225 may generate a plan using the intent and parameters determined by the natural language understanding module 223 . The planner module 225 may determine a plurality of domains required to perform the task based on the determined intent. The planner module 225 may determine a plurality of operations included in each of the determined plurality of domains based on the intent. The planner module 225 may determine parameters necessary for executing the determined plurality of operations or result values output by the execution of the plurality of operations. The parameter and the resulting value may be defined as a concept of a designated format (or class). Accordingly, the plan may include a plurality of actions and a plurality of concepts determined by the user's intention. The planner module 225 may determine relationships between the plurality of operations and the plurality of concepts in stages (or hierarchically). For example, the planner module 225 may determine an execution order of a plurality of operations determined based on a user's intention based on a plurality of concepts. In other words, the planner module 225 may determine an execution order of the plurality of operations based on parameters required for execution of the plurality of operations and results output by the execution of the plurality of operations. Accordingly, the planner module 225 may generate a plan including a plurality of operations and association information (eg, an ontology) between a plurality of concepts. The planner module 225 may generate a plan using information stored in the capsule database 230 in which a set of relationships between concepts and operations is stored.

The natural language generation module 227 may change designated information into text form. The information changed to the text form may be in the form of natural language speech. The text-to-speech conversion module 229 may change text-type information into voice-type information.

According to the disclosure, some or all of the functions of the natural language platform 220 may be implemented in the electronic device 101 as well.

The capsule database 230 may store information about relationships between a plurality of concepts and operations corresponding to a plurality of domains. The capsule may include a plurality of action objects (action objects or action information) and concept objects (concept objects or concept information) included in the plan. The capsule database 230 may store a plurality of capsules in the form of a concept action network (CAN). According to an embodiment, a plurality of capsules may be stored in a function registry included in the capsule database 230.

The capsule database 230 may include a strategy registry in which strategy information necessary for determining a plan corresponding to a voice input is stored. The strategy information may include reference information for determining one plan when there are a plurality of plans corresponding to the voice input. The capsule database 230 may include a follow-up registry in which information on a follow-up action is stored to suggest a follow-up action to the user in a designated situation. The follow-up action may include, for example, a follow-up utterance. The capsule database 230 may include a layout registry that stores layout information of information output through the electronic device 101 . The capsule database 230 may include a vocabulary registry in which vocabulary information included in capsule information is stored. The capsule database 230 may include a dialog registry in which dialog (or interaction) information with a user is stored. The capsule database 230 may update stored objects through a developer tool. The developer tool may include, for example, a function editor for updating action objects or concept objects. The developer tool may include a vocabulary editor for updating vocabulary. The developer tool may include a strategy editor for creating and registering strategies that determine plans. The developer tool may include a dialog editor to create a dialog with the user. The developer tool may include a follow up editor that can activate follow up goals and edit follow up utterances that provide hints. The subsequent goal may be determined based on a currently set goal, a user's preference, or environmental conditions. The capsule database 230 may also be implemented in the electronic device 101 .

The execution engine 240 may calculate a result using the generated plan. The end user interface 250 may transmit the calculated result to the electronic device 101 . Accordingly, the electronic device 101 may receive the result and provide the received result to the user. The management platform 260 may manage information used in the intelligent server 200 . The big data platform 270 may collect user data. The analysis platform 280 may manage quality of service (QoS) of the intelligent server 200 . For example, the analysis platform 280 may manage the components and processing speed (or efficiency) of the intelligent server 200 .

The service server 300 may provide a designated service (eg, food ordering (CP service A) 301 or hotel reservation (CP service B) 302 ) to the electronic device 101 . The service server 300 may be a server operated by a third party. The service server 300 may provide information for generating a plan corresponding to the received voice input to the intelligent server 200 . The provided information may be stored in the capsule database 230. In addition, the service server 300 may provide result information according to the plan to the intelligent server 200.

In the integrated intelligence system 20 described above, the electronic device 101 may provide various intelligent services to the user in response to user input. The user input may include, for example, an input through a physical button, a touch input, or a voice input.

The electronic device 101 may provide a voice recognition service through an internally stored intelligent app (or voice recognition app). In this case, for example, the electronic device 101 may recognize a user's utterance or voice input received through the microphone, and provide a service corresponding to the recognized voice input to the user. .

The electronic device 101 may perform a designated operation alone or together with the intelligent server 200 and/or the service server 300 based on the received voice input. For example, the electronic device 101 may execute an app corresponding to the received voice input and perform a designated operation through the executed app.

When the electronic device 101 provides a service together with the intelligent server 200 and/or the service server 300, the electronic device detects user speech using the microphone 150-1, and A signal (or voice data) corresponding to the detected user speech may be generated. The electronic device may transmit the voice data to the intelligent server 200 through the interface 177.

As a response to the voice input received from the electronic device 101, the intelligent server 200 may generate a plan for performing a task corresponding to the voice input or a result of performing an operation according to the plan. there is. The plan may include, for example, a plurality of operations for performing a task corresponding to a user's voice input, and a plurality of concepts related to the plurality of operations. The concept may define parameters input to the execution of the plurality of operations or result values output by the execution of the plurality of operations. The plan may include information related to a plurality of operations and a plurality of concepts.

The electronic device 101 may receive the response using the interface 177. The electronic device 101 outputs a voice signal generated inside the electronic device 101 to the outside using the speaker 155-1 or uses the display module 160 to output a voice signal generated inside the electronic device 101. Images can be output externally.

3 is a diagram illustrating a screen on which an electronic device processes a voice input received through an intelligent app according to an embodiment of the present disclosure.

The electronic device 101 may execute an intelligent app to process a user input through an intelligent server (eg, the intelligent server 200 of FIG. 2 ).

Referring to FIG. 3 , in screen 310, the electronic device 101 processes a voice input when recognizing a designated voice input (eg, wake up!) or receiving an input through a hardware key (eg, a dedicated hardware key). You can run intelligent apps for The electronic device 101 may, for example, execute an intelligent app in a state in which a schedule app is executed. The electronic device 101 may display an object (eg, an icon) 311 corresponding to an intelligent app on a display (eg, the display module 160 of FIG. 1 ). The electronic device 101 may receive a voice input by a user's speech. For example, the electronic device 101 may receive a voice input saying “tell me this week's schedule!”. The electronic device 101 may display a user interface (UI) 313 (eg, an input window) of an intelligent app displaying text data of the received voice input on the display.

In screen 320, the electronic device 101 may display a result corresponding to the received voice input on the display. For example, the electronic device 101 may receive a plan corresponding to the received user input and display 'this week's schedule' on the display according to the plan.

A capsule database (eg, capsule database 230 of FIG. 2 ) of an intelligent server (eg, intelligent server 200 of FIG. 2 ) may store capsules in a concept action network (CAN) form. The capsule database may store an operation for processing a task corresponding to a user's voice input and parameters necessary for the operation in the form of a concept action network (CAN).

The capsule database may store a plurality of capsules (capsule (A) 401 and capsule (B) 404) corresponding to each of a plurality of domains (eg, applications). One capsule (eg, capsule (A) 401) may correspond to one domain (eg, location (geo), application). In addition, one capsule may correspond to at least one service provider (eg, CP 1 402, CP 2 403, or CP 4 405) for performing a function for a domain related to the capsule. One capsule may include at least one operation 410 and at least one concept 420 for performing a designated function. Other service providers, such as CP 3 406, do not need to correspond to the capsule.

A natural language platform (eg, the natural language platform 220 of FIG. 2 ) may generate a plan for performing a task corresponding to a received voice input using a capsule stored in a capsule database. For example, a planner module (eg, the planner module 225 of FIG. 2 ) of the natural language platform may generate a plan using capsules stored in a capsule database. For example, plan 407 is created using

operations

4011 and 4013 and

concepts

4012 and 4014 of capsule A 401 and operation 4041 and concept 4042 of capsule B 404. can do.

An electronic device disclosed in this document may be a device of various types. The electronic device may include, for example, a portable communication device (eg, a smart phone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a home appliance. An electronic device according to an embodiment of the present document is not limited to the aforementioned devices.

Various embodiments of the document and terms used therein are not intended to limit the technical features described in this document to specific embodiments, but should be understood to include various modifications, equivalents, or substitutes of the embodiments. In connection with the description of the drawings, like reference numbers may be used for like or related elements. The singular form of a noun corresponding to an item may include one item or a plurality of items, unless the relevant context clearly dictates otherwise. In this document, "A or B", "at least one of A and B", "at least one of A or B", "A, B or C", "at least one of A, B and C", and "A Each of the phrases such as "at least one of , B, or C" may include any one of the items listed together in that phrase, or all possible combinations thereof. Terms such as "first", "second", or "first" or "secondary" may simply be used to distinguish that component from other corresponding components, and may refer to that component in other respects (eg, importance or order) is not limited. A (eg, first) component is said to be "coupled" or "connected" to another (eg, second) component, with or without the terms "functionally" or "communicatively." When mentioned, it means that the certain component may be connected to the other component directly (eg by wire), wirelessly, or through a third component.

The term "module" used in various embodiments of this document may include a unit implemented in hardware, software, or firmware, and is interchangeable with terms such as, for example, logic, logical blocks, parts, or circuits. can be used as A module may be an integrally constructed component or a minimal unit of components or a portion thereof that performs one or more functions. For example, according to one embodiment, the module may be implemented in the form of an application-specific integrated circuit (ASIC).

Various embodiments of this document are stored in a storage medium (eg, internal memory 136 or external memory 138) readable by a machine (eg, electronic device 101 of FIG. 1 ). It may be implemented as software (eg, program 140) comprising one or more instructions. For example, a processor (eg, the processor 120 ) of a device (eg, the electronic device 101 ) may call at least one command among one or more instructions stored from a storage medium and execute it. This enables the device to be operated to perform at least one function according to the at least one command invoked. The one or more instructions may include code generated by a compiler or code executable by an interpreter. The device-readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-temporary' only means that the storage medium is a tangible device and does not contain a signal (e.g. electromagnetic wave), and this term refers to the case where data is stored semi-permanently in the storage medium. It does not discriminate when it is temporarily stored.

According to the implementation of the disclosure, the method according to various embodiments disclosed in this document may be provided by being included in a computer program product. Computer program products may be traded between sellers and buyers as commodities. A computer program product is distributed in the form of a device-readable storage medium (e.g. compact disc read only memory (CD-ROM)), or through an application store (e.g. Play Store™) or on two user devices (e.g. It can be distributed (eg downloaded or uploaded) online, directly between smart phones. In the case of online distribution, at least part of the computer program product may be temporarily stored or temporarily created in a device-readable storage medium such as a manufacturer's server, an application store server, or a relay server's memory.

According to various embodiments, each component (eg, module or program) of the above-described components may include a single object or a plurality of objects, and some of the plurality of objects may be separately disposed in other components. . According to various embodiments, one or more components or operations among the aforementioned corresponding components may be omitted, or one or more other components or operations may be added. Alternatively or additionally, a plurality of components (eg modules or programs) may be integrated into a single component. In this case, the integrated component may perform one or more functions of each of the plurality of components identically or similarly to those performed by a corresponding component of the plurality of components prior to the integration. . According to various implementations, the actions performed by a module, program, or other component are executed sequentially, in parallel, iteratively, or heuristically, or one or more of the actions are executed in a different order, omitted, or , or one or more other operations may be added.

5 is a block diagram illustrating an electronic device and an intelligent server, according to embodiments of the disclosure.

electronic device

Referring to FIG. 5 , the electronic device 101 may include at least some of the components of the electronic device 101 described with reference to FIG. 1 and the electronic device 101 described with reference to FIG. 2 . The intelligent server 200 of FIG. 5 may include at least some of the components of the intelligent server 200 described with reference to FIG. 2 . In relation to the electronic device 101 and the intelligent server 200 of FIG. 5, descriptions overlapping those described with reference to FIGS. 1 to 4 will be omitted.

Referring to FIG. 5 , the electronic device 101 according to an embodiment includes an input module 150 for inputting user speech, a communication module 190 for communicating with the intelligent server 200 for processing speech, and a computer. It may include a memory 130 in which computer-executable instructions are stored and/or a processor 120 that accesses the memory 130 and executes the instructions. According to an embodiment, the electronic device 101, the input module 150, the communication module 190, the memory 130, and/or the processor 120 may include the electronic device 101 and the input module described with reference to FIG. 150, the communication module 190, the memory 130 and/or the processor 120 may correspond. The electronic device 101 may be an electronic device 101 that communicates with the intelligent server 200 described with reference to FIG. 2 , and the client module 151 may be included in the memory 130 .

The processor 120 may receive user speech through the input module 150 and transmit information about the user speech and the electronic device 101 to the intelligent server 200 . Information on the electronic device 101 includes information about the electronic device 101's specifications, such as account information, maximum supported volume information, and/or information on whether or not it is a professional device, whether or not it is locked. At least one of information about , information about a current location of the electronic device 101 , information about a ring tone setting value, and information about an application (app) of the electronic device 101 may be included. However, it is not limited thereto, and the processor 120 may transmit various information about the electronic device 101 to the intelligent server 200 .

The processor 120 transmits user speech and information about the electronic device 101 to the intelligent server 200 through the communication module 190, and provides the user with a speech processing result based on a command received from the intelligent server 200. can output

intelligent server

The intelligent server 200 may include a natural language platform 220 , a capsule database 230 , a communication module 590 , a processor 520 and/or a memory 530 . The intelligent server 200 is the intelligent server 200 described with reference to FIG. 2, and the communication module 590, processor 520, memory 530, natural language platform 220 and/or capsule database 230 are shown in FIG. 2 may correspond to the configuration of the intelligent server 200.

The communication module 590 may correspond to the front end 210 of FIG. 2 . The processor 520 may receive user speech and information about the electronic device 101 from the electronic device 101 through the communication module 590 . The intelligent server 200 receives information about the

electronic devices

102 and 104 from the electronic device 101 and other

electronic devices

102 and 104 interlocked with the electronic device 101 through the communication module 590 (e.g., electronic device specification information, application information installed in the electronic device) may be received. For example, a user may use various electronic devices such as an intelligent speaker 102, a smart watch 104, and/or a smart TV corresponding to a user account of the electronic device 101 (eg, a smartphone), The intelligent server 200 may receive device specification information and application performance information installed in the device from the smart phone 101 as well as the intelligent speaker 102 and/or the smart watch 104 and maintain them in the context information 540. . In the context information 540, information 541 on the

electronic devices

101, 102, and 104 and information 543 on capsules corresponding to each electronic device, for example, information on applications included in the electronic devices, are maintained. It can be.

The processor 520 may generate a processing result of the utterance received from the electronic device 101 and transmit the processing result to the electronic device 101 through the communication module 590 .

As described with reference to FIG. 2, the natural language platform 220 includes an automatic speech recognition module (ASR module) 221, a natural language understanding module (NLU module) 223, a planner module 225, and a natural language generation module. (NLG module) 227 or text-to-speech module (TTS module) 229 may be included. According to one embodiment, the memory 530 may include the capsule database 230. As described with reference to FIG. 2 , the capsule database 230 may store an operation for processing a task corresponding to a user's voice input and parameters necessary for the operation in the form of a concept action network (CAN) 400. . The concept action network 400 may be configured as described with reference to FIG. 4 .

Context information 540 may be stored in the memory 530 of the intelligent server 200 . The context information 540 may include information 541 on the

electronic devices

101 , 102 , and 104 and capsule information 543 corresponding to the

electronic devices

101 , 102 , and 104 . As described with reference to FIGS. 2 to 4 , the capsule information 543 may correspond to domain (eg, location (geo), application) information. The domain information is software capable of processing target speech through the electronic device 101, and includes at least one of an application downloadable to the

electronic device

101, 102, and 104, a program that provides services in the form of a widget, and a webapp. may contain one.

The context information 540 may be divided into permanent context information that does not change in real time and instant context information that changes in real time. Persistent context information includes network information of one or more

electronic devices

101, 102, and 104, account information of one or more

electronic devices

101, 102, and 104, and whether one or more

electronic devices

101, 102, and 104 are professional devices. information, and at least one of performance information of one or more domains. The instant context information may include at least one of user preference information of one or more domains, execution history information of one or more domains, and utterance history information received by the one or more

electronic devices

101, 102, and 104.

Persistent context information is transmitted from the electronic device to the intelligent server 200 when the

electronic devices

101, 102, and 104 initially connect to the intelligent server 200, and may be maintained in the memory 530 of the intelligent server 200. . Instant context information may be transmitted from the

electronic devices

101 , 102 , and 104 to the intelligent server 200 periodically or upon request.

The context information is temporarily stored in a cache included in the memory 530 of the intelligent server 200, and the processor 520 directly caches the context information in the memory 530 as needed without receiving it from the electronic device. Cached context information may be obtained.

In FIG. 5 , the context information 540 and the capsule database 230 are shown separately, but are not limited thereto and the context information 540 may be included in the capsule database 230 .

A memory 530 storing computer-executable instructions and a processor 520 accessing the memory to execute instructions are the natural language platform 220 or execution engine 240 of the intelligent server 200 described with reference to FIG. ) can correspond to For example, the processor 520 may generate a plan by referring to the capsule database 230 or the context information 540 as described for the natural language platform 220 in FIG. 2, and in FIG. As described for the execution engine 240, processing results may be generated according to a plan.

The processor 520 receives target speech from the electronic device 101 through the communication module 590, and processes the target speech by referring to the natural language platform 220, capsule database 230, and context information 540. may be generated and transmitted to the electronic device 101.

According to an implementation of the disclosure, an electronic device-domain combination capable of processing a target utterance is generated based on the target utterance and context information 540 received from any one of one or more

electronic devices

101, 102, and 104, A program (e.g., program 140 of FIG. 1) that determines the target electronic device and target domain to process the target utterance by calculating a quality of service score for each combination is stored in the memory 530 as software. can be stored

According to the disclosure, on-device artificial intelligence (AI) capable of processing speech without communication with the intelligent server 200 may be installed in the electronic device 101 . As described with reference to FIGS. 2 to 4 , the natural language platform 220 and/or the capsule database 230 may be implemented in the electronic device 101 , and the context information 540 is also the memory 130 of the electronic device. ) can be included. Based on the target utterance and context information 540 received from the user, an electronic device-domain combination capable of processing a target utterance is generated in the memory 130 of the electronic device 101, and a QoS score is obtained for each combination. A program (eg, program 140 of FIG. 1 ) that determines the target electronic device and target domain to process the target utterance by calculation may be stored as software.

When the electronic device 101 is loaded with on-device AI and functions of the intelligent server are implemented in the electronic device 101, only some functions of the intelligent server may be implemented in the electronic device 101. For example, only some components of the natural language platform 220 of the intelligent server 200 described with reference to FIG. 2 (eg, the automatic voice recognition module 221) may be implemented in the electronic device 101. For example, the electronic device 101 may include only the natural language platform 220 of the intelligent server 200, and the capsule database 230 or context information 540 may be maintained in the intelligent server 200.

The processor 520 of the intelligent server 200 receives a target speech from any one of one or more

electronic devices

101, 102, and 104 (eg, the electronic device 101 in FIG. 5), and receives the target speech and context. Based on the information 540, one or more combinations including electronic device information 541 capable of processing target speech and capsule information 543 (or domain information) may be generated. For example, the processor 520 is a smartphone capable of processing the target utterance "Play music to the maximum" based on the natural language platform 220, the capsule database 230, and the context information 540. -You can create combinations such as music app, smartphone-media player app, intelligent speaker-music app, intelligent speaker-media player app, smart refrigerator-music app, and smart air conditioner-music app.

The processor 520 may determine reference information about target speech processing from among the context information 540 . For example, with respect to the target utterance “Play music at maximum”, the maximum volume information of the

electronic devices

101, 102, and 104 among the electronic device information 541 is determined as reference information, and the

electronic devices

101, 102, Among the capsule information 543 (or domain information) corresponding to 104), “presence or absence of an amplification function” may be determined as reference information. The reference information may be previously determined based on the target utterance, and may be determined by analyzing the target utterance with reference to the natural language platform 220 .

The processor 520 may determine a target electronic device and a target domain by calculating a QoS score for each of one or more electronic device-domain combinations with reference to the reference information and determining a combination having the highest QoS score as a target combination. .

The processor 520 calculates quality of service as the sum of controllability scores, functionality scores, accessibility scores, and robustness scores as shown in Equation 1 below for each of one or more combinations. Scores are calculated, and as shown in [Equation 2] below, a combination with the highest service quality score may be determined as a target combination.

The processor 520 may determine a high controllability score when the combination of controllability of the electronic device and the corresponding domain is high. For example, with respect to the target utterance "record your voice the loudest", the processor 520 may determine a high controllability score for a combination of an electronic device having maximum input sensitivity and a domain having a sound source amplification function.

The processor 520 may determine a high functionality score when there are many shareable electronic devices or domains. For example, with respect to the target utterance of “Share me”, the processor 520 may determine a high functionality score for an electronic device-domain combination with a lot of sharing frequency and shareable media.

The processor 520 may determine a high accessibility score when automatic login is applied or there are few authentication steps. For example, for utterances related to personal accounts, such as “send me an email,” the processor 520 scores an accessibility score for a smartphone-email app combination, since the authentication process for a family device such as a TV is more complicated than that for a personal device such as a smartphone. can be determined high.

The processor 520 may determine a high function performance stability score when there is little conflict between domains in function execution. For example, if there is a follow-up utterance "Song title" after the utterance "Play 'I'm going to see you now'", a conflict between domains may occur due to a movie of the same name and a song of the same name, so the processor 520 determines the smart TV- Among media app combinations and intelligent speaker-music app combinations, a high function performance stability score can be determined for the intelligent speaker-music app combination.

The processor 520 may set different weights for the controllability score, the functionality score, the accessibility score, and the function performance stability score. For example, the processor 520 may calculate a QoS score by setting weights high for an accessibility score and a function performance stability score for utterances such as “~share me” or “~login me”.

However, the QoS score calculation process is not limited to the above-described method, and the processor 520 may determine a target combination composed of the target electronic device and the target domain by calculating the QoS score for the combination in various ways. According to the present disclosure, an internal policy (not shown) for natural language processing may be stored in the memory 530, and the processor 520 may refer to the internal policy when calculating a quality of service (QoS) score. For example, in addition to the above-mentioned controllability score, functionality score, accessibility score, and function performance stability score, the developer may define and add items related to the quality of service score to the internal policy, and the processor 520 considers the added items and A quality of service score can be calculated for a device-domain combination.

The processor 520 may transmit a command to process the target utterance to the target domain of the target electronic device to the target electronic device. The target electronic device may process the utterance in the target domain and output the result to the user.

The memory 530 or instructions stored in the memory 130 may be implemented as one function module in the operating system 142, middleware 144, or a separate application 146.

6 to 9 , the processor 120 of the electronic device 101 or the processor 520 of the intelligent server 200 processes the target utterance based on the target utterance and context information 540 received from the user. Various implementations of determining a target electronic device and target domain to process a target utterance by generating possible electronic device-domain combinations and calculating a quality of service score for each combination will be described in detail.

6 to 9 are views for explaining an operation of processing a user's speech, according to various embodiments of the present disclosure.

Referring to FIG. 6 , an embodiment of processing a speech related to music reproduction, for example, a target speech such as “play the music excitedly” is illustrated.

According to the implementation of the disclosure, a user's target utterance "play music happily" is input to any one of one or more electronic devices, and the target utterance is transmitted to the intelligent server 200 . In FIG. 6, only the smart phone 101 and the intelligent speaker 102 are shown for simplicity, but as described with reference to FIG. 5, target speech can be input through various electronic devices such as smart watches and smart refrigerators. There is, the target utterance can be transmitted to the intelligent server (200).

Referring to FIG. 6 , a situation 610 is a situation in which a target utterance is processed according to an existing method of determining an electronic device to process the utterance and then determining a domain to process the utterance. The processor 520 of the intelligent server 200 determines a target electronic device to process the received target utterance based on a predefined policy. For example, a device with good wake-up reception sensitivity may be determined as a target electronic device, and priorities among various electronic devices may be determined in advance. The intelligent server 200 may determine the smart phone 101, the intelligent speaker 102, and the intelligent speaker 102 of the smart TV (not shown) as a target electronic device according to a predefined policy. After determining the electronic device, the intelligent server 200 classifies the user's intention with a capsule classifier corresponding to the electronic device, and among capsules (or domains) corresponding to the electronic device, capsules capable of processing target utterances (or domain).

However, in this method, since the service quality of the application processing the speech in the target electronic device cannot be substantially considered, and the speech is processed simply based on capability, processing results that do not substantially meet the user's intention. may be provided. For example, with respect to the target speech “tell me today's fortune”, the processor 520 of the intelligent server 200 may determine the smart watch as the target electronic device and process the speech through the smart watch-fortune domain. However, the fortune-telling domain of the smart watch is a domain for generating a processing result of "This function cannot be supported", and if the utterance is processed in this way, a service that does not actually meet the user's intention may be provided.

In situation 610, the processor 520 determines the intelligent speaker 102 as the target electronic device according to a predefined policy, for example, a policy that the intelligent speaker 102, which is a professional device, processes a speech related to music, and , 'Music App' capable of processing a music play command among various applications included in the intelligent speaker 102 may be determined as a target domain. In the intelligent speaker 102, music may be played through a music app. However, in the case of a method of determining an electronic device and determining a domain, the performance of an application performing an operation in the electronic device may not be considered. For example, even though the current volume of the intelligent speaker 102 is set to 1 in situation 610, the intelligent server 200 determines the intelligent speaker's music app so that the intelligent speaker 102 can play the music 620. It can be played small.

Referring to FIG. 6 , context 650 involves processing utterances based on electronic device-domain combinations. The processor 520 of the intelligent server 200 is based on the target utterance “play the music happily” and the context information 540, one or more combinations of electronic device information capable of processing the target utterance and domain information. can create Whether the target utterance can be processed may be determined through the natural language platform 220 as in the conventional method. Although only the smart phone 101 and the intelligent speaker 102 are shown in FIG. 6 for brevity, the processor 520 uses the smart refrigerator-music app and the smart phone-music app for the target utterance of “play music excitedly”. , smart air conditioner-music app, and/or intelligent speaker-music app. In the case of the smart air conditioner-music app combination, a processing result of "This function cannot be supported" is generated as in the above description of the smart watch, but the processor 520 can process the target utterance. Can be determined as a combination .

The processor 520 may determine reference information about target speech processing from among the context information 540 . In the situation 650 of FIG. 6 , in response to “play music happily,” the processor 520 includes information on whether the electronic device is a professional device among the electronic device information 541 of the context information 540 and the current volume of the electronic device. information may be determined as reference information.

The processor 520 may calculate a QoS score for each of one or more combinations by referring to reference information. For example, in situation 650, the processor 520 refers to information on whether the electronic device is a professional device and current volume information among the context information 540, and selects a smart refrigerator-music app, a smartphone-music app, A service quality score may be calculated for each combination of the smart air conditioner-music app and the intelligent speaker-music app. As described with reference to FIG. 5 , the processor 520 may determine a controllability score, a functionality score, an accessibility score, and a function performance stability score for each combination, and calculate a QoS score as a sum of these scores.

In situation 650, the processor 520 may determine a combination having the highest QoS score as a target combination. For example, the processor 520 may determine that the QoS score of the smartphone-music app combination is the highest in consideration of information on whether the device is a professional device and information on the current volume, and “Play music excitedly” to the smartphone 101 . A command to process the target utterance of '' through the music app may be transmitted. In the smart phone 101, music may be played through a music app based on a command (660).

Referring to FIG. 7 , an implementation of processing a target utterance for reproduction at maximum volume, for example, “play music at maximum” is illustrated.

According to the implementation of the disclosure, a user's target utterance "Play music at maximum" is input to any one of one or more electronic devices, and the target utterance may be transmitted to the intelligent server 200 . In FIG. 7, only the smart phone 101 and the intelligent speaker 102 are shown for simplicity, but as described with reference to FIG. 5, target speech is received through various electronic devices such as a smart watch and a smart refrigerator, and the intelligent server (200).

Referring to FIG. 7 , a situation 710, like the situation 610 described with reference to FIG. 6 , is a target target according to an existing method in which a processor 520 determines an electronic device to process an utterance and then determines a domain to process the utterance. It is a situation in which ignition is being processed.

The processor 520 determines the intelligent speaker 102 as a target electronic device according to a predefined policy, for example, a policy that the intelligent speaker 102, which is a professional device, processes speech related to music, and selects a 'music app'. Music can be played through the music app of the intelligent speaker 102 by determining the target domain. However, in situation 710, although the maximum volume of the smart phone 101 and the intelligent speaker 102 is equal to 10, the music app of the smart phone 101 may support an amplification function, but the processor 520 is intelligent. The speaker 102 may be determined as a target electronic device. The processor 520 may determine to process an utterance through a music app through user intention classification among domains corresponding to the intelligent speaker 102, and the intelligent speaker 102 may play music at maximum volume. (720).

Referring to FIG. 7 , context 750 revolves around processing utterances based on electronic device-domain combinations. As described for the situation 650 with reference to FIG. 6 , the processor 520 of the intelligent server 200 may process the target utterance based on the target utterance “play the music to the maximum” and the context information 540 . One or more combinations consisting of capable electronic device information and capsule information may be created. Whether the target utterance can be processed may be determined through the natural language platform 220 as in the conventional method. Although only the smart phone 101 and the intelligent speaker 102 are shown in FIG. 7 for brevity, the processor 520 uses the smart refrigerator-music app and the smart phone-music for the target utterance of “play music to maximum”. You can create electronic device-domain combinations such as apps, smart air conditioner-music apps, and intelligent speakers-music apps.

The processor 520 may determine reference information about target speech processing from among the context information 540 . In the situation 750 of FIG. 7 , in response to “play the music at maximum,” the processor 520 includes information about whether the electronic device is a specialized device among the electronic device information 541 of the context information 540, and the information of the electronic device. Maximum volume information may be determined as reference information, and information on whether or not a domain has an amplification function among the capsule information 543 (or corresponding domain information) may be determined as reference information.

The processor 520 may calculate a QoS score for each of one or more combinations by referring to reference information. For example, in situation 750, the processor 520 refers to information on whether the electronic device is a professional device, maximum volume information, and domain amplification function information among the context information 540, and the smart refrigerator-music A service quality score may be calculated for each combination of the app, the smartphone-music app, the smart air conditioner-music app, and the intelligent speaker-music app. For example, as described with reference to FIG. 5 , the processor 520 may determine a controllability score, a functionality score, an accessibility score, and a function performance stability score for each combination, and calculate a QoS score as a sum of these scores.

In situation 750, the processor 520 may determine a combination having the highest QoS score as a target combination. For example, the processor 520 may determine that the smartphone-music app combination has the highest quality of service score, and command the smartphone 101 to process a target utterance "Play music to the maximum" through the music app. can transmit. In the smart phone 101 , by further using the amplification function of the music app, music may be reproduced at a higher volume than the volume reproduced in the situation 710 ( 720 ) ( 760 ).

Referring to FIG. 8 , an embodiment of processing a target speech for sound quality, for example, “play music with the best quality,” is illustrated.

According to the implementation of the disclosure, a user's target utterance "Play music with the best quality" is input to one of one or more electronic devices, and the target utterance is transmitted to the intelligent server 200 . In FIG. 8, only the smart phone 101 and the intelligent speaker 102 are shown for simplicity, but as described with reference to FIG. 5, the target utterance is received through various electronic devices such as a smart watch or a smart refrigerator. It can be transmitted to the intelligent server (200).

Referring to FIG. 8 , in situation 810, like situation 610 described with reference to FIG. 6 and situation 710 described with reference to FIG. 7, after processor 520 determines an electronic device to process speech, speech This is a situation in which a target utterance is processed according to an existing method of determining a domain to be processed.

The processor 520 determines the intelligent speaker 102 as a target electronic device according to a predefined policy, for example, a policy that the intelligent speaker 102, which is a professional device, processes speech related to music, and selects a 'music app'. Music can be played through the music app of the intelligent speaker 102 by determining the target domain. However, in situation 810, the sound quality of the intelligent speaker 102 is better than the sound quality of the smartphone 101, but when the sound quality of the application is also considered, the smartphone 101-app 1 combination has the best sound quality. Regardless, the processor 520 may determine the intelligent speaker 102 as the target electronic device. The processor 520 may determine to process an utterance with App 1 through user intention classification among domains corresponding to the intelligent speaker 102, and music may be played through App 1 in the intelligent speaker 102. (820).

Referring to FIG. 8 , context 850 revolves around processing utterances based on electronic device-domain combinations. As described for the situation 650 with reference to FIG. 6 and the situation 750 with reference to FIG. Based on , one or more combinations of electronic device information capable of processing the target utterance and capsule information may be generated. Whether the target utterance can be processed may be determined through the natural language platform 220 as in the conventional method. In FIG. 8, only the smart phone 101 and the intelligent speaker 102 are shown for simplicity, but the processor 520 uses the smart refrigerator-music app, the smart phone- Electronic device-domain combinations such as App1, smartphone-app2, smart air conditioner-music app, intelligent speaker-app1, and intelligent speaker-app2 can be created.

The processor 520 may determine reference information about target speech processing from among the context information 540 . In the situation 850 of FIG. 8 , for “play music with the best quality”, the processor 520 determines, among the electronic device information 541 of the context information 540, information about the sound quality of the electronic device as reference information. And, among the capsule information 543 (or corresponding domain information), information about sound quality of the domain may be determined as reference information.

The processor 520 may calculate a QoS score for each of one or more combinations by referring to reference information. For example, in situation 850, the processor 520 refers to the information about the sound quality of the electronic device and the information about the sound quality of the domain among the context information 540, and the smart refrigerator-music app and the smartphone-app. 1, the service quality score can be calculated for each combination of smart phone-app 2, smart air conditioner-music app, intelligent speaker-app 1, and intelligent speaker-app 2. For example, as described with reference to FIG. 5 , the processor 520 may determine a controllability score, a functionality score, an accessibility score, and a function performance stability score for each combination, and calculate a QoS score as a sum of these scores.

In situation 850, the processor 520 may determine a combination having the highest QoS score as a target combination. For example, the processor 520 may determine that the combination of smartphone-app1 has the highest quality of service score, and send a target utterance to the smartphone 101, “Play music with the best quality,” to the application of the smartphone 101. You can send a command to process through 1. In the smart phone 101, music may be played through App 1 (860).

Referring to FIG. 9 , an embodiment of processing a target utterance “What time is it?” is illustrated.

According to an implementation of the disclosure, a user's target utterance “what time is it?” is input to any one of one or more electronic devices, and the target utterance is transmitted to the intelligent server 200 . In FIG. 9, only the smart phone 101 and the smart watch 104 are shown for brevity, but as described with reference to FIG. 5, target speech is received through various electronic devices such as an intelligent speaker or a smart refrigerator. It can be transmitted to the intelligent server (200).

Referring to FIG. 9 , a situation 910, like the situation 610 described with reference to FIG. 6 , is a target target according to an existing method in which a processor 520 determines an electronic device to process an utterance and then determines a domain to process the utterance. It is a situation in which ignition is being processed.

The processor 520 determines the smart watch 104 as the target electronic device and the 'watch app' as the target domain according to a predefined policy, for example, a policy that processes an utterance with a nearby device, and determines the smart watch ( The utterance 920 may be processed through the watch app of 104 (920). However, this does not consider the performance of the domain, for example, response time, and processing speed may be relatively slow when the utterance 920 is processed by the smart watch 104.

Context 950 revolves around processing utterances based on electronic device-domain combinations. As described for the situation 650 with reference to FIG. 6 , the processor 520 of the intelligent server 200 may process the target utterance based on the target utterance “What time is it?” and the context information 540 . One or more combinations consisting of capable electronic device information and capsule information may be created. Whether the target utterance can be processed may be determined through the natural language platform 220 as in the conventional method. Although only the smart phone 101 and the smart watch 104 are shown in FIG. 9 for brevity, the processor 520 uses the smart refrigerator-clock app and the smart phone-clock app for the target utterance “What time is it now?” , smart air conditioner-watch app, and intelligent speaker-watch app.

The processor 520 may determine reference information about target speech processing from among the context information 540 . Information, such as response time, that is determined independently of a domain (eg, a watch app) and according to an electronic device (eg, the smart phone 101 or the smart watch 104 of FIG. 9 ) may be determined as reference information. For example, in the situation 950 of FIG. 9 , in response to “what time is it now?”, the processor 520 selects response time information from capsule information 543 (or corresponding domain information) of the context information 540. It can be determined by reference information.

The processor 520 may calculate a QoS score for each of one or more combinations by referring to reference information. For example, in situation 950, the processor 520 refers to the response time information of the domain among the context information 540, the smart refrigerator-clock app, the smartphone-clock app, the smart air conditioner-clock app, and the intelligent speaker. - A service quality score may be calculated by referring to information about response time, which is standard information, for each watch app combination. For example, referring to the situation 950 of FIG. 9, since the response speed of the smart watch 104 is 200 ms and the response speed of the smartphone 101 is 30 ms, the processor 520 determines the quality of service for the smart watch-watch app. The service quality score for the smartphone-watch app may be determined higher than the score. As described with reference to FIG. 5 , the processor 520 may determine a controllability score, a functionality score, an accessibility score, and a function performance stability score for each combination, and calculate a QoS score as a sum of these scores.

According to an embodiment, in situation 950, the processor 520 may determine a combination having the highest QoS score as the target combination. For example, the processor 520 may determine that the smartphone-watch app combination has the highest QoS score, and transmit a command to the smartphone 101 to process the target utterance "What time is it?" through the watch app. can The smartphone 101 may output a processing result for “what time is it now?” through a watch app (960).

The processor 520 may process the utterance more appropriately to the user's intention by determining the electronic device and domain to process the utterance with reference to the target utterance and the context information 540 . For example, processing conversations related to personal information such as text messages or phone calls with a personal device such as a smart phone rather than a common device such as a smart TV or an intelligent speaker may be more appropriate to the user's intention. The processor 520 of the intelligent server 200 may determine that the electronic device is a family device through the electronic device information 541 of the context information 540 when the number of accounts logged into the electronic device is plural, and "Call Mom Combinations such as an intelligent speaker-phone app and a smartphone-phone app that can handle the target utterance of "" can be created. As described with reference to FIG. 5 , function performance stability items (robustness) can be considered in the process of calculating the service quality score, and the personal device smart phone-phone app combination has a higher service quality than the intelligent speaker-phone app combination. Scores can be counted high. The intelligent server 200 may send a command to process “Call Mom” to the smart phone through the phone app, and the utterance may be processed by the phone app of the smart phone.

As the target utterance is processed based on the electronic device-domain combination, the user intent is classified only once rather than for each electronic device, so that 'this', 'that', 'that', and 'same' in subsequent utterances. Processing ability for demonstrative pronouns such as can be improved. As described above with reference to FIG. 5 , the processor 520 of the intelligent server 200 refers to the electronic device information 541 and the capsule information 543 (or domain information) in the context information 540 to process the utterance. Therefore, subsequent ignition processing through another electronic device may be facilitated.

For example, after the utterance "Search Changdeokgung on TV", a follow-up utterance "Search for the same thing on PC (personal computer)" may be transmitted to the intelligent server 200 . With the instant context information described with reference to FIG. 5 , information indicating that 'Changdeokgung Palace' was searched for on an electronic device such as a TV may be included in the context information 540, and the processor 520 may perform a subsequent target utterance "same as on a PC". In the process of "search me", it can be determined that 'same thing' is "Changdeokgung Palace" of the previous utterance.

As described for the electronic device 101 with reference to FIG. 5, the electronic device 101 may be equipped with an on-device AI, and various operations of the processor 520 described with reference to FIGS. 6 to 9 It can be performed without communication with the intelligent server 200 by the processor 120 of the device 101 .

How the intelligent server works

Referring to FIG. 10, operations 1010 to 1060 may be performed by the processor 520 of the intelligent server 200 described above with reference to FIG. 5, and have been described with reference to FIGS. 1 to 9 for concise description. Content that overlaps with the content may be omitted.

In operation 1010, the processor 520 may receive a target utterance from any one of one or more

electronic devices

101, 102, and 104. For example, as described with reference to FIG. 7 , the processor 520 may receive a target utterance “play the music to the maximum”.

In operation 1020, the processor 520 may generate one or more combinations of electronic device information and domain information capable of processing the target utterance based on the target utterance and the context information 540. For example, as described with reference to FIG. 7 , the processor 520 is capable of processing a target utterance of “play music to maximum”, a smart refrigerator-music app, a smartphone-music app, a smart air conditioner- Music apps, and electronic device-domain combinations such as intelligent speaker-music apps.

In operation 1030, the processor 520 may determine reference information for calculating a quality of service (QoS) score from among the context information 540 based on the target utterance. For example, as described with reference to FIG. 7 , for “play music to the maximum,” the processor 520 includes information about whether the electronic device is a professional device among electronic device information 541 of the context information 540, and Information on the maximum volume of the electronic device may be determined as reference information, and information on whether or not an amplification function of the domain exists among the capsule information 543 (or corresponding domain information) may be determined as reference information.

In operation 1040, the processor 520 may calculate a QoS score for each of one or more electronic device information-domain information combinations with reference to the reference information. For example, as described with reference to FIG. 7 , the processor 520 refers to information on whether the electronic device is a professional device, maximum volume information, and domain amplification function information among the context information 540, and the smart refrigerator. -Service quality scores may be calculated for each combination of the music app, the smartphone-music app, the smart air conditioner-music app, and the intelligent speaker-music app. As described with reference to FIG. 5 , in operation 1040, the processor 520 may determine a controllability score, a functionality score, an accessibility score, and a function performance stability score for each combination, and calculate a QoS score as the sum of these scores. .

In operation 1050, the processor 520 may determine a target combination including a target electronic device and a target domain based on the QoS score. For example, as described with reference to FIG. 7 , the processor 520 may determine that the QoS score of the smartphone-music app combination is the highest.

In operation 1060, the processor 520 may transmit a command to process the target utterance to the target domain to the target electronic device. For example, as described with reference to FIG. 7 , the processor 520 may transmit a command to process a target utterance of “play music to maximum” to the smartphone 101 through a music app. In the smart phone 101, music may be played at maximum through a music app.

Operations similar to operations 1010 to 1060 may be performed by the processor 120 of the electronic device 101 . As described above with reference to FIG. 5, on-device artificial intelligence (AI) capable of processing user speech without communication with the intelligent server 200 may be installed in the electronic device 101, for example, On-device artificial intelligence (AI) may be identical to or similar to the configuration of the natural language platform 220 and the capsule database 230 of the intelligent server 200 . The processor 120 receives a target utterance from the user, determines a target combination composed of a target electronic device and a target domain to process the target utterance in operations 1020 to 1050, and transfers the target electronic device to the target domain in operation 1060. You can send a command to process the target utterance with .

According to the implementation of the disclosure, the intelligent server 200 for processing user speech includes information 541 for each of the one or more

electronic devices

101, 102, and 104 and information 541 for each of the one or more

electronic devices

101, 102, and 104. a memory 530 storing context information 540 including information 543 on one or more domains corresponding to , and computer-executable instructions; and a processor 520 that accesses the memory 530 and executes instructions, wherein the instructions include target utterance and context information 540 received from any one of one or more

electronic devices

101, 102, and 104. ), generates one or more combinations consisting of electronic device information 541 and domain information 543 capable of processing the target speech, determines reference information for processing the target speech among the context information 540, , Calculate a quality of service score for each of one or more combinations with reference to reference information, determine a target combination composed of a target electronic device and a target domain corresponding to the target electronic device based on the quality of service score, , may be configured to transmit a command to process the target utterance to the target domain, to the target electronic device.

The instructions may be configured to determine information on whether the electronic device is a professional device and current volume information of the electronic device as reference information when the target utterance is a utterance related to music reproduction.

The commands may be configured to determine information on whether the electronic device is a professional device, information on the maximum volume of the electronic device, and information on whether or not the domain has an amplification function as reference information when the target utterance is an utterance for reproduction at a maximum volume. there is.

The instructions may be configured to determine, as reference information, information on the sound quality of the electronic device and information on the sound quality of the domain when the target speech is a speech about sound quality.

The context information 540 may include permanent context information that does not change in real time and instant context information that changes in real time.

Persistent context information includes network information of one or more

electronic devices

101, 102, and 104, account information of one or more

electronic devices

101, 102, and 104, and whether one or more

electronic devices

101, 102, and 104 are professional devices. Including at least one of information about, performance information of one or more domains,

The instant context information may include at least one of user preference information of one or more domains, execution history information of one or more domains, and utterance history information received by one or more

electronic devices

101, 102, and 104.

The domain is software capable of processing speech through a corresponding electronic device, and may include at least one of an application, a program that provides a service in the form of a widget, and a webapp.

The commands are controllability scores, functionality scores, accessibility scores, and function performance stability scores for each of one or more domains corresponding to one or more

electronic devices

101, 102, and 104, respectively. It may be configured to calculate the service quality score as a sum of .

According to the implementation of the disclosure, a method of processing a user utterance in the intelligent server 200 includes receiving a target utterance from any one of one or more

electronic devices

101, 102, and 104; Based on the target utterance and context information 540, generating one or more combinations consisting of electronic device information 541 capable of processing the target utterance and domain information 543 - context information 540 includes one or more electronic device information 541; information 541 for each of the

devices

101, 102, and 104 and information 543 for one or more domains corresponding to each of the one or more

electronic devices

101, 102, and 104; determining reference information for processing of target utterance among the context information 540; calculating a quality of service score for each of one or more combinations by referring to the reference information; determining a target combination composed of a target electronic device and a target domain corresponding to the target electronic device based on the QoS score; and transmitting, to the target electronic device, a command to process the target utterance into the target domain.

Determining the reference information may include determining whether the electronic device is a professional device and current volume information of the electronic device as the reference information when the target speech is a speech related to music reproduction.

In the operation of determining the reference information, when the target speech is a speech for reproduction at the maximum volume, information on whether the electronic device is a professional device, information on the maximum volume of the electronic device, and information on whether or not there is an amplification function of the domain are used as reference information. It may include a decision-making action.

The operation of determining the reference information may include an operation of determining information about the sound quality of the electronic device and information about the sound quality of the domain as the reference information when the target speech is a speech about sound quality.

Persistent context information includes network information of one or more

electronic devices

101, 102, and 104, account information of one or more

electronic devices

101, 102, and 104, and whether one or more

electronic devices

101, 102, and 104 are professional devices. It may include at least one of information about and performance information of one or more domains.

electronic devices

101, 102, and 104.

According to an implementation of the disclosure, the electronic device 101 processing a user utterance includes information 541 for each of one or more

electronic devices

101, 102, and 104 including the electronic device 101 and one or more Context information 540 including information 543 on one or more domains corresponding to each of the

electronic devices

101, 102, and 104, and computer-executable instructions this stored memory 130; and a processor 120 that accesses the memory 130 and executes instructions, wherein the instructions process the target speech based on the target speech and context information 540 received from the electronic device 101. generating one or more combinations composed of the electronic device information 541 and the domain information 543 that can be used, determining reference information for target utterance processing among the context information 540, referring to the reference information, and A quality of service score is calculated for each combination, a target combination composed of a target electronic device and a target domain corresponding to the target electronic device is determined based on the quality of service score, and as the target electronic device, the target It may be configured to send a command to process the utterance to the target domain.

Although the disclosure has been described and presented with reference to various embodiments, those skilled in the art may make various changes in form and detail without departing from the spirit and scope of the disclosure as defined by the appended claims and equivalents thereof. You will understand that it can be.

Claims

In an intelligent server that processes user utterances,

a memory configured to store context information including information on each of the at least one electronic device and information on at least one domain corresponding to each of the at least one electronic device; and

Based on the target speech and the context information received from any one of the at least one electronic device, at least one combination of electronic device information capable of processing the target speech and domain information is generated, and the target speech is selected from among the context information. Determines reference information for the processing of, calculates a quality of service score for each of the at least one combination with reference to the reference information, and calculates a quality of service score for each of the at least one combination, and based on the quality of service score, the target electronic device and the target A processor for determining a target combination of target domains corresponding to an electronic device and transmitting a command to the target electronic device to process the target utterance as the target domain

including,

server.
According to claim 1,

the processor,

When the target utterance is a utterance related to music reproduction,

determining information on whether the electronic device is a professional device and current volume information of the electronic device as the reference information;

server.
According to claim 1,

the processor,

When the target utterance is an utterance for maximum volume reproduction,

Determining information on whether the electronic device is a professional device, maximum volume information of the electronic device, and information on whether or not the domain has an amplification function as the reference information,

server.
According to claim 1,

the processor,

When the target utterance is an utterance for acoustic quality,

Determining information about the sound quality of the electronic device and information about the sound quality of the domain as the reference information,

server.
According to claim 1,

The context information,

Including permanent context information that does not change in real time and instant context information that changes in real time,

server.
According to claim 5,

The persistent context information,

Including at least one of network information of the at least one electronic device, account information of the at least one electronic device, information on whether the at least one electronic device is a professional device, and performance information of the at least one domain,

server.
According to claim 5,

The instant context information,

Including at least one of user preference information of the at least one domain, execution history information of the at least one domain, and speech history information received by the at least one electronic device,

server.
According to claim 1,

The domain is software capable of processing speech through a corresponding electronic device,

The software includes at least one of an application, a program that provides services in the form of a widget, and a webapp.

server.
According to claim 1,

the processor,

A sum of a controllability score, a functionality score, an accessibility score, and a robustness score for each of the at least one domain corresponding to each of the at least one electronic device. to calculate the quality score,

server.
A method for processing user utterances in an intelligent server,

receiving a target utterance from any one of at least one electronic device;

An operation of generating at least one combination of domain information and electronic device information capable of processing the target speech based on the target speech and context information, wherein the context information includes information about each of the at least one electronic device and the at least one electronic device. Includes information on at least one domain corresponding to each electronic device of -;

determining reference information for processing of the target utterance from among the context information;

calculating a quality of service score for each of the at least one combination by referring to the reference information;

determining a target combination of a target electronic device and a target domain corresponding to the target electronic device based on the QoS score; and

Transmitting, to the target electronic device, a command to process the target utterance as the target domain

including,

method.
According to claim 10,

The operation of determining the reference information,

When the target utterance is a utterance related to music reproduction,

An operation of determining information on whether the electronic device is a professional device and current volume information of the electronic device as the reference information

including,

method.
According to claim 10,

The operation of determining the reference information,

If the target utterance is an utterance for maximum volume reproduction,

Determining information on whether the electronic device is a professional device, information on the maximum volume of the electronic device, and information on whether or not there is an amplification function of the domain as the reference information

including,

method.
According to claim 10,

The operation of determining the reference information,

When the target utterance is an utterance for acoustic quality,

An operation of determining information about sound quality of an electronic device and information about sound quality of a domain as the reference information

including,

method.
According to claim 10,

The context information,

Including permanent context information that does not change in real time and instant context information that changes in real time,

method.
According to claim 14,

The persistent context information,

at least one of network information of the at least one electronic device, account information of the at least one electronic device, information on whether the at least one electronic device is a professional device, or performance information of the at least one domain;

The instant context information,

Including at least one of user preference information of the at least one domain, execution history information of the at least one domain, and speech history information received by the at least one electronic device,

method.