WO2018155810A1

WO2018155810A1 - Electronic device, control method therefor, and non-transitory computer readable recording medium

Info

Publication number: WO2018155810A1
Application number: PCT/KR2018/000336
Authority: WO
Inventors: 황인철
Original assignee: 삼성전자 주식회사
Priority date: 2017-02-21
Filing date: 2018-01-08
Publication date: 2018-08-30

Abstract

The present disclosure relates to an artificial intelligence (AI) system utilizing a machine learning algorithm such as deep learning, and applications thereof. In particular, a method for controlling an electronic device of the present disclosure comprises the steps of: receiving a user's voice; obtaining text data from the user's voice; determining a target component and a parameter component from the obtained text data; determining, on the basis of the target component and the parameter component, an action corresponding to the user's voice; if it is determined that the determined action cannot be performed, determining an alternative action to replace the action that was determined on the basis of at least one of the target component and the parameter component; and providing a message for guiding the alternative action.

Description

Electronic device, its control method and non-transitory computer readable recording medium

The present disclosure relates to an electronic device, a control method thereof, and a non-transitory computer readable recording medium. More particularly, an electronic device providing a guide for guiding an alternative operation when an operation corresponding to a user's voice cannot be performed, and control thereof A method and a non-transitory computer readable recording medium.

Artificial Intelligence (AI) system is a computer system that implements human-level intelligence, and unlike conventional rule-based smart systems, the machine learns, judges, and becomes smart. As the AI system is used, the recognition rate is improved and the user's taste can be understood more accurately, and the existing rule-based smart system is gradually replaced by the deep learning-based AI system.

AI technology is composed of elementary technologies that utilize machine learning (deep learning) and machine learning.

Machine learning is an algorithm technology that classifies / learns the characteristics of input data by itself, and element technology is a technology that uses machine learning algorithms such as deep learning.Its linguistic understanding, visual understanding, reasoning / prediction, knowledge expression, motion control, etc. It consists of technical fields.

The various fields in which artificial intelligence technology is applied are as follows. Linguistic understanding is a technology for recognizing and applying / processing human language / characters and includes natural language processing, machine translation, dialogue system, question and answer, speech recognition / synthesis, and the like. Visual understanding is a technology that recognizes and processes objects as human vision, and includes object recognition, object tracking, image retrieval, person recognition, scene understanding, spatial understanding, and image enhancement. Inference Prediction is a technique for judging, logically inferring, and predicting information. It includes knowledge / probability-based inference, optimization prediction, preference-based planning, and recommendation. Knowledge expression is a technology that automatically processes human experience information into knowledge data, and includes knowledge construction (data generation / classification) and knowledge management (data utilization). Motion control is a technology for controlling autonomous driving of a vehicle and movement of a robot, and includes motion control (navigation, collision, driving), operation control (action control), and the like.

Meanwhile, as functions of mobile devices, voice recognition devices, home network hub devices, servers, and the like have been recently improved, the number of users using these devices has increased. In particular, such an electronic device provides an intelligent assistant or virtual personal assistant (VPA) function that recognizes a user's voice and provides corresponding information or performs an operation.

The existing intelligent assistant function provided only an error message for guiding an error when the voice of the user was not interpreted as a form capable of performing an operation. In particular, when an operation corresponding to the user's voice is determined but the determined operation is impossible, the user may not know which user's voice to input in order to perform the intended operation if the user merely provides an error message.

An object of the present disclosure relates to an electronic device, a control method thereof, and a non-transitory computer readable recording medium for guiding an alternative operation that can replace an operation corresponding to the user's voice when an operation corresponding to the user's voice cannot be performed. .

According to an embodiment of the present disclosure, a method of controlling an electronic device includes: receiving a user voice; Obtaining text data from the user voice, and determining a target component and a parameter component from the obtained text data; Determining an operation corresponding to the user voice based on the target component and the parameter component; If it is determined that it is impossible to perform the determined operation, determining an alternative operation for replacing the determined operation based on at least one of the target component and the parameter component; And providing a message for guiding the alternative operation.

On the other hand, according to an embodiment of the present disclosure for achieving the above object, the electronic device comprises an input unit for receiving a user voice; And obtaining text data from the user voice input through the input unit, determining a target component and a parameter component from the obtained text data, and performing an operation corresponding to the user voice based on the target component and the parameter component. And if it is determined that it is impossible to perform the operation, determine an alternative operation for replacing the determined operation based on at least one of the target component and the parameter component, and provide a message for guiding the alternative operation. A processor;

Meanwhile, in a non-transitory computer readable recording medium storing a program for executing a control method of an electronic device according to an embodiment of the present disclosure for achieving the above object, the control method of the electronic device is a user voice. Receiving an input; Obtaining text data from the user voice, and determining a target component and a parameter component from the obtained text data; Determining an operation corresponding to the user voice based on the target component and the parameter component; If it is determined that it is impossible to perform the determined operation, determining an alternative operation for replacing the determined operation based on at least one of the target component and the parameter component; And providing a message for guiding the alternative operation.

According to the embodiment of the present disclosure as described above, by guiding the alternative operation that can replace the non-executable operation, it is easier and more natural to use the intelligent assistant function even for first-time users or unfamiliar users.

1 is a block diagram schematically illustrating a configuration of an electronic device according to an embodiment of the present disclosure;

2 is a block diagram illustrating a configuration of an electronic device in detail according to an embodiment of the present disclosure;

3 is a block diagram illustrating a configuration for performing an intelligent assistant function according to an embodiment of the present disclosure;

4A through 5 illustrate a message for guiding an alternate operation according to an embodiment of the present disclosure;

6 is a diagram for describing a method of controlling an electronic device, according to an embodiment of the present disclosure;

7 illustrates an intelligent secretary system including a user terminal and a server for performing an intelligent secretary function according to another embodiment of the present disclosure;

8 is a sequence diagram illustrating a control method of an intelligent secretary system according to an embodiment of the present disclosure;

9 is a block diagram illustrating a configuration of a processor according to an embodiment of the present disclosure.

10A is a block diagram illustrating a configuration of a data learning unit according to an exemplary embodiment.

10B is a block diagram illustrating a configuration of an alternative operation determiner according to an exemplary embodiment.

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In describing the present disclosure, when it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present disclosure, the detailed description thereof will be omitted. The terms to be described below are terms defined in consideration of functions in the present disclosure, and may vary according to a user, an operator, or a custom. Therefore, the definition should be made based on the contents throughout the specification.

Terms including ordinal numbers such as first and second may be used to describe various components, but the components are not limited by the terms. The terms are only used to distinguish one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term and / or includes any one of a plurality of related items or a combination of a plurality of related items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting and / or limiting of the invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, the terms including or having are intended to indicate that there is a feature, number, operation, component, part, or a combination thereof described in the specification, but one or more other features or numbers, operation, configuration It should be understood that it does not preclude the presence or possibility of addition of elements, parts or combinations thereof.

In an embodiment, the module or unit performs at least one function or operation, and may be implemented by hardware or software or a combination of hardware or software. In addition, a plurality of 'modules' or a plurality of 'units' may be integrated into at least one module except for 'modules' or 'units' that need to be implemented by specific hardware, and may be implemented as at least one processor.

Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings. 1 is a schematic block diagram illustrating a configuration of an electronic device 100 according to an embodiment of the present disclosure. As shown in FIG. 1, the electronic device 100 may provide an intelligent assistant service alone. When the electronic device 100 alone provides intelligent secretarial services, various electronic devices such as smart phones, tablet PCs, notebook PCs, desktop PCs, wearable devices such as smart watches, electronic photo frames, humanoid robots, audio devices, and smart TVs may be used. It may be implemented as a device. As another example, as illustrated in FIG. 7, the electronic device 100 may be implemented as a server to provide an intelligent secretarial service to a user in cooperation with an external user terminal 200.

The term 'intelligent assistant', as used herein, refers to a software application that understands a user's language and performs instructions desired by a user through a combination of artificial intelligence and voice recognition technology. For example, an intelligent assistant may perform artificial intelligence functions such as machine learning, deep speech recognition, sentence analysis, and situational awareness, including deep learning. The intelligent assistant can learn a user's habits or patterns to provide a personalized service for the individual. Examples of intelligent assistants include S voice and Bixby. Intelligent assistants may also be called virtual personal assistants, interactive agents, etc. in other terms.

As shown in FIG. 1, the electronic device 100 includes an input unit 110 and a processor 130.

The input unit 110 receives a user voice. In this case, the input unit 110 may be implemented as a microphone, and may receive a user voice through a microphone. In addition, the input unit 110 may receive text corresponding to the user voice in addition to the user voice.

The processor 130 may control overall operations of the electronic device 100. In detail, the processor 130 may acquire text data from a user voice input through the input unit 110 and determine a target component and a parameter component from the obtained text data. The processor 130 may determine an operation corresponding to the user voice based on the target component and the parameter component. If it is determined that it is impossible to perform the determined operation, the processor 130 may determine an alternative operation for replacing the determined operation based on at least one of the target component and the parameter component, and provide a message for guiding the alternative operation. have.

In detail, the processor 130 may obtain text data corresponding to the user voice by analyzing the user voice input through the input unit 110. The processor 130 may determine a target component and a parameter component from the text data. In this case, the target component may indicate an intention of the user to make a user voice, and the parameter component may indicate specific contents (for example, application type, time, object, etc.) of the user's intended operation.

The processor 130 may determine a task corresponding to the user voice based on the determined target component and parameter component. In this case, the processor 130 may determine the type of the operation corresponding to the user's voice based on the determined target component, and determine the content of the operation corresponding to the user's voice based on the parameter component.

When the operation is determined, the processor 130 may determine whether the determined operation can be performed. In detail, when the type of operation is determined based on the target component, the processor 130 may determine whether the content of the operation determined based on the parameter component is feasible.

If it is determined that the content of the determined operation is not executable, the processor 130 may determine an alternative operation that may replace the determined operation based on at least one of the target component and the parameter component.

In detail, when it is determined that the content of the determined operation is not executable, the processor 130 may determine one of the plurality of alternative operations that may replace the determined operation based on the content of the operation determined through the parameter component as the replacement operation. have. In this case, the determined operation and the plurality of alternative operations may be matched with each other and stored.

In addition, when it is determined that the content of the determined operation is not performed, the processor 130 may determine the replacement operation by inputting the content of the determined operation to the learned substitute operation determination model. In this case, the alternative motion determination model is a model for recognizing the alternative motion for replacing the specific motion, and may be built in advance.

In addition, the processor 130 may process and provide a message for guiding an alternative operation in a natural language form. In this case, when the electronic device 100 is implemented in the form of a smart phone, the processor 130 may provide a message through a display. In addition, when the electronic device 100 is implemented as a server, the processor 130 may provide a message to an external user terminal.

2 is a block diagram illustrating a configuration of an electronic device 100 according to an embodiment of the present disclosure in detail. Referring to FIG. 2, the electronic device 100 may include an input unit 110, a display 120, a processor 130, a voice output unit 140, a communication unit 150, and a memory 160. In addition to the components illustrated in the embodiment of FIG. 2, the electronic device 100 may include various components such as an image receiving unit (not shown), an image processing unit (not shown), a power supply unit (not shown), and the like. In addition, the electronic device 100 is not necessarily limited to being implemented by including all the configurations shown in FIG. 2. For example, when the electronic device 100 is implemented as a server, the display 120 and the voice output unit 140 may not be provided.

The input unit 110 may receive a user voice. In particular, the input unit 110 may include a voice input unit (eg, a microphone) that receives a user voice.

The voice input unit may receive a user voice spoken by the user. For example, the voice input unit may be implemented as an integrated body integrated with the upper side, the front side, the side, or the like of the electronic device 100, or may be provided as a separate means and connected to the electronic device 100 through a wired or wireless interface.

The voice input unit may include a plurality of voice input units to generate voice signals by receiving voices from different locations. Using the plurality of voice signals, the electronic device 100 may generate a single voice signal that is enhanced in a pre-processing process before performing the voice recognition function. Specifically, the voice input unit includes a microphone, an analog-to-digital converter (ADC), an energy determiner, a noise remover, and a voice signal generator.

The microphone receives an analog audio signal including a user's voice. The ADC converts the multichannel analog signal input from the microphone into a digital signal. The energy determination unit calculates energy of the converted digital signal to determine whether the energy of the digital signal is greater than or equal to a predetermined value. If the energy of the digital signal is greater than or equal to a predetermined value, the energy determination unit transmits the input digital signal to the noise canceller, and if the energy of the digital signal is less than the predetermined value, the energy determination unit does not output the input digital signal to the outside. , Wait for another input. As a result, the entire audio processing process is not activated by the sound other than the voice signal, and unnecessary power consumption can be prevented. When a digital signal input to the noise remover is input, the noise remover removes a noise component among the digital signal including the noise component and the user voice component. In this case, the noise component is a sudden noise that may occur in a home environment, and may include an air conditioner sound, a cleaner sound, a music sound, and the like. The noise removing unit outputs the digital signal from which the noise component is removed to the audio signal generating unit. The voice signal generator obtains direction information on the user's voice by tracking the location of the user's utterance within the 360 ° range from the voice input unit using the Localization / Speaker Tracking module. The voice signal generator extracts a target sound source within a 360 ° range from the voice input unit by using the digital signal from which the noise is removed and the direction information on the user voice through the target spoken sound extraction module. When the voice input unit is wirelessly connected to the electronic device, the voice signal generator converts the user voice into a user voice signal in a form for transmitting to the electronic device, and transmits the user voice signal to the main body of the electronic device 100 using the wireless interface. send.

In addition, the input unit 110 may receive various types of user commands in addition to the user voice. For example, the input unit 110 may receive a user command for selecting one of a plurality of candidate operations displayed on the guide UI. In addition, the input unit 110 may be implemented as a button, a motion recognition device, a touch pad, or the like. In addition, when the input unit 110 is implemented as a touch pad, the touch panel and the display 120 may be coupled to each other to form a touch screen in which a mutual layer structure is formed. The touch screen may detect a touch input position, an area, a pressure of the touch input, and the like.

The display 120 may display various guides, image contents, information, UIs, and the like provided by the electronic device 100. The display 120 is implemented as a liquid crystal display (LCD), an organic light emitting diode (OLED), a plasma display panel (PDP), or the like. Various screens that can be provided can be displayed.

The display 120 may provide an image corresponding to a voice determination result of the processor 130. For example, the display 120 may display the voice determination result of the user in text. In addition, the display 120 may display a message for guiding an alternative operation.

The voice output unit 140 may output a voice. For example, the voice output unit 140 may output not only various audio data but also a notification sound or a voice message. The electronic device 100 according to an embodiment of the present disclosure may include a voice output unit 140 as an output unit for providing an interactive intelligent secretary function. By outputting the natural language-processed voice message through the voice output unit 140, the electronic device 100 may provide a user experience as if the user is talking with the electronic device 100. The voice output unit 140 may be embedded in the electronic device 100 or may be implemented in the form of an output port such as a jack.

The communicator 150 communicates with an external device. For example, the external device may be implemented as another electronic device, server, cloud storage, network, or the like. The communicator 150 may transmit a voice determination result to an external device and receive corresponding information from the external device. The communicator 150 may receive a language model for speech recognition and a learning model for motion determination from an external device.

According to an embodiment of the present disclosure, the communication unit 150 may transmit a voice determination result to the server 200, and receive a control signal or a message for guiding an alternative operation for performing a corresponding operation in the server 200. .

To this end, the communication unit 150 may include various communication modules such as a short range wireless communication module (not shown), a wireless communication module (not shown), and the like. Here, the short range wireless communication module is a module for communicating with an external device located in a short range according to a short range wireless communication scheme such as Bluetooth, Zigbee, or the like. In addition, the wireless communication module is a module that is connected to an external network and performs communication according to a wireless communication protocol such as WiFi, WiFi direct, or IEEE. In addition, the wireless communication module performs communication by connecting to a mobile communication network according to various mobile communication standards such as 3G (3rd Generation), 3GPP (3rd Generation Partnership Project), Long Term Evoloution (LTE), LTE Advanced (LTE-A), etc. It may further include a mobile communication module.

The memory 160 may store various modules, software, and data for driving the electronic device 100. For example, the memory 160 may store an acoustic model (AM) and a language model (LM) that may be used to recognize a user's voice. In addition, the memory 160 may store a learned substitute action determination model to determine the substitute action. In addition, the memory 160 may store a model for Natural Language Generation (NLG).

The memory 160 may store programs and data for configuring various screens to be displayed on the display 120. In addition, the memory 160 may store a program, an application, and data for performing a specific service.

The memory 160 may previously store various response messages corresponding to the voice of the user as voice or text data. The electronic device 100 may also read at least one of voice and text data corresponding to the received user voice (especially, a user control command) from the memory 160 and output the readout to the display 120 or the voice output unit 140. have. In this way, the electronic device 100 may provide a user with a simple or frequently used message without passing through the natural language generation model.

The memory 160 is a storage medium that stores various programs necessary for operating the electronic device 100. The memory 160 may be implemented in the form of a flash memory, a hard disk drive (HDD), a solid state drive (SSD), or the like. For example, the memory 160 may include a ROM for storing a program for performing an operation of the electronic device 100 and a RAM for temporarily storing data for performing an operation of the electronic device 100.

The memory 160 may store a plurality of software modules for performing an operation corresponding to a user voice. Specifically, as shown in FIG. 3, the memory 160 may include a text acquisition module 310, a text analysis module 320, an operation determination module 330, an operation performance determination module 340, and an operation performance module 350. ), The alternative motion determination module 360 and the alternative motion guide module 370.

The text acquiring module 310 obtains text data from a voice signal including a user voice.

The text analysis module 320 analyzes the text data to determine a target component and a parameter component of the user's voice.

The text determination module 330 determines an operation corresponding to the user's voice based on the target component and the parameter component. In particular, the text determination module 330 may determine the type of the operation corresponding to the user's voice using the target component, and determine the content of the operation corresponding to the user's voice using the parameter component.

The operation performance determining module 340 determines whether the determined operation can be performed. In detail, the operation performance determining module 340 may determine whether the operation may be performed based on the content of the operation determined using the parameter component. For example, the operation performance determining module 340 may determine that the operation cannot be performed when the contents of the operation determined by using the parameter component are inoperable or some contents of the contents determined by the parameter component are missing. have.

If it is determined that the determined operation can be performed, the operation performing module 350 performs the determined operation.

If it is determined that the determined operation is not executable, the replacement operation determination module 360 may determine an alternative operation that may replace the determined operation using the target component and the parameter component. In this case, the substitute operation determination module 360 may determine the substitute operation using a previously stored substitute action or a previously learned substitute action determination model matched with the determined action.

The substitute operation guide module 370 provides a message for guiding the determined substitute action. In this case, the message for guiding the alternative operation may be in an audio form or a visual form, and may be processed and provided in a natural language form.

The processor 130 may control the above-described components of the electronic device 100. For example, the processor 130 may use a plurality of software modules stored in the memory 160 to determine an alternative operation that may replace an operation corresponding to the user's voice, and provide a message for guiding the determined alternative operation. Can be.

The processor 130 may be implemented as a single CPU to perform a voice recognition operation, a language understanding operation, a conversation management operation, an alternative operation search operation, a filtering operation, a response generation operation, or the like. It may be implemented as a dedicated processor that performs the same operation as at least one function of the software module. The processor 130 may perform speech recognition based on a traditional hidden markov model (HMM), or may perform deep learning based speech recognition such as a deep neural network (DNN).

In addition, the processor 130 may use big data and user-specific history data to determine voice recognition and substitute operation. In this way, the processor 130 may personalize the speech recognition model and the replacement gesture determination model while using the speech recognition model learned from the big data and the replacement gesture determination model for determining the replacement gesture.

Hereinafter, the present invention will be described in detail with reference to FIGS. 4A to 5.

In one embodiment of the present invention, if a user voice is input through the input unit 110 "find in the picture gallery taken yesterday and send it as a message to the assistant", the processor 130 controls the text acquisition module 310 Text data can be obtained from the user's voice.

In addition, the processor 130 may control the text analysis module 320 to determine the target component and the parameter component by analyzing the obtained text data. For example, the processor 130 may control the text analysis module 320 to analyze the text data "find it in a photo gallery taken yesterday and send it as a message to a friend" and determine a target component and a parameter component as follows. .

<Goal: Photo Transfer>

In addition, the processor 130 may control the motion determining module 330 to determine a motion corresponding to the user's voice based on the target component and the parameter component. In detail, the processor 130 may control the motion determination module 330 to determine that the type of motion is “send photo” based on the target component, and the content of the motion “find the picture taken yesterday in a gallery application as a message”. Send ".

In addition, the processor 130 may determine whether the determined operation may be performed by controlling the operation performance determining module 340. When the determined operation can be performed, the processor 130 may control the operation performing module 350 to perform the determined operation or transmit a control signal corresponding to the determined operation to an external device. For example, if it is possible to "find the picture taken yesterday in the gallery application and send it as a message", the processor 130 controls the action performing module 350 to retrieve the picture taken yesterday in the gallery application and attach it to the message. It can be transmitted to the external device corresponding thereto.

However, when the determined operation cannot be performed, the processor 130 may control the replacement operation determination module 360 to determine an alternative operation that may replace the determined operation based on the target component and the parameter component. For example, if the number of pictures that can be transmitted in a message is five, but there are ten pictures taken yesterday found in the gallery application, the processor 130 may control the operation performance determining module 340 to perform an operation determined. It can be determined that no.

The processor 130 may control the alternative gesture determining module 360 to determine that it is impossible to transmit a message of 10 pictures, and determine whether there is an alternative gesture corresponding to the user's voice. .

In this case, the alternative operation may include an operation having the same type of operation as the operation corresponding to the user's voice but having different contents of the operation, or an operation having a type of operation different from the operation corresponding to the user's voice and different contents of the operation.

For example, the processor 130 controls the alternative motion determining module 360 to have the same type of picture transmission, but transmits the picture using a chat application other than the message content using the message. This can be determined by an alternate action. In other words, the processor 130 controls the alternative motion determining module 360 to determine the alternative action of "transfer photo" and the content of the action "find the picture taken yesterday in the gallery application and send it to the chat application". Can be.

As another example, the processor 130 may control the alternative motion determining module 360 to determine an action having a different type of motion, such as a capture screen transmission instead of a picture transmission, as the alternative motion. That is, the processor 130 controls the alternative motion determination module 360 so that the type of motion is “send capture screen”, and the content of the motion is “find the picture taken yesterday in the gallery application and capture the screen and send it as a message”. Alternative actions can be determined.

The plurality of alternative operations corresponding to the specific operation may be stored in advance. For example, the memory 160 replaces the "picture transfer", and may pre-store "capture screen transfer", "message transfer", and the like by matching "photo transfer". In this case, the processor 130 may control the substitute operation determining module 360 to determine one of the at least one prestored alternative operation based on the cause of the error as the alternative operation of the operation corresponding to the user's voice. For example, if the operation corresponding to the user's voice is not possible due to the number of photos that can be transmitted, the processor 130 controls the alternative motion determination module 360 to determine “send the captured screen” as the alternative motion, and transmit the same. If the operation corresponding to the user's voice is not possible due to the excess of data, the processor 130 may control the substitute operation determining module 360 to determine “message transmission” as the substitute operation. In this case, the error cause and the replacement operation may also be matched and stored.

Alternatively, the processor 130 may control the substitute motion determining module 360 to determine the substitute motion for replacing the motion corresponding to the user's voice using the previously learned substitute motion determination model. That is, the processor 130 may control the substitute motion determination module 360 to input the action determined in the substitute action determination model previously learned by the user or another person to determine the substitute action corresponding to the determined action. The alternative operation determination model will be described in detail with reference to FIGS. 9 to 10B.

When the replacement operation is determined, the processor 130 may control the replacement operation guide module 370 to provide a message for guiding the replacement operation. In this case, the message for guiding the alternative operation may include a message for guiding at least one of a cause for failing to perform an operation corresponding to the user voice and an alternative operation. The processor 130 may control the substitute operation guide module 370 to display a message for guiding the substitute operation, and output the message in an audio form.

In addition, the processor 130 may control the substitute operation guide module 370 to process and provide a message for guiding the substitute operation in a natural language form. Specifically, in the case of an alternative action of which the type of action is "send photo" and the content of the action is "Find a picture taken yesterday in the gallery application and send it to the chat application", the processor 130 may execute the alternative action guide module 370. As shown in FIG. 4A, the display 120 may display a message in a natural language form, “Is not possible to send the message and send it to xxx chat.” In addition, in the case of an alternative action of which the type of action is "Send Composite Image" and the content of the action is "Find a picture taken yesterday in the gallery application and combine the picture into one image and send it as a message", the processor 130 replaces the action. As shown in FIG. 4B, the operation guide module 370 is controlled to display a message in natural language form, “Can't send all the pictures, and then synthesizes 10 pictures into one message?” To the display 120. I can display it.

In this case, the processor 130 may control the alternative operation guide module 370 to provide a pre-stored natural language message, but this is only an example, and the natural language message may be generated using a language model for natural language processing. Can be generated and provided.

In another embodiment of the present invention, when a user voice of “schedule tomorrow meeting” is input through the input unit 110, the processor 130 controls the text acquisition module 310 to obtain text data from the user voice. can do.

In addition, the processor 130 may control the text analysis module 320 to determine the target component and the parameter component by analyzing the obtained text data. For example, the processor 130 may control the text analysis module 320 to analyze the text data “schedule tomorrow” to determine a target component and a parameter component as follows.

<Goal: Scheduling>

In addition, the processor 130 may control the motion determining module 330 to determine a motion corresponding to the user's voice based on the target component and the parameter component. Specifically, the processor 130 may control the motion determination module 330 to determine that the type of motion is "scheduling" based on the target component, and the content of the motion is "registering a meeting tomorrow in the schedule application". Can be determined.

In addition, the processor 130 may determine whether the determined operation may be performed by controlling the operation performance determining module 340. When the determined operation cannot be performed, the processor 130 may control the substitute operation determination module 360 to determine an alternative operation that may replace the determined operation based on the target component and the parameter component. For example, since there is no parameter component indicating who is meeting with, the processor 130 may control the operation performance determining module 340 to determine that the determined operation is not executable.

The processor 130 may control the substitute motion determining module 360 to determine whether there is an alternative motion corresponding to the user's voice. For example, since there is no information about who is meeting with, the processor 130 controls the alternative action determining module 360 so that the alternative action has a different type of action, such as "leave a note" rather than "schedule". Can be determined. That is, the processor 130 may control the alternative operation determining module 360 to determine an alternative action of "Make a note" and the content of the action "Make a meeting schedule for tomorrow".

When the replacement operation is determined, the processor 130 may control the replacement operation guide module 370 to provide a message for guiding the replacement operation. For example, in the case of an alternative action of which the type of action is "leave a note" and the content of the action is "composing a meeting schedule for tomorrow," the processor 130 controls the alternative action guide module 370. As shown in FIG. 5, a message in a natural language form may be displayed on the display 120.

6 is a flowchart illustrating a control method of the electronic device 100 according to an embodiment of the present disclosure.

First, the electronic device 100 receives a user voice in operation S610.

In operation S620, the electronic device 100 obtains text data from the user's voice.

In operation S630, the electronic device 100 determines a target component and a parameter component from the obtained text data.

The electronic device 100 determines an operation corresponding to the user's voice based on the target component and the parameter component (S640). In this case, the electronic device 100 may determine the type of the operation corresponding to the user's voice using the target component, and may determine the content of the operation corresponding to the user's voice using the parameter component.

In operation S650, the electronic device 100 may determine whether the determined operation may be performed.

If it is determined that the determined operation is possible (S650-Y), the electronic device 100 performs the determined operation (S660).

On the other hand, when it is determined that it is impossible to perform the determined operation (S650-N), the electronic device 100 determines an alternative operation for replacing the determined operation (S670). In this case, the electronic device 100 may determine one of a plurality of pre-stored alternative operations, which are symmetrical to the determined operation, as an alternative operation, and determine the alternative operation by inputting the determined operation to the alternative operation determination model.

In operation S680, the electronic device 100 provides a message for guiding the replacement operation. In this case, the electronic device 100 may process and provide a message for guiding an alternative operation in a natural language form.

7 is a diagram illustrating an intelligent secretary system including a user terminal and a server for performing an intelligent secretary function according to another exemplary embodiment of the present disclosure. Referring to FIG. 7, the intelligent secretary system 1000 may include a user terminal 200 and a server 100. Meanwhile, the electronic device 100 described in the above-described embodiment may be implemented as a server in FIG. 7.

The user terminal 200 may obtain the user's voice spoken by the user and transmit the user's voice to the external server 100. The server 200 may determine an operation or substitute operation corresponding to the received user voice, and transmit a control signal or a message for guiding the substitute operation to the user terminal 200. As such, the user terminal 200 and the server 100 may interwork to provide an intelligent secretary service.

That is, the user terminal 200 may simply be implemented as an input / output device for receiving a user's voice and providing a message, and may be implemented in a form in which the server 100 processes most of the intelligent secretary service. In particular, when the user terminal 200 is implemented as a small wearable device such as a smart watch as shown in FIG. 7 and the available resources are limited, processes such as determining an alternative operation and generating a natural language are resource-rich servers 200. ) Can be performed.

8 is a sequence diagram illustrating a control method of an intelligent secretary system according to an embodiment of the present disclosure.

The user terminal 200 obtains a user voice (S810). In this case, the user terminal 200 may obtain a user voice from a microphone provided in the user terminal 200 or connected to the user terminal 200.

The user terminal 200 transmits the user's voice to the external server 100 (S820). In detail, the user terminal 200 may transmit a voice signal corresponding to the user voice to the external server 100.

In operation S830, the server 100 obtains text data from the received user voice.

The server 100 analyzes the text data (S840) and determines an operation corresponding to the user's voice (S850). Specifically, the server 100 may determine the target component and the parameter component from the text data, determine the type of the operation for the user voice from the target component, and determine the content of the operation for the user voice from the parameter component.

If it is determined that the operation corresponding to the user's voice cannot be performed, the server 100 determines an alternative operation that may replace the operation corresponding to the user's voice (S860). In this case, the server 100 may determine one of the pre-stored alternative operations as the alternative operation, and determine the alternative operation using the learned substitute operation determination model.

The server 100 generates a message for guiding the replacement operation (S870). In this case, the server 100 may generate a message in a natural language form.

The server 100 transmits a message to the user terminal 200 (S880), and the user terminal 200 outputs the received message (S890).

According to the embodiment of the present disclosure as described above, by guiding the alternative operation for the non-executable operation, even the first time users or new users who are not familiar with the intelligent assistant function can use the intelligent assistant function more easily and naturally.

9 is a block diagram of a processor 130 according to some embodiments of the present disclosure. Referring to FIG. 9, the processor 130 according to some embodiments may include a data learner 131 and an alternative operation determiner 132.

The data learner 131 may learn a criterion for determining an alternative operation. The processor 130 may analyze the input motion according to the learned criterion to determine an alternative motion that may replace the motion corresponding to the user's voice. The data learner 131 may determine which data (or parameter component) to use to determine the replacement operation. In addition, the data learner 131 may acquire the data to be used for learning, and apply the acquired data to an alternative operation determination model to be described later to learn the criteria for the alternative operation.

The alternative gesture determination unit 132 may determine an alternative gesture that may replace the gesture corresponding to the user's voice from the predetermined data using the previously learned alternative gesture determination model. The replacement operation determiner 132 obtains predetermined data (eg, at least one of the target component and the parameter component of the determined operation) according to a predetermined criterion by learning, and uses the obtained data as an input value to replace the replacement operation. Judgment models can be used. In addition, the substitute operation determination unit 132 may apply the inputted data to the substitute operation determination model to obtain a result value for the substitute operation. In addition, the substitute operation determination unit 132 may update the substitute operation determination model based on user feedback on the input value and the output value.

At least one of the data learner 131 and the substitute operation determiner 132 may be manufactured in the form of one or a plurality of hardware chips and mounted on the electronic device 100. For example, at least one of the data learning unit 131 and the alternative operation determining unit 132 may be manufactured in the form of a dedicated hardware chip for artificial intelligence (AI), or may be a conventional general purpose processor (for example, , A CPU or an application processor) or a part of an IP for a specific function, and may be mounted on the aforementioned various electronic devices 100.

In the embodiment of FIG. 9, the data learner 131 and the replacement operation determiner 132 are both mounted on the electronic device 100, but they may be mounted on separate devices. For example, one of the data learner 131 and the substitute operation determiner 132 may be included in the electronic device 100, and the other may be included in the user terminal 200. In addition, the data learning unit 131 and the replacement operation determination unit 132 are connected to each other by wire or wirelessly, and the information on the replacement operation determination model built by the data learning unit 131 is provided to the replacement operation determination unit 132. The data input to the substitute operation determiner 132 may be provided to the data learner 131 as additional learning data.

Meanwhile, at least one of the data learner 131 and the substitute operation determiner 132 may be implemented as a software module. When at least one of the data learner 131 and the substitute operation determiner 132 is implemented as a software module (or a program module including instructions), the software module may be stored in a non-transitory computer readable recording medium. . At least one software module may be provided by an operating system (OS) or by a predetermined application. Alternatively, some of the at least one software module may be provided by the OS, and some of the at least one software module may be provided by a predetermined application.

10A is a block diagram of a data learner 131, according to some embodiments of the present disclosure. Referring to FIG. 10A, the data learner 131 includes a data acquirer 131-1, a preprocessor 131-2, a training data selector 131-3, and a model learner 131. -4) and the model evaluator 131-5.

The data acquirer 131-1 may acquire data necessary for determining the replacement operation. In particular, the data acquirer 131-1 may acquire data for determining an operation corresponding to the user's voice as the training data. For example, at least one of a signal corresponding to a user voice input through the input unit 110, a text data corresponding to the user voice, a target component determined from the text data, and a parameter component may be input.

The preprocessor 131-2 may preprocess the acquired data so that the obtained data may be used for learning to determine an alternative operation. The preprocessor 131-2 may process the acquired data in a predetermined format so that the model learner 131-4 to be described later uses the acquired data for learning to determine an alternative operation.

For example, the preprocessor 131-2 may extract a section that is a recognition target for the input user voice. The preprocessor 131-2 may perform noise reduction, feature extraction, and the like, on the signal corresponding to the user's voice, and convert the signal into text data.

As another example, the preprocessor 131-2 may generate voice data to be suitable for speech recognition by analyzing a frequency component of the input user voice to reinforce some frequency components and suppress the remaining frequency components.

The training data selector 131-3 may select data necessary for learning from the preprocessed data. The selected data may be provided to the model learner 131-4. The training data selector 131-3 may select data necessary for learning from preprocessed data according to a predetermined criterion for determining an alternative operation. In addition, the training data selector 131-3 may select data according to a predetermined criterion by learning by the model learner 131-4 to be described later. For example, the training data selector 131-1 may select only a target component and a parameter component from the input text data.

The model learner 131-4 may learn a criterion on how to determine an alternative operation based on the training data. In addition, the model learner 131-4 may learn a criterion about what training data should be used to determine an alternative operation.

The model learner 131-4 may train the alternative motion determination model used for the alternative motion determination using the training data. In this case, the alternative motion determination model may be a previously built model. For example, the alternative motion determination model may be a model built in advance by receiving basic training data. As another example, the alternative motion determination model may be a model built in advance using big data.

The alternative motion determination model may be constructed in consideration of the application field of the recognition model, the purpose of learning, or the computer performance of the device. The alternative motion determination model may be, for example, a model based on a neural network. For example, a model such as a deep neural network (DNN), a recurrent neural network (RNN), and a bidirectional recurrent deep neural network (BRDNN) may be used as an alternative operation determination model, but is not limited thereto.

According to various embodiments of the present disclosure, when there are a plurality of previously constructed alternative action determination models, the model learner 131-4 may substitute an alternative action determination model having a large correlation between input training data and basic training data. Can be determined by the judgment model. In this case, the basic training data may be felt for each type of data, and the alternative motion determination model may be built in advance for each type of data. For example, the basic training data may be mood based on various criteria such as the region where the training data is generated, the time at which the training data is generated, the size of the training data, the genre of the training data, the creator of the training data, the types of objects in the training data, and the like. It may be.

Also, the model learner 131-4 may train the alternative motion determination model using a learning algorithm including, for example, error back-propagation or gradient descent. have.

For example, the model learner 131-4 may train the alternative motion determination model through supervised learning using the training data as an input value. As another example, the model learning unit 131-4 may perform the alternative operation through unsupervised learning that discovers a criterion for determining the alternative operation by learning the type of data necessary for the determination of the alternative operation without additional guidance. Train decision models. As another example, the model learner 131-4 may train the alternative gesture determination model through reinforcement learning using feedback on whether the result of the alternative gesture determination according to the learning is correct.

In addition, when the alternative motion determination model is trained, the model learner 131-4 may store the learned alternative motion determination model. In this case, the model learner 131-4 may store the learned substitute motion determination model in the memory 160 of the electronic device 100.

In this case, the memory 160 in which the learned substitute operation determination model is stored may also store commands or data related to at least one other element of the electronic device 100. The memory 160 may also store software and / or programs. For example, the program may include a kernel, middleware, an application programming interface (API) and / or an application program (or “application”), and the like.

The model evaluator 131-5 inputs the evaluation data to the substitute operation determination model, and causes the model learner 131-4 to relearn if the determination result output from the evaluation data does not satisfy a predetermined criterion. Can be. In this case, the evaluation data may be preset data for evaluating the alternative operation determination model.

For example, the model evaluator 131-5 may determine a predetermined criterion when the number or ratio of evaluation data in which the determination result is not accurate among the determination results of the learned substitute operation determination model for the evaluation data exceeds a preset threshold. Can be evaluated as not satisfied. For example, when a predetermined criterion is defined as a ratio of 2%, the model evaluator 131-5 when the learned substitute motion judgment model outputs an incorrect judgment result for more than 20 evaluation data out of a total of 1000 evaluation data. ) Can be assessed as not suiting the learned alternative behavior decision model.

On the other hand, when there are a plurality of learned alternative motion determination models, the model evaluator 131-5 evaluates whether each learned alternative motion determination model satisfies a predetermined criterion, and finally ends the model satisfying the predetermined criterion. It can be determined as an alternative behavior judgment model. In this case, when there are a plurality of models satisfying a predetermined criterion, the model evaluator 131-5 may determine any one or a predetermined number of models that are preset in the order of the highest evaluation score as the final replacement operation determination model.

Meanwhile, the data acquisition unit 131-1, the preprocessor 131-2, the training data selection unit 131-3, the model learner 131-4, and the model evaluator 131 in the data learner 131. At least one of -5) may be manufactured in the form of at least one hardware chip and mounted on the electronic device. For example, at least one of the data acquirer 131-1, the preprocessor 131-2, the training data selector 131-3, the model learner 131-4, and the model evaluator 131-5. One may be manufactured in the form of a dedicated hardware chip for artificial intelligence (AI), or may be manufactured as an existing general purpose processor (eg, a CPU or an application processor) or part of an IP for a specific function. It may be mounted on the electronic device 100.

In addition, the data obtaining unit 131-1, the preprocessor 131-2, the training data selecting unit 131-3, the model learning unit 131-4, and the model evaluating unit 131-5 are electronic components. It may be mounted on the device, or may be mounted on separate electronic devices, respectively. For example, some of the data acquirer 131-1, the preprocessor 131-2, the training data selector 131-3, the model learner 131-4, and the model evaluator 131-5. May be included in the electronic device 100, and some of them may be included in the server 200.

Meanwhile, at least one of the data acquirer 131-1, the preprocessor 131-2, the training data selector 131-3, the model learner 131-4, and the model evaluator 131-5 is provided. It may be implemented as a software module. At least one of the data acquirer 131-1, the preprocessor 131-2, the training data selector 131-3, the model learner 131-4, and the model evaluator 131-5 is a software module. (Or, a program module including instructions), the software module may be stored on a non-transitory computer readable recording medium. At least one software module may be provided by an operating system (OS) or by a predetermined application. Alternatively, some of the at least one software module may be provided by the OS, and some of the at least one software module may be provided by a predetermined application.

10B is a block diagram of an alternative operation determiner 132 according to some embodiments of the present disclosure. Referring to FIG. 10B, the alternative operation determiner 132 according to some embodiments may include a data acquirer 132-1, a preprocessor 132-2, a data selector 132-3, and a determination result provider ( 132-4) and the model updater 132-5.

The data acquirer 132-1 may acquire data necessary for determining the replacement operation, and the preprocessor 132-2 may preprocess the acquired data so that the obtained data may be used for the replacement operation determination. have. The preprocessing unit 132-2 may process the acquired data in a predetermined format so that the determination result providing unit 132-4, which will be described later, may use the acquired data for determining the replacement operation.

The data selector 132-3 may select data necessary for determining the replacement operation from the preprocessed data. The selected data may be provided to the determination result providing unit 132-4. The data selector 132-3 may select some or all of the preprocessed data according to a predetermined criterion for determining the replacement operation. In addition, the data selector 132-3 may select data according to a predetermined criterion by learning by the model learner 142-4 to be described later.

The determination result providing unit 132-4 may apply the selected data to an alternative gesture determination model to determine an alternative gesture that may replace the gesture corresponding to the user's voice. The determination result providing unit 132-4 may apply the selected data to the replacement operation determination model by using the data selected by the data selecting unit 132-3 as an input value. In addition, the determination result may be determined by the alternative operation determination model. For example, the determination result providing unit 132-4 may determine an operation that may substitute an operation corresponding to the user's voice by inputting data capable of determining an operation corresponding to the user's voice into an alternative operation determination model. have.

The model updater 132-5 may update the alternative operation determination model based on the evaluation of the determination result provided by the determination result providing unit 132-4. For example, the model updater 132-5 may provide the model learner 131-4 to the model learner 131-4 by providing the determination result provided by the determination result provider 132-4 to the model learner 131-4. It is possible to update the alternative behavior determination model.

Meanwhile, the data acquisition unit 132-1, the preprocessor 132-2, the data selection unit 132-3, the determination result providing unit 132-4, and the model updater in the alternative operation determination unit 132 ( At least one of the 132-5 may be manufactured in the form of at least one hardware chip and mounted on the electronic device. For example, at least one of the data obtaining unit 132-1, the preprocessor 132-2, the data selecting unit 132-3, the determination result providing unit 132-4, and the model updating unit 132-5. One may be manufactured in the form of a dedicated hardware chip for artificial intelligence (AI), or may be manufactured as an existing general purpose processor (eg, a CPU or an application processor) or part of an IP for a specific function. It may be mounted on the electronic device 100.

In addition, one electronic device of the data acquirer 132-1, the preprocessor 132-2, the data selector 132-3, the determination result providing unit 132-4, and the model updater 132-5 It may be mounted on, or may be mounted on separate electronic devices, respectively. For example, some of the data obtaining unit 132-1, the preprocessor 132-2, the data selecting unit 132-3, the determination result providing unit 132-4, and the model updating unit 132-5. May be included in the electronic device 100, and the remaining part may be included in a server interworking with the electronic watch 100.

Meanwhile, at least one of the data acquirer 132-1, the preprocessor 132-2, the data selector 132-3, the determination result providing unit 132-4, and the model updater 132-5 It may be implemented as a software module. At least one of the data acquisition unit 132-1, the preprocessor 132-2, the data selection unit 132-3, the determination result providing unit 132-4, and the model updater 132-5 is a software module. (Or, a program module including instructions), the software module may be stored on a non-transitory computer readable recording medium. At least one software module may be provided by an operating system (OS) or by a predetermined application. Alternatively, some of the at least one software module may be provided by the OS, and some of the at least one software module may be provided by a predetermined application.

The methods described above may be embodied in the form of program instructions that may be executed by various computer means and may be recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

Although the present disclosure has been described with reference to the limited embodiments and the drawings, the present disclosure is not limited to the above embodiments, and those skilled in the art to which the present disclosure pertains may make various modifications and variations from such descriptions. This is possible. Therefore, the scope of the present disclosure should not be limited to the described embodiments, but should be determined not only by the claims below but also by the equivalents of the claims.

Claims

In the control method of an electronic device,

Receiving a user voice;

Obtaining text data from the user voice, and determining a target component and a parameter component from the obtained text data;

Determining an operation corresponding to the user voice based on the target component and the parameter component;

If it is determined that it is impossible to perform the determined operation, determining an alternative operation for replacing the determined operation based on at least one of the target component and the parameter component; And

Providing a message for guiding the replacement operation.
The method of claim 1,

Determining an operation corresponding to the user voice,

And determining a type of an operation corresponding to the user voice based on the determined target component, and determining contents of an operation corresponding to the user voice based on the parameter component.
The method of claim 2,

If the type of the operation is determined based on the target component, determining whether the content of the operation determined based on the parameter component is feasible;
The method of claim 3,

Determining the alternative action,

And when it is determined that the content of the determined operation is not executable, determining one of a plurality of alternative operations capable of substituting the determined operation as a substitute operation based on the content of the determined operation.
The method of claim 4, wherein

And the determined operation and the plurality of alternative operations are matched with each other and stored in advance.
The method of claim 3,

Determining the alternative action,

And when it is determined that the content of the determined operation is not executable, inputting the content of the determined operation into a learned substitute motion determination model to determine a substitute motion.
The method of claim 1,

And a message for guiding the substitute operation is processed in a natural language form.
In an electronic device,

An input unit for receiving a user voice; And

Acquire text data from the user voice input through the input unit, determine a target component and a parameter component from the obtained text data, and determine an operation corresponding to the user voice based on the target component and the parameter component. And when it is determined that the operation is impossible, determining an alternative operation for replacing the determined operation based on at least one of the target component and the parameter component, and providing a message for guiding the alternative operation. And an electronic device.
The method of claim 1,

The processor,

And determine a type of an operation corresponding to the user's voice based on the determined target component, and determine contents of an operation corresponding to the user's voice based on the parameter component.
The method of claim 9,

The processor,

And when the type of the operation is determined based on the target component, determining whether the content of the operation determined based on the parameter component is feasible.
The method of claim 10,

The processor,

And when it is determined that the content of the determined operation is not executable, determining one of a plurality of alternative operations that can replace the determined operation as a substitute operation based on the content of the determined operation.
The method of claim 11,

And a memory configured to store the determined operation and the plurality of alternative operations by matching each other with each other.
The method of claim 10,

The processor,

And when it is determined that the content of the determined operation is not executable, inputting the content of the determined operation into a learned substitute operation determination model to determine a substitute operation.
The method of claim 8,

The processor,

And providing a message for guiding the replacement operation in a natural language form.
A non-transitory computer readable recording medium storing a program for executing a method of controlling an electronic device, the method comprising:

The control method of the electronic device,

Receiving a user voice;

Obtaining text data from the user voice, and determining a target component and a parameter component from the obtained text data;

Determining an operation corresponding to the user voice based on the target component and the parameter component;

If it is determined that it is impossible to perform the determined operation, determining an alternative operation for replacing the determined operation based on at least one of the target component and the parameter component; And

Providing a message for guiding the replacement operation.