WO2020096172A1 - Electronic device for processing user utterance and controlling method thereof - Google Patents
Electronic device for processing user utterance and controlling method thereof Download PDFInfo
- Publication number
- WO2020096172A1 WO2020096172A1 PCT/KR2019/009716 KR2019009716W WO2020096172A1 WO 2020096172 A1 WO2020096172 A1 WO 2020096172A1 KR 2019009716 W KR2019009716 W KR 2019009716W WO 2020096172 A1 WO2020096172 A1 WO 2020096172A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- nlu
- processor
- user terminal
- information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 44
- 238000012545 processing Methods 0.000 title claims description 32
- 238000004891 communication Methods 0.000 claims abstract description 67
- 238000003058 natural language processing Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 42
- 230000009471 action Effects 0.000 description 56
- 239000002775 capsule Substances 0.000 description 38
- 230000008569 process Effects 0.000 description 28
- 230000004044 response Effects 0.000 description 11
- 230000008859 change Effects 0.000 description 8
- 238000004590 computer program Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 235000015109 caffè americano Nutrition 0.000 description 2
- 239000006071 cream Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000035807 sensation Effects 0.000 description 2
- 102100035081 Homeobox protein TGIF1 Human genes 0.000 description 1
- 101000596925 Homo sapiens Homeobox protein TGIF1 Proteins 0.000 description 1
- 101001033233 Homo sapiens Interleukin-10 Proteins 0.000 description 1
- 241000209035 Ilex Species 0.000 description 1
- 235000003332 Ilex aquifolium Nutrition 0.000 description 1
- 235000002296 Ilex sandwicensis Nutrition 0.000 description 1
- 235000002294 Ilex volkensiana Nutrition 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 235000013361 beverage Nutrition 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 235000013353 coffee beverage Nutrition 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000003155 kinesthetic effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/227—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology
Definitions
- a user terminal may process only voice inputs of the limited number.
- the user terminal may transmit another voice input other than the voice inputs of the limited number to an external server, may receive the response, and may process the received voice input.
- the voice inputs of the limited number may be configured to be processed by the user terminal, as the voice that a user is expected to enter frequently. As such, the user terminal may increase the overall voice input processing speed. However, because the voice entered frequently for each user is different and the voice input entered frequently as time goes on is changed in spite of the same user, the overall voice input processing speed may not increase depending on a user.
- a controlling method of a system for updating an NLU model may include receiving first information associated with a user from an external electronic device associated with a user account, selecting at least one of the plurality of NLU models, based on at least part of the first information, and transmitting the selected at least one NLU model to the external electronic device, using at least one communication interface such that the external electronic device uses the selected at least one NLU model for natural language processing.
- the user terminal 100 may be a terminal device (or an electronic device) capable of connecting to Internet, and may be, for example, a mobile phone, a smartphone, a personal digital assistant (PDA), a notebook computer, TV, a white household appliance, a wearable device, a head mount display (HMD), or a smart speaker.
- a terminal device or an electronic device capable of connecting to Internet
- PDA personal digital assistant
- TV TV
- white household appliance a wearable device
- HMD head mount display
- smart speaker a smart speaker
- the processor 160 may also execute the program stored in the memory 150 to perform the specified function.
- the processor 160 may execute at least one of the client module 151 or the SDK 153 to perform the following actions for processing a voice input.
- the processor 160 may control the actions of the plurality of apps 155 via the SDK 153.
- the following actions described as the actions of the client module 151 or the SDK 153 may be the action by the execution of the processor 160.
- the client module 151 may receive the plan corresponding to the received voice input.
- the client module 151 may display the result of executing a plurality of actions of an app in the display 140 depending on the plan.
- the client module 151 may sequentially display the execution result of a plurality of actions in a display.
- the user terminal 100 may display only a part of results (e.g., the result of the last action) of executing a plurality of actions, on the display.
- the client module 151 may receive a request for obtaining information necessary to calculate the result corresponding to a voice input, from the intelligent server 200.
- the information necessary to calculate the result may be the state information of the user terminal 100.
- the client module 151 may transmit the necessary information to the intelligent server 200 in response to the request.
- the client module 151 may include a voice recognition module. According to an embodiment, the client module 151 may recognize a voice input to perform the limited function, via the voice recognition module. For example, the client module 151 may launch an intelligent app that processes a voice input for performing an organic action, via a specified input (e.g., wake up!).
- the intelligent server 200 may receive the information associated with a user's voice input from the user terminal 100 over a communication network. According to an embodiment, the intelligent server 200 may change the data associated with the received voice input to text data. According to an embodiment, the intelligent server 200 may generate a plan for performing a task corresponding to a user voice input, based on the text data.
- the intelligent server 200 may transmit the result calculated depending on the generated plan to the user terminal 100 or may transmit the generated plan to the user terminal 100.
- the user terminal 100 may display the result calculated depending on the plan, on a display.
- the user terminal 100 may display the result of executing the action according to the plan, on the display.
- the planner module 225 may determine the relationship between the plurality of actions and the plurality of concepts stepwise (or hierarchically). For example, the planner module 225 may determine the execution sequence of the plurality of actions, which are determined based on a user's intent, based on the plurality of concepts. In other words, the planner module 225 may determine the execution sequence of the plurality of actions, based on the parameters necessary to perform the plurality of actions and the result output by the execution of the plurality of actions. As such, the planner module 225 may generate a plan including information (e.g., ontology) of the relationship between a plurality of actions and a plurality of concepts. The planner module 225 may generate the plan, using the information stored in the capsule DB 230 storing a set of relationships between concepts and actions.
- information e.g., ontology
- the service server 300 may provide the user terminal 100 with a specified service (e.g., food order or hotel reservation).
- the service server 300 may be a server operated by the third party.
- the service server 300 may include a first service server 301, a second service server 302, and a third service server 305, which are operated by different third parties.
- the service server 300 may provide the intelligent server 200 with information for generating a plan corresponding to the received voice input.
- the provided information may be stored in the capsule DB 230.
- the service server 300 may provide the intelligent server 200 with result information according to the plan.
- FIG. 2 is a diagram illustrating the form in which relationship information between a concept and an action is stored in a database, according to various embodiments.
- FIG. 3 is a view illustrating a screen in which a user terminal processes a received voice input through an intelligent app, according to various embodiments.
- the first ASR module 221 may generate text data corresponding to the received voice input.
- the first NLU module 223 may determine the user's intent and a parameter, using the text data.
- the first planner module 225 may generate the plan corresponding to the received voice input.
- the plan may be determined based on the determined intent and the determined parameter.
- the intelligent server 200 may calculate the result using the generated plan and may transmit the calculated result to the user terminal 100. Furthermore, the intelligent server 200 may directly transmit the generated plan to the user terminal 100.
- the user terminal 100 may sequentially perform specified actions based on the plan.
- the NLU management module 290 may manage the second NLU module 173 of the user terminal 100.
- the NLU management module 290 may manage an NLU module (e.g., the second NLU module 173) of at least one electronic device.
- the user data manager module 295 may store the information associated with the user received from the user terminal 100, in the user history DB 296.
- the information associated with the user may include at least one of the user's voice input, information of the user terminal 100, or the user's preference information.
- the user terminal 100 may be a device that is logged in with a user account.
- the information of the user terminal 100 may include information (e.g., identification information or setting information) of the logged-in user.
- the user data manager module 295 may store the information of the user terminal 100 in the user history DB 296.
- the user data manager module 295 may store processed information of the received voice input, in the user history DB 296.
- the user data manager module 295 may analyze information associated with the user stored in the user history DB 296. For example, the user data manager module 295 may identify the intent processed by the user terminal 100 by analyzing the user log. For example, as illustrated in Table 1, the user log may include identification information of a plan, information about the name of an app, information about a user utterance, or the like. The user data manager module 295 may determine the recognized intent, using the identification information of a plan included in log information.
- the intelligent server 200 may train the criterion for extracting the intent, using artificial intelligence (AI).
- AI artificial intelligence
- the criterion for extracting the intent in the user terminal 100 may be updated through machine learning.
- the intelligent server 200 may generate a natural language recognition model for recognizing the intent included in the determined range.
- the intelligent server 200 may implement the personalized voice processing system by changing (or updating) the intent recognized by the user terminal 100 using user data.
- the user terminal 100 may receive a user input to select at least one intent through the intent list 621.
- the user terminal 100 may display the selected intent in the intent list 621.
- the user terminal 100 may display the selected app (e.g., "order americano”, “add whipping cream”, and “make a payment with Samsung Pay") through indicators 621a_1, 621a_2, and 621a_3.
- the display device 860 may visually provide information to the outside (e.g., a user) of the electronic device 801.
- the display device 860 may include, for example, a display, a hologram device, or a projector and control circuitry to control a corresponding one of the display, hologram device, and projector.
- the display device 860 may include touch circuitry adapted to detect a touch, or sensor circuitry (e.g., a pressure sensor) adapted to measure the intensity of force incurred by the touch.
- a connecting terminal 878 may include a connector via which the electronic device 801 may be physically connected with the external electronic device (e.g., the electronic device 802).
- the connecting terminal 878 may include, for example, a HDMI connector, a USB connector, a SD card connector, or an audio connector (e.g., a headphone connector).
- the wireless communication module 892 may identify and authenticate the electronic device 801 in a communication network, such as the first network 898 or the second network 899, using subscriber information (e.g., international mobile subscriber identity (IMSI)) stored in the subscriber identification module 896.
- subscriber information e.g., international mobile subscriber identity (IMSI)
- the instructions may cause the processor to select at least one of the plurality of NLU models when the number of times that the specified voice input of the user is received is not less than a specified value during a specified period.
- the instructions may cause the processor to determine an intent corresponding to the voice input and to select the at least one NLU model based on at least one voice input corresponding to the specified intent when determining a specified intent more than a specified count.
- the instructions may cause the processor to select the at least one NLU model corresponding to an intent for performing a function of a specified application program.
- the selecting of the at least one of the plurality of NLU models may include determining an intent corresponding to the voice input and selecting the at least one NLU model based on at least one voice input corresponding to the specified intent when determining a specified intent more than a specified count.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
A system includes at least one communication interface, at least one processor operatively connected to the at least one communication interface, and at least one memory operatively connected to the at least one processor and storing a plurality of natural language understanding (NLU) models. The at least one memory stores instructions that, when executed, cause the processor to receive first information associated with a user from an external electronic device associated with a user account, using the at least one communication interface, to select at least one of the plurality of NLU models, based on at least part of the first information, and to transmit the selected at least one NLU model to the external electronic device, using the at least one communication interface such that the external electronic device uses the selected at least one NLU model for natural language processing.
Description
The disclosure relates to a technology for processing a user utterance.
In addition to a conventional input scheme using a keyboard or a mouse, electronic devices have recently supported various input schemes such as a voice input and the like. For example, the electronic devices such as a smartphone or a tablet PC may recognize the voice of a user input in a state where a speech recognition service is executed and may execute an action corresponding to a voice input or may provide the result found depending on the voice input.
Nowadays, the speech recognition service is being developed based on a technology processing a natural language. The technology processing the natural language refers to a technology that grasps the intent of the user utterance and provides the user with the result suitable for the intent.
The above information is presented as background information only to assist with an understanding of the disclosure. No determination has been made, and no assertion is made, as to whether any of the above might be applicable as prior art with regard to the disclosure.
Due to hardware limitations, a user terminal may process only voice inputs of the limited number. The user terminal may transmit another voice input other than the voice inputs of the limited number to an external server, may receive the response, and may process the received voice input. The voice inputs of the limited number may be configured to be processed by the user terminal, as the voice that a user is expected to enter frequently. As such, the user terminal may increase the overall voice input processing speed. However, because the voice entered frequently for each user is different and the voice input entered frequently as time goes on is changed in spite of the same user, the overall voice input processing speed may not increase depending on a user.
The user terminal according to various embodiments of the present disclosure may provide a user with the personalized voice input processing system, using user information.
Aspects of the disclosure are to address at least the above-mentioned problems and/or disadvantages and to provide at least the advantages described below.
In accordance with an aspect of the disclosure, a system may include at least one communication interface, at least one processor operatively connected to the at least one communication interface, and at least one memory operatively connected to the at least one processor and storing a plurality of natural language understanding (NLU) models. The at least one memory may store instructions that, when executed, cause the processor to receive first information associated with a user from an external electronic device associated with a user account, using the at least one communication interface, to select at least one of the plurality of NLU models, based on at least part of the first information, and to transmit the selected at least one NLU model to the external electronic device, using the at least one communication interface such that the external electronic device uses the selected at least one NLU model for natural language processing.
In accordance with another aspect of the disclosure, a controlling method of a system for updating an NLU model may include receiving first information associated with a user from an external electronic device associated with a user account, selecting at least one of the plurality of NLU models, based on at least part of the first information, and transmitting the selected at least one NLU model to the external electronic device, using at least one communication interface such that the external electronic device uses the selected at least one NLU model for natural language processing.
According to embodiments disclosed in the disclosure, the integrated intelligence system may provide the personalized voice input recognizing system by changing (or updating) a natural language understanding model of a user terminal for recognizing intents of the limited number by using user data. As such, the integrated intelligence system may provide a response corresponding to a rapid voice input.
Besides, a variety of effects directly or indirectly understood through the disclosure may be provided.
While the disclosure has been shown and described with reference to various embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents.
Although the present disclosure has been described with various embodiments, various changes and modifications may be suggested to one skilled in the art. It is intended that the present disclosure encompass such changes and modifications as fall within the scope of the appended claims.
The above and other aspects, features, and advantages of certain embodiments of the disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a block diagram illustrating an integrated intelligence system, according to an embodiment;
FIG. 2 is a diagram illustrating the form in which relationship information between a concept and an action is stored in a database, according to an embodiment;
FIG. 3 is a view illustrating a user terminal displaying a screen of processing a received voice input through an intelligent app, according to an embodiment;
FIG. 4a illustrates an intelligence system including a plurality of natural language platforms, according to an embodiment;
FIG. 4b illustrates another example of an intelligence system including a plurality of natural language platforms, according to an embodiment;
FIG. 5 is a flowchart illustrating a method of changing (or updating) an intent recognition model of a user terminal, according to an embodiment;
FIG. 6a is a view illustrating a screen for setting the intent recognized by a user terminal depending on an app installed in the user terminal, according to an embodiment;
FIG. 6b is a view illustrating a screen for setting the intent processed by a user terminal depending on the intent for performing the function of an app of a user terminal, according to an embodiment;
FIG. 7 is a view illustrating a screen for providing a user with information about the intent capable of being recognized by a user terminal, according to an embodiment; and
FIG. 8 illustrates a block diagram of an electronic device in a network environment, according to various embodiments.
Other aspects, advantages, and salient features of the disclosure will become apparent to those skilled in the art from the following detailed description, which, taken in conjunction with the annexed drawings, discloses various embodiments of the disclosure.
Before undertaking the BEST MODE below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms "include" and "comprise," as well as derivatives thereof, mean inclusion without limitation; the term "or," is inclusive, meaning and/or; the phrases "associated with" and "associated therewith," as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term "controller" means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely.
Moreover, various functions described below can be implemented or supported by one or more computer programs, each of which is formed from computer readable program code and embodied in a computer readable medium. The terms "application" and "program" refer to one or more computer programs, software components, sets of instructions, procedures, functions, objects, classes, instances, related data, or a portion thereof adapted for implementation in a suitable computer readable program code. The phrase "computer readable program code" includes any type of computer code, including source code, object code, and executable code. The phrase "computer readable medium" includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive, a compact disc (CD), a digital video disc (DVD), or any other type of memory. A "non-transitory" computer readable medium excludes wired, wireless, optical, or other communication links that transport transitory electrical or other signals. A non-transitory computer readable medium includes media where data can be permanently stored and media where data can be stored and later overwritten, such as a rewritable optical disc or an erasable memory device.
Definitions for certain words and phrases are provided throughout this patent document. Those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.
FIGS. 1 through 8, discussed below, and the various embodiments used to describe the principles of the present disclosure in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the disclosure. Those skilled in the art will understand that the principles of the present disclosure may be implemented in any suitably arranged system or device.
Hereinafter, various embodiments of the disclosure will be described with reference to accompanying drawings. However, those of ordinary skill in the art will recognize that modification, equivalent, and/or alternative on various embodiments described herein can be variously made without departing from the scope and spirit of the disclosure.
FIG. 1 is a block diagram illustrating an integrated intelligence system, according to an embodiment.
Referring to FIG. 1, an integrated intelligence system 10 according to an embodiment may include a user terminal 100, an intelligent server 200, and a service server 300.
The user terminal 100 according to an embodiment may be a terminal device (or an electronic device) capable of connecting to Internet, and may be, for example, a mobile phone, a smartphone, a personal digital assistant (PDA), a notebook computer, TV, a white household appliance, a wearable device, a head mount display (HMD), or a smart speaker.
According to an embodiment, the user terminal 100 may include a communication interface 110, a microphone 120, a speaker 130, a display 140, a memory 150, and a processor 160. The listed components may be operatively or electrically connected to one another.
According to an embodiment, the communication interface 110 may be configured to transmit or receive data to or from an external device. According to an embodiment, the microphone 120 may receive a sound (e.g., a user utterance) to convert the sound into an electrical signal. According to an embodiment, the speaker 130 may output the electrical signal as a sound (e.g., voice). According to an embodiment, the display 140 may be configured to display an image or a video. According to an embodiment, the display 140 may display the graphic user interface (GUI) of the running app (or an application program).
According to an embodiment, the memory 150 may store a client module 151, a software development kit (SDK) 153, and a plurality of apps 155. The client module 151 and the SDK 153 may constitute a framework (or a solution program) for performing general-purposed functions. Furthermore, the client module 151 or the SDK 153 may constitute the framework for processing a voice input.
According to an embodiment, the plurality of apps 155 in the memory 150 may be a program for performing the specified function. According to an embodiment, the plurality of apps 155 may include a first app 155_1 and a second app 155_3. According to an embodiment, each of the plurality of apps 155 may include a plurality of actions for performing the specified function. For example, a plurality of apps 155 may include at least one of an alarm app, a message app, or a schedule app. According to an embodiment, the plurality of apps 155 may be executed by the processor 160 to sequentially execute at least part of the plurality of actions.
According to an embodiment, the processor 160 may control overall operations of the user terminal 100. For example, the processor 160 may be electrically connected to the communication interface 110, the microphone 120, the speaker 130, the display 140, and the memory 150 to perform a specified action.
According to an embodiment, the processor 160 may also execute the program stored in the memory 150 to perform the specified function. For example, the processor 160 may execute at least one of the client module 151 or the SDK 153 to perform the following actions for processing a voice input. The processor 160 may control the actions of the plurality of apps 155 via the SDK 153. The following actions described as the actions of the client module 151 or the SDK 153 may be the action by the execution of the processor 160.
According to an embodiment, the client module 151 may receive a voice input. For example, the client module 151 may receive a voice signal corresponding to a user utterance detected via the microphone 120. The client module 151 may transmit the received voice input to the intelligent server 200. According to an embodiment, the client module 151 may transmit the state information of the user terminal 100 together with the received voice input, to the intelligent server 200. For example, the state information may be the execution state information of an app.
According to an embodiment, the client module 151 may receive the result corresponding to the received voice input. For example, the client module 151 may receive the result corresponding to the received voice input from the intelligent server 200. The client module 151 may display the received result in the display 140.
According to an embodiment, the client module 151 may receive the plan corresponding to the received voice input. The client module 151 may display the result of executing a plurality of actions of an app in the display 140 depending on the plan. For example, the client module 151 may sequentially display the execution result of a plurality of actions in a display. For another example, the user terminal 100 may display only a part of results (e.g., the result of the last action) of executing a plurality of actions, on the display.
According to an embodiment, the client module 151 may receive a request for obtaining information necessary to calculate the result corresponding to a voice input, from the intelligent server 200. For example, the information necessary to calculate the result may be the state information of the user terminal 100. According to an embodiment, the client module 151 may transmit the necessary information to the intelligent server 200 in response to the request.
According to an embodiment, the client module 151 may transmit information about the result of executing a plurality of actions depending on the plan, to the intelligent server 200. The intelligent server 200 may determine that the received voice input is processed correctly, through the result information.
According to an embodiment, the client module 151 may include a voice recognition module. According to an embodiment, the client module 151 may recognize a voice input to perform the limited function, via the voice recognition module. For example, the client module 151 may launch an intelligent app that processes a voice input for performing an organic action, via a specified input (e.g., wake up!).
According to an embodiment, the intelligent server 200 may receive the information associated with a user's voice input from the user terminal 100 over a communication network. According to an embodiment, the intelligent server 200 may change the data associated with the received voice input to text data. According to an embodiment, the intelligent server 200 may generate a plan for performing a task corresponding to a user voice input, based on the text data.
According to an embodiment, the plan may be generated by an artificial intelligent (AI) system. The AI system may be a rule-based system, or may be a neural network-based system (e.g., a feedforward neural network (FNN) or a recurrent neural network (RNN)). Alternatively, the AI system may be a combination of the above-described systems or an AI system different from the above-described system. According to an embodiment, the plan may be selected from a set of predefined plans or may be generated in real time in response to a user request. For example, the AI system may select at least one plan of the plurality of predefined plans.
According to an embodiment, the intelligent server 200 may transmit the result calculated depending on the generated plan to the user terminal 100 or may transmit the generated plan to the user terminal 100. According to an embodiment, the user terminal 100 may display the result calculated depending on the plan, on a display. According to an embodiment, the user terminal 100 may display the result of executing the action according to the plan, on the display.
The intelligent server 200 according to an embodiment may include a front end 210, a natural language platform 220, a capsule DB 230, an execution engine 240, an end user interface 250, a management platform 260, a big data platform 270, and an analytic platform 280.
According to an embodiment, the front end 210 may receive a voice input received from the user terminal 100. The front end 210 may transmit a response corresponding to the voice input.
According to an embodiment, the natural language platform 220 may include an automatic speech recognition (ASR) module 221, a natural language understanding (NLU) module 223, a planner module 225, a natural language generator (NLG) module 227, and a text to speech module (TTS) module 229.
According to an embodiment, the ASR module 221 may convert the voice input received from the user terminal 100 to text data. According to an embodiment, the NLU module 223 may grasp the intent of the user, using the text data of the voice input. For example, the NLU module 223 may grasp the intent of the user by performing syntactic analysis or semantic analysis. According to an embodiment, the NLU module 223 may grasp the meaning of words extracted from the voice input by using linguistic features (e.g., syntactic elements) such as morphemes or phrases and may determine the intent of the user by matching the grasped meaning of the words to an intent.
According to an embodiment, the planner module 225 may generate the plan by using the intent and a parameter, which are determined by the NLU module 223. According to an embodiment, the planner module 225 may determine a plurality of domains necessary to perform a task, based on the determined intent. The planner module 225 may determine a plurality of actions included in each of the plurality of domains determined based on the intent. According to an embodiment, the planner module 225 may determine the parameter necessary to perform the determined plurality of actions or the result value output by the execution of the plurality of actions. The parameter and the result value may be defined as a concept associated with the specified form (or class). As such, the plan may include the plurality of actions and a plurality of concepts determined by the intent of the user. The planner module 225 may determine the relationship between the plurality of actions and the plurality of concepts stepwise (or hierarchically). For example, the planner module 225 may determine the execution sequence of the plurality of actions, which are determined based on a user's intent, based on the plurality of concepts. In other words, the planner module 225 may determine the execution sequence of the plurality of actions, based on the parameters necessary to perform the plurality of actions and the result output by the execution of the plurality of actions. As such, the planner module 225 may generate a plan including information (e.g., ontology) of the relationship between a plurality of actions and a plurality of concepts. The planner module 225 may generate the plan, using the information stored in the capsule DB 230 storing a set of relationships between concepts and actions.
According to an embodiment, the NLG module 227 may change the specified information into information in the text form. The information changed to the text form may be a form of a natural language utterance. The TTS module 229 according to an embodiment may change information of the text form to information of a voice form.
According to an embodiment, the capsule DB 230 may store information about the relationship between the actions and the plurality of concepts corresponding to a plurality of domains. For example, the capsule DB 230 may store a plurality of capsules including a plurality of action objects (or action information) and concept objects (or concept information) of the plan. According to an embodiment, the capsule DB 230 may store the plurality of capsules in the form of a concept action network (CAN). According to an embodiment, the plurality of capsules may be stored in the function registry included in the capsule DB 230.
According to an embodiment, the capsule DB 230 may include a strategy registry that stores strategy information necessary to determine a plan corresponding to a voice input. The strategy information may include reference information for determining a single plan when there are a plurality of plans corresponding to the voice input. According to an embodiment, the capsule DB 230 may include a follow up registry that stores the information of the follow-up action for suggesting a follow-up action to the user in the specified context. For example, the follow-up action may include a follow-up utterance. According to an embodiment, the capsule DB 230 may include a layout registry for storing layout information of the information output via the user terminal 100. According to an embodiment, the capsule DB 230 may include a vocabulary registry that stores vocabulary information included in the capsule information. According to an embodiment, the capsule DB 230 may include a dialog registry that stores information about dialog (or interaction) with the user.
According to an embodiment, the capsule DB 230 may update the stored object via a developer tool. For example, the developer tool may include a function editor for updating an action object or a concept object. The developer tool may include a vocabulary editor for updating the vocabulary. The developer tool may include a strategy editor that generates and registers a strategy for determining the plan. The developer tool may include a dialog editor that creates a dialog with the user. The developer tool may include a follow up editor capable of activating the follow-up target and editing the follow-up utterance for providing a hint. The follow-up target may be determined based on the currently set target, the preference of the user, or environment condition.
According to an embodiment, the capsule DB 230 may be implemented in the user terminal 100. In other words, the user terminal 100 may include the capsule DB 230 storing information for determining the action corresponding to the voice input.
According to an embodiment, the execution engine 240 may calculate the result, using the generated plan. According to an embodiment, the end user interface 250 may transmit the calculated result to the user terminal 100. As such, the user terminal 100 may receive the result and may provide the user with the received result. According to an embodiment, the management platform 260 may manage information used by the intelligent server 200. According to an embodiment, the big data platform 270 may collect data of the user. According to an embodiment, the analytic platform 280 may manage the quality of service (QoS) of the intelligent server 200. For example, the analytic platform 280 may manage the component and processing speed (or efficiency) of the intelligent server 200.
According to an embodiment, the service server 300 may provide the user terminal 100 with a specified service (e.g., food order or hotel reservation). According to an embodiment, the service server 300 may be a server operated by the third party. For example, the service server 300 may include a first service server 301, a second service server 302, and a third service server 305, which are operated by different third parties. According to an embodiment, the service server 300 may provide the intelligent server 200 with information for generating a plan corresponding to the received voice input. For example, the provided information may be stored in the capsule DB 230. Furthermore, the service server 300 may provide the intelligent server 200 with result information according to the plan.
In the above-described integrated intelligence system 10, the user terminal 100 may provide the user with various intelligent services in response to a user input. The user input may include, for example, an input through a physical button, a touch input, or a voice input.
According to an embodiment, the user terminal 100 may provide a speech recognition service via an intelligent app (or a speech recognition app) stored therein. In this case, for example, the user terminal 100 may recognize the user utterance or the voice input received via the microphone and may provide the user with a service corresponding to the recognized voice input.
According to an embodiment, the user terminal 100 may perform a specified action, based on the received voice input, exclusively, or together with the intelligent server and/or the service server. For example, the user terminal 100 may execute an app corresponding to the received voice input and may perform the specified action via the executed app.
According to an embodiment, when the user terminal 100 provides a service together with the intelligent server 200 and/or the service server, the user terminal may detect a user utterance, using the microphone 120 and may generate a signal (or voice data) corresponding to the detected user utterance. The user terminal may transmit the voice data to the intelligent server 200, using the communication interface 110.
According to an embodiment, the intelligent server 200 may generate a plan for performing a task corresponding to the voice input or the result of performing an action depending on the plan, as the response to the voice input received from the user terminal 100. For example, the plan may include a plurality of actions for performing the task corresponding to the voice input of the user and a plurality of concepts associated with the plurality of actions. The concept may define a parameter to be input for the execution of the plurality of actions or a result value output by the execution of the plurality of actions. The plan may include relationship information between a plurality of actions and a plurality of concepts.
According to an embodiment, the user terminal 100 may receive the response, using the communication interface 110. The user terminal 100 may output the voice signal generated in user terminal 100, to the outside using the speaker 130 or may output an image generated in the user terminal 100, to the outside using the display 140.
FIG. 2 is a diagram illustrating the form in which relationship information between a concept and an action is stored in a database, according to various embodiments.
The capsule database (e.g., the capsule DB 230) of the intelligent server 200 may store a plurality of capsules in the form of a concept action network (CAN) 400. The capsule database may store an action for processing a task corresponding to a voice input and a parameter necessary for the action, in the CAN form. The CAN may indicate an organic relationship between the action and a concept defining the parameter necessary to perform the action.
The capsule database may store a plurality of capsules (e.g., capsule A 401 and capsule B 402) respectively corresponding to a plurality of domains (e.g., applications). According to an embodiment, a single capsule (e.g., the capsule A 401) may correspond to one domain (e.g., an application). Furthermore, the single capsule may correspond to at least one service provider (e.g., CP 1 402, CP 2 403, CP 3 406, or CP 4 405) for performing the function of the domain associated with the capsule. According to an embodiment, the single capsule may include at least one or more actions 410 and at least one or more concepts 420 for performing a specified function.
According to an embodiment, the natural language platform 220 may generate a plan for performing a task corresponding to the received voice input, using the capsule stored in the capsule database. For example, the planner module 225 of the natural language platform may generate a plan, using the capsule stored in the capsule database. For example, a plan 407 may be generated using actions 4011 and 4013 and concepts 4012 and 4014 of the capsule A 401 and an action 4041 and a concept 4042 of the capsule B 402.
FIG. 3 is a view illustrating a screen in which a user terminal processes a received voice input through an intelligent app, according to various embodiments.
The user terminal 100 may execute an intelligent app to process a user input through the intelligent server 200.
According to an embodiment, in screen 310, when recognizing a specified voice input (e.g., wake up!) or receiving an input via a hardware key (e.g., the dedicated hardware key), the user terminal 100 may launch an intelligent app for processing a voice input. For example, the user terminal 100 may launch an intelligent app in a state in which a schedule app is being executed. According to an embodiment, the user terminal 100 may display an object (e.g., an icon) 311 corresponding to the intelligent app, in the display 140. According to an embodiment, the user terminal 100 may receive a voice input by a user utterance. For example, the user terminal 100 may receive a voice input saying that "Let me know the schedule of this week!". According to an embodiment, the user terminal 100 may display a user interface (UI) 313 (e.g., an input window) of an intelligent app, in which text data of the received voice input is displayed, in a display
According to an embodiment, in screen 320, the user terminal 100 may display the result corresponding to the received voice input, in the display. For example, the user terminal 100 may receive the plan corresponding to the received user input and may display 'the schedule of this week' in the display depending on the plan.
FIG. 4a illustrates an intelligence system including a plurality of natural language platforms, according to an embodiment.
Referring to FIG. 4a, the integrated intelligence system 10 may include the user terminal 100 and the intelligent server 200.
According to an embodiment, each of the user terminal 100 and the intelligent server 200 may include natural language platforms 170 and 220. In other words, in addition to the intelligent server 200, the user terminal 100 may include the second natural language platform 170 for processing the received voice input. For example, the user terminal 100 may include an on device natural language understanding module 173. According to an embodiment, the natural language platforms 170 and 220 of the intelligent server 200 and the user terminal 100 may process the voice input received complementarily. For example, the second natural language platform 170 of the user terminal 100 may process a part of voice inputs capable of being processed by the first natural language platform 220 of the intelligent server 200. In other words, the second natural language platform 170 of the user terminal 100 may process the limited voice input, compared with the first natural language platform 220 of the intelligent server 200.
According to an embodiment, the intelligent server 200 may process the voice input received from the user terminal 100. Furthermore, the intelligent server 200 may change (or upgrade) the voice input processed by the second natural language platform 170 of the user terminal 100.
According to an embodiment, the intelligent server 200 may include the front end 210, the first natural language platform 220, and an NLU management module 290. The intelligent server 200 may be illustrated while a part of components of the intelligent server 200 of FIG. 1 is omitted. In other words, the intelligent server 200 may further include the remaining components of the intelligent server 200.
According to an embodiment, the intelligent server 200 may include a communication interface, a memory, and a processor. The processor may transmit or receive data (or information) to or from an external electronic device (e.g., the user terminal 100) through the communication interface. The processor may execute instructions stored in the memory to perform the actions of the front end 210, the first natural language platform 220, and the NLU management module 290.
According to an embodiment, the front end 210 is connected to the user terminal 100 to receive information associated with a user. For example, the information associated with the user may include at least one of the user's voice input, information of the user terminal 100, or the user's preference information.
According to an embodiment, the first natural language platform 220 may process the user's voice input. The first natural language platform 220 may not be limited to a specific voice input and may process various voice inputs. According to an embodiment, the first natural language platform 220 may include the first ASR module 221, the first NLU module 223, the first planner module 225, and a first TTS module 229.
According to an embodiment, the first ASR module 221 may generate text data corresponding to the received voice input. The first NLU module 223 may determine the user's intent and a parameter, using the text data. The first planner module 225 may generate the plan corresponding to the received voice input. The plan may be determined based on the determined intent and the determined parameter. According to an embodiment, the intelligent server 200 may calculate the result using the generated plan and may transmit the calculated result to the user terminal 100. Furthermore, the intelligent server 200 may directly transmit the generated plan to the user terminal 100. The user terminal 100 may sequentially perform specified actions based on the plan.
According to an embodiment, the first TTS module 229 may generate a voice signal for interacting with a user. According to an embodiment, the first TTS module 229 may convert the text data into a voice signal. According to an embodiment, the user terminal 100 may receive the voice signal from the intelligent server 200 to output guide information.
According to an embodiment, the NLU management module 290 may manage the second NLU module 173 of the user terminal 100. For example, the NLU management module 290 may manage an NLU module (e.g., the second NLU module 173) of at least one electronic device.
According to an embodiment, the NLU management module 290 may select at least one of NLU models based on at least part of information associated with a user and may transmit the selected at least one NLU model to the user terminal 100.
According to an embodiment, the NLU management module 290 may include an NLU management module 291, an NLU modeling module 292, a model training system 293, an NLU model database (DB) 294, a user data manager module 295, and a user history DB 296.
According to an embodiment, the NLU management module 291 may determine whether to change (or update) an NLU model used by the second NLU module 173 of the user terminal 100. The NLU management module 291 may receive at least one voice input from the user data manager module 295 and may determine whether to change the NLU model based on the received voice input.
According to an embodiment, the NLU management module 291 may include a model generating manager 291a and an update manager 291b. According to an embodiment, when the model generating manager 291a determines to change the NLU model of the user terminal 100, the model generating manager 291a may transmit an NLU model generation request to the NLU modeling module 292. According to an embodiment, the update manager 291b may transmit the generated NLU model to the user terminal 100.
According to an embodiment, when receiving the NLU generation request, the NLU modeling module 292 may generate an NLU model for recognizing a specified intent through the model training system 293. According to an embodiment, the model training system 293 may repeatedly perform the training of a model for recognizing the specified intent. As such, the model training system 293 may generate an NLU model for accurately recognizing the specified intent. According to an embodiment, the generated NLU model may include an intent set for recognizing a plurality of intents. In other words, the generated NLU model may correspond to intents of the specified number. According to an embodiment, the generated NLU model may be stored in the NLU model DB 294. According to an embodiment, the update manager 291b of the NLU management module 291 may transmit the NLU model stored in the NLU model DB 294 to the user terminal 100.
According to an embodiment, the user data manager module 295 may store the information associated with the user received from the user terminal 100, in the user history DB 296. For example, the information associated with the user may include at least one of the user's voice input, information of the user terminal 100, or the user's preference information. For example, the user terminal 100 may be a device that is logged in with a user account. As such, the information of the user terminal 100 may include information (e.g., identification information or setting information) of the logged-in user. According to an embodiment, the user data manager module 295 may store the information of the user terminal 100 in the user history DB 296. According to an embodiment, the user data manager module 295 may store processed information of the received voice input, in the user history DB 296. For example, the user data manager module 295 may store information associated with the intent of the recognized voice input, in the user history DB 296. For example, the information associated with the intent may include user log information. According to an embodiment, the user data manager module 295 may store preference information. For example, the preference information may include the app selected by a user or information about an intent.
According to an embodiment, the user data manager module 295 may analyze information associated with the user stored in the user history DB 296. For example, the user data manager module 295 may identify the intent processed by the user terminal 100 by analyzing the user log. For example, as illustrated in Table 1, the user log may include identification information of a plan, information about the name of an app, information about a user utterance, or the like. The user data manager module 295 may determine the recognized intent, using the identification information of a plan included in log information.
{"body": {"commandType": 1,"pathRule": {"apps": [],"isRoot": true,"pathRuleId": "Gallery_101", // the identifier of a plan "seqNums": 3,"states": [{"appName": "Gallery", // the name of an app"parameters": [{"parameterName": "title""slotName": "title""slotNum": 0,"slotValue": "document"}, {"parameterName": "searchContentType""slotName": "searchContentType""slotNum": 1,"slotValue": "image"}],"seqNum": 3,"stateId": "SearchViewResult"}],"utterance": "find document pictures in a gallery" // user utterance },"category": "pathrule_result"},"header": {"appName": "","appVersion": ""."specVersion": "0.72","timestamp": 1498809446662,"lang": "ko_KR","tpo_app": "com.sec.android.app.launcher","tpo_dofw": "6","tpo_hour": "16","tpo_plc_geohash": "wyd7gn","tpo_plc_id": "WORK,DAILY_LIVING_AREA,HOME_COUNTRY","tpo_yyyymmdd": "20170630"}} |
According to an embodiment, the user data manager module 295 may extract information about the intent corresponding to a specified condition, from the user history DB 296. For example, the user data manager module 295 may extract information (e.g., top 20 intents) about at least one intent, which is recognized at a high frequency during a specified period (e.g., one week). For another example, the user data manager module 295 may extract information about an intent used at a specific location (or place). For another example, the user data manager module 295 may extract information about the intent included in a domain corresponding to a specific app (or an application program). For another example, the user data manager module 295 may extract information about the intent for performing a function associated with the connection to a network (e.g., Wireless Fidelity (Wi-Fi)). According to an embodiment, the user data manager module 295 may extract intents of the specified number. For example, the specified number may be selected by the user. According to an embodiment, the user data manager module 295 may generate an intent set including the extracted intent.
According to an embodiment, the intelligent server 200 may train the criterion for extracting the intent, using artificial intelligence (AI). In other words, the criterion for extracting the intent in the user terminal 100 may be updated through machine learning.
According to an embodiment, the user data manager module 295 may transmit the information about the extracted intent to the NLU management module 291. For example, the extracted information may include at least one voice input corresponding to the extracted intent. According to an embodiment, the NLU management module 291 may generate an NLU model for recognizing a specified intent, using the received voice input and may provide the generated NLU model to the user terminal 100. As such, the intelligent server 200 may provide the user terminal 100 with the personalized natural language recognition model.
According to an embodiment, the user terminal 100 may include the second natural language platform 170. According to an embodiment, the second natural language platform 170 may include a second ASR module 171, the second NLU module 173, a second planner module 175, and a second TTS module 177. For example, the second ASR module 171, the second NLU module 173, the second planner module 175, and the second TTS module 177 may be embedded modules for performing a specified function. According to an embodiment, the user terminal 100 may be similar to the user terminal 100 of FIG. 1. For example, the user terminal 100 may additionally include the configuration of the user terminal 100 of FIG. 1, as well as the configuration illustrated in FIG. 4a.
According to an embodiment, the user terminal 100 may receive a voice input. According to an embodiment, the user terminal 100 may process the received voice input through the second ASR module 171, the second NLU module 173, and the second planner module 175. For example, the second ASR module 171, the second NLU module 173, and second planner module 175 of the user terminal 100 may process the voice input, similarly to the first ASR module 221, the first NLU module 223, and the first planner module 225 of the intelligent server 200. According to an embodiment, the second NLU module 173 may determine the intent of the received voice input. The second NLU module 173 may determine the intent corresponding to the voice input, using the NLU model.
According to an embodiment, the user terminal 100 may process only the voice input corresponding to the intents of the limited number, through the second ASR module 171, the second NLU module 173, and the second planner module 175. For example, the intent capable of being recognized by the user terminal 100 may be a part of intents capable of being recognized by the intelligent server 200.
According to an embodiment, when the user terminal 100 directly processes the received voice input, the user terminal 100 may rapidly process the received voice input, compared with the case where the voice input is processed through the intelligent server 200. However, the user terminal 100 may process only the voice inputs of the specified number due to the limitation of hardware performance. According to an embodiment, the user terminal 100 may complementarily process the received voice input together with the intelligent server 200. For example, the user terminal 100 may directly process the voice input corresponding to the intent at the recognized frequency; for another example, the user terminal 100 may directly process the voice input corresponding to the intent corresponding to the intent selected by a user. According to an embodiment, the user terminal 100 may process the voice input corresponding to the remaining intents through the intelligent server 200.
According to an embodiment, the intent capable of being recognized by the user terminal 100 may be changed (or updated) through the intelligent server 200. According to an embodiment, the intent capable of being recognized by the user terminal 100 may be determined based on the usage history of the user. In other words, the changed intent may be determined based on the usage history of the user. According to an embodiment, the user terminal 100 may receive the NLU model corresponding to the determined intent, from the intelligent server 200. According to an embodiment, the user terminal 100 may store the received NLU model in a database. For example, the user terminal 100 may store the NLU model corresponding to the determined intent in the database instead of the previously stored NLU model.
According to an embodiment, the user terminal 100 may generate a voice signal for interacting with a user through the second TTS module 177. The second TTS module 177 of the user terminal 100 may generate the voice signal, similarly to the first TTS module 229 of the intelligent server 200.
As such, the integrated intelligence system 10 may provide the user with the personalized voice input processing service by changing (or updating) the intent capable of being recognized by the user terminal 100 using user data. The user terminal 100 may rapidly provide the response corresponding to the voice input, which is frequently used or selected by the user.
FIG. 4b illustrates another example of an intelligence system including a plurality of natural language platforms, according to an embodiment. The configuration of the user terminal 100 illustrated in FIG. 4b is one possible configuration, and the user terminal 100 may further include at least one of components illustrated in FIG. 1, 4a, or 8, in addition to the configuration illustrated in FIG. 4b.
Referring to FIG. 4b, an integrated intelligence system 20 may further include an edge server 1400 between the user terminal 100 and the intelligent server 200. According to an embodiment, the edge server 1400 may include at least one of a mobile edge computing (MEC) server or a fog computing server. The edge server 1400 may be positioned at a location that is geographically closer to the user terminal 100 than the intelligent server 200. For example, the edge server 1400 may be positioned inside or around the base station that provides wireless communication to the user terminal 100. When the user terminal 100 requires low latency, the user terminal 100 may transmit or receive data to and from the edge server 1400 located at a geographically close location, instead of transmitting or receiving data to and from the intelligent server 200.
According to an embodiment, the edge server 1400 may include a third natural language platform 1470 including the function of an on-device natural language platform (e.g., the second natural language platform 170 of FIG. 4a). The third natural language platform 1470 may perform a function the same as or similar to the function of the second natural language platform 170. The third natural language platform 1470 may include a third ASR module 1471, a third NLU module 1473, a third planner module 1475, and a third TTS module 1477. The edge server 1400 may compensate for the limitation of the hardware performance of the user terminal 100 while providing data at a low latency, by replacing the function of the second natural language platform 170. Although not illustrated in FIG. 4b, the edge server 1400 may further include a module performing the function of a front end (e.g., 201 of FIG. 4b) configured to transmit data to the user terminal 100 or the intelligent server 200.
For example, when the user terminal 100 receives a user utterance through the microphone 120, the user terminal 100 may generate a voice signal corresponding to the received user utterance. The user terminal 100 may make a request for processing of the voice input, by transmitting the voice input to the edge server 1400 through the communication interface 110. The edge server 1400 may process the voice input received through the third natural language platform 1470. For example, the third ASR module 1471 may convert the voice input received from the user terminal 100 to text data. The third NLU module 1473 may determine the user's intent corresponding to the voice input; the third planner module 1475 may generate a plan according to the determined intent; the third TTS module 1477 may generate the voice signal for interacting with the user. The user terminal 100 may receive the voice signal and then may output guide information through the speaker 130.
According to an embodiment, the edge server 1400 may quickly process the voice signal compared with the intelligent server 200, while replacing the function of the user terminal 100. However, because the hardware performance of the edge server 1400 is limited compared to the hardware performance of the intelligent server 200, the number of voice inputs capable of being processed by the third natural language platform 1470 may be limited. In this case, the edge server 1400 may induce the user terminal 100 to process a voice signal through the intelligent server 200.
For example, when the intent determined by the third NLU module 1473 is less than a specified level, the edge server 1400 may determine that the recognition of the intent fails. The specified level may be referred to as the confidence level. For example, the specified level may be a specified probability (e.g., 50%). When the determined intent is less than the specified level, the edge server 1400 may make a request for the processing of the voice signal to the intelligent server 200. For another example, the edge server 1400 may transmit information indicating that the determined intent is less than the specified level, to the user terminal 100. The user terminal 100 may make a request for the processing of the voice signal to the intelligent server 200. The intelligent server 200 may process the voice signal through the first natural language platform 220 and may transmit the processed result to the user terminal 100.
FIG. 5 is a flowchart illustrating a method of changing (or updating) an intent recognition model of a user terminal, according to an embodiment.
Referring to FIG. 5, the intelligent server 200 may change (or update) the voice recognition model of a user terminal (e.g., the user terminal 100 of FIG. 4a) for processing the voice inputs of the limited number in a user history, based on the recognition frequency of the intent corresponding to the voice input.
According to an embodiment, in operation 510, the intelligent server 200 (e.g., the user data manager module 295 of FIG. 4a) may analyze a user pattern. For example, the intelligent server 200 may analyze the pattern of the voice input. The intelligent server 200 may determine whether the specified intent is recognized more than the specified number of times.
According to an embodiment, in operation 520, the intelligent server 200 (e.g., the NLU management module 291 of FIG. 4a) may determine whether the update of the NLU module (e.g., the second NLU module 173 of FIG. 4a) of a user terminal is needed. For example, when the increment of the recognition frequency of the specified intent is not less than a specified value, the intelligent server 200 may determine that the update of the user terminal is needed. For another example, when different preference information is changed, the intelligent server 200 may determine that the update of the user terminal is needed. For example, when a user's favorite app or intent is changed (e.g., deleted or registered), the intelligent server 200 may receive the changed information. For another example, when the information associated with the user terminal 100 is changed, the intelligent server 200 may determine that the update of the user terminal is needed. For example, when another device logged in with the same user account is connected or when the same device is logged with another user account, the intelligent server 200 may receive the changed information.
According to an embodiment, when there is no need for the update of the second NLU module 173 (No), the intelligent server 200 may terminate a procedure for changing the NLU model of the user terminal.
According to an embodiment, when the update of the second NLU module 173 is needed (Yes), in operation 530, the intelligent server 200 (e.g., the NLU management module 291 of FIG. 4a) may determine the range of the intent capable of being processed (or recognized) by the user terminal. For example, the intelligent server 200 may determine the intents of the specified number, which have the high recognition frequency, as the range of the intent capable of being recognized by the user terminal. For another example, the intelligent server 200 may determine all or part of intents included in the domain corresponding to an app, as the range of the intent capable of being recognized by the user terminal. For another example, the intelligent server 200 may determine the intent associated with the function to limit the connection to an external device, as the range of the intent capable of being recognized by the user terminal. For another example, the intelligent server 200 may determine the intent recognized by the specific kind of electronic device (e.g., an electronic device not including a display), within the range of the intent capable of being recognized by the user terminal.
According to an embodiment, in operation 540, the intelligent server 200 (e.g., the NLU modeling module 292 of FIG. 4a) may generate a natural language recognition model for recognizing the intent included in the determined range.
According to an embodiment, in operation 550, the intelligent server 200 (e.g., the model training system 293 of FIG. 4a) may repeatedly train the model for recognizing the specified intent. As such, the intelligent server 200 may generate an NLU model for accurately recognizing the specified intent.
According to an embodiment, in operation 560, the intelligent server 200 (e.g., the NLU management module 291 of FIG. 4a) may transmit the generated NLU module to the user terminal 100. For example, the NLU module may be the personalized NLU model.
As such, the intelligent server 200 may implement the personalized voice processing system by changing (or updating) the intent recognized by the user terminal 100 using user data.
FIG. 6a is a view illustrating a screen for setting the intent recognized by a user terminal depending on an app installed in the user terminal, according to an embodiment.
Referring to FIG. 6a, the user terminal 100 may receive a user input to select the intent capable of being recognized by the user terminal 100. The user terminal 100 may generate preference information of a user associated with an app, based on the user input.
According to an embodiment, the user terminal 100 may display a first user interface (UI) 610 for receiving the user input, on a display (e.g., the display 140 of FIG. 1).
According to an embodiment, the user terminal 100 may display at least one app list 611 or 613 on the first UI 610. For example, the user terminal 100 may display the first app list 611, which is separated for each service, on the UI 610. For example, the first app list 611 may include an app (e.g., Starbucks and Hollys) 611a associated with coffee & beverages, an app (e.g., Domino's, Pizzahut, and TGIF) 611b associated with restaurants, and an app (e.g., Gmarket) 611c associated with shopping. The user terminal 100 may display the second app list 613 displayed based on a usage frequency, on the UI 610. The second app list 613 may include an app (e.g., Trip Advisor and Starbucks) 613a that is executed more than the specified number of times during a specified period. According to an embodiment, the apps included in the first app list 611 and the second app list 613 may be duplicated. As such, the user terminal 100 may receive a user input to select an app through the list 611 or 613 of an app.
According to an embodiment, the user terminal 100 may display the intent capable of being recognized by each app, in the list 611 or 613 of an app. For example, the user terminal 100 may display a representative utterance (e.g., identify my order) corresponding to a part of the recognizable intent. As such, the user terminal 100 may provide information about the intent capable of being recognized by the selected app.
According to an embodiment, the user terminal 100 may receive a user input to select an app through the app list 611 or 613. For example, the user terminal 100 may receive a user input to select an app (e.g., Starbucks or Dominos's) for each service in the first app list 611. Because intents for performing the similar function are duplicated, one app may be selected for each service. According to an embodiment, the user terminal 100 may display the selected app in the app list 611 or 613. For example, the user terminal 100 may display the selected app (e.g., Starbucks or Dominos's) through indicators 611a_1, 611b_1, and 613a_1.
According to an embodiment, the user terminal 100 may transmit information about the selected app to the intelligent server 200. In other words, the user terminal 100 may transmit preference information to the intelligent server 200. According to an embodiment, the intelligent server 200 may generate an NLU model for recognizing all or part of the intents included in the domain corresponding to the selected app and may transmit the generated NLU model to the user terminal 100. For example, the generated NLU model may be the personalized NLU model.
As such, the user terminal 100 may process a voice input for performing the function of the app selected by the user. In other words, the user terminal 100 may directly recognize the intent included in the domain corresponding to the selected app.
FIG. 6b is a view illustrating a screen for setting the intent processed by a user terminal depending on the intent for performing the function of an app of a user terminal, according to an embodiment.
Referring to FIG. 6b, the user terminal 100 may receive a user input to select the intent capable of being recognized by the user terminal 100. The user terminal 100 may generate preference information of a user associated with the intent, based on the user input.
According to an embodiment, the user terminal 100 may display a second UI 620 for receiving a user input, in a display.
According to an embodiment, the user terminal 100 may display an intent list 621 for performing the specified function of the specified app, on the second UI 620. According to an embodiment, the specified app may be an app selected by a user. For example, the intent list 621 may include a representative utterance (e.g., "order americano" or "add whipping cream") 621a corresponding to at least one intent. According to an embodiment, when receiving a user input to select an app through the app list 611 or 613 of the first UI 610 of FIG. 6a, the user terminal 100 may display the intent list 621 of the selected app on the second UI.
According to an embodiment, the user terminal 100 may receive a user input to select at least one intent through the intent list 621. According to an embodiment, the user terminal 100 may display the selected intent in the intent list 621. For example, the user terminal 100 may display the selected app (e.g., "order americano", "add whipping cream", and "make a payment with Samsung Pay") through indicators 621a_1, 621a_2, and 621a_3.
According to an embodiment, the user terminal 100 may transmit information about the selected intent to the intelligent server 200. For example, the selected intent may be the intent for performing the function of the app selected by a user. As such, the user terminal 100 may transmit the information about the selected app as well as the information about the selected intent to the intelligent server 200. According to an embodiment, the intelligent server 200 may generate an NLU model for recognizing the selected intent and may transmit the generated NLU model to the user terminal 100. For example, the generated NLU model may be the personalized NLU model.
As such, the user terminal 100 may process a voice input for performing the function selected by the user. In other words, the user terminal 100 may directly recognize the intent corresponding to the selected function.
FIG. 7 is a view illustrating a screen for providing a user with information about the intent capable of being recognized by a user terminal, according to an embodiment.
Referring to FIG. 7, the user terminal 100 may provide information 711 about a voice input processing system through a UI 710.
According to an embodiment, the user terminal 100 may provide state information of the voice input processing system. For example, the user terminal 100 may provide information about terms of service of the voice input processing system, service policy information, license information, and update information.
According to an embodiment, the user terminal 100 may provide information 711a associated with the intent capable of being directly recognized by the user terminal 100. For example, the user terminal 100 may provide the information 711a of an app performing the function corresponding to the intent capable of being directly recognized by the user terminal 100. For example, the information of the app may be the name of an app.
According to embodiments disclosed in the disclosure, the integrated intelligence system 10 or 20 described with reference to FIGS. 1 to 7 may provide the personalized voice input recognizing system by changing (or updating) a natural language understanding model of a user terminal for recognizing intents of the limited number by using user data. As such, the integrated intelligence system 10 or 20 may provide a rapid response corresponding to a voice input.
FIG. 8 is a block diagram illustrating an electronic device 801 in a network environment 800 according to various embodiments. Referring to FIG. 8, the electronic device 801 in the network environment 800 may communicate with an electronic device 802 via a first network 898 (e.g., a short-range wireless communication network), or an electronic device 804 or a server 808 via a second network 899 (e.g., a long-range wireless communication network). According to an embodiment, the electronic device 801 may communicate with the electronic device 804 via the server 808. According to an embodiment, the electronic device 801 may include a processor 820, memory 830, an input device 850, a sound output device 855, a display device 860, an audio module 870, a sensor module 876, an interface 877, a haptic module 879, a camera module 880, a power management module 888, a battery 889, a communication module 890, a subscriber identification module (SIM) 896, or an antenna module 897. In some embodiments, at least one (e.g., the display device 860 or the camera module 880) of the components may be omitted from the electronic device 801, or one or more other components may be added in the electronic device 801. In some embodiments, some of the components may be implemented as single integrated circuitry. For example, the sensor module 876 (e.g., a fingerprint sensor, an iris sensor, or an illuminance sensor) may be implemented as embedded in the display device 860 (e.g., a display).
The processor 820 may execute, for example, software (e.g., a program 840) to control at least one other component (e.g., a hardware or software component) of the electronic device 801 coupled with the processor 820, and may perform various data processing or computation. According to one embodiment, as at least part of the data processing or computation, the processor 820 may load a command or data received from another component (e.g., the sensor module 876 or the communication module 890) in volatile memory 832, process the command or the data stored in the volatile memory 832, and store resulting data in non-volatile memory 834. According to an embodiment, the processor 820 may include a main processor 821 (e.g., a central processing unit (CPU) or an application processor (AP)), and an auxiliary processor 823 (e.g., a graphics processing unit (GPU), an image signal processor (ISP), a sensor hub processor, or a communication processor (CP)) that is operable independently from, or in conjunction with, the main processor 821. Additionally or alternatively, the auxiliary processor 823 may be adapted to consume less power than the main processor 821, or to be specific to a specified function. The auxiliary processor 823 may be implemented as separate from, or as part of the main processor 821.
The auxiliary processor 823 may control at least some of functions or states related to at least one component (e.g., the display device 860, the sensor module 876, or the communication module 890) among the components of the electronic device 801, instead of the main processor 821 while the main processor 821 is in an inactive (e.g., sleep) state, or together with the main processor 821 while the main processor 821 is in an active state (e.g., executing an application). According to an embodiment, the auxiliary processor 823 (e.g., an image signal processor or a communication processor) may be implemented as part of another component (e.g., the camera module 880 or the communication module 890) functionally related to the auxiliary processor 823.
The memory 830 may store various data used by at least one component (e.g., the processor 820 or the sensor module 876) of the electronic device 801. The various data may include, for example, software (e.g., the program 840) and input data or output data for a command related thererto. The memory 830 may include the volatile memory 832 or the non-volatile memory 834.
The program 840 may be stored in the memory 830 as software, and may include, for example, an operating system (OS) 842, middleware 844, or an application 846.
The input device 850 may receive a command or data to be used by other component (e.g., the processor 820) of the electronic device 801, from the outside (e.g., a user) of the electronic device 801. The input device 850 may include, for example, a microphone, a mouse, a keyboard, or a digital pen (e.g., a stylus pen).
The sound output device 855 may output sound signals to the outside of the electronic device 801. The sound output device 855 may include, for example, a speaker or a receiver. The speaker may be used for general purposes, such as playing multimedia or playing record, and the receiver may be used for an incoming calls. According to an embodiment, the receiver may be implemented as separate from, or as part of the speaker.
The display device 860 may visually provide information to the outside (e.g., a user) of the electronic device 801. The display device 860 may include, for example, a display, a hologram device, or a projector and control circuitry to control a corresponding one of the display, hologram device, and projector. According to an embodiment, the display device 860 may include touch circuitry adapted to detect a touch, or sensor circuitry (e.g., a pressure sensor) adapted to measure the intensity of force incurred by the touch.
The audio module 870 may convert a sound into an electrical signal and vice versa. According to an embodiment, the audio module 870 may obtain the sound via the input device 850, or output the sound via the sound output device 855 or a headphone of an external electronic device (e.g., an electronic device 802) directly (e.g., wiredly) or wirelessly coupled with the electronic device 801.
The sensor module 876 may detect an operational state (e.g., power or temperature) of the electronic device 801 or an environmental state (e.g., a state of a user) external to the electronic device 801, and then generate an electrical signal or data value corresponding to the detected state. According to an embodiment, the sensor module 876 may include, for example, a gesture sensor, a gyro sensor, an atmospheric pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an infrared (IR) sensor, a biometric sensor, a temperature sensor, a humidity sensor, or an illuminance sensor.
The interface 877 may support one or more specified protocols to be used for the electronic device 801 to be coupled with the external electronic device (e.g., the electronic device 802) directly (e.g., wiredly) or wirelessly. According to an embodiment, the interface 877 may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, a secure digital (SD) card interface, or an audio interface.
A connecting terminal 878 may include a connector via which the electronic device 801 may be physically connected with the external electronic device (e.g., the electronic device 802). According to an embodiment, the connecting terminal 878 may include, for example, a HDMI connector, a USB connector, a SD card connector, or an audio connector (e.g., a headphone connector).
The haptic module 879 may convert an electrical signal into a mechanical stimulus (e.g., a vibration or a movement) or electrical stimulus which may be recognized by a user via his tactile sensation or kinesthetic sensation. According to an embodiment, the haptic module 879 may include, for example, a motor, a piezoelectric element, or an electric stimulator.
The camera module 880 may capture a still image or moving images. According to an embodiment, the camera module 880 may include one or more lenses, image sensors, image signal processors, or flashes.
The power management module 888 may manage power supplied to the electronic device 801. According to one embodiment, the power management module 888 may be implemented as at least part of, for example, a power management integrated circuit (PMIC).
The battery 889 may supply power to at least one component of the electronic device 801. According to an embodiment, the battery 889 may include, for example, a primary cell which is not rechargeable, a secondary cell which is rechargeable, or a fuel cell.
The communication module 890 may support establishing a direct (e.g., wired) communication channel or a wireless communication channel between the electronic device 801 and the external electronic device (e.g., the electronic device 802, the electronic device 804, or the server 808) and performing communication via the established communication channel. The communication module 890 may include one or more communication processors that are operable independently from the processor 820 (e.g., the application processor (AP)) and supports a direct (e.g., wired) communication or a wireless communication. According to an embodiment, the communication module 890 may include a wireless communication module 892 (e.g., a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 894 (e.g., a local area network (LAN) communication module or a power line communication (PLC) module). A corresponding one of these communication modules may communicate with the external electronic device via the first network 898 (e.g., a short-range communication network, such as Bluetooth쪠, wireless-fidelity (Wi-Fi) direct, or infrared data association (IrDA)) or the second network 899 (e.g., a long-range communication network, such as a cellular network, the Internet, or a computer network (e.g., LAN or wide area network (WAN)). These various types of communication modules may be implemented as a single component (e.g., a single chip), or may be implemented as multi components (e.g., multi chips) separate from each other. The wireless communication module 892 may identify and authenticate the electronic device 801 in a communication network, such as the first network 898 or the second network 899, using subscriber information (e.g., international mobile subscriber identity (IMSI)) stored in the subscriber identification module 896.
The antenna module 897 may transmit or receive a signal or power to or from the outside (e.g., the external electronic device) of the electronic device 801. According to an embodiment, the antenna module 897 may include an antenna including a radiating element composed of a conductive material or a conductive pattern formed in or on a substrate (e.g., PCB). According to an embodiment, the antenna module 897 may include a plurality of antennas. In such a case, at least one antenna appropriate for a communication scheme used in the communication network, such as the first network 898 or the second network 899, may be selected, for example, by the communication module 890 (e.g., the wireless communication module 892) from the plurality of antennas. The signal or the power may then be transmitted or received between the communication module 890 and the external electronic device via the selected at least one antenna. According to an embodiment, another component (e.g., a radio frequency integrated circuit (RFIC)) other than the radiating element may be additionally formed as part of the antenna module 897.
At least some of the above-described components may be coupled mutually and communicate signals (e.g., commands or data) therebetween via an inter-peripheral communication scheme (e.g., a bus, general purpose input and output (GPIO), serial peripheral interface (SPI), or mobile industry processor interface (MIPI)).
According to an embodiment, commands or data may be transmitted or received between the electronic device 801 and the external electronic device 804 via the server 808 coupled with the second network 899. Each of the electronic devices 802 and 804 may be a device of a same type as, or a different type, from the electronic device 801. According to an embodiment, all or some of operations to be executed at the electronic device 801 may be executed at one or more of the external electronic devices 802, 804, or 808. For example, if the electronic device 801 should perform a function or a service automatically, or in response to a request from a user or another device, the electronic device 801, instead of, or in addition to, executing the function or the service, may request the one or more external electronic devices to perform at least part of the function or the service. The one or more external electronic devices receiving the request may perform the at least part of the function or the service requested, or an additional function or an additional service related to the request, and transfer an outcome of the performing to the electronic device 801. The electronic device 801 may provide the outcome, with or without further processing of the outcome, as at least part of a reply to the request. To that end, a cloud computing, distributed computing, or client-server computing technology may be used, for example.
As described above, a system may include at least one communication interface, at least one processor operatively connected to the at least one communication interface, and at least one memory operatively connected to the at least one processor and storing a plurality of natural language understanding (NLU) models. The at least one memory may store instructions that, when executed, cause the processor to receive first information associated with a user from an external electronic device associated with a user account, using the at least one communication interface, to select at least one of the plurality of NLU models, based on at least part of the first information, and to transmit the selected at least one NLU model to the external electronic device, using the at least one communication interface such that the external electronic device uses the selected at least one NLU model for natural language processing.
According to an embodiment, the first information may include at least one of a voice input of the user, information of the external electronic device, or preference information of the user.
According to an embodiment, the instructions may cause the processor to select at least one of the plurality of NLU models when the number of times that the specified voice input of the user is received is not less than a specified value during a specified period.
According to an embodiment, the instructions may cause the processor to select at least one of the plurality of NLU models when the information associated with the external electronic device is changed.
According to an embodiment, the instructions may cause the processor to select at least one of the plurality of NLU models when the preference information of the user is changed.
According to an embodiment, the instructions may cause the processor to generate text data by processing voice data of the user received from the external electronic device using an automatic speech recognition (ASR) model.
According to an embodiment, the instructions may cause the processor to determine an intent corresponding to the voice input and to select the at least one NLU model based on at least one voice input corresponding to the specified intent when determining a specified intent more than a specified count.
According to an embodiment, the instructions may cause the processor to select an NLU model corresponding to the intent of a specified number.
According to an embodiment, the instructions may cause the processor to select the at least one NLU model corresponding to an intent for performing a function of a specified application program.
According to an embodiment, the instructions may cause the processor to select the at least one NLU model corresponding to an intent selected by the user.
As described above, a controlling method of a system for updating an NLU model may include receiving first information associated with a user from an external electronic device associated with a user account, selecting at least one of the plurality of NLU models, based on at least part of the first information, and transmitting the selected at least one NLU model to the external electronic device, using at least one communication interface such that the external electronic device uses the selected at least one NLU model for natural language processing.
According to an embodiment, the first information may include at least one of a voice input of the user, information of the external electronic device, or preference information of the user.
According to an embodiment, the selecting of the at least one of the plurality of NLU models may include when the number of times that the specified voice input of the user is received is not less than a specified value during a specified period, selecting at least one of the plurality of NLU models.
According to an embodiment, the selecting of the at least one of the plurality of NLU models may include selecting at least one of the plurality of NLU models when the information of the external electronic device is changed.
According to an embodiment, the selecting of the at least one of the plurality of NLU models may include selecting at least one of the plurality of NLU models when the preference information of the user is changed.
According to an embodiment, the method may further include generating text data by processing voice data of the user received from the external electronic device using an ASR model.
According to an embodiment, the selecting of the at least one of the plurality of NLU models may include determining an intent corresponding to the voice input and selecting the at least one NLU model based on at least one voice input corresponding to the specified intent when determining a specified intent more than a specified count.
According to an embodiment, the selecting of the at least one NLU model based on the at least one voice input corresponding to the specified intent may include selecting an NLU model corresponding to the intent of a specified number.
According to an embodiment, the selecting of the at least one NLU model based on the at least one voice input corresponding to the specified intent may include selecting the at least one NLU model corresponding to an intent for performing a function of a specified application program.
According to an embodiment, the selecting of the at least one NLU model based on the at least one voice input corresponding to the specified intent may include selecting the at least one NLU model corresponding to an intent determined by the user.
The electronic device according to various embodiments may be one of various types of electronic devices. The electronic devices may include, for example, a portable communication device (e.g., a smartphone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a home appliance. According to an embodiment of the disclosure, the electronic devices are not limited to those described above.
It should be appreciated that various embodiments of the present disclosure and the terms used therein are not intended to limit the technological features set forth herein to particular embodiments and include various changes, equivalents, or replacements for a corresponding embodiment. With regard to the description of the drawings, similar reference numerals may be used to refer to similar or related elements. It is to be understood that a singular form of a noun corresponding to an item may include one or more of the things, unless the relevant context clearly indicates otherwise. As used herein, each of such phrases as "A or B," "at least one of A and B," "at least one of A or B," "A, B, or C," "at least one of A, B, and C," and "at least one of A, B, or C," may include any one of, or all possible combinations of the items enumerated together in a corresponding one of the phrases. As used herein, such terms as "1st" and "2nd," or "first" and "second" may be used to simply distinguish a corresponding component from another, and does not limit the components in other aspect (e.g., importance or order). It is to be understood that if an element (e.g., a first element) is referred to, with or without the term "operatively" or "communicatively", as "coupled with," coupled to," "connected with," or "connected to" another element (e.g., a second element), it means that the element may be coupled with the other element directly (e.g., wiredly), wirelessly, or via a third element.
As used herein, the term "module" may include a unit implemented in hardware, software, or firmware, and may interchangeably be used with other terms, for example, "logic," "logic block," "part," or "circuitry". A module may be a single integral component, or a minimum unit or part thereof, adapted to perform one or more functions. For example, according to an embodiment, the module may be implemented in a form of an application-specific integrated circuit (ASIC).
Various embodiments as set forth herein may be implemented as software (e.g., the program 840) including one or more instructions that are stored in a storage medium (e.g., internal memory 836 or external memory 838) that is readable by a machine (e.g., the electronic device 801). For example, a processor(e.g., the processor 820) of the machine (e.g., the electronic device 801) may invoke at least one of the one or more instructions stored in the storage medium, and execute it, with or without using one or more other components under the control of the processor. This allows the machine to be operated to perform at least one function according to the at least one instruction invoked. The one or more instructions may include a code generated by a compiler or a code executable by an interpreter. The machine-readable storage medium may be provided in the form of a non-transitory storage medium. Wherein, the term "non-transitory" simply means that the storage medium is a tangible device, and does not include a signal (e.g., an electromagnetic wave), but this term does not differentiate between where data is semi-permanently stored in the storage medium and where the data is temporarily stored in the storage medium.
According to an embodiment, a method according to various embodiments of the disclosure may be included and provided in a computer program product. The computer program product may be traded as a product between a seller and a buyer. The computer program product may be distributed in the form of a machine-readable storage medium (e.g., compact disc read only memory (CD-ROM)), or be distributed (e.g., downloaded or uploaded) online via an application store (e.g., PlayStoreTM), or between two user devices (e.g., smart phones) directly. If distributed online, at least part of the computer program product may be temporarily generated or at least temporarily stored in the machine-readable storage medium, such as memory of the manufacturer's server, a server of the application store, or a relay server.
According to various embodiments, each component (e.g., a module or a program) of the above-described components may include a single entity or multiple entities. According to various embodiments, one or more of the above-described components may be omitted, or one or more other components may be added. Alternatively or additionally, a plurality of components (e.g., modules or programs) may be integrated into a single component. In such a case, according to various embodiments, the integrated component may still perform one or more functions of each of the plurality of components in the same or similar manner as they are performed by a corresponding one of the plurality of components before the integration. According to various embodiments, operations performed by the module, the program, or another component may be carried out sequentially, in parallel, repeatedly, or heuristically, or one or more of the operations may be executed in a different order or omitted, or one or more other operations may be added.
Claims (15)
- A server comprising:a communication interface;a processor operatively connected to the communication interface; anda memory operatively connected to the processor and configured to store a plurality of natural language understanding (NLU) models and instructions that, when executed by the processor, cause the processor to:receive first information associated with a user from an external electronic device associated with a user account using the communication interface;identify at least one of the plurality of NLU models based on at least part of the first information; andtransmit the selected at least one of the plurality of NLU models to the external electronic device using the communication interface such that the external electronic device uses the selected at least one of the plurality of NLU models for natural language processing.
- The server of claim 1, wherein the first information includes at least one of a voice input of the user, information of the external electronic device, or preference information of the user.
- The server of claim 2, wherein the memory stores instructions that, when executed by the processor, cause the processor to:based on a number of times that a specified voice input of the user is received not being less than a specified value during a specified period, select at least one of the plurality of NLU models.
- The server of claim 2, wherein the memory stores instructions that, when executed by the processor, cause the processor to:based on the information associated with the external electronic device being changed, select at least one of the plurality of NLU models.
- The server of claim 2, wherein the memory stores instructions that, when executed by the processor, cause the processor to:based on the preference information of the user being changed, select at least one of the plurality of NLU models.
- The server of claim 1, wherein the memory stores instructions that, when executed by the processor, cause the processor to:generate text data by processing voice data of the user received from the external electronic device using an automatic speech recognition (ASR) model.
- The server of claim 2, wherein the memory stores instructions that, when executed by the processor, cause the processor to:determine an intent corresponding to the voice input; andto determine a specified intent more than a specified count, select the at least one NLU model based on at least one voice input corresponding to the specified intent.
- The server of claim 7, wherein the memory stores instructions that, when executed by the processor, cause the processor to:identify an NLU model corresponding to an intent of a specified number.
- The server of claim 7, wherein the memory stores instructions that, when executed by the processor, cause the processor to:select the at least one NLU model corresponding to an intent for performing a function of a specified application program.
- The server of claim 7, wherein the memory stores instructions that, when executed by the processor, cause the processor to:select the at least one NLU model corresponding to an intent selected by the user.
- A controlling method of a system for updating an NLU model, the method comprising:receiving first information associated with a user from an external electronic device associated with a user account;identifying at least one of a plurality of natural language understanding (NLU) models based on at least part of the first information; andtransmitting the selected at least one of the plurality of NLU models to the external electronic device using at least one communication interface such that the external electronic device uses the selected at least one of the plurality of NLU models for natural language processing.
- The method of claim 11, wherein the first information includes at least one of a voice input of the user, information of the external electronic device, or preference information of the user.
- The method of claim 12, wherein the selecting of the at least one of the plurality of NLU models includes:based on a number of times that a specified voice input of the user is received not being less than a specified value during a specified period, selecting at least one of the plurality of NLU models.
- The method of claim 12, wherein the selecting of the at least one of the plurality of NLU models includes:based on the information of the external electronic device being changed, selecting at least one of the plurality of NLU models.
- The method of claim 12, wherein the selecting of the at least one of the plurality of NLU models includes:based on the preference information of the user being changed, selecting at least one of the plurality of NLU models.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201980072576.XA CN112970059B (en) | 2018-11-07 | 2019-08-05 | Electronic device for processing user utterance and control method thereof |
EP19882344.5A EP3850620B1 (en) | 2018-11-07 | 2019-08-05 | Electronic device for processing user utterance and controlling method thereof |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2018-0135771 | 2018-11-07 | ||
KR1020180135771A KR102725793B1 (en) | 2018-11-07 | Electronic apparatus for processing user utterance and controlling method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020096172A1 true WO2020096172A1 (en) | 2020-05-14 |
Family
ID=70457804
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2019/009716 WO2020096172A1 (en) | 2018-11-07 | 2019-08-05 | Electronic device for processing user utterance and controlling method thereof |
Country Status (4)
Country | Link |
---|---|
US (2) | US10699704B2 (en) |
EP (1) | EP3850620B1 (en) |
CN (1) | CN112970059B (en) |
WO (1) | WO2020096172A1 (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020096172A1 (en) * | 2018-11-07 | 2020-05-14 | Samsung Electronics Co., Ltd. | Electronic device for processing user utterance and controlling method thereof |
US11423910B2 (en) * | 2019-03-19 | 2022-08-23 | Servicenow, Inc. | Systems and method for third party natural language understanding service integration |
CN114365143A (en) * | 2019-09-04 | 2022-04-15 | 布莱恩科技有限责任公司 | Real-time morphing interface for display on a computer screen |
US12086541B2 (en) | 2019-09-04 | 2024-09-10 | Brain Technologies, Inc. | Natural query completion for a real-time morphing interface |
US11935521B2 (en) * | 2019-09-12 | 2024-03-19 | Oracle International Corporation | Real-time feedback for efficient dialog processing |
EP3980992A4 (en) * | 2019-11-01 | 2022-05-04 | Samsung Electronics Co., Ltd. | Hub device, multi-device system including the hub device and plurality of devices, and operating method of the hub device and multi-device system |
US12080289B2 (en) | 2020-12-22 | 2024-09-03 | Samsung Electronics Co., Ltd. | Electronic apparatus, system comprising electronic apparatus and server and controlling method thereof |
US11978449B2 (en) * | 2021-03-02 | 2024-05-07 | Samsung Electronics Co., Ltd. | Electronic device for processing user utterance and operation method therefor |
KR20220123932A (en) * | 2021-03-02 | 2022-09-13 | 삼성전자주식회사 | Electronic device for processing user utterance and method for thereof |
CN114005443A (en) * | 2021-10-28 | 2022-02-01 | 北京欧珀通信有限公司 | Equipment control method, device, server and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002091477A (en) * | 2000-09-14 | 2002-03-27 | Mitsubishi Electric Corp | Voice recognition system, voice recognition device, acoustic model control server, language model control server, voice recognition method and computer readable recording medium which records voice recognition program |
WO2005010868A1 (en) * | 2003-07-29 | 2005-02-03 | Mitsubishi Denki Kabushiki Kaisha | Voice recognition system and its terminal and server |
US8949266B2 (en) * | 2007-03-07 | 2015-02-03 | Vlingo Corporation | Multiple web-based content category searching in mobile search application |
KR101694011B1 (en) * | 2015-06-17 | 2017-01-06 | 포항공과대학교 산학협력단 | Method and apparatus for voice dialogue on multiple application softwares |
US20180068663A1 (en) * | 2016-09-07 | 2018-03-08 | Samsung Electronics Co., Ltd. | Server and method for controlling external device |
Family Cites Families (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9076448B2 (en) | 1999-11-12 | 2015-07-07 | Nuance Communications, Inc. | Distributed real time speech recognition system |
AU2001250050A1 (en) | 2000-03-24 | 2001-10-08 | Eliza Corporation | Remote server object architecture for speech recognition |
ATE439665T1 (en) * | 2005-11-25 | 2009-08-15 | Swisscom Ag | METHOD FOR PERSONALIZING A SERVICE |
WO2008004663A1 (en) | 2006-07-07 | 2008-01-10 | Nec Corporation | Language model updating device, language model updating method, and language model updating program |
WO2013150526A1 (en) | 2012-04-02 | 2013-10-10 | Dixilang Ltd. | A client-server architecture for automatic speech recognition applications |
US9070366B1 (en) * | 2012-12-19 | 2015-06-30 | Amazon Technologies, Inc. | Architecture for multi-domain utterance processing |
US9818407B1 (en) | 2013-02-07 | 2017-11-14 | Amazon Technologies, Inc. | Distributed endpointing for speech recognition |
US9361289B1 (en) | 2013-08-30 | 2016-06-07 | Amazon Technologies, Inc. | Retrieval and management of spoken language understanding personalization data |
US9530416B2 (en) * | 2013-10-28 | 2016-12-27 | At&T Intellectual Property I, L.P. | System and method for managing models for embedded speech and language processing |
US10978052B2 (en) | 2014-04-16 | 2021-04-13 | Facebook, Inc. | Email-like user interface for training natural language systems |
US10726831B2 (en) * | 2014-05-20 | 2020-07-28 | Amazon Technologies, Inc. | Context interpretation in natural language processing using previous dialog acts |
KR102225404B1 (en) * | 2014-05-23 | 2021-03-09 | 삼성전자주식회사 | Method and Apparatus of Speech Recognition Using Device Information |
US20150371628A1 (en) * | 2014-06-23 | 2015-12-24 | Harman International Industries, Inc. | User-adapted speech recognition |
US9508339B2 (en) * | 2015-01-30 | 2016-11-29 | Microsoft Technology Licensing, Llc | Updating language understanding classifier models for a digital personal assistant based on crowd-sourcing |
US10152968B1 (en) * | 2015-06-26 | 2018-12-11 | Iconics, Inc. | Systems and methods for speech-based monitoring and/or control of automation devices |
KR102386863B1 (en) * | 2015-09-09 | 2022-04-13 | 삼성전자주식회사 | User-based language model generating apparatus, method and voice recognition apparatus |
US10360910B2 (en) * | 2016-08-29 | 2019-07-23 | Garmin Switzerland Gmbh | Automatic speech recognition (ASR) utilizing GPS and sensor data |
KR102691541B1 (en) * | 2016-12-19 | 2024-08-02 | 삼성전자주식회사 | Method and Apparatus for Voice Recognition |
CN108665890B (en) * | 2017-03-28 | 2023-10-13 | 三星电子株式会社 | Method for operating voice recognition service, electronic device and system supporting the same |
US11170768B2 (en) * | 2017-04-17 | 2021-11-09 | Samsung Electronics Co., Ltd | Device for performing task corresponding to user utterance |
KR102414122B1 (en) * | 2017-04-19 | 2022-06-29 | 삼성전자주식회사 | Electronic device for processing user utterance and method for operation thereof |
US10504513B1 (en) * | 2017-09-26 | 2019-12-10 | Amazon Technologies, Inc. | Natural language understanding with affiliated devices |
US10685669B1 (en) * | 2018-03-20 | 2020-06-16 | Amazon Technologies, Inc. | Device selection from audio data |
US10755706B2 (en) * | 2018-03-26 | 2020-08-25 | Midea Group Co., Ltd. | Voice-based user interface with dynamically switchable endpoints |
US10714085B2 (en) * | 2018-06-06 | 2020-07-14 | Amazon Technologies, Inc. | Temporary account association with voice-enabled devices |
WO2020096172A1 (en) * | 2018-11-07 | 2020-05-14 | Samsung Electronics Co., Ltd. | Electronic device for processing user utterance and controlling method thereof |
-
2019
- 2019-08-05 WO PCT/KR2019/009716 patent/WO2020096172A1/en unknown
- 2019-08-05 EP EP19882344.5A patent/EP3850620B1/en active Active
- 2019-08-05 CN CN201980072576.XA patent/CN112970059B/en active Active
- 2019-08-08 US US16/536,226 patent/US10699704B2/en active Active
-
2020
- 2020-06-29 US US16/946,604 patent/US11538470B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002091477A (en) * | 2000-09-14 | 2002-03-27 | Mitsubishi Electric Corp | Voice recognition system, voice recognition device, acoustic model control server, language model control server, voice recognition method and computer readable recording medium which records voice recognition program |
WO2005010868A1 (en) * | 2003-07-29 | 2005-02-03 | Mitsubishi Denki Kabushiki Kaisha | Voice recognition system and its terminal and server |
US8949266B2 (en) * | 2007-03-07 | 2015-02-03 | Vlingo Corporation | Multiple web-based content category searching in mobile search application |
KR101694011B1 (en) * | 2015-06-17 | 2017-01-06 | 포항공과대학교 산학협력단 | Method and apparatus for voice dialogue on multiple application softwares |
US20180068663A1 (en) * | 2016-09-07 | 2018-03-08 | Samsung Electronics Co., Ltd. | Server and method for controlling external device |
Non-Patent Citations (1)
Title |
---|
See also references of EP3850620A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP3850620A4 (en) | 2021-12-08 |
KR20200052612A (en) | 2020-05-15 |
EP3850620B1 (en) | 2024-07-10 |
US10699704B2 (en) | 2020-06-30 |
EP3850620C0 (en) | 2024-07-10 |
US11538470B2 (en) | 2022-12-27 |
US20200335094A1 (en) | 2020-10-22 |
CN112970059B (en) | 2023-11-24 |
CN112970059A (en) | 2021-06-15 |
US20200143798A1 (en) | 2020-05-07 |
EP3850620A1 (en) | 2021-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020096172A1 (en) | Electronic device for processing user utterance and controlling method thereof | |
WO2021025350A1 (en) | Electronic device managing plurality of intelligent agents and operation method thereof | |
WO2020122677A1 (en) | Method of performing function of electronic device and electronic device using same | |
WO2021075736A1 (en) | Electronic device and method for sharing voice command thereof | |
WO2020032563A1 (en) | System for processing user voice utterance and method for operating same | |
WO2020040595A1 (en) | Electronic device for processing user utterance, and control method therefor | |
WO2021060728A1 (en) | Electronic device for processing user utterance and method for operating same | |
WO2020091248A1 (en) | Method for displaying content in response to speech command, and electronic device therefor | |
WO2020032443A1 (en) | Electronic device supporting personalized device connection and method thereof | |
WO2020197263A1 (en) | Electronic device and multitasking supporting method thereof | |
WO2020180034A1 (en) | Method and device for providing user-selection-based information | |
WO2021075716A1 (en) | Electronic device supporting improved speech recognition | |
WO2019190062A1 (en) | Electronic device for processing user voice input | |
WO2021187901A1 (en) | Method for controlling external device based on voice and electronic device thereof | |
WO2020167006A1 (en) | Method of providing speech recognition service and electronic device for same | |
WO2020080635A1 (en) | Electronic device for performing voice recognition using microphones selected on basis of operation state, and operation method of same | |
WO2019059642A1 (en) | Method for providing natural language expression and electronic device supporting same | |
WO2020180008A1 (en) | Method for processing plans having multiple end points and electronic device applying the same method | |
WO2020085784A1 (en) | Electronic device and system which provides service based on voice recognition | |
WO2020209661A1 (en) | Electronic device for generating natural language response and method thereof | |
WO2020101389A1 (en) | Electronic device for displaying voice recognition-based image | |
WO2021101276A1 (en) | Electronic device for providing intelligent assistance service and operating method thereof | |
WO2022092901A1 (en) | Electronic device for processing user utterance, and control method of same electronic device | |
WO2020171545A1 (en) | Electronic device and system for processing user input and method thereof | |
WO2021045406A1 (en) | Electronic device configured to perform action using speech recognition function and method for providing notification related to action using same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19882344 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019882344 Country of ref document: EP Effective date: 20210413 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |