WO2021158040A1 - Dispositif électronique fournissant un énoncé correspondant au contexte d'une conversation, et procédé d'utilisation associé - Google Patents

Dispositif électronique fournissant un énoncé correspondant au contexte d'une conversation, et procédé d'utilisation associé Download PDF

Info

Publication number
WO2021158040A1
WO2021158040A1 PCT/KR2021/001490 KR2021001490W WO2021158040A1 WO 2021158040 A1 WO2021158040 A1 WO 2021158040A1 KR 2021001490 W KR2021001490 W KR 2021001490W WO 2021158040 A1 WO2021158040 A1 WO 2021158040A1
Authority
WO
WIPO (PCT)
Prior art keywords
output
module
electronic device
token
transformation module
Prior art date
Application number
PCT/KR2021/001490
Other languages
English (en)
Korean (ko)
Inventor
김도현
김태훈
최승혁
Original Assignee
삼성전자 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼성전자 주식회사 filed Critical 삼성전자 주식회사
Publication of WO2021158040A1 publication Critical patent/WO2021158040A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • One embodiment relates to an electronic device that provides an utterance corresponding to the context of a conversation, and an operating method thereof.
  • the electronic device may provide a voice recognition service.
  • the voice recognition service receives a user's utterance, provides a response corresponding to the received user's utterance as voice (or text), or provides a function corresponding to the user's utterance (eg, alarm reservation, place recommendation) It may be a service that
  • the voice recognition service there are a voice recognition assistant service and a chatbot service.
  • the electronic device In order to provide a smooth voice recognition service, the electronic device must accurately recognize the user's utterance and provide a response corresponding to the user's utterance to the user.
  • An electronic device includes a memory for storing instructions, and a processor including a first conversion module and a second conversion module, wherein the processor executes the instructions, and the electronic device obtain, based on the first transform module, at least one intermediate output for the input, and based on the second transform module, obtain a final output for the at least one intermediate output, and the second output is configured to perform a function of the electronic device based on A first output based on a vector represented, a second output based on a vector represented by each of at least one token positioned in the same row as the first token, and a second output represented by each of the at least one token positioned in the same column as the first token It may include a third output based on a vector, or an output of a combination thereof.
  • the method of operating an electronic device includes: acquiring at least one intermediate output for an input based on a first conversion module of the electronic device; based on a second conversion module of the electronic device; obtaining a final output for at least one intermediate output, and performing a function of the electronic device based on the final output, wherein the at least one intermediate output is at least one of the first conversion module
  • a first output based on a vector indicated by a first token at a predetermined position among outputs of a layer with a predetermined order number among one layer, and a second output based on a vector indicated by each of at least one token located in the same row as the first token
  • the second output may include an output of a third output based on a vector indicated by each of at least one token positioned in the same column as the first token, or a combination thereof.
  • An electronic device and an operating method thereof obtain various outputs from one conversion module, and obtain a final output from another conversion module based on the obtained various outputs, thereby providing a response related to the overall context of a conversation can be provided to users.
  • An electronic device and an operating method thereof provide a response related to the context of the overall conversation to a user by using information on a feature of a conversation that is not used as an input in one transformation module in another transformation module can do.
  • FIG. 1 is a block diagram of an electronic device in a network environment according to an embodiment.
  • FIG. 2 is a block diagram of a processor according to an embodiment.
  • FIG 3 illustrates an example of an input input to the first conversion module according to an embodiment.
  • FIG 4 shows an example of processing for an input in the first transform module according to an embodiment.
  • 5A is a block diagram of a second transform module according to an embodiment.
  • 5B is a block diagram of a second transform module according to an embodiment.
  • 5C is a block diagram of a second transform module according to an embodiment.
  • 5D is a block diagram of a second transform module according to an embodiment.
  • FIG. 6 is a flowchart illustrating an operation of performing a function corresponding to an input in an electronic device according to an exemplary embodiment.
  • FIG. 7 is a flowchart illustrating an operation of learning transformation modules in an electronic device according to an embodiment.
  • FIG. 1 is a block diagram of an electronic device 101 in a network environment 100, according to an embodiment.
  • an electronic device 101 communicates with an electronic device 102 through a first network 198 (eg, a short-range wireless communication network) or a second network 199 . It may communicate with the electronic device 104 or the server 108 through (eg, a long-distance wireless communication network). According to an embodiment, the electronic device 101 may communicate with the electronic device 104 through the server 108 .
  • a first network 198 eg, a short-range wireless communication network
  • a second network 199 e.g., a second network 199
  • the electronic device 101 may communicate with the electronic device 104 through the server 108 .
  • the electronic device 101 includes a processor 120 , a memory 130 , an input device 150 , a sound output device 155 , a display device 160 , an audio module 170 , and a sensor module ( 176 , interface 177 , haptic module 179 , camera module 180 , power management module 188 , battery 189 , communication module 190 , subscriber identification module 196 , or antenna module 197 . ) may be included. In some embodiments, at least one of these components (eg, the display device 160 or the camera module 180 ) may be omitted or one or more other components may be added to the electronic device 101 . In some embodiments, some of these components may be implemented as a single integrated circuit. For example, the sensor module 176 (eg, a fingerprint sensor, an iris sensor, or an illuminance sensor) may be implemented while being embedded in the display device 160 (eg, a display).
  • the sensor module 176 eg, a fingerprint sensor, an iris sensor, or an illumina
  • the processor 120 for example, executes software (eg, the program 140) to execute at least one other component (eg, a hardware or software component) of the electronic device 101 connected to the processor 120 . It can control and perform various data processing or operations. According to an embodiment, as at least part of data processing or operation, the processor 120 converts commands or data received from other components (eg, the sensor module 176 or the communication module 190) to the volatile memory 132 . may be loaded into the volatile memory 132 , process commands or data stored in the volatile memory 132 , and store the resulting data in the non-volatile memory 134 .
  • software eg, the program 140
  • the processor 120 converts commands or data received from other components (eg, the sensor module 176 or the communication module 190) to the volatile memory 132 .
  • the volatile memory 132 may be loaded into the volatile memory 132 , process commands or data stored in the volatile memory 132 , and store the resulting data in the non-volatile memory 134 .
  • the processor 120 includes a main processor 121 (eg, a central processing unit or an application processor), and an auxiliary processor 123 (eg, a graphic processing unit or an image signal processor) that can be operated independently or together with the main processor 121 . , a sensor hub processor, or a communication processor). Additionally or alternatively, the auxiliary processor 123 may be configured to use less power than the main processor 121 or to be specialized for a designated function. The auxiliary processor 123 may be implemented separately from or as a part of the main processor 121 .
  • a main processor 121 eg, a central processing unit or an application processor
  • an auxiliary processor 123 eg, a graphic processing unit or an image signal processor
  • the auxiliary processor 123 may be configured to use less power than the main processor 121 or to be specialized for a designated function.
  • the auxiliary processor 123 may be implemented separately from or as a part of the main processor 121 .
  • the auxiliary processor 123 may be, for example, on behalf of the main processor 121 while the main processor 121 is in an inactive (eg, sleep) state, or when the main processor 121 is active (eg, executing an application). ), together with the main processor 121, at least one of the components of the electronic device 101 (eg, the display device 160, the sensor module 176, or the communication module 190) It is possible to control at least some of the related functions or states.
  • the auxiliary processor 123 eg, an image signal processor or a communication processor
  • the memory 130 may store various data used by at least one component (eg, the processor 120 or the sensor module 176 ) of the electronic device 101 .
  • the data may include, for example, input data or output data for software (eg, the program 140 ) and instructions related thereto.
  • the memory 130 may include a volatile memory 132 or a non-volatile memory 134 .
  • the program 140 may be stored as software in the memory 130 , and may include, for example, an operating system 142 , middleware 144 , or an application 146 .
  • the input device 150 may receive a command or data to be used by a component (eg, the processor 120 ) of the electronic device 101 from the outside (eg, a user) of the electronic device 101 .
  • the input device 150 may include, for example, a microphone, a mouse, a keyboard, or a digital pen (eg, a stylus pen).
  • the sound output device 155 may output a sound signal to the outside of the electronic device 101 .
  • the sound output device 155 may include, for example, a speaker or a receiver.
  • the speaker can be used for general purposes such as multimedia playback or recording playback, and the receiver can be used to receive incoming calls. According to an embodiment, the receiver may be implemented separately from or as a part of the speaker.
  • the display device 160 may visually provide information to the outside (eg, a user) of the electronic device 101 .
  • the display device 160 may include, for example, a display, a hologram device, or a projector and a control circuit for controlling the corresponding device.
  • the display device 160 may include a touch circuitry configured to sense a touch or a sensor circuit (eg, a pressure sensor) configured to measure the intensity of a force generated by the touch. there is.
  • the audio module 170 may convert a sound into an electric signal or, conversely, convert an electric signal into a sound. According to an embodiment, the audio module 170 acquires a sound through the input device 150 , or an external electronic device (eg, a sound output device 155 ) connected directly or wirelessly with the electronic device 101 . The sound may be output through the electronic device 102 (eg, a speaker or a headphone).
  • an external electronic device eg, a sound output device 155
  • the sound may be output through the electronic device 102 (eg, a speaker or a headphone).
  • the sensor module 176 detects an operating state (eg, power or temperature) of the electronic device 101 or an external environmental state (eg, user state), and generates an electrical signal or data value corresponding to the sensed state. can do.
  • the sensor module 176 may include, for example, a gesture sensor, a gyro sensor, a barometric pressure sensor, a magnetic sensor, an acceleration sensor, a grip sensor, a proximity sensor, a color sensor, an IR (infrared) sensor, a biometric sensor, It may include a temperature sensor, a humidity sensor, or an illuminance sensor.
  • the interface 177 may support one or more specified protocols that may be used by the electronic device 101 to directly or wirelessly connect with an external electronic device (eg, the electronic device 102 ).
  • the interface 177 may include, for example, a high definition multimedia interface (HDMI), a universal serial bus (USB) interface, an SD card interface, or an audio interface.
  • the connection terminal 178 may include a connector through which the electronic device 101 can be physically connected to an external electronic device (eg, the electronic device 102 ).
  • the connection terminal 178 may include, for example, an HDMI connector, a USB connector, an SD card connector, or an audio connector (eg, a headphone connector).
  • the haptic module 179 may convert an electrical signal into a mechanical stimulus (eg, vibration or movement) or an electrical stimulus that the user can perceive through tactile or kinesthetic sense.
  • the haptic module 179 may include, for example, a motor, a piezoelectric element, or an electrical stimulation device.
  • the camera module 180 may capture still images and moving images. According to an embodiment, the camera module 180 may include one or more lenses, image sensors, image signal processors, or flashes.
  • the power management module 188 may manage power supplied to the electronic device 101 .
  • the power management module 188 may be implemented as, for example, at least a part of a power management integrated circuit (PMIC).
  • PMIC power management integrated circuit
  • the battery 189 may supply power to at least one component of the electronic device 101 .
  • the battery 189 may include, for example, a non-rechargeable primary cell, a rechargeable secondary cell, or a fuel cell.
  • the communication module 190 is a direct (eg, wired) communication channel or a wireless communication channel between the electronic device 101 and an external electronic device (eg, the electronic device 102, the electronic device 104, or the server 108). It can support establishment and communication through the established communication channel.
  • the communication module 190 may include one or more communication processors that operate independently of the processor 120 (eg, an application processor) and support direct (eg, wired) communication or wireless communication.
  • the communication module 190 is a wireless communication module 192 (eg, a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module) or a wired communication module 194 (eg, : It may include a local area network (LAN) communication module, or a power line communication module).
  • a wireless communication module 192 eg, a cellular communication module, a short-range wireless communication module, or a global navigation satellite system (GNSS) communication module
  • GNSS global navigation satellite system
  • wired communication module 194 eg, : It may include a local area network (LAN) communication module, or a power line communication module.
  • a corresponding communication module is a first network 198 (eg, a short-range communication network such as Bluetooth, WiFi direct, or IrDA (infrared data association)) or a second network 199 (eg, a cellular network, the Internet, or It may communicate with an external electronic device via a computer network (eg, a telecommunication network such as a LAN or WAN).
  • a computer network eg, a telecommunication network such as a LAN or WAN.
  • These various types of communication modules may be integrated into one component (eg, a single chip) or may be implemented as a plurality of components (eg, multiple chips) separate from each other.
  • the wireless communication module 192 uses the subscriber information (eg, International Mobile Subscriber Identifier (IMSI)) stored in the subscriber identification module 196 within a communication network such as the first network 198 or the second network 199 .
  • the electronic device 101 may be identified and authenticated.
  • the antenna module 197 may transmit or receive a signal or power to the outside (eg, an external electronic device).
  • the antenna module may include one antenna including a conductor formed on a substrate (eg, a PCB) or a radiator formed of a conductive pattern.
  • the antenna module 197 may include a plurality of antennas. In this case, at least one antenna suitable for a communication method used in a communication network such as the first network 198 or the second network 199 is connected from the plurality of antennas by, for example, the communication module 190 . can be selected. A signal or power may be transmitted or received between the communication module 190 and an external electronic device through the selected at least one antenna.
  • other components eg, RFIC
  • other than the radiator may be additionally formed as a part of the antenna module 197 .
  • peripheral devices eg, a bus, general purpose input and output (GPIO), serial peripheral interface (SPI), or mobile industry processor interface (MIPI)
  • GPIO general purpose input and output
  • SPI serial peripheral interface
  • MIPI mobile industry processor interface
  • the command or data may be transmitted or received between the electronic device 101 and the external electronic device 104 through the server 108 connected to the second network 199 .
  • Each of the electronic devices 102 and 104 may be the same or a different type of the electronic device 101 .
  • all or part of the operations performed by the electronic device 101 may be executed by one or more of the external electronic devices 102 , 104 , or 108 .
  • the electronic device 101 may perform the function or service itself instead of executing the function or service itself.
  • one or more external electronic devices may be requested to perform at least a part of the function or the service.
  • the one or more external electronic devices that have received the request may execute at least a part of the requested function or service, or an additional function or service related to the request, and transmit a result of the execution to the electronic device 101 .
  • the electronic device 101 may process the result as it is or additionally and provide it as at least a part of a response to the request.
  • cloud computing, distributed computing, or client-server computing technology may be used.
  • An electronic device may be a device of various types.
  • the electronic device may include, for example, a portable communication device (eg, a smart phone), a computer device, a portable multimedia device, a portable medical device, a camera, a wearable device, or a home appliance device.
  • a portable communication device eg, a smart phone
  • a computer device e.g., a smart phone
  • a portable multimedia device e.g., a portable medical device
  • a camera e.g., a camera
  • a wearable device e.g., a smart bracelet
  • first”, “second”, or “first” or “second” may simply be used to distinguish the component from other components in question, and may refer to components in other aspects (e.g., importance or order) is not limited. It is said that one (eg, first) component is “coupled” or “connected” to another (eg, second) component, with or without the terms “functionally” or “communicatively”. When referenced, it means that one component can be connected to the other component directly (eg by wire), wirelessly, or through a third component.
  • module may include a unit implemented in hardware, software, or firmware, and may be used interchangeably with terms such as, for example, logic, logic block, component, or circuit.
  • a module may be an integrally formed part or a minimum unit or a part of the part that performs one or more functions.
  • the module may be implemented in the form of an application-specific integrated circuit (ASIC).
  • ASIC application-specific integrated circuit
  • One or more instructions stored in a storage medium may be implemented as software (eg, the program 140) including
  • the processor eg, the processor 120
  • the device eg, the electronic device 101
  • the one or more instructions may include code generated by a compiler or code executable by an interpreter.
  • the device-readable storage medium may be provided in the form of a non-transitory storage medium.
  • 'non-transitory' only means that the storage medium is a tangible device and does not contain a signal (eg, electromagnetic wave), and this term is used in cases where data is semi-permanently stored in the storage medium and It does not distinguish between temporary storage cases.
  • a signal eg, electromagnetic wave
  • the method according to an embodiment disclosed in this document may be provided by being included in a computer program product.
  • Computer program products may be traded between sellers and buyers as commodities.
  • the computer program product is distributed in the form of a machine-readable storage medium (eg compact disc read only memory (CD-ROM)), or via an application store (eg Play Store TM ) or on two user devices ( It can be distributed (eg downloaded or uploaded) directly, online between smartphones (eg: smartphones).
  • a part of the computer program product may be temporarily stored or temporarily created in a machine-readable storage medium such as a memory of a server of a manufacturer, a server of an application store, or a relay server.
  • each component eg, a module or a program of the above-described components may include a singular or a plurality of entities.
  • one or more components or operations among the above-described corresponding components may be omitted, or one or more other components or operations may be added.
  • a plurality of components eg, a module or a program
  • the integrated component may perform one or more functions of each component of the plurality of components identically or similarly to those performed by the corresponding component among the plurality of components prior to the integration. .
  • operations performed by a module, program, or other component are executed sequentially, in parallel, repeatedly, or heuristically, or one or more of the operations are executed in a different order, omitted, or , or one or more other operations may be added.
  • FIG. 2 is a block diagram of a processor 200 according to an embodiment.
  • 3 illustrates an example of an input 300 input to the first conversion module 220 according to an embodiment.
  • 4 shows an example of processing for the input 300 in the first transformation module 220 according to an embodiment.
  • 5A to 5D are block diagrams of the second transformation module 240 according to an embodiment.
  • the processor 200 may correspond to the processor 120 of FIG. 1 .
  • the processor 200 includes a data augmentation module 205 , a pre-processing module 210 , a first transformation module 220 , a feature extraction module 230 , a second transformation module 240 , or these Combinations may be included.
  • the first transformation module 220 is a transformation module based on Embeddings from Language Model (ELMo), Generative Pre-Training (GPT), XLNet, Bidirectional Encoder Representations from Transformers (BERT), or a combination thereof.
  • ELMo Language Model
  • GTT Generative Pre-Training
  • XLNet XLNet
  • BET Bidirectional Encoder Representations from Transformers
  • the data augmentation module 205 , the pre-processing module 210 , the first transformation module 220 , the feature extraction module 230 , the second transformation module 240 , or a combination thereof is the processor 200 .
  • the data augmentation module 205 , the pre-processing module 210 , the first transformation module 220 , the feature extraction module 230 , the second transformation module 240 , or a combination thereof is the processor 200 .
  • the processor 200 can be implemented as an executable program.
  • the data augmentation module 205 , the preprocessing module 210 , the first transformation module 220 , the feature extraction module 230 , the second transformation module 240 , or a combination thereof may be implemented as a program.
  • the data for the data augmentation module 205 , the preprocessing module 210 , the first transformation module 220 , the feature extraction module 230 , the second transformation module 240 , or a combination thereof is stored in a memory (eg: may be stored in the memory 130 of FIG. 1 .
  • the processor 200 may train the first transform module 220 and the second transform module 240 based on corpus data. In an embodiment, the processor 200 may train the second transformation module 240 after learning the first transformation module 220 based on the corpus data. In an embodiment, the processor 200 does not train the learned first transformation module 220 based on the error derived through the second transformation module 240 when learning the second transformation module 240 . may not be In an embodiment, learning may be changing weights of at least one of the transform modules 220 and 240 . In an embodiment, the corpus data may be data obtained in advance. In an embodiment, the corpus data is labeled based on unlabeled text data (eg, Wikipedia, news) and a specific task (eg, utterance prediction) text data (eg, conversation log). ), or a combination thereof.
  • unlabeled text data eg, Wikipedia, news
  • a specific task eg, utterance prediction
  • the processor 200 pre-trains the first transformation module 220 based on unclassified text data, and then based on the text data classified based on a specific task. , it is possible to fine-tune the pre-learned first transformation module 220 .
  • pre-learning and fine-tuning may be an example of learning.
  • the processor 200 when the learning of the first transformation module 220 is completed (that is, when the pre-learning and fine-tuning is completed), based on the text data classified based on a specific task, 2
  • the transformation module 240 may be trained.
  • the processor 200 may learn the second transformation module 240 by using the output of the first transformation module 220 .
  • the learning of the second transform module 240 uses the learned first transform module 220 , and thus may also be referred to as transfer learning.
  • the processor 200 when the first transformation module 220 and the second transformation module 240 are learned, the learned first transformation module 220 and the learned second transformation module 240 ) to obtain a final output (eg, a response corresponding to the input) for an input (eg, voice, text, or a combination thereof).
  • a final output eg, a response corresponding to the input
  • an input eg, voice, text, or a combination thereof.
  • the processor 200 may perform a function of the electronic device 101 based on the obtained final output.
  • the function of the electronic device 101 may be to provide an output (eg, a response corresponding to an input) to the user.
  • the processor 200 uses an audio output device (sound output device 155 of FIG. 1 ), a display device (display device 160 of FIG. 1 ), or a combination thereof to output (eg, : a response corresponding to the input) can be provided to the user.
  • the processor 200 displays a visual object representing "How about watching a movie?” to the display device ( 160 , and/or a voice signal indicating “How about watching a movie?” may be output through the sound output device 155 .
  • the data augmentation module 205 may augment data to be input to the first transformation module 220 based on the corpus data during learning.
  • Table 1 below may represent an example of corpus data.
  • Table 1 may represent corpus data representing a conversation between a speaker A and a speaker B.
  • the data augmentation module 205 selects a predetermined first number of consecutive conversations to be input to the pre-processing module 210 by selecting the correct or incorrect correct conversations with the selected series of conversations. data can be created.
  • the first predetermined number may be an integer of 1 or more. In an embodiment, the first predetermined number may be 3, 5, 7, and/or 8.
  • the data augmentation module 205 may include: 1st to 3rd conversations, 2nd to 4th conversations, 3rd to 5th conversations, 4th to 6th conversations, and fifth to seventh conversations.
  • the data augmentation module 205 may generate data obtained by combining the selected dialogues with one correct dialogue and a second number of incorrect dialogues previously specified in the selected dialogues.
  • the data augmentation module 205 may generate data to be input to the preprocessing module 210 by combining the correct answer dialogue (ie, the 4th dialogue) with the 1st to 3rd dialogues.
  • the data augmentation module 205 may be configured to be input to the first transformation module 220 by combining the incorrect answer dialogue (eg, the 6th dialogue, or a newly generated random sentence) with the 1st to 3rd dialogues.
  • data can be generated.
  • the second number may be a natural number of 1 or more. In an embodiment, the second number may be four. In an embodiment, when the second number is 4, a generation ratio between data combining correct conversations and data combining incorrect conversations may be 1 to 4.
  • the selected conversations may be referred to as a first sentence.
  • a dialogue (eg, a correct answer or incorrect answer) coupled to the selected conversations may be referred to as a second sentence.
  • the data augmentation module 205 may provide the generated data to the preprocessing module 210 .
  • the pre-processing module 210 may process input data (eg, data provided from the data augmentation module 205 and/or input).
  • the input may be data obtained by combining a predetermined first number (eg, 3) of conversations between the user and the electronic device 101 and candidate conversations.
  • the candidate conversation may be a conversation selected from data including pre-obtained candidate conversations.
  • the candidate conversation may be a conversation that is likely to be used as a response to the user's last conversation.
  • the user's utterance in a conversation between the user and the electronic device 101 includes an input device (eg, the input device 150 of FIG. 1 ), an audio module (eg, the audio module 170 of FIG.
  • the utterance of the electronic device 101 in a conversation between the user and the electronic device 101 may be a response to a previous utterance of the user.
  • the utterance of the electronic device 101 may be a response finally selected based on a final output among a plurality of candidate conversations for a conversation between the user and the electronic device 101 .
  • Table 2 below may represent a conversation between a user of the electronic device 101 and the electronic device 101 .
  • the input when a conversation is made between the user and the electronic device 101, the input may be data obtained by combining a predetermined first number of conversations (eg, first to third conversations) and candidate conversations.
  • the candidate conversation may include "How about watching a movie?", "Make a bucket list of things you want to do this time.”, or a combination thereof.
  • the first number of conversations may be referred to as a first sentence.
  • a candidate conversation that is combined with the first number of conversations may be referred to as a second sentence.
  • the pre-processing module 210 may process the input data based on a typo of the input data. In an embodiment, the preprocessing module 210 may correct the identified typo when a typo is identified in at least one dialogue of the input data. For example, if the data provided from the data augmentation module 205 includes the fifth dialog of Table 1 (ie, “same. without swtiching user.”), the preprocessing module 210 may The typo in "swtiching" can be corrected to "switching".
  • the pre-processing module 210 may process the input data based on the number of semantic units included in the input data. In an embodiment, when the number of semantic units included in the input data exceeds a predetermined third number (eg, 256), the pre-processing module 210 sets the semantic units as much as the exceeded number of semantic units from the input data. can be removed In an embodiment, the pre-processing module 210 may process the input data by limiting the number of semantic units included in the input data to a predetermined third number (eg, 256). In an embodiment, the predetermined third number (eg, 256 pieces) may correspond to the number of tokens that can be processed by the first conversion module 220 .
  • a predetermined third number eg, 256 pieces
  • the predetermined third number (eg, 256) may increase or decrease according to the number of tokens that can be processed by the first conversion module 220 .
  • the semantic unit is an entity that performs a function of differentiating meaning in a sentence (eg, a word, a predefined string (eg, a dot (.), comma (,), question mark (?), semicolon (;)) , or, an exclamation point (!)), or a suffix (eg, -ing, or -ed)).
  • the pre-processing module 210 may remove some of the dialogues included in the input data. In an embodiment, some of the conversations that are removed from among the conversations included in the input data may be conversations that have preceded the order.
  • the preprocessing module 210 may provide the processed data to the first transformation module 220 , the feature extraction module 230 , or a combination thereof.
  • the first conversion module 220 may generate an output based on the data provided from the pre-processing module 210 .
  • the output of the first conversion module 220 may also be referred to as an intermediate output, and/or an intermediate input.
  • the first transformation module 220 may generate the first intermediate input 300 based on data provided from the preprocessing module 210 . In an embodiment, the first conversion module 220 may generate an output corresponding to the first intermediate input 300 .
  • the first intermediate input 300 may be configured with a third predetermined number (eg, 256) of tokens 301 to 317 .
  • each of the tokens 301 to 317 may be a vector having a predetermined fourth number (eg, 768) dimension.
  • the predetermined fourth number may be increased or decreased according to the degree to which the semantic unit is subdivided.
  • tokens 301 - 317 may be the sum of token embeddings 330 , segment embeddings 350 , and position embeddings 370 .
  • the tokens 301 to 309 may constitute a first sentence, and the tokens 311 to 317 may constitute a second sentence.
  • each token embedding (TE CLS , TE 1 , TE 2 , TE n-2 , TE SEP , TE n , TE m-3 , TE m-2 , and TE SEP ) may indicate a vector allocated to each of the semantic units included in the data provided from the preprocessing module 210 .
  • m may correspond to a third predetermined number.
  • n may be a natural number between 1 and m.
  • the token embedding may be a vector for indicating the degree of relevance between the first sentence and the second sentence included in the data provided from the preprocessing module 210 .
  • the token embedding (TE SEP ) may be a vector for indicating the completion of a sentence (eg, a first sentence or a second sentence).
  • the token embeddings TE 1 , TE 2 , TE n-2 , TE n , TE m-3 , and TE m-2 are semantic units included in the data provided from the preprocessing module 210 , respectively. It may be a vector assigned to .
  • the token embedding (TE 1 ) is a vector assigned to “hi”
  • token embedding (TE 2 ) has a vector assigned to “everyone”
  • token embedding (TE n-2 ) has a vector assigned to “account”
  • token embedding (TE n-2 ) has a vector assigned to “account”
  • a token embedding (TE m-3 ) has a vector assigned to "same”
  • a token embedding (TE m-2 ) has a vector assigned to "computer”
  • each of the segment embeddings SE A and SE B constituting the segment embeddings 350 is a vector representing a first sentence included in the data provided from the preprocessing module 210, or a second sentence. It can have a vector representing
  • each of the position embeddings PE 0 to PE m-1 constituting the position embeddings 370 may have a vector allocated according to the respective order of the tokens 301 to 317 .
  • the predetermined number of times may be one or more times (eg, 12 times).
  • the self-focused processing is a process of obtaining a query, a key, and a value of each of the tokens (eg, tokens 301 to 317 ), tokens (eg: A process of obtaining a self-focused vector of each of the tokens (eg, tokens 301 to 317) using a query, key, and value of each of the tokens 301 to 317), and the tokens (eg, token) and obtaining an output based on each of the magnetic concentration vectors 301 to 317).
  • the predetermined number of times may correspond to the number of layers of the first transformation module 220 .
  • the process of obtaining a query, key, and value of each of the tokens is performed by using a predetermined fifth number of query weight matrices to evaluate the fifth number of queries. obtaining, obtaining a fifth number of keys using the fifth number of key weighting matrices, and obtaining a fifth number of values using the fifth number of value weighting matrices.
  • the fifth number may be an integer of 1 or more. In an embodiment, the fifth number may be 12. In one embodiment, the fifth number may also be referred to as the number of attention heads.
  • each of the weight matrices may have a weight adjusted by previous learning of the first transformation module 220 . there is.
  • weight matrices ie, query weight matrices, key weight matrix
  • value weighting matrices each may be a fourth number (eg, 768) ⁇ k matrix.
  • each of the weight matrices ie, query weight matrices, key weight matrices, value weight matrices
  • queries, keys, and values Each may be a k-dimensional vector.
  • k may be a natural number of 1 or more.
  • the first transformation module 220 may obtain a query for each of the tokens (eg, the tokens 301 to 317 ) using Equation 1 below.
  • X l may represent a vector of an l-th token among a predetermined third number (eg, 256) of tokens.
  • W Qj may represent a j-th query weight matrix among the fifth number (eg, 12) of query weight matrices.
  • Q(l, j) may represent a query obtained by multiplying the vector of the l-th token and the j-th query weighting matrix.
  • l is an integer equal to or greater than 1 and equal to or less than a predetermined third number (eg, 256).
  • j is an integer greater than or equal to 1 and less than or equal to a fifth number (eg, 12).
  • the first conversion module 220 may obtain a key of each of the tokens (eg, tokens 301 to 317 ) using Equation 2 below.
  • W Kj may represent a j-th key weight matrix among the fifth number (eg, 12) of key weight matrices.
  • K(l, j) may represent a key obtained by multiplying the vector of the l-th token and the j-th key weighting matrix.
  • the first conversion module 220 may obtain a value of each of the tokens (eg, the tokens 301 to 317 ) by using Equation 3 below.
  • W Vj may represent a j-th value weighting matrix among the fifth number (eg, 12) of value weighting matrices.
  • V(l, j) may represent a value obtained by multiplying the vector of the l-th token and the j-th value weighting matrix.
  • using the query, key, and value of each of the tokens obtain a self-focusing vector of each of the tokens (eg, tokens 301 - 317 )
  • a process of calculating a score for each of the tokens eg, tokens 301 to 317) based on each query, and each of the tokens (eg, tokens 301 to 317) based on the score
  • a process of obtaining an activation value of , and multiplying the activation value and value of each of the tokens (eg, tokens 301 to 317) may include the process of obtaining
  • the process of calculating the score of each of the tokens is a query of each of the tokens (eg, tokens 301 to 317) and tokens (eg, token)
  • the number of tokens 301 to 317 may be a process of calculating the scores.
  • the first transformation module 220 may calculate scores of each of the tokens (eg, the tokens 301 to 317 ) using Equation 4 below.
  • Equation 4 Q(x, j) may represent a query obtained by multiplying the vector of the x-th token and the j-th query weight matrix.
  • K(y, j) may represent a key obtained by multiplying the vector of the y-th token and the j-th key weighting matrix.
  • sc(x, y, j) may represent a score obtained by the dot product of Q(x, j) and K(y, j).
  • the first transformation module 220 when calculating the score of the token 301 , performs a query of the token 301 and a dot product of the key of each of the tokens 301 to 317 to obtain the tokens 301 . to 317) (ie, a third predetermined number (eg, 256)) of the tokens 301 may be calculated.
  • a third predetermined number eg, 256
  • the process of obtaining an activation value of each of the tokens (eg, tokens 301 to 317) based on the score includes pre-setting scores of each of the tokens (eg, tokens 301 to 317).
  • the process of dividing by a specified number (eg 8), tokens (eg tokens 301 to 317) for a predetermined function (eg softmax function) of scores divided by each predetermined number (eg 8) It may include a process of acquiring output values, and a process of acquiring an output value based on one's own query and key among the output values as an activation value of one's own.
  • the predetermined number may correspond to a square root value (ie, 8) of a value (ie, 64) obtained by dividing the fourth number (eg, 768) by the fifth number (eg, 12).
  • the first transformation module 220 using Equation 5 below, the tokens (eg, tokens 301 to 317) of each of the predetermined number (eg, 8) of the scores divided by the soft You can get the output values for the max function.
  • a may correspond to the square root value (ie, 8) of a value (ie, 64) obtained by dividing the fourth number (eg, 768) by the fifth number (eg, 12).
  • b may correspond to a predetermined third number (ie, 256).
  • softmax(x, z, j) is an index value based on a score obtained by the dot product of Q(x, j) and K(z, j) Q(x, j) and K(y) , j) may represent a value divided by the sum of the exponent values based on the score obtained by the dot product.
  • the first transformation module 220 is based on the output values of the softmax function of the scores divided by a predetermined number (eg, 8) of each of the tokens (eg, tokens 301 to 317 )
  • a predetermined number eg, 8
  • an output value based on one's own query and key ie, an output value when x and z represent the same token
  • an output value based on its own query and key may be softmax(x, x, j).
  • the fifth number (eg, 12) of output values based on the own query and key may be obtained.
  • the first conversion module 220 when obtaining the activation value of the token 301 , the first conversion module 220 , the number of tokens 301 to 317 for the token 301 (that is, a predetermined third number (eg, : 256)) divided by a predetermined number (eg 8), and output to a predetermined function (eg softmax) for the scores of the token 301 divided by a predetermined number (eg 8)
  • a predetermined third number eg, : 256
  • a predetermined function eg softmax
  • the first conversion module 220 by multiplying the activation value and the value of each of the tokens (eg, tokens 301 to 317), tokens (eg, tokens 301 to 317) Each magnetic concentration vector can be obtained.
  • the first conversion module 220 multiplies the activation value and the value of each of the tokens (eg, the tokens 301 to 317 ) using Equation 6 below, so that the tokens (eg: A magnetic concentration vector of each of the tokens 301 to 317) may be obtained.
  • Z(x, j) may represent a j-th self-concentration vector of an x-th token among a predetermined third number (eg, 256) of tokens.
  • softmax(x, x, j) may represent the j-th activation value of the x-th token.
  • V(x, j) may represent a value obtained by multiplying the vector of the x-th token and the j-th value weighting matrix.
  • the first transformation module 220 concatenates a magnetic concentration vector as much as a fifth number (eg, 12) of each of the tokens (eg, tokens 301 to 317), and combines By multiplying the obtained magnetic concentration vector by a weighting matrix, an intermediate output (eg, the first intermediate output 420 ) may be obtained.
  • the magnetic concentration vector may have a dimension of k.
  • the focused magnetic concentration vector may have a dimension of 12k.
  • the weighting matrix may be a 12k ⁇ fourth number (eg, 768) matrix.
  • the intermediate output eg, the first intermediate output 420
  • the first transformation module 220 combines the magnetic concentration vectors as many as a fifth number (eg, 12) of the token 301 , and multiplies the combined magnetic concentration vector of the token 301 by a weighting matrix By doing so, a token 421 can be obtained.
  • a fifth number eg, 12
  • the first conversion module 220 may obtain the first intermediate output 420 by performing the first magnetic concentration processing 401 on the first intermediate input 300 .
  • the first intermediate output 420 may consist of tokens 421 to 437 .
  • the first intermediate input 300 , the first magnetically focused processing 401 , and the first intermediate output 420 may also be referred to as a first layer of the first transformation module 220 .
  • the first intermediate input 300 may be referred to as an input of the first layer.
  • the first intermediate output 420 may be referred to as an output of the first layer.
  • each of the tokens 421 to 437 of the first intermediate output 420 is also an output vector of each of the tokens 301 to 317 according to the self-focused processing based on the first intermediate input 300 .
  • the token 421 may also be referred to as an output vector of the token 301 according to the self-focused processing based on the first intermediate input 300 .
  • the token 423 may also be referred to as an output vector of the token 303 according to self-focused processing based on the first intermediate input 300 .
  • the first conversion module 220 may obtain the second intermediate output 440 by performing the second magnetic concentration processing 403 on the first intermediate output 420 .
  • the second intermediate output 440 may consist of tokens 441 to 457 .
  • the first intermediate output 420 , the second magnetic concentration processing 403 , and the second intermediate output 440 may also be referred to as a second layer of the first transformation module 220 .
  • the first intermediate output 420 may be referred to as an input of the second layer.
  • the second intermediate output 440 may be referred to as an output of the second layer.
  • each of the tokens 441 to 457 of the second intermediate output 440 is also an output vector of each of the tokens 421 to 437 according to the self-focused processing based on the first intermediate output 420 .
  • the token 441 may also be referred to as an output vector of the token 421 according to the self-focused processing based on the first intermediate output 420 .
  • the token 443 may also be referred to as an output vector of the token 423 according to the self-focused processing based on the first intermediate output 420 .
  • the first conversion module 220 may obtain the third intermediate output by performing the third magnetic concentration processing 405 on the second intermediate output 440 .
  • the second intermediate output 440 , the third magnetic concentration processing 405 , and the third intermediate output may also be referred to as a third layer of the first transform module 220 .
  • the second intermediate output 440 may be referred to as an input of the third layer.
  • the third intermediate output may be referred to as an output of the third layer.
  • the first transformation module 220 may obtain the u-1 th intermediate output 460 by performing the u-1 th self-focusing process 407 on the u-2 th intermediate output.
  • u may correspond to a predetermined number of times.
  • the u-1 th intermediate output 460 may be composed of tokens 461 to 477 .
  • the u-2 th intermediate output, the u-1 th magnetic concentration processing 407 , and the u-1 th intermediate output 460 are also referred to as the u-1 th layer of the first transform module 220 .
  • the u-2 th intermediate output may be referred to as an input of the u-1 th layer.
  • the u-1 th intermediate output 460 may be referred to as an output of the u-1 th layer.
  • the first transformation module 220 may obtain the u-th intermediate output 480 by performing the u-th magnetic concentration processing 409 on the u-1 th intermediate output 460 . .
  • the u-th intermediate output 460 may be composed of tokens 481 to 497 .
  • the u-1 th intermediate output 460 , the u-th magnetic concentration processing 409 , and the u-th intermediate output 480 may also be referred to as a u-th layer of the first transform module 220 .
  • the u-1 th intermediate output 460 may be referred to as an input of the u th layer.
  • the u-th intermediate output 480 may be referred to as an output of the u-th layer.
  • each of the tokens 481 to 497 of the u-th intermediate output 480 is an output vector of each of the tokens 461 to 477 according to the self-focused processing based on the u-1 th intermediate output 460 .
  • the token 481 may also be referred to as an output vector of the token 461 according to the self-focused processing based on the u-1 th intermediate output 460 .
  • the token 483 may also be referred to as an output vector of the token 483 according to the self-focused processing based on the u-1 th intermediate output 420 .
  • the first transformation module 220 may obtain, as an output, a vector indicated by a first token at a predetermined position among outputs of a layer of a predetermined order number.
  • the layer of the predetermined order may be the layer of the last order (ie, the u-th layer).
  • the first token in the predetermined position may be a token 481 located in a column in which the token embedding (TE CLS) is located.
  • the first conversion module 220 when the first conversion module 220 is learning, the token 481 located in the column in which the token embedding (TE CLS) of the intermediate output 480 is located indicates A vector may be obtained as an output 413 .
  • T CLS token embedding
  • the first transform module 220 is used in the first to u-th self-focusing processes 401 to 409 based on the output 413 when the first transform module 220 is learning. It is possible to adjust the values (ie, weights) of the elements of the matrices. In one embodiment, the first transformation module 220, when the first transformation module 220 is learning, so that the error indicated by the output 413 is minimized, the first to u-th self-focusing processes 401 to 409 ) can adjust the values (ie, weights) of the elements of the matrices used. In an embodiment, values (ie, weights) of elements of matrices used in the first to u-th self-focusing processes 401 to 409 may be different from each other.
  • the first transformation module 220 may obtain at least one output when learning of the first transformation module 220 is completed (ie, when the first transformation module 220 is in use). there is.
  • the at least one output is a first token at a predetermined position among outputs of a layer with a predetermined order number, at least one token located in the same row as the first token, and located in the same column as the first token It may be obtained by at least one token, or a combination thereof.
  • the layer of the predetermined order may be the layer of the last order (ie, the u-th layer).
  • the token at a predetermined position may be a token 481 located in a column in which the token embedding (TE CLS) is located.
  • the first conversion module 220 when the learning of the first conversion module 220 is completed, the token 481 located in the column in which the token embedding (TE CLS) of the intermediate output 480 is located is An output 413 may be obtained based on the representing vector.
  • the first transformation module 220 when the learning of the first transformation module 220 is completed, based on the vector indicated by the tokens 481 to 497 located in the same row as the token 481 An output 415 may be obtained.
  • the first transformation module 220 when the learning of the first transformation module 220 is completed, tokens 421, 441, 461, and 481 located in the same column as the token 481 An output 411 may be obtained based on the vector.
  • the tokens 421 , 441 , 461 , and 481 may obtain an output 411 having a fourth number (eg, 768 ) dimension by averaging (or weighted averaging) the respective vectors.
  • the first transformation module 220, the output 411 by performing magnetic concentration processing based on the vector indicated by the tokens 421 , 441 , 461 , and 481 located in the same column as the token 481 . ) can be obtained.
  • An output 411 may be obtained by performing self-focusing processing based on the tokens 421 , 441 , 461 , and 481 .
  • the output 411 is the token 481 according to the magnetic concentration processing based on the tokens 421 , 441 , 461 , and 481 . It can be an output vector.
  • the first transformation module 220 may to 497) by averaging (or, a weighted average) of each of the vectors, an output 415 having a fourth number (eg, 768) of dimensions may be obtained.
  • the first transformation module 220 obtains the output 415 by performing self-focusing processing based on the vectors indicated by the tokens 481 to 497 located in the same row as the token 481 .
  • the first transformation module 220 may to 497)
  • an output 415 may be obtained.
  • the output 415 when the output 415 is obtained by performing the magnetic concentration processing, the output 415 may be an output vector of the token 481 according to the magnetic concentration processing based on the tokens 481 to 497. .
  • the first transformation module 220 when the learning of the first transformation module 220 is completed (that is, when the first transformation module 220 is in use), the obtained at least one output ( 411 , 413 , or 415 may be provided to the second transformation module 240 .
  • the feature extraction module 230 may extract features from data provided from the preprocessing module 210 .
  • the feature may include the form of a sentence (eg, interrogative sentence, declarative sentence, exclamation sentence, imperative sentence) included in the data provided from the preprocessing module 210 , the number of speakers, the number of speaker transitions, or a combination thereof.
  • a sentence eg, interrogative sentence, declarative sentence, exclamation sentence, imperative sentence
  • the feature extraction module 230 may provide information on the extracted feature to the second transformation module 240 .
  • the second transformation module 240 may obtain a final output based on at least one output 411 , 413 , or 415 , information about a feature, or a combination thereof. In an embodiment, the second transformation module 240 uses at least one sub-transform module to finalize at least one output 411 , 413 , or 415 , information about a feature, or a combination thereof. output can be obtained.
  • the second transform module 240 may obtain a final output for at least one output 411 , 413 , or 415 based on the first sub-transform module 501 .
  • the second transform module 240 applies a linear product 511 , and an activation 514 , to an output 411 , a linear product 513 to an output 413 , and an activation Apply (515), linear product (513), and activation (516) on output (415).
  • the linear product 511 of the output 411 may be the dot product of the output 411 having a fourth number (eg, 768) and a matrix of the linear product 511 .
  • the linear product 512 of the output 413 may be the dot product of the output 413 having a fourth number (eg, 768) and a matrix of the linear product 512 .
  • the linear product 513 of the output 415 may be a dot product of the output 415 having a fourth number (eg, 768) and a matrix of the linear product 513 .
  • the matrix of the linear product 511 may be a fourth number (eg, 768) ⁇ fourth number (eg, 768) matrix.
  • the matrix of the linear product 512 may be a fourth number (eg, 768) ⁇ fourth number (eg, 768) matrix.
  • the matrix of the linear product 513 may be a fourth number (eg, 768) ⁇ fourth number (eg, 768) matrix.
  • the activation 514 for the output to which the linear product 511 is applied may be to obtain an output of a predetermined activation function for the output to which the linear product 511 is applied.
  • the activation 515 for the output to which the linear product 512 is applied may be to obtain an output of a predetermined activation function for the output to which the linear product 512 is applied.
  • the activation 516 for the output to which the linear product 513 is applied may be to obtain an output of a predetermined activation function for the output to which the linear product 513 is applied.
  • the predetermined activation function may include a hyperbolic tangent function, a step function, a sigmoid function, a Rectified Linear Unit (ReLU), or a combination thereof.
  • the second transformation module 240 may apply a combination 517 to an output to which activation 514 is applied, an output to which activation 515 is applied, and an output to which activation 516 is applied.
  • the output to which the combination 517 is applied may be a vector having a dimension that is three times (eg, 2304) a fourth number (eg, 768).
  • the second transformation module 240 may apply a dropout 518 to the output to which the combination 517 is applied.
  • the dropout 518 may be to change some values of the output to which the combination 517 is applied to 0 based on a predetermined probability.
  • the predetermined probability may have a value greater than or equal to 0 and less than 1.
  • the predetermined probability may have a value of 0.1.
  • the second transform module 240 may apply a linear product 519 to the output to which the dropout 518 is applied.
  • the linear product 519 of the output to which the dropout 518 is applied is applied to the dropout 518 having a dimension equal to three times (eg, 2304) a fourth number (eg, 768). It may be the dot product of the output and the matrix of the linear product 519 .
  • the matrix of the linear product 519 may be a number (eg, 2304) ⁇ 2 matrix corresponding to three times the fourth number (eg, 768).
  • the second transform module 240 may obtain a two-dimensional vector by applying a linear product 519 to the output to which the dropout 518 is applied.
  • the obtained two-dimensional vector may be the final output.
  • the second transform module 240 when the second transform module 240 is learning, based on the obtained two-dimensional vector, the matrix of the linear products 511 , 512 , 513 , 519 is You can adjust the values (ie weights) of the elements. In an embodiment, the second transform module 240 performs linear products 511, 512, 513, 519 such that, when the second transform module 240 is learning, the error indicated by the obtained two-dimensional vector is minimized. You can adjust the values (ie, weights) of the elements of the matrices of .
  • the second transformation module 240 when the learning of the second transformation module 240 is completed (that is, when the second transformation module 240 is in use), the obtained two-dimensional vector (that is, , final output) may be provided to components of the electronic device 101 (eg, an application (application 146 of FIG. 1 ), the sound output device 155 , the display device 160 , or a combination thereof). .
  • the second transform module 240 may obtain a final output for at least one output 411 , 413 , or 415 based on the second sub-transform module 503 .
  • the second transformation module 240 may apply a combination 521 to the output 411 , the output 413 , and the output 415 .
  • the output to which the combination 521 is applied may be a vector having a dimension that is three times (eg, 2304) the fourth number (eg, 768).
  • the second transformation module 240 may apply a linear product 522 to the output to which the combination 521 is applied.
  • the linear product 522 of the output to which the combination 521 is applied is equal to the output to which the combination 521 is applied having a dimension equal to three times (eg, 2304) a fourth number (eg, 768). It may be a dot product of a matrix of linear products 522 .
  • the matrix of the linear product 522 is a number corresponding to three times the fourth number (eg, 768) (eg, 2304) ⁇ a number corresponding to three times the fourth number (eg, 768) (eg 2304) It can be a matrix.
  • the second transformation module 240 may apply a dropout 523 to the output to which the linear product 522 is applied.
  • the second transformation module 240 may apply a linear product 524 to the output to which the dropout 523 is applied.
  • the matrix of linear products 524 is a number corresponding to three times the fourth number (eg, 768) (eg, 2304) ⁇ a number corresponding to three times the fourth number (eg, 768) (eg 2304) It can be a matrix.
  • the second transformation module 240 may apply a dropout 525 to the output to which the linear product 524 is applied.
  • the second transformation module 240 may apply a linear product 526 to the output to which the dropout 525 is applied.
  • the matrix of linear products 526 is a number corresponding to three times the fourth number (eg, 768) (eg, 2304) ⁇ a number corresponding to three times the fourth number (eg, 768) (eg 2304) It can be a matrix.
  • the second transformation module 240 may apply a dropout 527 to the output to which the linear product 526 is applied.
  • the second transformation module 240 may apply a linear product 528 to the output to which the dropout 527 is applied.
  • the matrix of the linear product 528 may be a number (eg, 2304) ⁇ 2 matrix corresponding to three times the fourth number (eg, 768).
  • the obtained two-dimensional vector may be the final output.
  • the second transform module 240 when the second transform module 240 is learning, based on the obtained two-dimensional vector, the matrix of the linear products 522 , 524 , 526 , 528 is You can adjust the values (ie weights) of the elements. In an embodiment, the second transform module 240 performs linear products 522, 524, 526, and 528 such that, when the second transform module 240 is learning, the error indicated by the obtained two-dimensional vector is minimized. You can adjust the values (ie, weights) of the elements of the matrices of .
  • the second transformation module 240 when the learning of the second transformation module 240 is completed (that is, when the second transformation module 240 is in use), the obtained two-dimensional vector (that is, , final output) may be provided to components of the electronic device 101 (eg, the application 146 , the sound output device 155 , the display device 160 , or a combination thereof).
  • the electronic device 101 eg, the application 146 , the sound output device 155 , the display device 160 , or a combination thereof.
  • the second transformation module 240 generates at least one output 411 , 413 , or 415 and a final output for information on features based on the first sub-transform module 501 . can be obtained
  • the second transform module 240 applies a linear product 541 , and an activation 544 , to an output 411 , a linear product 543 to an output 413 , and an activation Apply 545 , linear product 543 on output 415 , and activation 546 .
  • the matrix of the linear product 541 may be a fourth number (eg, 768) ⁇ fourth number (eg, 768) matrix.
  • the matrix of the linear product 542 may be a fourth number (eg, 768) ⁇ fourth number (eg, 768) matrix.
  • the matrix of the linear product 543 may be a fourth number (eg, 768) ⁇ fourth number (eg, 768) matrix.
  • the activation 544 for the output to which the linear product 541 is applied may be to obtain an output of a predetermined activation function for the output to which the linear product 541 is applied.
  • the activation 545 for the output to which the linear product 542 is applied may be to obtain an output of a predetermined activation function for the output to which the linear product 542 is applied.
  • the activation 546 for the output to which the linear product 543 is applied may be to obtain an output of a predetermined activation function for the output to which the linear product 543 is applied.
  • the second transformation module 240 may apply a combination 547 to an output to which activation 544 is applied, an output to which activation 545 is applied, and an output to which activation 546 is applied.
  • the output to which the combination 547 is applied may be a vector having a dimension that is three times (eg, 2304) the fourth number (eg, 768).
  • the second transformation module 240 may apply a dropout 548 to the combined output.
  • the second transformation module 240 may apply a linear product 549 and activation 550 to the output to which the dropout 548 is applied.
  • the matrix of the linear product 549 may be a number (eg, 2304) ⁇ 100 matrix corresponding to three times the fourth number (eg, 768).
  • the activation 550 for the output to which the linear product 549 is applied may be to obtain an output of a predetermined activation function for the output to which the linear product 549 is applied.
  • the second transformation module 240 may apply a linear product 551 and activation 552 to a vector 531 representing information on features.
  • the matrix of the linear product 559 may be a size (eg, 3) ⁇ 100 matrix of the vector 531 .
  • the activation 552 for the vector to which the linear product 551 is applied may be to obtain an output of a predetermined activation function for the vector to which the linear product 551 is applied.
  • the second transformation module 240 may apply a combination 553 to the output to which the activation 552 is applied and the output to which the activation 550 is applied.
  • the combined output may be a vector with dimensions up to 200.
  • the second transformation module 240 may apply a linear product 554 to the output to which the combination 553 is applied.
  • the matrix of linear product 554 may be a 200 ⁇ 2 matrix.
  • the obtained two-dimensional vector may be the final output.
  • the second transform module 240 when the second transform module 240 is learning, the second transform module 240 performs linear products 541, 542, 543, 549, 551, 554 based on the obtained two-dimensional vector. ) can adjust the values (ie, weights) of the elements of the matrices. In an embodiment, when the second transform module 240 is learning, the linear products 541, 542, 543, 549, Values (ie, weights) of elements of matrices 551 and 554 may be adjusted.
  • the second transformation module 240 when the learning of the second transformation module 240 is completed (that is, when the second transformation module 240 is in use), the obtained two-dimensional vector (that is, , final output) may be provided to components of the electronic device 101 (eg, the application 146 , the sound output device 155 , the display device 160 , or a combination thereof).
  • the electronic device 101 eg, the application 146 , the sound output device 155 , the display device 160 , or a combination thereof.
  • a final output may be obtained for at least one of the inputs 509 (ie, at least one of the outputs 411 , 413 , or 415 and/or a vector 531 representing information about the feature).
  • the ensemble 570 is an average of the output of the first sub-transform module 501 , the output of the second sub-transform module 503 , the output of the third sub-transform module 505 , or a combination thereof (or, a weighted average) may be applied to obtain a final output.
  • the second transform module 240 when the second transform module 240 is learning, based on the final output to which the ensemble 570 is applied, the first sub transform module 501, the second sub Values (ie, weights) of elements of matrices included in the transform module 503 , the third sub transform module 505 , or a combination thereof may be adjusted.
  • the second transform module 240 when the learning of the second transform module 240 is completed (ie, when the second transform module 240 is in use), the final output to which the ensemble 570 is applied may be provided to components of the electronic device 101 (eg, the application 146 , the sound output device 155 , the display device 160 , or a combination thereof).
  • the components of the electronic device 101 that have received the final output may perform a function of the electronic device 101 corresponding to the final output.
  • the processor 200 receiving the final output is configured to combine a predetermined first number of conversations (eg, 1 to 3 conversations in Table 2) and a plurality of input by combining each of a plurality of candidate conversations. Based on the final outputs of the second transformation module 240 for , a final candidate conversation among the plurality of candidate conversations may be selected. In an embodiment, the processor 200 may provide the selected final candidate conversation to the user. In an embodiment, the processor 200 may output the selected final candidate dialogue through the sound output device 155 , the display device 160 , or a combination thereof.
  • a predetermined first number of conversations eg, 1 to 3 conversations in Table 2
  • a final candidate conversation among the plurality of candidate conversations may be selected.
  • the processor 200 may provide the selected final candidate conversation to the user.
  • the processor 200 may output the selected final candidate dialogue through the sound output device 155 , the display device 160 , or a combination thereof.
  • the processor 200 may select a candidate conversation corresponding to the final output having the highest value among the final outputs of the second transformation module 240 as the final candidate conversation. In an embodiment, the processor 200 may select a candidate conversation corresponding to a final output exceeding a reference value among softmax values of each of the final outputs of the second transformation module 240 as the final candidate conversation.
  • the processor 200 may provide the user with a candidate conversation, "How about watching a movie?”
  • FIG. 6 is a flowchart illustrating an operation of performing a function corresponding to an input in an electronic device (eg, the electronic device 101 of FIG. 1 ) according to an embodiment.
  • the operations of FIG. 6 include: (eg, the processor 120 of Fig. 1, or the processor 200 of Fig. 2)
  • the operations of Fig. 6, the learning is completed, the first transformation module 220, and The learning may be performed by the processor based on the second transformation module 240 completed.
  • the processor 200 converts at least one output (eg, one output 411 , 413 , or 415 ) of the first conversion module 220 based on the input.
  • the input may be data obtained by combining a first number of preset conversations (eg, first to third conversations in Table 2) and each of a plurality of candidate conversations.
  • the processor 200 is configured to output a first output based on a vector indicated by a first token at a predetermined position among outputs of a layer of a predetermined order among at least one layer of the first transformation module 220 , A first operation of obtaining a first operation, a second operation of obtaining a first output based on a vector indicated by each of the at least one token located in the same row as the first token, a second operation of obtaining at least one in the same column as the first token
  • the at least one first output may be acquired using an operation of acquiring a first output based on a vector represented by each of the one tokens, or an operation of a combination thereof.
  • the processor 200 may obtain an output of the second transformation module 240 based on at least one output of the first transformation module 220 .
  • the processor 200 may include one sub-transformation module (eg, the first sub-transformation module 501 , the second sub-transformation module 503 , or the third sub-transformation module of the second transformation module 240 ).
  • An output of at least one output of the first transform module 220 may be obtained using one sub transform module of the modules 505).
  • the processor 200 may include two or more sub-transformation modules of the second transformation module 240 (eg, the first sub-transformation module 501 , the second sub-transformation module 503 , or the third sub-transformation module). At least two sub-transform modules of the transform module 505) may be used to obtain an output of at least one output of the first transform module 220 .
  • the processor 200 converts the average (or weighted average) of the outputs of each of the two or more sub-transform modules to the first transform module ( 220) may be obtained as an output for at least one output.
  • the processor 200 may perform a function of the electronic device 101 based on the output of the second conversion module 240 .
  • the processor 200 may select a final candidate conversation from among a plurality of candidate conversations based on the output of the second transformation module 240 , and provide the selected final candidate conversation to the user. In an embodiment, the processor 200 may provide the selected final candidate dialogue to the user by outputting the selected final candidate dialogue through the sound output device 155 , the display device 160 , or a combination thereof.
  • FIG. 7 is a flowchart illustrating an operation of learning transformation modules in an electronic device (eg, the electronic device 101 of FIG. 1 ) according to an embodiment.
  • the operations of FIG. 7 are performed by a processor (eg: It may be performed by the processor 120 of FIG. 1 or the processor 200 of FIG. 2 .
  • the processor 200 may learn the first transformation module 220 based on predetermined data.
  • the predefined data may be corpus data.
  • the processor 200 pre-trains the first transformation module 220 on the basis of unclassified text data, and then based on the text data classified based on a specific task, the pre-trained second 1
  • the conversion module 220 can be finely adjusted.
  • the processor 200 may learn the first transformation module 220 as many times as the first predetermined number of learning, based on the predetermined data.
  • the first number of times of learning may be 3 times.
  • the processor 200 may determine whether learning of the first conversion module 220 is completed.
  • the processor 200 may determine that the learning of the first transformation module 220 has been completed when the first transformation module 220 is learned by a predetermined first number of learning times.
  • the processor 200 may perform operation 730 .
  • the processor 200 may perform operation 710 .
  • the processor 200 may learn the second transformation module 240 based on the learned output of the first transformation module 220 .
  • the processor 200 may train the second transformation module 240 based on the output of the text data classified based on the specific task of the learned first transformation module 220 . .
  • the processor 200 may learn the second transformation module 240 as many times as the second predetermined number of learning.
  • the second learning number may be higher than the first learning number.
  • the second number of learning times may be 10 times.
  • the processor 200 may determine whether learning of the second transformation module is completed.
  • the processor 200 may determine that the learning of the second transformation module 240 is completed when the second transformation module 240 has been learned as many times as the second predetermined number of learning.
  • the processor 200 may end the operation of FIG. 7 .
  • the processor 200 may perform operation 730 .
  • the processor 200 uses the learned first transformation module 220 and the learned second transformation module 240 to input (eg, : It is possible to obtain a final output (eg, a response corresponding to an input) for voice, text, or a combination thereof.
  • the electronic device 101 is a processor 200 including a memory 130 for storing instructions, and a first conversion module 220 , and a second conversion module 240 . including, wherein the processor 200 executes the instructions so that the electronic device 101 obtains at least one intermediate output for the input based on the first conversion module 220, and the obtain a final output for the at least one intermediate output based on the second conversion module 240 , and perform a function of the electronic device 101 based on the second output, wherein the at least one The intermediate output of is a first output based on a vector indicated by a first token at a predetermined position among outputs of a layer of a predetermined order among at least one layer of the first transformation module 220, the same row as the first token a second output based on a vector represented by each of the at least one token positioned in , a third output based on a vector represented by each of the at least one token positioned in the same column as the first token, or a combination thereof can do.
  • the first transformation module 220 is a transformation module based on Embeddings from Language Model (ELMo), Generative Pre-Training (GPT), XLNet, Bidirectional Encoder Representations from Transformers (BERT), or a combination thereof.
  • ELMo Language Model
  • GPT Generative Pre-Training
  • XLNet XLNet
  • BET Bidirectional Encoder Representations from Transformers
  • the second conversion module 220 includes at least one sub-conversion module, and the processor 200 executes the instructions so that the electronic device 101 , the at least one and, based on sub-transform modules, obtain outputs for the at least one intermediate output, and based on a weighted average of the outputs, obtain the final output.
  • the predetermined position may be a position allocated to a token indicating a relationship between two sentences included in the input among a plurality of tokens of the at least one layer.
  • the layer of the predetermined order may be a layer of the last order of the at least one layer.
  • the processor 200 may be configured to execute the instructions so that the electronic device 101 obtains information indicating the characteristics of the sentence included in the input as the at least one intermediate output. there is.
  • the input may include a first sentence including a plurality of conversations and a second sentence including a candidate conversation.
  • the processor 200 executes the instructions to cause the electronic device 101 to select one of the plurality of conversations included in the input based on the number of semantic units included in the input. delete a partial conversation, and obtain the at least one intermediate output for the input in which the partial conversation is deleted, based on the first transformation module.
  • the processor 200 executes the instructions so that the electronic device 101 corrects typos of the plurality of conversations included in the input, and based on the first conversion module, and obtain the at least one intermediate output for the input with the typo corrected.
  • the processor 200 executes the instructions so that the electronic device 101 learns the first transformation module 220 based on predetermined data, and the learned first transformation Based on the intermediate output obtained from the module 220 , it may be configured to learn the second transform module 240 .
  • the processor 200 executes the instructions to calculate the weight of the first transformation module 220 while the electronic device 101 learns the second transformation module 240 . It can be configured not to adjust.
  • the display device further includes a display (eg, the display device 160 of FIG. 1 ), wherein the processor 200 executes the instructions so that the electronic device 101 , based on the final output , select one candidate conversation among a plurality of candidate conversations, and display the selected candidate conversation through the display.
  • a display eg, the display device 160 of FIG. 1
  • the processor 200 executes the instructions so that the electronic device 101 , based on the final output , select one candidate conversation among a plurality of candidate conversations, and display the selected candidate conversation through the display.
  • the method of operating the electronic device 101 includes an operation of acquiring at least one intermediate output for an input based on the first conversion module 220 of the electronic device 101 . , an operation of obtaining a final output for the at least one intermediate output based on the second conversion module 240 of the electronic device 101, and performing a function of the electronic device based on the final output operation, wherein the at least one intermediate output is a first output based on a vector indicated by a first token at a predetermined position among outputs of a layer of a predetermined order among at least one layer of the first transformation module 220 , a second output based on a vector represented by each of the at least one token positioned in the same row as the first token, a third output based on a vector represented by each of the at least one token positioned in the same column as the first token, or the output of a combination thereof.
  • the predetermined position may be a position allocated to a token indicating a relationship between two sentences included in the input among a plurality of tokens of the at least one layer.
  • the layer of the predetermined order may be a layer of the last order of the at least one layer.
  • the obtaining of the at least one intermediate output may further include obtaining information indicating a characteristic of a sentence included in the input as the at least one intermediate output.
  • the input may include a first sentence including a plurality of conversations and a second sentence including a candidate conversation.
  • the acquiring of the at least one intermediate output includes deleting some conversations from among the plurality of conversations included in the input based on the number of semantic units included in the input; and obtaining the at least one intermediate output with respect to the input from which the partial conversation has been deleted, based on one conversion module.
  • the method further includes an operation of learning the first transformation module based on predetermined data, and an operation of learning the second transformation module based on an intermediate output obtained from the learned first transformation module. can do.
  • the performing of the function of the electronic device includes selecting one candidate conversation from among a plurality of candidate conversations based on the final output, and displaying the selected candidate conversation on the display of the electronic device. It may further include an operation to display through.
  • a computer-readable storage medium storing one or more programs (software modules) may be provided.
  • One or more programs stored in the computer-readable storage medium are configured for execution by one or more processors in an electronic device (device).
  • One or more programs include instructions for causing an electronic device to execute methods according to embodiments described in a claim or specification of the present disclosure.
  • Such programs include random access memory, non-volatile memory including flash memory, read only memory (ROM), electrically erasable programmable ROM (EEPROM: electrically erasable programmable read only memory), magnetic disc storage device, compact disc ROM (CD-ROM), digital versatile discs (DVDs), or other types of It may be stored in an optical storage device or a magnetic cassette. Alternatively, it may be stored in a memory composed of a combination of some or all thereof. In addition, each configuration memory may be included in plurality.
  • non-volatile memory including flash memory, read only memory (ROM), electrically erasable programmable ROM (EEPROM: electrically erasable programmable read only memory), magnetic disc storage device, compact disc ROM (CD-ROM), digital versatile discs (DVDs), or other types of It may be stored in an optical storage device or a magnetic cassette. Alternatively, it may be stored in a memory composed of a combination of some or all thereof. In addition, each configuration memory may be included in plurality.
  • the program is transmitted through a communication network consisting of a communication network such as the Internet, an intranet, a local area network (LAN), a wide LAN (WLAN), or a storage area network (SAN), or a combination thereof. It may be stored on an attachable storage device that can be accessed. Such a storage device may be connected to a device implementing an embodiment of the present disclosure through an external port. In addition, a separate storage device on the communication network may be connected to the device implementing the embodiment of the present disclosure.
  • a communication network such as the Internet, an intranet, a local area network (LAN), a wide LAN (WLAN), or a storage area network (SAN), or a combination thereof. It may be stored on an attachable storage device that can be accessed.
  • Such a storage device may be connected to a device implementing an embodiment of the present disclosure through an external port.
  • a separate storage device on the communication network may be connected to the device implementing the embodiment of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

L'invention concerne un procédé d'utilisation d'un dispositif électronique selon un mode de réalisation, consistant : à obtenir au moins une sortie intermédiaire pour une entrée en fonction d'un premier module de conversion du dispositif électronique; à obtenir une sortie finale pour ladite sortie intermédiaire au moins, en fonction d'un deuxième module de conversion du dispositif électronique; et à mettre en oeuvre une fonction du dispositif électronique en fonction de la sortie finale, ladite sortie intermédiaire au moins pouvant comprendre une première sortie basée sur un vecteur indiqué par un premier jeton à un emplacement prédéterminé parmi les sorties d'une couche présentant un ordre prédéterminé parmi plusieurs couches du premier module de conversion, une deuxième sortie basée sur un vecteur indiqué par au moins un jeton situé dans la même rangée que le premier jeton, une troisième sortie basée sur un vecteur indiqué au moins un jeton situé dans la même colonne que le premier jeton, ou une sortie d'une combinaison de ceux-ci.
PCT/KR2021/001490 2020-02-06 2021-02-04 Dispositif électronique fournissant un énoncé correspondant au contexte d'une conversation, et procédé d'utilisation associé WO2021158040A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2020-0014435 2020-02-06
KR1020200014435A KR20210100446A (ko) 2020-02-06 2020-02-06 대화의 맥락에 대응하는 발화를 제공하는 전자 장치 및 이의 동작 방법

Publications (1)

Publication Number Publication Date
WO2021158040A1 true WO2021158040A1 (fr) 2021-08-12

Family

ID=77200290

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2021/001490 WO2021158040A1 (fr) 2020-02-06 2021-02-04 Dispositif électronique fournissant un énoncé correspondant au contexte d'une conversation, et procédé d'utilisation associé

Country Status (2)

Country Link
KR (1) KR20210100446A (fr)
WO (1) WO2021158040A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102596711B1 (ko) * 2022-09-21 2023-11-02 주식회사 비트리 홀로그램 메모리얼 장치 및 그 방법

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160144384A (ko) * 2014-04-14 2016-12-16 마이크로소프트 테크놀로지 라이센싱, 엘엘씨 딥 러닝 모델을 이용한 상황 의존 검색 기법
KR20190071527A (ko) * 2017-12-14 2019-06-24 삼성전자주식회사 발화의 의미를 분석하기 위한 전자 장치 및 그의 동작 방법
KR20190122457A (ko) * 2018-04-20 2019-10-30 삼성전자주식회사 음성 인식을 수행하는 전자 장치 및 전자 장치의 동작 방법
WO2020013428A1 (fr) * 2018-07-13 2020-01-16 삼성전자 주식회사 Dispositif électronique pour générer un modèle asr personnalisé et son procédé de fonctionnement
US20200035215A1 (en) * 2019-08-22 2020-01-30 Lg Electronics Inc. Speech synthesis method and apparatus based on emotion information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160144384A (ko) * 2014-04-14 2016-12-16 마이크로소프트 테크놀로지 라이센싱, 엘엘씨 딥 러닝 모델을 이용한 상황 의존 검색 기법
KR20190071527A (ko) * 2017-12-14 2019-06-24 삼성전자주식회사 발화의 의미를 분석하기 위한 전자 장치 및 그의 동작 방법
KR20190122457A (ko) * 2018-04-20 2019-10-30 삼성전자주식회사 음성 인식을 수행하는 전자 장치 및 전자 장치의 동작 방법
WO2020013428A1 (fr) * 2018-07-13 2020-01-16 삼성전자 주식회사 Dispositif électronique pour générer un modèle asr personnalisé et son procédé de fonctionnement
US20200035215A1 (en) * 2019-08-22 2020-01-30 Lg Electronics Inc. Speech synthesis method and apparatus based on emotion information

Also Published As

Publication number Publication date
KR20210100446A (ko) 2021-08-17

Similar Documents

Publication Publication Date Title
WO2020105856A1 (fr) Appareil électronique pour traitement d'énoncé utilisateur et son procédé de commande
WO2022019538A1 (fr) Modèle de langage et dispositif électronique le comprenant
WO2020130447A1 (fr) Procédé de fourniture de phrases basé sur un personnage et dispositif électronique de prise en charge de ce dernier
WO2020167006A1 (fr) Procédé de fourniture de service de reconnaissance vocale et dispositif électronique associé
WO2020085784A1 (fr) Dispositif électronique et système qui fournissent un service sur la base d'une reconnaissance vocale
WO2018097439A1 (fr) Dispositif électronique destiné à la réalisation d'une traduction par le partage d'un contexte d'émission de parole et son procédé de fonctionnement
WO2022010157A1 (fr) Procédé permettant de fournir un écran dans un service de secrétaire virtuel à intelligence artificielle, et dispositif de terminal d'utilisateur et serveur pour le prendre en charge
WO2021158040A1 (fr) Dispositif électronique fournissant un énoncé correspondant au contexte d'une conversation, et procédé d'utilisation associé
WO2019190062A1 (fr) Dispositif électronique destiné au traitement d'une entrée vocale utilisateur
WO2019240434A1 (fr) Dispositif électronique et procédé de commande correspondant
WO2020180000A1 (fr) Procédé d'expansion de langues utilisées dans un modèle de reconnaissance vocale et dispositif électronique comprenant un modèle de reconnaissance vocale
WO2023113502A1 (fr) Dispositif électronique et procédé de recommandation de commande vocale associé
WO2020076086A1 (fr) Système de traitement d'énoncé d'utilisateur et son procédé de fonctionnement
WO2022131566A1 (fr) Dispositif électronique et procédé de fonctionnement de dispositif électronique
WO2022163963A1 (fr) Dispositif électronique et procédé de réalisation d'instruction de raccourci de dispositif électronique
WO2022010279A1 (fr) Dispositif électronique permettant de convertir une écriture manuscrite en texte et son procédé
WO2020171545A1 (fr) Dispositif électronique et système de traitement de saisie d'utilisateur et procédé associé
WO2024029851A1 (fr) Dispositif électronique et procédé de reconnaissance vocale
WO2024080745A1 (fr) Procédé d'analyse de la parole d'un utilisateur sur la base d'une mémoire cache de parole, et dispositif électronique prenant en charge celui-ci
WO2024043670A1 (fr) Procédé d'analyse de la parole d'un utilisateur, et dispositif électronique prenant celui-ci en charge
WO2021086130A1 (fr) Dispositif électronique de traitement d'un énoncé d'utilisateur et son procédé d'opération
WO2023132470A1 (fr) Serveur et dispositif électronique pour traiter un énoncé d'utilisateur et procédé d'action associé
WO2022211590A1 (fr) Dispositif électronique de traitement d'énoncé d'utilisateur et son procédé de commande
WO2023008819A1 (fr) Dispositif électronique et son procédé de fonctionnement
WO2024085461A1 (fr) Dispositif électronique et procédé destiné à fournir un service de traduction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21750197

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21750197

Country of ref document: EP

Kind code of ref document: A1