WO2022060050A1 - Système, dispositif et procédé pour mener une conversation - Google Patents

Système, dispositif et procédé pour mener une conversation Download PDF

Info

Publication number
WO2022060050A1
WO2022060050A1 PCT/KR2021/012484 KR2021012484W WO2022060050A1 WO 2022060050 A1 WO2022060050 A1 WO 2022060050A1 KR 2021012484 W KR2021012484 W KR 2021012484W WO 2022060050 A1 WO2022060050 A1 WO 2022060050A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
response
conversation
personality
learning
Prior art date
Application number
PCT/KR2021/012484
Other languages
English (en)
Korean (ko)
Inventor
장윤나
임희석
Original Assignee
고려대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 고려대학교 산학협력단 filed Critical 고려대학교 산학협력단
Publication of WO2022060050A1 publication Critical patent/WO2022060050A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to a system, apparatus and method for conducting a conversation.
  • a dialogue system is hardware or software that enables humans and machines to communicate with each other.
  • Such a conversation system may be divided into a general small talk conversation system (eg, a chatbot, etc.) and a virtual assistant conversation system.
  • the small talk dialogue system is a system that performs conversations with humans by exchanging light conversations as if chatting.
  • the small talk dialogue system is not as natural as human dialogue and lacks specificity.
  • the recent conversation system is implemented using a conversation-related learning model. Since the conversation-related learning model is mostly learned based on a large data set, the conversation system cannot create a dialogue according to its own characteristics or personality, but only universal There was a difficulty in creating only a dialogue of character.
  • An object of the present invention is to provide a system, apparatus and method for performing a conversation that can have a conversation with a human being more specifically and intimately based on their unique characteristics.
  • Conversation performing apparatus obtains combination data by connecting and combining personality data, dialogue history, and response data
  • the personality data is an embedding processing unit including information that can represent a specific personality and at least one learning based on the combination data It may include a learning processing unit that performs learning using the model.
  • the dialogue performance system receives personality data, dialogue history, and response data, wherein the personality data includes a terminal device including information capable of representing a specific personality, combined data obtained from the personality data, dialogue history and response data, and segment data and a server device that performs learning using at least one learning model based on location data.
  • Conversation performance method includes the steps of obtaining combination data by connecting and combining personality data, conversation history and response data, wherein the personality data includes information that can represent a specific personality, the source of each data in the combination data generating segment embedding data indicating and position data indicating the position of each data in the combination; and performing learning using at least one learning model based on the combination data, the segment embedding data, and the position data.
  • FIG. 1 is a block diagram of an embodiment of an apparatus for performing a conversation.
  • FIG. 2 is a diagram for explaining an example of word embedding.
  • 3 is a diagram for explaining an example of a word token.
  • FIG. 4 is a diagram illustrating an example of data input to a learning processing unit.
  • FIG. 6 is a diagram for explaining an example in which erroneous response data is input.
  • FIG. 7 is a diagram of an embodiment of a system for performing a conversation
  • FIG. 8 is a flowchart of an embodiment of a method for performing a conversation
  • a term to which a 'unit' is added used below may be implemented in software or hardware, and according to an embodiment, one 'unit' may be implemented as one physical or logical part, or a plurality of 'units' may be implemented as one It is also possible to be implemented with physical or logical parts, or one 'unit' may be implemented with a plurality of physical or logical parts.
  • a certain part when it is said that a certain part is connected to another part, it may mean a physical connection or electrically connected according to a certain part and another part.
  • a part when it is said that a part includes another part, this does not exclude another part other than the other part unless otherwise stated, and it means that another part may be further included according to the designer's choice do.
  • FIGS. 1 to 6 an embodiment of an apparatus for performing a conversation will be described with reference to FIGS. 1 to 6 .
  • FIG. 1 is a block diagram of an embodiment of an apparatus for performing a conversation.
  • the conversation performing apparatus 100 may include an input unit 101 , a storage unit 102 , an output unit 103 , and a processor 110 .
  • the input unit 101 , the storage unit 102 , the output unit 103 , and the processor 110 may be electrically connected through a circuit line or a cable to enable data transmission.
  • At least one of the input unit 101 , the storage unit 102 , and the output unit 103 may be omitted if necessary.
  • the input unit 101 may receive the data 10 to 40 , and transmit the received data 10 to 40 to at least one of the storage unit 102 and the processor 110 through a circuit or a cable.
  • the input unit 101 is, for example, a keyboard device, a mouse device, a tablet, a touch screen, a touch pad, a microphone, a track ball, a track pad, it is possible to receive and transmit data from an external device (eg, a memory device). It may include a data input/output terminal and/or a communication module connected to an external device through a wired or wireless communication network (for example, a LAN card, short-distance communication module, or mobile communication module that is built into the motherboard or can be installed separately), etc. .
  • the data 10 to 40 input to the input unit 101 may include the personality data 10 and the conversation history 20, and according to the embodiment, the response data 30 and the incorrect response At least one of the data 40 may be further included.
  • the personality data 10 may include information that may indicate a specific personality.
  • the personality data 10 may be data on the personality of a person (hereinafter referred to as a user) who uses the conversation performing apparatus 100 according to an embodiment, and/or a person other than a user preset by a user or a designer. It may be data about the personality of a person (such as a virtual person such as a human, an animal, or a character).
  • the user's personality may represent the user, such as, for example, the user's personal personality, taste, preferences, habits, condition, physique, disease, experience, family or close relationship, occupation, status, career and/or recent behavior, etc. means things that are Similarly, the personality of a person other than the user may mean various things that can represent a person other than the user.
  • the personality data 10 may include at least one word or sentence reflecting the user's or preset personality, and may include, for example, 4 to 5 plurality of sentences. A number of sentences may or may not be part of a dialogue.
  • the conversation history 20 means data on conversations between at least two, and may include a plurality of sentences or words performed between at least two.
  • the conversation history 20 is a conversation text ( ) previously performed between the user and the conversation performing device 100 .
  • the conversation history 20 may include conversation text(s) between the user and a third party.
  • the response data 30 is data including a response corresponding to a sentence (a declarative sentence, an exclamation sentence, an imperative sentence, or an interrogative sentence) mentioned in the conversation, for example, at least one sentence of the conversation history 20 (eg, the last sentence) ) may be data including response contents.
  • the response data 30 may include at least one sentence, and the at least one sentence may include one or two or more spaces.
  • One or more spaces correspond to missing words in the sentence.
  • the response data 30 may be given as [I am fine, what about (blank)].
  • at least one sentence of the response data 30 may consist only of blanks, some may consist of words or symbols, and other parts may consist of blanks, or words without any blanks. It may contain only the or symbol.
  • the response data 30 may include a sentence in which all word parts are blank, may include a sentence in which a part has blanks, or may include a complete sentence without blanks. In the latter case, the response data 30 may be used for training a learning model.
  • the incorrect response data 40 is an inappropriate response (eg, a selector that is an incorrect answer among a plurality of options given as a response, etc.) to at least one sentence (eg, the last sentence) of the conversation history 20 .
  • the erroneous response data 40 may be formed of at least one word or sentence. Specifically, for example, if the last sentence of the conversation history 20 is given as [Hello, How are you?], the incorrect response data 30 does not fit the context, such as [Oh, I am sorry to hear that]. It may contain words or sentences that are not.
  • the storage unit 102 may temporarily or non-temporarily store at least one data (10 to 40, etc.) necessary for the operation of the conversation performing apparatus 100 .
  • the storage unit 102 stores at least one of the personality data 10 , the conversation history 20 , the response data 30 , and the incorrect response data 40 , and stores at least one of the stored data 10 to 40 .
  • One) may be transferred to the processor 110 according to the call of the processor 110 .
  • the storage unit 102 may receive a processing result (eg, a response determined by the processor 110 or a parameter related to a learning model trained by the processor 110 ) from the processor 110 and store the received processing result. .
  • a processing result eg, a response determined by the processor 110 or a parameter related to a learning model trained by the processor 110
  • the storage unit 102 may store one or more programs (which may be referred to as apps, applications, or software) for the operation of the conversation performing device 100 , and more specifically, for example, in the processor 110 . It may also store a learning model used or trained by Here, the program may be directly written or updated by a designer and stored in the storage unit 102 , may be transmitted through a data input/output terminal, etc. and then stored in the storage unit 102 , and/or wired or wireless communication It may be obtained through an electronic software distribution network that can be accessed through a network.
  • the storage unit 102 may include at least one of a main memory device and an auxiliary memory device, and these main memory device or the auxiliary memory device may be implemented using a semiconductor device, a magnetic disk, a compact disk, or the like.
  • the output unit 103 may visually or aurally output data input to the input unit 101 or stored in the storage unit 102 or data obtained according to a processing result of the processor 110 to the outside.
  • the processor 110 determines a response (which may include a word or sentence, etc.) corresponding to a given sentence (eg, the last conversation in the conversation history 20 ) and transmits it to the output unit 103 .
  • the output unit 103 receives the response determined by the processor 110, outputs the determined response, etc. through a screen using symbols or characters, and/or outputs a voice corresponding to the determined response, etc. to a speaker device, etc.
  • the output unit 103 may output data acquired during the processing of the processor 110 (eg, a word embedding result (122 in FIG. 3 ), etc.) to the outside.
  • the output unit 120 may include, for example, a display panel, a speaker device, a data input/output terminal, and/or a communication module, but is not limited thereto.
  • the processor 110 receives predetermined data (10 to 40, etc.) from at least one of the input unit 101 and the storage unit 102 , and operates, determines and/or based on the received data 10 to 40 . It is provided to perform an operation such as control processing. According to an embodiment, the processor 110 may drive a program stored in the storage unit 102 to perform at least one processing on these data (10 to 40, etc.).
  • the processor 110 is, for example, a central processing unit (CPU, Central Processing Unit), an application processor (AP), a microcontroller unit (MCU, Micro Controller Unit), an electronic control unit (ECU, Electronic Controlling Unit). ) and/or other electronic devices capable of processing various calculations and generating control signals.
  • the processor 110 may include an embedding processing unit 120 and a learning processing unit 130, and the embedding processing unit 120 includes a word embedding processing unit 121, It may include at least one of the segment embedding processing unit 124 and the position embedding processing unit 127 . Also, according to an embodiment, the processor 110 may further include the classification unit 140 .
  • the embedding processing unit 120 , the learning processing unit 130 , and the classifying unit 140 may be physically or logically separated. When physically separated, at least two of the embedding processing unit 120 , the learning processing unit 130 , and the classifying unit 140 may be implemented using separate semiconductor chips or the like.
  • the word embedding processing unit 121 , the segment embedding processing unit 124 , and the position embedding processing unit 127 may also be physically or logically separated according to an embodiment.
  • FIG. 2 is a diagram for explaining an example of word embedding
  • FIG. 3 is a diagram for explaining an example of a word token.
  • the embedding processing unit 120 converts the data 10 to 40 transmitted by at least one of the input unit 101 and the storage unit 102 to generate and obtain data for learning processing, and then transfer it to the learning processing unit 130 .
  • the word embedding processing unit 121 collects personality data 10, conversation history 20 and response data from at least one of the input unit 101 and the storage unit 102 .
  • Receive 30, combine the received personality data 10, conversation history 20, and response data 30 to obtain at least one data 122 (hereinafter referred to as combination data), and at least one combination obtained
  • the data 122 may be transmitted to the learning processing unit 130 .
  • the word embedding processing unit 121 concatenates the personality data 10, the conversation history 20, and the response data 30 sequentially or in a predefined order to obtain at least one combination data 122.
  • the word embedding processing unit 121 sequentially concatenates all words or sentences in the personality data 10 , and then sequentially concatenates all words or sentences in the conversation history 20 , and thereafter
  • the individuality data 10, the conversation history 20, and the response data 30 are combined into one by concatenating words or sentences or at least one blank(s) 39 of the response data 30 and connecting them in a chain.
  • Data 122 may be obtained.
  • the word embedding processing unit 121 tokenizes a word or sentence while corresponding to each data 10 to 30 for each data 10 to 30 before obtaining the combination data 122 .
  • (tokenized) data (t10, t20, t30, hereinafter token) may be acquired. Referring to FIG.
  • the input sentence s10 may be divided into tokens (token, t10: t11 to t19) and tokenized.
  • tokens tokens (token, t10: t11 to t19) and tokenized.
  • some tokens (t11 to t14, t16 to t19) may correspond to each word in the sentence (eg [I] or [like], etc.), and some tokens (t15) are punctuation marks (eg, period, etc.).
  • an abbreviation to which a period or an apostrophe is added may be treated as one token.
  • at least one token t39 corresponding to each of the at least one blank 39 of the response data 30 may be provided.
  • the word embedding processing unit 121 obtains one or more tokenized words or sentences by performing such tokenization for each word or sentence of the personality data 10, the conversation history 20, and the response data 30, Subsequently, the combined data 122 may be obtained by sequentially or concatenating and combining at least one token t10 , t20 , and t30 obtained from the respective data 10 , 20 , and 30 .
  • the word embedding processing unit 121 includes, for example, a token set t10 formed by sequentially connecting at least one token(s) (t11 to t19) obtained from the personality data 10, and, A token set t20 formed by sequentially connecting at least one token(s) obtained from the conversation history 20, and a token formed by sequentially connecting at least one token(s) obtained from the response data 30 By sequentially connecting the sets t30, one or more combined data 122 may be obtained.
  • the word embedding processing unit 121 combines the personality data 10, the conversation history 20, and the false response data 40 transmitted from at least one of the input unit 101 and the storage unit 102 to obtain at least one data. (122a of FIG. 6, hereinafter false response combination data) may be obtained. More specifically, the word embedding processing unit 121 successively combines the personality data 10, the conversation history 20, and the false response data 40 sequentially or according to a predefined order to combine at least one wrong response combination data 122a. ) can be obtained.
  • the word embedding processing unit 121 first performs the above-described tokenization process for each of the personality data 10 , the conversation history 20 , and the incorrect response data 40 , and then the personality data 10 , the conversation history At least one false response combination data 122a may be obtained by combining at least one token(s) corresponding to each of ( 20 ) and false response data 40 .
  • At least one false response combination data 122a is used for training of at least one learning model (131 in FIG. 5 ) used by the learning processing unit 130 , and more specifically, the learning processing unit 130 more appropriately processes results (For example, a sentence or dialog to appear after the last sentence of the conversation history 20, a blank 39 of the response data 30, etc.) can be acquired.
  • At least one of the combined data 122 and the false-response combined data 122a may be transmitted and input to the learning processing unit 130 .
  • FIG. 4 is a diagram illustrating an example of data input to a learning processing unit.
  • the segment embedding processing unit 124 and the position embedding processing unit 125 include information 125 and 128 about the combined data 122 generated by the word embedding processing unit 121.
  • the generated information 125 and 128 may be transmitted and input to the learning processing unit 130 .
  • the segment embedding processing unit 124 may include information on whether each token in the combination data 122 corresponds to. Specifically, as shown in FIG. 4 , the segment embedding processing unit 124 may generate data 125 (hereinafter, segment embedding data) indicating the source of the corresponding data (token).
  • the segment embedding data 125 may include at least one zone corresponding to each of the at least one token in the combination data 122 , each zone including whether the corresponding token is a token of the personality data 10 , the dialogue Source data e1 , e2 , and e3 indicating whether a token of the history 20 or a token of the response data 30 may be recorded.
  • the source data e1 , e2 , and e3 may have values such as letters, symbols, and/or numbers.
  • the location embedding processing unit 125 may generate data 128 (hereinafter, location data) indicating the location of each token in the combination data 122 .
  • location data 128 includes a zone corresponding to each of the at least one token in the combination data 122, and location information (p1 to p22, etc.) of each token may be recorded in each zone.
  • the location information p1 to p22 of each token can be implemented using values such as letters, symbols, and/or numbers.
  • the learning processing unit 130 can determine the location of a specific token by using the location data 128 .
  • at least one of the segment embedding processing unit 124 and the position embedding processing unit 125 may be omitted.
  • the learning processing unit 130 trains a learning model 131 (a learning algorithm) and/or a processing result 150 based on the learning model 131 .
  • the processing result 150 may include, for example, a response to the last sentence of the conversation history 20 , and corresponds to each of at least one blank 39 in the response data 30 according to an embodiment. may include at least one word, symbol, phrase or sentence.
  • the learning processing unit 130 receives the combination data 122 or further receives at least one of the segment embedding data 125 and the position data 128, and a learning model using the received data 122, 125, and 128.
  • the processing result 150 may be obtained by training 131 and/or using the learning model 131 .
  • the combination data 122 , the segment embedding data 125 , and the position data 128 may be sequentially or simultaneously input to the learning processing unit 130 .
  • the learning model 131 used by the learning processing unit 130 may be, for example, a transformer, or a vert implemented based on an encoder of the transformer. , and/or may be GPT-2 or GPT-3 implemented based on a decoder of a transformer.
  • the learning model 131 may be a learning model (eg, a small model of dialoGPT, etc.) obtained based on at least one of a transformer, a vert, and GPT-2 and GPT-3.
  • dialoGPT is a learning model implemented based on the decoder of the transformer.
  • the dialoGPT may sequentially or randomly generate a word or the like for each of one or more spaces 39 in succession one or more times. For example, when a plurality of regions t39 corresponding to the spaces 39 exist in the combination data 122, a learning model such as dialoGPT may select words, phrases, or symbols corresponding to the plurality of regions t39.
  • the learning model 131 may be a new model obtained by fine-tuning the above-described transformer, vert, GPT-2, GPT-3, or a learning model developed based on them.
  • the learning model 131 includes a deep neural network (DNN), a recurrent neural network (RNN), a convolutional neural network (CNN), a long short term memory (LSTM) and / Or it may be implemented using one or more predetermined learning algorithms, such as a deep reinforcement learning algorithm.
  • DNN deep neural network
  • RNN recurrent neural network
  • CNN convolutional neural network
  • LSTM long short term memory
  • the above-described learning model 131 may be trained in advance.
  • the learning processing unit 130 may be designed to perform transfer learning based on the previously trained learning model 131 .
  • the learning model 131 processed by the learning processing unit 130 may include at least one hidden state (not shown, may be referred to as a hidden layer).
  • the learning processing unit 130 inputs the result data to a decoder (not shown) to thereby provide a final result 150 according to the learning process. It may be designed to obtain
  • FIG. 6 is a diagram for explaining an example in which erroneous response data is input.
  • the classification unit 140 is provided to classify and discriminate whether the response is appropriate (ie, whether the response is an incorrect response). 5 and 6 , the classification unit 140 receives at least one hidden state (eg, the last hidden state 132) as an input value, and based on this, the input data is returned as an erroneous response ( 40) can be determined. In detail, the classification unit 140 determines whether learning processing is performed based on the combination data 122 or the false response combination data 122a generated by the word embedding processing unit 121 combining the false response data 40 . By receiving and determining whether the learning process has been performed, the accuracy of the processing result by the learning processing unit 130 may be improved.
  • the classification unit 140 receives at least one hidden state (eg, the last hidden state 132) as an input value, and based on this, the input data is returned as an erroneous response ( 40) can be determined.
  • the classification unit 140 determines whether learning processing is performed based on the combination data 122 or the false response combination data 122a generated by the word
  • the classification unit 140 may be trained in advance or at the same time as the determination to determine whether the incorrect response 40 is present.
  • a region 139 for a next sentence prediction (hereinafter, a next sentence prediction region) may be further added to the last hidden state 132 of the at least one hidden state, and the classifier 140 may further include a next sentence prediction region ( 139), the classification unit 140 may determine the last hidden state 132 to be received.
  • a value such as a symbol or character
  • indicating an incorrect response may be recorded.
  • the classification unit 140 may obtain a classification loss by calculating a score based on a classification probability for each of the input combined data 122 and 122a.
  • the classification loss may be used to calculate a total loss together with a cross entropy loss such as a language model loss or the like.
  • the overall loss can be given as the sum of the classification loss and the language model loss.
  • a multi-task loss for the learning model 131 can be obtained.
  • the above-described classification unit 140 may determine whether the response determined by the learning processing unit 130 is appropriate based on such a loss.
  • the classification unit 140 may be implemented using a predetermined learning model according to an embodiment.
  • the conversation performing apparatus 100 described with reference to FIGS. 1 to 6 may be implemented using at least one information processing apparatus capable of calculating and processing data.
  • the at least one information processing device includes a desktop computer, a laptop computer, a smart phone, a tablet PC, a wearable device (a smart watch, a smart band, or a head mounted display device (HMD, etc.), a navigation device, and a personal digital assistant).
  • PDA personal digital assistant
  • handheld game consoles artificial intelligence speaker devices, digital televisions, set-top boxes, consumer electronics, vehicles, manned or unmanned aerial vehicles, robots, mechanical devices, construction equipment, and/or electronic devices specially designed for conducting conversations.
  • the conversation performing apparatus 100 may be implemented using various devices capable of performing learning on received data in addition to these.
  • FIG. 7 is a diagram of an embodiment of a system for performing a conversation
  • the conversation execution system 200 transmits and receives data through at least one terminal device 210 and at least one terminal device 210 and the communication network 201 in an embodiment.
  • This may include at least one possible server device 220 .
  • the communication network 201 may include a wired communication network, a wireless communication network, or a combination thereof.
  • the wireless communication network can be implemented using at least one of a short-range communication network (such as Wi-Fi or Bluetooth) and a mobile communication network (such as a network implemented based on a mobile communication standard such as 3GPP, 3GPP2, or WiMAX series).
  • At least one terminal device 210 may receive at least one of the aforementioned personality data 10 , conversation history 20 , response data 30 , and false response data 40 from a user or manager. According to an embodiment, the at least one terminal device 210 may at least process at least one of the personality data 10 , the conversation history 20 , the response data 30 , and the false response data 40 as it is or through some processing. It may be transmitted to one server device 220 . Alternatively, the at least one terminal device 210 may include the combination data 122 and the segment embedding data based on at least one of the personality data 10 , the conversation history 20 , the response data 30 , and the incorrect response data 40 .
  • At least one of 125 and location data 128 may be generated, and at least one of the generated combined data 122 , segment embedding data 125 , and location data 128 may be transmitted to the server device 220 .
  • the at least one terminal device 210 may perform all or part of the operation of the learning processing unit 130 described above.
  • the at least one terminal device 210 receives a processing result (for example, a word or sentence acquired according to learning) from the at least one server device 220 , and displays the received processing result on a display or speaker device, etc. may be provided to users.
  • a processing result for example, a word or sentence acquired according to learning
  • the at least one terminal device 210 is, for example, a smart phone, a tablet PC, a wearable device, a desktop computer, a laptop computer, an artificial intelligence speaker device, a navigation device, a black box device, a portable game machine, a digital television, a home appliance, It may include a robot, a vehicle, a construction machine, an electronic billboard, and/or at least one electronic device capable of performing communication with the other server device 220 .
  • the server device 220 receives at least one of personality data 10 , conversation history 20 , response data 30 , and false response data 40 from at least one terminal device 210 .
  • at least one of the combination data 122 , the segment embedding data 125 , and the location data 128 may be received, and a predetermined operation may be performed based on the received data.
  • the server device 220 performs the overall operation of the processor 110 (eg, the generation operation of the combination data 122 , the segment embedding data 125 , and the position data 128 ) and operation of the learning processing unit 130 ) or only the operation of the learning processing unit 130 of the processor 110 may be performed.
  • the server device 220 may be designed to perform only a part of the operation performed by the learning processing unit 130 (eg, the operation of the classification unit 140 , etc.).
  • the processing result of the server device 220 may be transmitted to the at least one terminal device 210 through the communication network 201 .
  • the server device 220 may transmit the processing result to the terminal device 210 that has transmitted the data 10 to 40, 123, 125, and 128, and/or process it to another terminal device, according to an embodiment. You can also send results.
  • the server device 220 may be implemented using one or more computing devices for servers. When implemented using two or more computer devices for servers, each computer device for servers may perform the same operation as each other, or may perform different operations from each other.
  • the server device 220 may be implemented using various information processing devices capable of arithmetic processing of data, such as a desktop computer, a laptop computer, a smart phone, a tablet PC, a navigation device and/or a black box device, in addition to the server computer device. .
  • FIG. 8 is a flowchart of an embodiment of a method for performing a conversation
  • first, personality data and conversation history may be input at the same time or at the same time ( 300 ).
  • at least one of response data and false response data may be sequentially or simultaneously inputted along with these.
  • the personality data is information capable of representing a specific personality, and may be formed of one or more words or sentences.
  • the conversation history may be formed of a conversation performed between at least two, wherein the at least two may include a user and a conversation performing apparatus, but is not limited thereto.
  • the conversation history may also consist of at least one word or sentence.
  • the response data may be a response to at least one sentence (eg, the last sentence) of the conversation history.
  • the response data may include at least one space, and the response data may include only the space, or only a part of the entire sentence may be replaced with a space. Also, the response data may not include any spaces at all.
  • the erroneous response data may include sentences or words that are not appropriate as a response to at least one sentence of the conversation history.
  • tokenization of personality data and conversation history may be performed, and, if necessary, tokenization of at least one of response data and false response data may be further performed ( 302 ).
  • the combination data may be generated by combining the personality data and the conversation history, and according to an embodiment, the combination data may be generated by further using any one of the response data and the wrong response data ( 304 ).
  • the generation of combination data may be created by concatenating sequential combinations of tokens for each of personality data and conversation history.
  • the combination data may be obtained by further concatenating and combining token(s) obtained from response data or token(s) obtained from incorrect response data.
  • At least one of the segment embedding data and the position data may be further generated in addition to or sequentially with the generation of the combined data.
  • the segment embedding data is data indicating a source of the corresponding data (eg, a token), and the position data is data indicating an absolute or relative position of the corresponding data (eg, a token).
  • the above-described combination data may include at least one region, and each of the segment embedding data and the location data includes at least one region corresponding to each of at least one region of the combined data, and each region contains a A value corresponding to a source or location may be recorded.
  • Learning processing may be performed based on the combined data including the response data or the combined data including the wrong response data (306).
  • segment embedding data and position data may also be further used for learning processing.
  • the learning model may include Transformers, Berts, GPT-2, GPT-3, or other learning models built on these models (eg dialoGPT, etc.).
  • the learning model may be obtained by fine-tuning at least one of the exemplified learning models.
  • the learning model may include a deep neural network, a convolutional neural network, or the like.
  • the learning model may include at least one hidden state as needed, and at least one of the at least one hidden state (eg, the last hidden state) may be used to determine whether a response is appropriate.
  • the last hidden state may further include at least one region for predicting the next sentence, and the at least one region includes at least one value for determining the last hidden state or determining whether an incorrect response is made. may be recorded.
  • a predetermined learning model may be trained and/or a processing result according to the predetermined learning model may be obtained ( 308 ).
  • the trained learning model or processing result may be stored in a storage unit and/or may be output to the outside through a visual method or an auditory method according to an embodiment.
  • the conversation execution method may be implemented in the form of a program that can be driven by a computer device.
  • the program may include program instructions, data files, and data structures alone or in combination.
  • the program may be designed and manufactured using machine code or high-level language code.
  • the program may be specially designed to implement the above-described method, or may be implemented using various functions or definitions that are known and available to those skilled in the art of computer software.
  • the computer device may be implemented by including a processor or memory that enables the function of the program to be realized, and may further include a communication device if necessary.
  • a program for implementing the above-described method for performing conversation may be recorded in a computer-readable recording medium.
  • the computer-readable recording medium includes, for example, a semiconductor storage device such as a solid state drive, ROM, RAM or flash memory, a magnetic disk storage medium such as a hard disk or floppy disk, and an optical recording medium such as a compact disk or DVD. , a magnetic-optical recording medium such as a floppy disk, and at least one type of physical device capable of storing a specific program executed in response to a computer call, such as a magnetic tape.
  • system, apparatus and method for performing a conversation are not limited to the above-described embodiments.
  • Various devices or methods that can be implemented by a person skilled in the art by modifying and modifying based on the above-described embodiments may also be examples of the above-described dialogue performance system, apparatus, and method.
  • the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc., are combined or combined in a different form than the described method, or other components or Even if it is substituted or substituted by equivalents, it may be an embodiment of the system, apparatus and method for performing the above-described dialogue.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention concerne un système, un dispositif, et un procédé pour mener une conversation, et le dispositif pour mener une conversation peut aussi comprendre : une unité de traitement de plongement permettant d'acquérir des données combinées par connexion et combinaison de données de personnalité, d'un historique de conversation et de données de réponse, les données de personnalité comprenant des informations pouvant indiquer une personnalité spécifique ; et une unité de traitement d'apprentissage permettant d'effectuer un apprentissage en utilisant au moins un modèle d'apprentissage en fonction des données combinées.
PCT/KR2021/012484 2020-09-17 2021-09-14 Système, dispositif et procédé pour mener une conversation WO2022060050A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020200120077A KR102491931B1 (ko) 2020-09-17 2020-09-17 대화 수행 시스템, 장치 및 방법
KR10-2020-0120077 2020-09-17

Publications (1)

Publication Number Publication Date
WO2022060050A1 true WO2022060050A1 (fr) 2022-03-24

Family

ID=80777144

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2021/012484 WO2022060050A1 (fr) 2020-09-17 2021-09-14 Système, dispositif et procédé pour mener une conversation

Country Status (2)

Country Link
KR (1) KR102491931B1 (fr)
WO (1) WO2022060050A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100918644B1 (ko) * 2009-06-17 2009-09-25 김정중 대화 시스템 및 대화 문장 생성 방법
US20140250195A1 (en) * 2009-01-08 2014-09-04 Mycybertwin Group Pty Ltd Chatbots
KR20180021444A (ko) * 2016-08-22 2018-03-05 최준영 기계 학습 기반으로 언어를 처리하는 방법 및 장치
KR20190133579A (ko) * 2018-05-23 2019-12-03 한국과학기술원 사용자와 대화하며 내면 상태를 이해하고 긴밀한 관계를 맺을 수 있는 감성지능형 개인비서 시스템
KR20200103151A (ko) * 2019-02-12 2020-09-02 주식회사 자이냅스 대화 서비스 제공을 위한 문장 의미 관계 학습 장치

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3949365B2 (ja) 1999-10-05 2007-07-25 株式会社リコー 電子写真感光体及びそれを用いた電子写真装置
KR20060026636A (ko) 2004-09-21 2006-03-24 이형영 인터넷을 이용한 인성계발 및 일반학습 시스템 및 방법
KR101522837B1 (ko) 2010-12-16 2015-05-26 한국전자통신연구원 대화 방법 및 이를 위한 시스템
KR101851793B1 (ko) 2017-12-22 2018-04-24 주식회사 마인드셋 멀티 도메인 자연어 처리를 위한 도메인 매칭 장치 및 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140250195A1 (en) * 2009-01-08 2014-09-04 Mycybertwin Group Pty Ltd Chatbots
KR100918644B1 (ko) * 2009-06-17 2009-09-25 김정중 대화 시스템 및 대화 문장 생성 방법
KR20180021444A (ko) * 2016-08-22 2018-03-05 최준영 기계 학습 기반으로 언어를 처리하는 방법 및 장치
KR20190133579A (ko) * 2018-05-23 2019-12-03 한국과학기술원 사용자와 대화하며 내면 상태를 이해하고 긴밀한 관계를 맺을 수 있는 감성지능형 개인비서 시스템
KR20200103151A (ko) * 2019-02-12 2020-09-02 주식회사 자이냅스 대화 서비스 제공을 위한 문장 의미 관계 학습 장치

Also Published As

Publication number Publication date
KR20220037297A (ko) 2022-03-24
KR102491931B1 (ko) 2023-01-26

Similar Documents

Publication Publication Date Title
JP7005694B2 (ja) コンピュータによるエージェントのための合成音声の選択
CN110490213B (zh) 图像识别方法、装置及存储介质
CN108319599B (zh) 一种人机对话的方法和装置
CN111428520A (zh) 一种文本翻译方法及装置
EP3540611A1 (fr) Dispositif électronique destiné à la réalisation d'une traduction par le partage d'un contexte d'émission de parole et son procédé de fonctionnement
US11314951B2 (en) Electronic device for performing translation by sharing context of utterance and operation method therefor
CN108229535A (zh) 涉黄图像审核方法、装置、计算机设备及存储介质
CN108055617A (zh) 一种麦克风的唤醒方法、装置、终端设备及存储介质
CN113821720A (zh) 一种行为预测方法、装置及相关产品
CN114676704A (zh) 句子情感分析方法、装置、设备以及存储介质
KR102507260B1 (ko) 메타버스 가상공간의 강사 아바타 생성을 위한 서비스 서버 및 그 방법
CN111444321B (zh) 问答方法、装置、电子设备和存储介质
CN114385817A (zh) 实体关系的识别方法、设备及可读存储介质
CN112488157A (zh) 一种对话状态追踪方法、装置、电子设备及存储介质
JP6449368B2 (ja) 会話提供装置、会話提供方法及びプログラム
WO2022060050A1 (fr) Système, dispositif et procédé pour mener une conversation
CN115759052A (zh) 一种文本纠错方法、装置、电子设备及存储介质
CN114360528A (zh) 语音识别方法、装置、计算机设备及存储介质
CN114860910A (zh) 智能对话方法及系统
CN114186039A (zh) 一种视觉问答方法、装置及电子设备
CN113821609A (zh) 一种答案文本的获取方法及装置、计算机设备和存储介质
CN117590944B (zh) 实体人对象和数字虚拟人对象的绑定系统
KR102311218B1 (ko) 영상에 관한 대화 처리 장치, 방법 및 시스템
US20240220730A1 (en) Text data processing method, neural-network training method, and related device
CN113421551B (zh) 语音识别方法、装置、计算机可读介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21869672

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21869672

Country of ref document: EP

Kind code of ref document: A1