WO2021153830A1 - Procédé de reconnaissance de personnalités à partir d'une conversation et système associé - Google Patents

Procédé de reconnaissance de personnalités à partir d'une conversation et système associé Download PDF

Info

Publication number
WO2021153830A1
WO2021153830A1 PCT/KR2020/001499 KR2020001499W WO2021153830A1 WO 2021153830 A1 WO2021153830 A1 WO 2021153830A1 KR 2020001499 W KR2020001499 W KR 2020001499W WO 2021153830 A1 WO2021153830 A1 WO 2021153830A1
Authority
WO
WIPO (PCT)
Prior art keywords
personality
emotion
speech
speaker
category
Prior art date
Application number
PCT/KR2020/001499
Other languages
English (en)
Korean (ko)
Inventor
박종철
박한철
송호윤
이희제
Original Assignee
한국과학기술원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국과학기술원 filed Critical 한국과학기술원
Priority to PCT/KR2020/001499 priority Critical patent/WO2021153830A1/fr
Publication of WO2021153830A1 publication Critical patent/WO2021153830A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Definitions

  • the following description relates to a dialogue speech personality recognition system and its method, and more specifically, it analyzes the dependency between the speaker's emotions and personality categories using a natural language processing system for the input dialogue text to determine the speaker's personality expressed from the speech. It is about technology that automatically predicts and recognizes.
  • One embodiment proposes a method and system for predicting and recognizing the personality expressed by the speaker who uttered the utterance from the dialogue based on the artificial neural network model by applying the personality model (Big-five, MBTI, etc.) used in psychological theory. do.
  • embodiments suggest a method and system for recognizing the speaker's personality by predicting the speaker's emotions and personality categories in an utterance, and analyzing the dependence between the predicted emotions and personality categories.
  • one embodiment provides a method and system for analyzing the dependence between parameters used in each process by utilizing the self-concentration technique in the process of predicting the speaker's emotion and personality category and the process of recognizing the personality.
  • one embodiment proposes a method and system for reflecting context information of a utterance in the process of predicting the speaker's emotion and personality category.
  • one embodiment proposes a method and system for training an artificial neural network model that performs a process of predicting a speaker's emotion and personality category through a multi-task learning method using the emotion recognition result as a quality.
  • a computer-implemented dialogue utterance personality recognition system includes at least one processor embodied to execute a computer readable instruction, wherein the at least one process is an object of personality recognition from an input dialogue sentence.
  • a pre-processing unit for classifying a target utterance becoming a target utterance and a speaker who has uttered the target utterance; an emotion information prediction unit for predicting an emotion of a speaker who has uttered the target utterance based on the target utterance; a speech personality predicting unit for predicting a personality category of the speaker based on the target speech; and an emotion-personality dependency analysis unit for recognizing the speaker's personality by analyzing the dependency between the predicted emotion and the predicted personality category.
  • the emotion information prediction unit may predict the emotion of the speaker by reflecting context information of the target utterance in the target utterance.
  • the emotion information prediction unit may include: a dialogue embedding coupling layer that converts the target speech into an embedding vector, combines the embedding vector with a delimiter representing the speaker, and outputs; a self-concentration layer that outputs a context embedding vector in which context information of the embedding vector is reflected in the embedding vector; and a linear layer for predicting and outputting the speaker's emotion by analyzing the context embedding vector.
  • the self-attention layer may extract context information of the embedding vector by analyzing a dependency relationship between language tokens in the target utterance.
  • the speech personality predicting unit may predict the personality category of the speaker by reflecting context information of the target utterance in the target utterance.
  • the speech personality prediction unit may share at least some components with the emotion information prediction unit.
  • the speech personality prediction unit shares the dialogue embedding coupling layer and the self-attention layer of the emotion information prediction unit, and analyzes the context embedding vector output from the self-attention layer to analyze the personality category of the speaker It may include a linear layer that predicts and outputs .
  • the linear layer may be characterized in that it predicts and outputs the personality category of the speaker by independently determining whether personality elements constituting a preset personality category exist in the context embedding vector. there is.
  • the emotion-personality dependence analysis unit may include: a self-concentration layer that analyzes a dependency relationship between the predicted emotion and the predicted personality category; and a linear-active layer for recognizing and outputting the personality of the speaker as a result of the analysis.
  • the self-attention layer may be characterized by further analyzing a dependency relationship between the predicted personality categories.
  • the dialogue speech personality recognition system may include: a speech unit emotion annotation database used by the emotion information prediction unit to perform emotion prediction training based on the predicted emotion; and a speech-unit personality annotation database used by the speech personality prediction unit to perform personality category prediction training based on the predicted personality category, and the emotion-personality dependence analysis unit to perform personality recognition training based on the recognized personality.
  • a speech unit emotion annotation database used by the emotion information prediction unit to perform emotion prediction training based on the predicted emotion
  • a speech-unit personality annotation database used by the speech personality prediction unit to perform personality category prediction training based on the predicted personality category, and the emotion-personality dependence analysis unit to perform personality recognition training based on the recognized personality.
  • a method for recognizing conversational utterance personality performed by a computer may include: distinguishing a target utterance that is a target of personality recognition and a speaker who uttered the target utterance from an inputted dialogue; predicting an emotion of a speaker who has uttered the target utterance based on the target utterance; predicting a personality category of the speaker based on the target utterance; and recognizing the speaker's personality by analyzing the dependence between the predicted emotion and the predicted personality category.
  • the dialogue speech personality recognition method includes: distinguishing a speaker who has uttered the target utterance; predicting an emotion of a speaker who has uttered the target utterance based on the target utterance; predicting a personality category of the speaker based on the target utterance; and recognizing the speaker's personality by analyzing the dependence between the predicted emotion and the predicted personality category.
  • One embodiment proposes a method and system for predicting and recognizing the personality expressed by the speaker who uttered the utterance from the dialogue based on the artificial neural network model by applying the personality model (Big-five, MBTI, etc.) used in psychological theory. can do.
  • embodiments may propose a method and system for recognizing the speaker's personality by predicting the speaker's emotion and personality category in the utterance, and analyzing the dependence between the predicted emotion and personality category.
  • one embodiment provides a method and system for analyzing the dependence between parameters used in each process by utilizing the self-concentration technique in the process of predicting the speaker's emotion and personality category and the process of recognizing the personality.
  • embodiments may propose a method and system for reflecting context information of a utterance in a process of predicting the speaker's emotion and personality category.
  • the exemplary embodiments may propose a method and system for training an artificial neural network model that performs a process of predicting a speaker's emotion and personality category through a multi-task learning method using the emotion recognition result as a quality.
  • one embodiment may propose a technology that helps to understand the flow of a conversation by accurately predicting and recognizing personality characteristics that may appear simultaneously between the emotion and personality category relationship of the speaker included in the utterances in the dialogue sentence. .
  • the exemplary embodiments may propose a technique that helps to generate a dialogue text for an appropriate response according to the characteristics of each speaker.
  • FIG. 1 is a diagram illustrating an example of a network environment according to an embodiment.
  • FIG. 2 is a block diagram for explaining the internal configuration of an electronic device and a server according to an embodiment.
  • FIG. 3 is a block diagram for explaining a conversational speech personality recognition system configured by a processor of a server according to an exemplary embodiment.
  • FIG. 4 is a flowchart illustrating a method for recognizing conversational speech personality performed by the dialogue speech personality recognition system shown in FIG. 3 .
  • FIG. 5 is a conceptual diagram for explaining in more detail the operation of the dialogue speech personality recognition system shown in FIG. 3 .
  • the embodiments described below predict the speaker's emotion and personality category in the utterance by applying the personality model used in psychological theory, and analyze the dependence between the emotion and personality category predicted based on the artificial neural network model to analyze the speaker's personality.
  • FIG. 1 is a diagram illustrating an example of a network environment according to an embodiment.
  • the network environment of FIG. 1 shows an example including a plurality of electronic devices 110 , 120 , 130 , 140 , a server 150 , and a network 160 .
  • FIG. 1 is an example for explaining the invention, and the number of electronic devices or the number of servers is not limited as in FIG. 1 .
  • the plurality of electronic devices 110 , 120 , 130 , and 140 may be mobile terminals implemented as computer devices. Examples of the plurality of electronic devices 110 , 120 , 130 , 140 include a smart phone, a mobile phone, a tablet PC, a navigation device, a computer, a laptop computer, a digital broadcasting terminal, a PDA (Personal Digital Assistants), a PMP (Personal Digital Assistants) Portable Multimedia Player), etc.
  • the first electronic device 110 may communicate with other electronic devices 120 , 130 , 140 and/or the server 150 through the network 160 using a wireless or wired communication method.
  • the communication method is not limited, and not only a communication method using a communication network (eg, a mobile communication network, a wired Internet, a wireless Internet, a broadcasting network) that the network 160 may include, but also short-range wireless communication between devices may be included.
  • the network 160 may include a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). , the Internet, and the like.
  • the network 160 may include any one or more of a network topology including a bus network, a star network, a ring network, a mesh network, a star-bus network, a tree or a hierarchical network, etc. not limited
  • the server 150 is a computer device or a plurality of computer devices that communicates with a plurality of electronic devices 110 , 120 , 130 , 140 and the network 160 to provide commands, codes, files, contents, services, etc. can be implemented.
  • the server 150 may provide a file for installing an application to the first electronic device 110 connected through the network 160 .
  • the first electronic device 110 may install an application using a file provided from the server 150 .
  • the server 150 is provided by accessing the server 150 under the control of an operating system (OS) or at least one program (eg, a browser or the installed application) included in the first electronic device 110 . services or contents can be provided.
  • OS operating system
  • at least one program eg, a browser or the installed application
  • the server 150 transmits a code corresponding to the service request message to the first
  • the content may be transmitted to the electronic device 110 , and the first electronic device 110 may provide content by configuring and displaying a screen according to a code according to the control of the application.
  • FIG. 2 is a block diagram for explaining the internal configuration of an electronic device and a server according to an embodiment.
  • the internal configuration of the first electronic device 110 and the server 150 will be described as an example of one electronic device.
  • Other electronic devices 120 , 130 , and 140 may also have the same or similar internal configuration.
  • the first electronic device 110 and the server 150 may include memories 211 and 221 , processors 212 and 222 , communication modules 213 and 223 , and input/output interfaces 214 and 224 .
  • the memories 211 and 221 are computer-readable recording media and may include random access memory (RAM), read only memory (ROM), and permanent mass storage devices such as disk drives.
  • an operating system or at least one program code (eg, a code for an application installed and driven in the first electronic device 110 ) may be stored in the memories 211 and 221 .
  • These software components may be loaded from a computer-readable recording medium separate from the memories 211 and 221 .
  • the separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD/CD-ROM drive, and a memory card.
  • the software components may be loaded into the memories 211 and 221 through the communication modules 213 and 223 instead of a computer-readable recording medium.
  • the at least one program is a program installed by files provided through the network 160 by a file distribution system (eg, the above-described server 150 ) for distributing installation files of developers or applications (eg, a program) may be loaded into the memories 211 and 221 based on the above-described application).
  • the processors 212 and 222 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations.
  • the instructions may be provided to the processors 212 and 222 by the memories 211 and 221 or the communication modules 213 and 223 .
  • the processors 212 and 222 may be configured to execute received instructions according to program codes stored in a recording device such as the memories 211 and 221 .
  • the communication modules 213 and 223 may provide a function for the first electronic device 110 and the server 150 to communicate with each other through the network 160 , and other electronic devices (eg, the second electronic device 120 ) )) or provide a function to communicate with other servers.
  • a request eg, a search request
  • a request generated by the processor 212 of the first electronic device 110 according to a program code stored in a recording device such as the memory 211 is transmitted to a network according to the control of the communication module 213 . It may be transmitted to the server 150 through 160 .
  • a control signal, command, content, file, etc. provided under the control of the processor 222 of the server 150 passes through the communication module 223 and the network 160 to the communication module of the first electronic device 110 .
  • a control signal or command of the server 150 received through the communication module 213 may be transmitted to the processor 212 or the memory 211 , and contents or files may be transmitted to the first electronic device 110 .
  • the input/output interface 214 may be a means for interfacing with the input/output device 215 .
  • the input device may include a device such as a keyboard or mouse
  • the output device may include a device such as a display for displaying a communication session of an application.
  • the input/output interface 214 may be a means for an interface with a device in which functions for input and output are integrated into one, such as a touch screen.
  • the processor 212 of the first electronic device 110 uses data provided by the server 150 or the second electronic device 120 in processing the command of the computer program loaded in the memory 211 . A service screen or content configured by doing this may be displayed on the display through the input/output interface 214 .
  • the input/output interface 224 can also output information configured by using the data provided by the server 150 when the processor 222 of the server 150 processes the command of the computer program loaded in the memory 221 likewise. there is.
  • the first electronic device 110 and the server 150 may include more components than those of FIG. 2 .
  • the first electronic device 110 is implemented to include at least a portion of the above-described input/output device 215 or a transceiver, a global positioning system (GPS) module, an image sensor (camera), a voice sensor ( microphone), may further include other components such as a database.
  • GPS global positioning system
  • the first electronic device 110 is a smartphone, an acceleration sensor or a gyro sensor, an image sensor, a voice sensor, various physical buttons, buttons using a touch panel, and input/output ports generally included in the smartphone.
  • various components such as a vibrator for vibration may be implemented to be further included in the first electronic device 110 .
  • FIG. 3 is a block diagram illustrating a dialogue speech personality recognition system configured with a processor of a server according to an embodiment
  • FIG. 4 shows a dialogue speech personality recognition method performed by the dialogue speech personality recognition system shown in FIG. 3 It is a flowchart
  • FIG. 5 is a conceptual diagram for explaining the operation of the dialogue speech personality recognition system shown in FIG. 3 in more detail.
  • the server 150 may include a computer-implemented conversational speech personality recognition system 300 .
  • a conversational speech personality recognition system may be configured as the processor 222 of the server 150 .
  • the server 150 provides a conversational speech personality recognition service to a plurality of electronic devices 110 , 120 , 130 and 140 that are clients (hereinafter, a conversational speech personality recognition service is provided to a specific electronic device)
  • Doing means performing a conversational speech personality recognition method for a specific electronic device), and a dedicated application installed on the electronic devices 110 , 120 , 130 , 140 or a web/mobile site related to the server 150 . It is possible to provide a conversational speech personality recognition service corresponding to a service request through access.
  • the dialogue speech personality recognition system 300 composed of the processor 222 of the server 150 includes a preprocessor 310 and emotional information as shown in FIG. 3 as components. It may include a prediction unit 320 , a speech personality prediction unit 330 , an emotion-personality dependency analysis unit 340 , a speech-unit emotion annotation database 350 , and a speech-unit personality annotation database 360 .
  • the dialogue speech personality recognition system 300 predicts the speaker's emotion and personality category in the target speech 371 that is the target of personality recognition in the input dialogue 370 as shown in FIG. 3 , and predicts the predicted emotion and personality. By recognizing the speaker's personality by analyzing the dependency between categories, the target utterance 381 including the recognized speaker's personality characteristic may be output. In addition, the dialogue speech personality recognition system 300 performs personality recognition on each of the speech sentences 371 and 372 constituting the dialogue text 370, and as a result, the dialogue text ( 380) can be printed.
  • components of the processor 222 may be selectively included in or excluded from the processor 222 .
  • the components of the processor 222 may be separated or combined to express the functions of the processor 222 .
  • at least some of the components of the processor 222 may be implemented in the processor 212 included in the first electronic device 110 .
  • the processor 222 and the components of the processor 222 may control the server 150 to perform the steps S410 to S440 included in the method for recognizing conversational speech personality of FIG. 4 .
  • the processor 222 and the components of the processor 222 may be implemented to execute instructions according to the code of the operating system included in the memory 221 and the code of at least one program.
  • the components of the processor 222 may be expressions of different functions of the processor 222 performed by the processor 222 according to instructions provided by program code stored in the server 150 .
  • the server 150 controls the server 150 according to a command to distinguish the target utterance 371 that is the target of personality recognition and the speaker who uttered the target utterance 371 from the input dialogue.
  • the preprocessor 310 may be used as a functional representation of the processor 222 .
  • processor 222 and components of the processor 222 will be described as components of the dialogue speech personality recognition system 300 and the dialogue speech personality recognition system 300 , respectively.
  • the dialogue speech personality recognition system 300 may read a necessary command from the memory 221 in which the command related to the control of the server 150 is loaded before the step S410 (not shown as a separate step in the drawing). ).
  • the read command may include a command for controlling the dialogue speech personality recognition system 300 to execute steps S410 to S440 to be described later.
  • step S410 the preprocessor 310 may distinguish a target utterance 371 , which is the target of personality recognition, and a speaker who uttered the target utterance 371 , from the input dialogue 370 .
  • the pre-processing unit 310 receives the text-based dialog 370 and converts the utterances 371 and 372 included in the dialog 370 and the speakers who uttered the utterances 371 and 372 . can be distinguished.
  • the preprocessor 310 converts the input dialog 370 into separators (eg, A, B, etc.) representing speakers and utterances 371 and 372 (eg, s 1 , s 2 , etc.).
  • a delimiter indicating that the dialogue sentence 370 is the starting utterance by recognizing it as a sequence formed by recognizing it as a sequence, and dividing the target utterance 371 and the context dialogue indicating context information on the target utterance 371 among the utterances 371 and 372 (eg, ⁇ CLS>, ⁇ SEP>, etc.), a delimiter (eg, ⁇ p>) indicating that the target utterance 371 is a dialogue sentence (eg, ⁇ A>, ⁇ B>) 370) can be converted.
  • the emotion information prediction unit 320 may predict the emotion of the speaker who uttered the target utterance 371 based on the target utterance 371 .
  • the emotion information prediction unit 320 may predict the speaker's emotion by reflecting the context information of the target utterance 371 in the target utterance 371 .
  • the emotion information prediction unit 320 determines a preset emotion category (eg, Ekman's six emotion categories (Anger, Disgust, and Happiness, Fear, Sadness, Surprise)), it is possible to determine the most influential emotion of the speaker appearing in the target utterance 371 , and predict the most influential emotion as the emotion of the speaker who uttered the target utterance 371 .
  • the emotion information prediction unit 320 may perform emotion prediction training using the speech unit emotion annotation database 350 .
  • the emotion information prediction unit 320 compares the information on the emotion predicted in step S420 with the emotion information annotated in the speech unit emotion annotation database 350, so that the emotion predicted in step S420 is correct. Evaluate whether it is predicted and reflect the evaluation result to perform emotion prediction training for the next emotion prediction process.
  • the emotion information prediction unit 320 may improve emotion prediction accuracy by performing the same training process not only for the current dialogue text 370 but also for a plurality of dialogue texts for training.
  • the emotion information prediction unit 320 may include a dialogue embedding coupling layer 321 , a self-attention layer 322 , and a linear layer 323 . there is.
  • the dialog embedding coupling layer 321 may convert the target utterance 371 into an embedding vector, combine the embedding vector with a delimiter indicating the speaker, and output the same.
  • the self-attention layer 322 may output a context embedding vector in which context information of the embedding vector is reflected in the embedding vector. Specifically, the self-attention layer 322 analyzes the dependency relationship between the language tokens in the target utterance 371 , so as to take into account the language tokens of the surrounding utterances related to the target utterance 371 . Context information of the corresponding embedding vector may be extracted. Accordingly, a Transformer Encoder may be used as the self-attention layer 322 . Accordingly, the context embedding vector output by the self-attention layer 322 may indicate the meaning of the target utterance 371 .
  • the language token in the target utterance 371 that the self-attention layer 322 analyzes the dependency relationship is a unit representing the sentence, and word division through spaces or subword units of Byte-pair Encoding (BPE), Unigram Subwords divided by the Language Model can be used.
  • BPE Byte-pair Encoding
  • the linear layer 323 may predict and output the emotion of the speaker by analyzing the context embedding vector.
  • the linear layer 323 analyzes the context embedding vector, and a preset emotion category (eg, Ekman's six emotion categories (Anger, Disgust, Happiness, Fear, Sadness, Surprise)), the most likely emotion of the speaker appearing in the target utterance 371 may be identified, and the most likely emotion may be predicted and output as the emotion of the speaker who uttered the target utterance 371 .
  • a feed-forward layer may be used.
  • the speech personality predicting unit 330 may predict the speaker's personality category based on the target speech 371 .
  • the speech personality prediction unit 330 may predict the speaker's personality category by reflecting the context information of the target utterance 371 in the target utterance 371 .
  • the speech personality predicting unit 330 is configured to generate personality elements constituting a preset personality category based on the context information of the target speech text 371 (eg, 5 of Big-five, a personality model used in psychological theory). It is possible to predict and output the speaker's personality category by independently determining whether ten personality elements (openness, closedness, etc.) by OCEAN exist in the target dialogue 371 for each personality element.
  • the spoken personality prediction unit 330 is provided as many as the number of personality elements constituting the preset personality category. It may consist of an artificial neural network. A detailed description thereof will be provided below.
  • the speech personality prediction unit 330 may perform personality category prediction training using the speech unit personality annotation database 360 . For example, the speech personality prediction unit 330 compares the personality category information predicted in step S430 with the personality category information annotated in the speech unit personality annotation database 360, thereby predicting the personality predicted in step S430 . By evaluating whether the category was correctly predicted and reflecting the evaluation result, personality category prediction training for the next personality category prediction process can be performed. Also, the speech personality predicting unit 330 may improve personality category prediction accuracy by performing the same training process not only for the current dialogue sentence 370 but also for a plurality of dialogue sentences for training.
  • the speech unit personality annotation database 360 is not only used for personality category prediction training of the speech personality prediction unit 330 as described above. It may be used for personality recognition training of the emotion-opinion dependency analysis unit 340, which will be described later.
  • the emotion-personality dependency analysis unit 340 compares the information on the personality of the speaker finally predicted and recognized with the information on the personality annotated in the speech unit personality annotation database 360, which will be described later ( In S440), it is possible to evaluate whether the predicted and recognized speaker's personality is correctly predicted and recognized, and to reflect the evaluation result to perform personality recognition prediction training for the next personality prediction and recognition process.
  • the emotion-personality dependency analysis unit 340 may improve personality prediction and recognition accuracy by performing the same training process not only for the current dialogue sentence 370 but also for a plurality of dialogue sentences for training.
  • the speech personality prediction unit 330 may include a dialogue embedding coupling layer 321 , a self-attention layer 322 , and a linear layer 331 .
  • the speech personality prediction unit 330 may be characterized in that it shares at least some components with the emotion information prediction unit 320 (eg, the dialogue embedding coupling layer 321 and the self-attention layer 322).
  • the emotion information prediction unit 320 and the speech personality prediction unit 330 have the initial characteristics of the artificial neural network (dialog embedding coupling layer 321 and self-attention layer 322) so that emotion information analysis and personality information analysis can be performed at the same time. ) can proceed with shared multi-task learning.
  • the dialogue embedding coupling layer 321 and the self-attention layer 322 used in the operation of the speech personality prediction unit 330 may share parameters used in the operation of the emotion information prediction unit 320 .
  • the dialog embedding coupling layer 321 may convert the target utterance 371 into an embedding vector, combine the embedding vector with a delimiter indicating the speaker, and output the same.
  • the self-attention layer 322 may output a context embedding vector in which context information of the embedding vector is reflected in the embedding vector. Specifically, the self-attention layer 322 analyzes the dependency relationship between the language tokens in the target utterance 371 , so as to take into account the language tokens of the surrounding utterances related to the target utterance 371 . Context information of the corresponding embedding vector may be extracted. Accordingly, a Transformer Encoder may be used as the self-attention layer 322 . Accordingly, the context embedding vector output by the self-attention layer 322 may indicate the meaning of the target utterance 371 .
  • the language token in the target utterance 371 that the self-attention layer 322 analyzes the dependency relationship is a unit representing the sentence, and word division through spaces or subword units of Byte-pair Encoding (BPE), Unigram Subwords divided by the Language Model can be used.
  • BPE Byte-pair Encoding
  • the linear layer 331 may predict and output the speaker's personality category by analyzing the context embedding vector.
  • the linear layer 331 analyzes the context embedding vector to analyze the personality elements constituting the preset personality category (eg, five personality types (OCEAN) of Big-Five, a personality model used in psychological theory).
  • the personality elements constituting the preset personality category (eg, five personality types (OCEAN) of Big-Five, a personality model used in psychological theory).
  • OCEAN personality types
  • a feed-forward layer may be used.
  • the linear layer 331 is composed of an artificial neural network provided as many as the number of personality elements constituting the preset personality category. Thus, it can have independent parameters representing each personality category.
  • the emotion information prediction unit 320 and the speech personality prediction unit 330 include a shared dialogue embedding coupling layer 321 and a self-attention layer 322, and independent linear layers 323 and 331, respectively. ), the output of emotion prediction and personality category prediction can be independently performed, and training for emotion prediction and personality category prediction can be performed independently by using different databases for each linear layer (323, 331).
  • the emotion-personality dependency analysis unit 340 may recognize the speaker's personality by analyzing the dependence between the predicted emotion and the predicted personality category.
  • the emotion-personality dependency analyzer 340 may include a self-attention layer 341 and a linear-active layer 342 .
  • the self-attention layer 341 may analyze a dependency relationship between the predicted emotion and the predicted personality category. In this case, the self-attention layer 341 may analyze not only the dependency relationship between the predicted emotion and the predicted personality category, but also the dependency relationship between the predicted personality categories.
  • the linear-active layer 342 may recognize and output the personality of the speaker as a result of the analysis.
  • a combination of a linear layer of a feed-forward layer and an active layer of a softmax function may be used.
  • the speaker's personality predicted and recognized through step S440 may be expressed as a personality characteristic, and may be output by the emotion-personality dependency analysis unit 340 as a dialogue text 380 including the personality characteristic.
  • the device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component.
  • the devices and components described in the embodiments may include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), and a programmable logic unit (PLU).
  • ALU arithmetic logic unit
  • FPGA field programmable gate array
  • PLU programmable logic unit
  • It may be implemented using one or more general purpose or special purpose computers, such as a logic unit, microprocessor, or any other device capable of executing and responding to instructions.
  • the processing device may execute an operating system (OS) and one or more software applications running on the operating system.
  • the processing device may also access, store, manipulate, process, and generate data in response to execution of the software.
  • OS operating system
  • the processing device may also access, store, manipulate, process, and generate data in response to execution of the software.
  • the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.
  • the software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device.
  • the software and/or data may be embodied in any type of machine, component, physical device, computer storage medium or device for interpretation by or providing instructions or data to the processing device. there is.
  • the software may be distributed over networked computer systems, and stored or executed in a distributed manner.
  • Software and data may be stored in one or more computer-readable recording media.
  • the method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium.
  • the medium may be to continuously store the program executable by the computer, or to temporarily store the program for execution or download.
  • the medium may be a variety of recording means or storage means in the form of a single or several hardware combined, it is not limited to a medium directly connected to any computer system, and may exist distributed on a network.
  • examples of the medium include a hard disk, a magnetic medium such as a floppy disk and a magnetic tape, an optical recording medium such as CD-ROM and DVD, a magneto-optical medium such as a floppy disk, and those configured to store program instructions, including ROM, RAM, flash memory, and the like.
  • examples of other media may include recording media or storage media managed by an app store for distributing applications, sites for supplying or distributing other various software, and servers.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Child & Adolescent Psychology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un procédé de reconnaissance de personnalités à partir d'une conversation et un système associé. Selon un mode de réalisation, un système, mis en œuvre avec un ordinateur, pour reconnaître une personnalité à partir d'une conversation comprend au moins un processeur pour exécuter des instructions lisibles par ordinateur, ledit processeur ou lesdits processeurs comprenant : une unité de prétraitement qui isole, à partir d'une conversation entrée, un énoncé cible à partir duquel la personnalité doit être reconnue, ainsi qu'un locuteur qui a prononcé l'énoncé cible ; une unité de prédiction d'informations d'émotion qui prédit l'émotion du locuteur qui a prononcé l'énoncé cible, sur la base de l'énoncé cible ; une unité de prédiction de personnalité de l'énoncé qui prédit la catégorie de la personnalité du locuteur sur la base de l'énoncé cible ; et une unité d'analyse de dépendance entre la personnalité et l'émotion qui analyse la dépendance entre l'émotion prédite et la catégorie de personnalité prédite pour reconnaître la personnalité du locuteur.
PCT/KR2020/001499 2020-01-31 2020-01-31 Procédé de reconnaissance de personnalités à partir d'une conversation et système associé WO2021153830A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/KR2020/001499 WO2021153830A1 (fr) 2020-01-31 2020-01-31 Procédé de reconnaissance de personnalités à partir d'une conversation et système associé

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/KR2020/001499 WO2021153830A1 (fr) 2020-01-31 2020-01-31 Procédé de reconnaissance de personnalités à partir d'une conversation et système associé

Publications (1)

Publication Number Publication Date
WO2021153830A1 true WO2021153830A1 (fr) 2021-08-05

Family

ID=77078400

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/001499 WO2021153830A1 (fr) 2020-01-31 2020-01-31 Procédé de reconnaissance de personnalités à partir d'une conversation et système associé

Country Status (1)

Country Link
WO (1) WO2021153830A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8825764B2 (en) * 2012-09-10 2014-09-02 Facebook, Inc. Determining user personality characteristics from social networking system communications and characteristics
US20160300570A1 (en) * 2014-06-19 2016-10-13 Mattersight Corporation Personality-based chatbot and methods
US9812151B1 (en) * 2016-11-18 2017-11-07 IPsoft Incorporated Generating communicative behaviors for anthropomorphic virtual agents based on user's affect
KR102054042B1 (ko) * 2014-04-17 2019-12-09 소프트뱅크 로보틱스 유럽 로봇과의 대화를 핸들링하는 방법 및 시스템

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8825764B2 (en) * 2012-09-10 2014-09-02 Facebook, Inc. Determining user personality characteristics from social networking system communications and characteristics
KR102054042B1 (ko) * 2014-04-17 2019-12-09 소프트뱅크 로보틱스 유럽 로봇과의 대화를 핸들링하는 방법 및 시스템
US20160300570A1 (en) * 2014-06-19 2016-10-13 Mattersight Corporation Personality-based chatbot and methods
US9812151B1 (en) * 2016-11-18 2017-11-07 IPsoft Incorporated Generating communicative behaviors for anthropomorphic virtual agents based on user's affect

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HONGHAO WEI ; FUZHENG ZHANG ; NICHOLAS JING YUAN ; CHUAN CAO ; HAO FU ; XING XIE ; YONG RUI ; WEI-YING MA: "Beyond the Words", WEB SEARCH AND DATA MINING, ACM, 2 PENN PLAZA, SUITE 701 NEW YORK NY 10121-0701 USA, 2 February 2017 (2017-02-02) - 10 February 2017 (2017-02-10), 2 Penn Plaza, Suite 701 New York NY 10121-0701 USA, pages 305 - 314, XP058316649, ISBN: 978-1-4503-4675-7, DOI: 10.1145/3018661.3018717 *

Similar Documents

Publication Publication Date Title
WO2018151464A1 (fr) Système de codage et procédé de codage utilisant la reconnaissance vocale
CN112001175B (zh) 流程自动化方法、装置、电子设备及存储介质
WO2018074716A1 (fr) Procédé et système pour recommander une interrogation à l'aide d'un contexte de recherche
WO2011074771A2 (fr) Appareil et procédé permettant l'étude d'une langue étrangère
WO2018208026A1 (fr) Procédé et système de traitement de commande d'utilisateur permettant de régler un volume de sortie d'un son à délivrer, sur la base d'un volume d'entrée d'une entrée vocale reçue
WO2015005679A1 (fr) Procédé, appareil et système de reconnaissance vocale
WO2018174314A1 (fr) Procédé et système de production d'une séquence vidéo d'histoire
WO2021132797A1 (fr) Procédé de classification d'émotions de parole dans une conversation à l'aide d'une incorporation d'émotions mot par mot, basée sur un apprentissage semi-supervisé, et d'un modèle de mémoire à court et long terme
JP7113047B2 (ja) 人工知能基盤の自動応答方法およびシステム
WO2021162362A1 (fr) Procédé d'apprentissage de modèle de reconnaissance vocale et dispositif de reconnaissance vocale entraîné au moyen de ce procédé
WO2021251539A1 (fr) Procédé permettant de mettre en œuvre un message interactif en utilisant un réseau neuronal artificiel et dispositif associé
WO2022045651A1 (fr) Procédé et système pour appliquer une parole synthétique à une image de haut-parleur
CN108924218A (zh) 用于推送信息的方法和装置
WO2022050724A1 (fr) Dispositif, procédé et système de détermination de réponses à des requêtes
JP2021068455A (ja) 写真に基づいてユーザの顔を認識して活用する方法およびコンピュータシステム
WO2024090713A1 (fr) Système de gestion de psychologie d'utilisateur par l'intermédiaire d'un service de robot conversationnel basé sur la psychologie empathique
WO2019156536A1 (fr) Procédé et dispositif informatique pour construire ou mettre à jour un modèle de base de connaissances pour un système d'agent ia interactif en marquant des données identifiables mais non apprenables, parmi des données d'apprentissage, et support d'enregistrement lisible par ordinateur
WO2023163383A1 (fr) Procédé et appareil à base multimodale pour reconnaître une émotion en temps réel
KR20190109651A (ko) 인공지능 기반의 음성 모방 대화 서비스 제공 방법 및 시스템
WO2019066231A1 (fr) Génération d'image représentative
KR102319013B1 (ko) 대화문 발화 성격 인식 방법 및 시스템
WO2021153830A1 (fr) Procédé de reconnaissance de personnalités à partir d'une conversation et système associé
KR20190133579A (ko) 사용자와 대화하며 내면 상태를 이해하고 긴밀한 관계를 맺을 수 있는 감성지능형 개인비서 시스템
WO2020149621A1 (fr) Système et procédé d'évaluation de l'expression orale en anglais
CN109887490A (zh) 用于识别语音的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20916768

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20916768

Country of ref document: EP

Kind code of ref document: A1