RU2762702C2

RU2762702C2 - System and method for automated assessment of intentions and emotions of users of dialogue system

Info

Publication number: RU2762702C2
Application number: RU2020117653A
Authority: RU
Inventors: Алена Сергеевна Феногенова; Татьяна Олеговна Шаврина
Original assignee: Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2021-12-22
Also published as: EA202092856A1; RU2020117653A3; RU2020117653A

Abstract

FIELD: computer technology.

SUBSTANCE: invention relates to the field of computer technology. This result is achieved by the fact that the processing of the input text data is performed, in which the data is cleared, normalized and tokenized; form at least one supply vector based on the received data tokens; sentiment analysis is performed, in which the type of sentence is determined based on the mentioned vectors of sentences using their processing by a machine learning model, while the type of sentence is: negative, positive, neutral or conversational; performing the extraction of dialog acts, in which the general intention in the incoming proposals is determined by processing the said vector of the proposal with a machine learning model; process the mentioned tokens of sentences obtained as a result of the analysis of sentiment and dialog acts in order to identify at least one of: subject, object, action or their combinations in each sentence and determine a specific intention and / or reason, as well as emotions of the object and subject based on the processing of said offer tokens.

EFFECT: invention is aimed at providing real-time automated analysis of user messages to select the most relevant response for an automatic response from the dialogue system.

14 cl, 3 dwg

Description

ОБЛАСТЬ ТЕХНИКИFIELD OF TECHNOLOGY

[0001] Настоящее изобретение относится к области компьютерной техники, в частности к решениям для реализации диалоговых систем для автоматизированной оценки намерений и эмоций пользователя.[0001] The present invention relates to the field of computer technology, in particular to solutions for the implementation of dialogue systems for automated assessment of the user's intentions and emotions.

УРОВЕНЬ ТЕХНИКИLEVEL OF TECHNOLOGY

[0002] Определение намерений и эмоций пользователя - информацию способная существенно повысить качество обработки сообщений в диалоговых системах. Подобные механизмы анализа сейчас присутствуют во многих популярных чат-ботах, например, Gunrock (Chen С.Y. et al. Gunrock: Building A Human-Like Social Bot By Leveraging Large Scale Real User Data. - 2018. https://pdfs.semanticscholar.org/b402/b85ad45e3ac51flda8ee718373082ce24f47.pdf)[0002] Determining the intentions and emotions of the user - information that can significantly improve the quality of message processing in conversational systems. Similar analysis mechanisms are now present in many popular chat bots, for example, Gunrock (Chen C.Y. et al. Gunrock: Building A Human-Like Social Bot By Leveraging Large Scale Real User Data. - 2018. https: // pdfs. semanticscholar.org/b402/b85ad45e3ac51flda8ee718373082ce24f47.pdf)

[0003] Архитектура диалоговой системы включает в себя субмодули, отвечающие за классификацию и извлечение заданных команд от пользователя (intent), эмоций (sentiment), выделения тематик разговора пользователя (topic modelling). Система показывает высокие метрики качества, однако оперирует только заданными наборами эмоций, команд, тематик, тогда как поведение пользователей не детерминировано и потенциально может покрывать неограниченное количество тематик.[0003] The architecture of the dialog system includes sub-modules responsible for classifying and extracting given commands from the user (intent), emotions (sentiment), highlighting topics of the user's conversation (topic modeling). The system shows high quality metrics, however, it operates only with specified sets of emotions, commands, topics, while user behavior is not deterministic and can potentially cover an unlimited number of topics.

[0004] Другим примером такого рода систем является решение Xiaoice (Zhou L. et al. The design and implementation of Xiaoice, an empathetic social chatbot //arXiv preprint arXiv: 1812.08989. - 2018. https://arxiv.org/abs/1812.08989). Данная диалоговая система также оперирует субмодулями, отвечающие за классификацию команд от пользователя (intent), эмоций (sentiment), выделения тематик разговора пользователя (topic modelling), выделения заданного набора намерений/диалоговых актов (dialog acts). Она также обладает специальной системой хранения мнений пользователя и связанных с ним объектов.[0004] Another example of this kind of systems is the Xiaoice solution (Zhou L. et al. The design and implementation of Xiaoice, an empathetic social chatbot // arXiv preprint arXiv: 1812.08989. - 2018. https://arxiv.org/abs/ 1812.08989). This dialog system also operates with submodules responsible for the classification of commands from the user (intent), emotions (sentiment), highlighting the topics of the user's conversation (topic modeling), highlighting a given set of intentions / dialog acts (dialog acts). It also has a dedicated storage system for user opinions and related objects.

[0005] Система показывает высокие метрики качества и регулярно обновляет архитектуру за счет большого количества обратной связи от пользователей. Однако также оперирует только заданными наборами эмоций, команд, тематик, тогда как поведение пользователей не детерминировано и потенциально может покрывать неограниченное количество тематик.[0005] The system shows high quality metrics and regularly updates the architecture due to a large amount of feedback from users. However, it also operates only with specified sets of emotions, commands, topics, while user behavior is not deterministic and can potentially cover an unlimited number of topics.

[0006] При этом существуют нейросетевые модели с открытым набором возможных выводов (в том числе потенциально - намерений, эмоций), в частности, модель Event2Mind (Rashkin Н. et al. Event2mind: Commonsense inference on events, intents, and reactions //arXiv preprint arXiv: 1805.06939. - 2018. https://arxiv.org/abs/1805.06939). Данная система решает задачу логического вывода и генерирует возможные эмоциональные состояния человека (открытый список) соотносимые с событиями, описанными в тексте на естественном языке.[0006] At the same time, there are neural network models with an open set of possible conclusions (including potentially - intentions, emotions), in particular, the Event2Mind model (Rashkin N. et al. Event2mind: Commonsense inference on events, intents, and reactions // arXiv preprint arXiv: 1805.06939. - 2018.https: //arxiv.org/abs/1805.06939). This system solves the problem of logical inference and generates possible emotional states of a person (open list) correlated with the events described in the text in natural language.

[0007] На вход системе дается текст с описанием какого-то события (например, Персона X пьет утром кофе), - система генерирует намерения, либо причину события (Персона X хочет взбодриться/проснуться или Персона X засыпает) и психоэмоциональное состояние и реакции субъектов события (Персона X чувствует себя усталым). Потенциально обученная система способна соотносить высказанные эмоции и состояния с субъектом и объектом (если они присутствуют в сообщении). Авторы данного решения собрали датасет для английского языка, в котором содержится около 25.000 событий и соответствующим им реакций и намерений участников событий. Система способна генерировать релевантный ответ и показывает высокие метрики качества на тестовой выборке. Однако для русского языка подобного набора данных не существует, также, морфология русского языка богаче, что потенциально может вызвать проблемы для генерации.[0007] At the input, the system is given a text describing an event (for example, Person X drinks coffee in the morning), the system generates intentions, or the reason for the event (Person X wants to cheer up / wake up or Person X falls asleep) and the psycho-emotional state and reactions of the subjects events (Person X feels tired). A potentially trained system is able to correlate expressed emotions and states with the subject and object (if they are present in the message). The authors of this solution have collected a dataset for the English language, which contains about 25,000 events and the corresponding reactions and intentions of the participants in the events. The system is capable of generating a relevant response and shows high quality metrics on the test sample. However, such a dataset does not exist for the Russian language, and the morphology of the Russian language is richer, which can potentially cause problems for generation.

СУЩНОСТЬ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

[0008] Для решения существующей технической проблемы предлагается заявленная архитектура системы и способа ее функционирования, которая способна в реальном времени проанализировать сообщение от пользователя на предмет не только фактического смысла высказывания, но и эмоций, субэмоций, причин событий в высказывании, открытого списка намерений, и на основании данной информации отвечать поддерживать разговор с пользователем более валидно и эмпатично.[0008] To solve the existing technical problem, the claimed system architecture and method of its operation is proposed, which is capable of real-time analyzing a message from a user for not only the actual meaning of the utterance, but also emotions, sub-emotions, causes of events in the utterance, an open list of intentions, and based on this information, responding to maintain a conversation with the user is more valid and empathic.

[0009] Техническим результатом является обеспечение в реальном времени автоматизированного анализа сообщений пользователя для выбора наиболее релевантной реакции для автоматического ответа со стороны диалоговой системы.[0009] The technical result is to provide real-time automated analysis of user messages to select the most relevant response for an automatic response from the dialog system.

[0010] Заявленный результат достигается с помощью системы автоматизированной оценки намерений и эмоций пользователей диалоговой системы, которая содержит:[0010] The claimed result is achieved using a system for automated assessment of intentions and emotions of users of the dialogue system, which contains:

по меньшей мере один процессор;at least one processor;

по меньшей мере одно средство памяти;at least one memory means;

модуль препроцессинга текста, выполненный с возможностью обработки входных данных, при которой осуществляется очистка, нормализация и токенизация текстовых данных;a text preprocessing module capable of processing input data, in which text data is cleaned, normalized and tokenized;

модуль векторизации, обеспечивающий формирование вектора предложений на основании токенов, передаваемых от модуля препроцессинга текста;a vectorization module that provides the formation of a vector of sentences based on tokens transmitted from the text preprocessing module;

модуль анализа тональности, выполненный с возможностью определения типа предложения на основании получаемого вектора с помощью модели машинного обучения, при этом тип предложения представляет собой: негативный, позитивный, нейтральный или разговорный;a sentiment analysis module, configured to determine the type of sentence based on the resulting vector using a machine learning model, wherein the type of sentence is: negative, positive, neutral or conversational;

модуль извлечения диалоговых актов, выполненный с возможностью определения общего намерения в поступающих предложениях с помощью обработки упомянутого вектора предложения моделью машинного обучения;a module for extracting dialog acts, configured to determine the general intention in incoming proposals by processing said proposal vector with a machine learning model;

модуль обработки событий, выполненный с возможностьюan event handling module, configured

обработки токенов предложений на предмет выявления по меньшей мере одного из: субъект, объект, действие, или их сочетания, в каждом предложении;processing tokens of proposals in order to identify at least one of: subject, object, action, or their combination, in each proposal;

определение конкретного намерения и/или причины, а также эмоций объекта и субъекта на основании обработки упомянутых токенов предложения.determination of a specific intention and / or reason, as well as the emotions of the object and the subject, based on the processing of the said offer tokens.

[0011] В одном из частных примеров реализации системы при токенизации входные данные разбиваются на слова, числа и/или знаки препинания.[0011] In one of the particular examples of the implementation of the system, tokenization breaks the input data into words, numbers and / or punctuation marks.

[0012] В другом частном примере реализации системы разбиение предложений осуществляется по пробелам и знакам препинания, с учетом сокращений и инициалов.[0012] In another particular example of the implementation of the system, the division of sentences is carried out by spaces and punctuation marks, taking into account abbreviations and initials.

[0013] В другом частном примере реализации системы при очистке данных удаляются токены, не являющиеся буквами алфавита, знаками препинания и/или цифрами.[0013] In another particular example of the system implementation, when clearing data, tokens that are not alphabet letters, punctuation marks and / or numbers are removed.

[0014] В другом частном примере реализации системы модуль препроцессинга текста в ходе нормализации выполняет исправление опечаток в токенах.[0014] In another particular example of the implementation of the system, the text preprocessing module during normalization performs correction of typos in tokens.

[0015] В другом частном примере реализации системы модуль препроцессинга текста в ходе нормализации выполняет лемматизацию токенов.[0015] In another particular example of the system implementation, the text preprocessing module performs token lemmatization during normalization.

[0016] В другом частном примере реализации системы модуль препроцессинга текста в ходе нормализации осуществляет построение синтаксического графа для каждого предложения для определения бинарных отношений управления, согласования и примыкания между словами предложений.[0016] In another particular example of the implementation of the system, the text preprocessing module during normalization builds a syntactic graph for each sentence to determine the binary relations of control, agreement and contiguity between the words of the sentences.

[0017] Заявленный технический результат достигается также с помощью способа автоматизированной оценки намерений и эмоций пользователей диалоговой системы, который выполняется с помощью процессора и содержит этапы, на которых:[0017] The claimed technical result is also achieved using a method for automated assessment of the intentions and emotions of users of the dialogue system, which is performed using a processor and contains the steps in which:

выполняют обработку входных текстовых данных, при которой осуществляетсяperform processing of the input text data, in which

очистка, нормализация и токенизация данных;data cleaning, normalization and tokenization;

формируют по меньшей мере один вектор предложения на основании получаемых токенов данных;form at least one supply vector based on the received data tokens;

осуществляют анализ тональности, при котором определяют тип предложения на основании упомянутых векторов предложений с помощью их обработки моделью машинного обучения, при этом тип предложения представляет собой: негативный, позитивный, нейтральный или разговорный;sentiment analysis is performed, in which the type of sentence is determined based on the mentioned vectors of sentences using their processing by a machine learning model, while the type of sentence is: negative, positive, neutral or conversational;

выполняют извлечение диалоговых актов, при котором определяют общее намерение в поступающих предложениях с помощью обработки упомянутого вектора предложения моделью машинного обучения;performing the extraction of dialog acts, in which the general intention in the incoming proposals is determined by processing the said vector of the proposal with a machine learning model;

выполняют обработку упомянутых токенов предложений на предмет выявления по меньшей мере одного из: субъект, объект, действие, или их сочетания, в каждом предложении; иprocessing said tokens of the proposals in order to identify at least one of: subject, object, action, or combinations thereof, in each proposal; and

определяют конкретное намерение и/или причину, а также эмоции объекта и субъекта на основании обработки упомянутых токенов предложения.determine the specific intent and / or reason, as well as the emotions of the object and the subject, based on the processing of said offer tokens.

[0018] В одном из частных примеров реализации способа при токенизации входные данные разбиваются на слова, числа и/или знаки препинания.[0018] In one of the particular examples of the implementation of the method, during tokenization, the input data is divided into words, numbers and / or punctuation marks.

[0019] В другом частном примере реализации способа разбиение предложений осуществляется по пробелам и знакам препинания, с учетом сокращений и инициалов.[0019] In another particular example of the implementation of the method, the division of sentences is carried out by spaces and punctuation marks, taking into account abbreviations and initials.

[0020] В другом частном примере реализации способа при очистке данных удаляются токены, не являющиеся буквами алфавита, знаками препинания и/или цифрами.[0020] In another particular example of the implementation of the method, when clearing data, tokens that are not alphabet letters, punctuation marks and / or numbers are removed.

[0021] В другом частном примере реализации способа в ходе нормализации текста выполняется исправление опечаток в токенах.[0021] In another particular example of the implementation of the method, during text normalization, typos in tokens are corrected.

[0022] В другом частном примере реализации способа в ходе нормализации текста выполняется морфологический анализ и лемматизация токенов.[0022] In another particular example of the method implementation, during text normalization, morphological analysis and lemmatization of tokens are performed.

[0023] В другом частном примере реализации способа в ходе нормализации текста осуществляется построение синтаксического графа для каждого предложения для определения бинарных отношений управления, согласования и примыкания между словами предложений.[0023] In another particular example of the implementation of the method, during the normalization of the text, a syntax graph is built for each sentence to determine the binary relations of control, agreement and contiguity between the words of the sentences.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF DRAWINGS

[0024] Фиг. 1 иллюстрирует общий вид заявленной системы.[0024] FIG. 1 illustrates a general view of the claimed system.

[0025] Фиг. 2 иллюстрирует архитектуру модуля обработки событий.[0025] FIG. 2 illustrates the architecture of the event handling module.

[0026] Фиг. 3 иллюстрирует общий вид вычислительного устройства.[0026] FIG. 3 illustrates a general view of a computing device.

ОСУЩЕСТВЛЕНИЕ ИЗОБРЕТЕНИЯCARRYING OUT THE INVENTION

[0027] На Фиг. 1 представлена архитектура заявленной системы (100) автоматизированной оценки намерений и эмоций пользователей диалоговой системы, которая может быть реализована на базе вычислительного устройства, например, персонального компьютера, сервера и т.п. Система (100) содержит совокупность модулей обеспечивающий обработки поступающих реплик от пользователя (10), который выполняет взаимодействие с диалоговой системой (110).[0027] FIG. 1 shows the architecture of the claimed system (100) for automated assessment of intentions and emotions of users of a dialogue system, which can be implemented on the basis of a computing device, for example, a personal computer, a server, etc. System (100) contains a set of modules providing processing of incoming replicas from the user (10), which interacts with the dialogue system (110).

[0028] Модуль препроцессинга текста (101) является модулем выполняющим первичную обработку обращений пользователя, в частности, выполняет обязательные последовательные процедуры очистки и подготовки входящего текста из диалоговой системы при ведении беседы с пользователем. Модуль препроцессинга (101) осуществляет деление входящего текста на предложения и их токенизацию. Под токенизацией понимается разбиение текста на минимально функциональные единицы, токены, которыми являются слова, числа и знаки пунктуации.[0028] The text preprocessing module (101) is a module performing the primary processing of user requests, in particular, it performs mandatory sequential procedures for cleaning and preparing incoming text from the dialogue system when conducting a conversation with a user. The preprocessing module (101) performs the division of the incoming text into sentences and their tokenization. Tokenization is understood as splitting the text into minimum functional units, tokens, which are words, numbers and punctuation marks.

[0029] Входные данные подаются на вход модуля (101) в формате txt. Пользователь (10) может взаимодействовать с диалоговой системой (110) с помощью голосовых команд, например, если голосовая система (110) представляет собой голосового. В этом случае голосовые команды пользователя (10) преобразовываются в текстовый формат с помощью известных решений в уровне техники, например, Google Speech-to-Text. С помощью открытых библиотек на языке python3, производится деление полученного текста на предложения(при помощи https://pypi.org/project/rusenttokenize/) и деление предложений на токены с помощью разбивки предложений по пробелам и отделения от них знаков препинания.[0029] The input data is supplied to the input of the module (101) in txt format. The user (10) can interact with the dialogue system (110) using voice commands, for example, if the voice system (110) is a voice system. In this case, the voice commands of the user (10) are converted into text format using known solutions in the prior art, for example, Google Speech-to-Text. Using open libraries in the python3 language, the resulting text is divided into sentences (using https://pypi.org/project/rusenttokenize/) and the sentences are divided into tokens by breaking sentences by spaces and separating punctuation marks from them.

[0030] После первичной обработки текста модулем (101) формируется предложений, в котором списки токенов предложения.[0030] After the primary processing of the text, the module (101) generates proposals, in which the lists of tokens of the proposal.

Пример:Example:

"Все люди смертны. Сократ - человек. Следовательно, Сократ смертен." □ [["все", "люди", "смертны", "."], ["Сократ", "-", "человек", "."], ["Следовательно", ",", "Сократ", "смертен", "."]]."All people are mortal. Socrates is a man. Therefore, Socrates is mortal." □ [["all", "people", "mortal", "."], ["Socrates", "-", "man", "."], ["Hence", ",", "Socrates" , "mortal", "."]].

В данной системе токен считается не типом данных string, а объектом класса Token, у которого определены следующие атрибуты:In this system, a token is considered not a string data type, but an object of the Token class, which has the following attributes:

asis - здесь хранится текст токена «как есть»asis - this is where the token text is stored "as is"

morph - результат морфологического анализа токена (нормальная форма слова, часть речи, падеж и т.д.)morph - the result of morphological analysis of the token (normal form of a word, part of speech, case, etc.)

spellcheck - результат спеллчекинга (от англ. Spellcheck - проверка правописания) токена.spellcheck - the result of the spellcheck (from the English Spellcheck) token.

Если модуль спеллчекинга не выявил ошибку, этот атрибут будет пустым, если же выявил - будет содержать самое вероятное исправление.If the spellchecking module did not detect an error, this attribute will be empty, if it does, it will contain the most likely fix.

Start_index, end_index - порядковый номер в тексте первого и последнего символа в токене Coref - поле для кореференции (связи токена с другими в пределах текста, если токен является местоимением). По умолчанию поле пустое.Start_index, end_index - the ordinal number in the text of the first and last character in the Coref token - a field for coreference (the relationship of the token with others within the text, if the token is a pronoun). The field is empty by default.

Псевдокод:Pseudocode:

[0031] Далее с помощью модуля (101) осуществляется очистка текста от спецсимволов. В ходе упомянутой процедуры выполняется фильтрация входящих токенов от спецсимволов, не входящих в список кириллических и латинских букв, чисел и символов со стандартной 105-клавишной клавиатуры.[0031] Next, using the module (101), the text is cleared from special characters. During this procedure, incoming tokens are filtered from special characters that are not included in the list of Cyrillic and Latin letters, numbers and symbols from a standard 105-key keyboard.

[0032] После очистки текста от спецсимволов модуль (101) выполняет исправление опечаток. На данном этапе происходит проверка входящих токенов на опечатки и их исправление. Используются открытые технологии - модель на основе расстояния Damerau Levenshtein 1 (Модель из репозитория DeepPavlov Damerau Levenshtein 1 + lm). При осуществлении процедуры исправления опечаток выполняется обработка списка объектов класса Token, для которых по итогу проверки заполняется атрибут spellcheck у тех токенов, где возможна опечатка.[0032] After clearing the text from special characters, the module (101) performs typo correction. At this stage, incoming tokens are checked for typos and corrected. Open technologies are used - a model based on the distance Damerau Levenshtein 1 (Model from the DeepPavlov Damerau Levenshtein 1 + lm repository). When performing the procedure for correcting typos, the list of objects of the Token class is processed, for which, as a result of the check, the spellcheck attribute is filled in those tokens where a typo is possible.

[0033] После этого модулем (101) осуществляется определение частей речи (морфологический анализ), при которой выполняется нормализация токенов. При нормализации происходит приведение токенов к форме именительного падежа единственного числа у существительных, инфинитива у глаголов и т.д., а также фиксация морфологических характеристик формы токена (падеж, число, лицо, род и т.д.). Анализ осуществляется при помощи открытой технологий - библиотеки RnnMorph на рекуррентных нейросетях https://pypi.org/project/rnnmorph/. Данная нейросетевая архитектура обеспечивает наилучшее качество анализа для русского языка. По результатам анализа для каждого токена сохраняется следующая информация:[0033] After that, the module (101) determines the parts of speech (morphological analysis), in which the tokens are normalized. During normalization, tokens are reduced to the nominative singular form for nouns, the infinitive for verbs, etc., as well as the morphological characteristics of the token form (case, number, person, gender, etc.) are fixed. The analysis is carried out using open technologies - the RnnMorph library on recurrent neural networks https://pypi.org/project/rnnmorph/. This neural network architecture provides the best analysis quality for the Russian language. Based on the analysis results, the following information is saved for each token:

• Часть речи (POS, part of speech);• Part of speech (POS, part of speech);

• Морфологические характеристики (fulltag);• Morphological characteristics (fulltag);

• Нормальная форма лемматизации (lemma).• Normal form of lemmatization (lemma).

Пример анализа:Analysis example:

AsisAsis

"токены""tokens"

Morph:Morph:

[{ "grammar": {'POS'i'NOUN', 'fulltag':'Case=Nom|Gender=Masc|Number=Plur', 'wordform': 'токены', 'lemma':'токен'}].[{"grammar": {'POS'i'NOUN', 'fulltag': 'Case = Nom | Gender = Masc | Number = Plur', 'wordform': 'tokens', 'lemma': 'token'}] ...

[0034] Далее с помощью модуля (101) осуществляется синтаксический анализ. В ходе данного этапа выполняется более высокоуровневый анализ входящего текста - осуществляет построение синтаксического графа на каждом предложении, определяя бинарные отношения управления, согласования и примыкания между словами. Для синтаксического анализа используется открытая библиотека Udpipe https://pypi.org/project/ufal.udpipe/. В ходе анализа формируются списки вершин и ребер синтаксического графа для предложения, собираемого из передаваемого на вход списка токенов. По итогам обработки формируется построчная матрица ребер синтаксического графа на исходном списке токенов, String. Информация о синтаксическом разборе может использоваться дополнительно и сохраняется в привязки к списку токенов предложения.[0034] Next, using the module (101), parsing is performed. During this stage, a higher-level analysis of the incoming text is performed - it builds a syntactic graph on each sentence, defining binary relations of control, agreement and contiguity between words. The open source Udpipe library https://pypi.org/project/ufal.udpipe/ is used for parsing. In the course of the analysis, lists of vertices and edges of the syntactic graph are formed for the sentence assembled from the list of tokens passed to the input. As a result of processing, a row-by-row matrix of syntactic graph edges is formed on the original list of tokens, String. The parsing information can be used optionally and is stored in binding to the offer token list.

[0035] Модуль векторизации (102) получает на вход текст, обработанный модулем (101), и выполняет его преобразование в вектор при помощи модели ELMo. Модель ELMO обеспечивает формирование вектора для предложений русского языка. Полученный вектор, содержащий контекстную информацию о порядке слов в тексте и их важности, передается затем на вход модулей (103) и (104).[0035] The vectorization module (102) receives as input the text processed by the module (101), and performs its transformation into a vector using the ELMo model. The ELMO model provides the formation of a vector for Russian sentences. The resulting vector containing contextual information about the order of words in the text and their importance is then transmitted to the input of modules (103) and (104).

[0036] Модель векторизации текста ELMo (https://allennlp.org/elmo) для русского языка предобучена на огромных корпусах данных и представляет собой архитектуру, хранящую многоуровневую информацию о языке - символьную, лексическую и семантическую (информацию о контексте). За счет этого возможно делать высокоуровневые обобщения о свойствах входящих фраз даже при отсутствии большого количества примеров при обучении классификатора.[0036] The ELMo text vectorization model (https://allennlp.org/elmo) for the Russian language is pre-trained on huge data bodies and is an architecture that stores multi-level information about the language - symbolic, lexical and semantic (information about the context). Due to this, it is possible to make high-level generalizations about the properties of incoming phrases even in the absence of a large number of examples when training the classifier.

[0037] Модуль анализа тональности (103) выполняется на основе нейросетевой модели классификатора, обученную определять эмоции сообщения. Определение выполняется по заданному списку классов, распознаваемых моделью модуля (103). Применения модуля (102) позволяет более эмпатично отвечать на сообщения пользователя в ходе его взаимодействия с диалоговой системой, накапливать его мнения о различных вопросах, менять тональность сообщения в ответ, воздействовать на негатив, стараясь перевести на более позитивные аспекты и т.д. В одном из примеров реализации системы (100) может применяться модель анализа тональности из открытого репозитория DeepPavlov (https://github.eom/deepmipt/DeepPavlov/blob/0.2.0/deeppavlov/configs/classifiers/rusentiment_elmo.json).[0037] The sentiment analysis module (103) is executed based on the neural network model of the classifier, trained to determine the emotions of the message. The determination is performed according to a given list of classes recognized by the module model (103). The use of the module (102) allows you to more empathically respond to the user's messages in the course of his interaction with the dialogue system, accumulate his opinions on various issues, change the tone of the message in response, influence the negative, trying to translate into more positive aspects, etc. One of the examples of system implementation (100) can use the sentiment analysis model from the open repository DeepPavlov (https: //github.eom/deepmipt/DeepPavlov/blob/0.2.0/deeppavlov/configs/classifiers/rusentiment_elmo.json).

[0038] Сверточная нейросеть модуля (103) обучена приписывать вектору предложения определенный класс, выбираемый из группы: 'positive', 'negative', 'neutral', 'speech', 'skip'.[0038] The convolutional neural network of the module (103) is trained to assign to the sentence vector a certain class selected from the group: 'positive', 'negative', 'neutral', 'speech', 'skip'.

• Positive, neutral, negative - для выраженно позитивных, нейтральных и негативных эмоций• Positive, neutral, negative - for strongly positive, neutral and negative emotions

• Speech - для нейтральных разговорных фраз ("Привет, пока, спасибо")• Speech - for neutral conversational phrases ("Hello, bye, thanks")

• Skip - для случаев, неявных для модели (для которых невозможно осуществить классификацию).• Skip - for cases implicit for the model (for which it is impossible to perform classification).

[0039] Модель на основе нейронной сети в модуле (103) обучена на данных из открытого источника RuSentiment (содержит тексты социальных медиа на русском языке). Так как модель работает с векторами целых предложений и коротких текстов, то обобщающая способность модели позволяет ей воспринимать текст целиком, а не привязываться к конкретным словам, так как их значение в контексте может сильно изменяться: теплый (позитивный сентимент) + пиво (нейтральный сентимент) = теплое пиво (негативный сентимент).[0039] The neural network-based model in module (103) is trained on data from the open source RuSentiment (contains texts of social media in Russian). Since the model works with vectors of whole sentences and short texts, the generalizing ability of the model allows it to perceive the entire text, and not be tied to specific words, since their meaning in the context can vary greatly: warm (positive sentiment) + beer (neutral sentiment) = warm beer (negative sentiment).

[0040] Модуль извлечения диалоговых актов (104) представляет собой также нейросетевую архитектуру классификатора, который может определить на сообщении от 1 до 13 различных диалоговых актов, то есть классов, определяющих намерения в сообщении пользователя - информативный вопрос, вопрос о мнении, личное мнение, причина, следствие, аргумент, согласие, несогласие, извинение, ответ на извинение, благодарность, ответ на благодарность, приветствие, прощание и т.д. Модуль (104) принимает на вход предложение векторизованное с помощью модуля (102). Далее векторизованное предложение обрабатывается классификатором для распознавания общего намерения пользователя в поступающих предложениях.[0040] The module for extracting dialog acts (104) is also a neural network architecture of a classifier that can define on a message from 1 to 13 different dialog acts, that is, classes that determine intentions in a user's message - an informative question, a question about an opinion, a personal opinion, cause, effect, argument, agreement, disagreement, apology, response to an apology, gratitude, response to gratitude, greeting, goodbye, etc. Module (104) accepts as input a sentence vectorized using module (102). Further, the vectorized proposal is processed by the classifier to recognize the general intention of the user in the incoming proposals.

[0041] Модель классификатора модуля (104) работает на основе сверточной нейронной сети, обученной приписывать вектору предложения от 1 до 13 меток класса (они не являются взаимоисключающими, multiclass labelling).[0041] The module classifier model (104) operates on a convolutional neural network trained to assign 1 to 13 class labels to a sentence vector (they are not mutually exclusive, multiclass labeling).

[0042] Модель обучена на примерах пользовательских диалогов, накопленных во время тестирования диалоговых систем и открытых данных диалогов из OpenSubtitles, размеченных правилами и обученными разметчиками. По итогам обработки вектора предложений модулем (104) формируется список меток предложения.[0042] The model is trained on examples of custom dialogs accumulated during testing of dialog systems and open data dialogs from OpenSubtitles, tagged with rules and trained markers. Based on the results of the processing of the vector of proposals by the module (104), a list of labels of the proposal is formed.

[0043] Модуль извлечения диалоговых актов (104) позволяет точнее обрабатывать информацию о прагматике высказывания (с точностью 92%) и корректном ответе на сообщение пользователя. Например, высказывание «сколько этот телефон стоит?» требует точного ответа, высказывание «да сколько же такой телефон может стоить?» требует оценочного суждения в ответ (в том числе о характеристиках телефона, а не о цене), а высказывание «да что же это такое, что телефоны столько стоят!» требует реакции на негатив. На приведенных примерах модель классификатора модуля (104) дает различные классы - информативный вопрос, вопрос о мнении, личное мнение.[0043] The module for extracting dialog acts (104) allows more accurate processing of information about the pragmatics of the statement (with an accuracy of 92%) and the correct response to the user's message. For example, saying "how much does this phone cost?" requires an exact answer, the statement "how much can such a phone cost?" requires a value judgment in response (including about the characteristics of the phone, and not about the price), and the statement "but what is it that phones cost so much!" requires a reaction to negativity. On the given examples, the module classifier model (104) gives various classes - an informative question, a question about an opinion, a personal opinion.

[0044] Модуль обработки событий (105), построенный на архитектуре модели event2mind, выделяет событие из текста и генерирует релевантное описание причин события и психоэмоциональное состояние участников события в моменте. На Фиг. 2 представлен пример архитектуры упомянутого модуля.[0044] The event processing module (105), built on the architecture of the event2mind model, extracts the event from the text and generates a relevant description of the causes of the event and the psychoemotional state of the event participants in the moment. FIG. 2 shows an example of the architecture of the mentioned module.

[0045] Модуль (105) по факту обработки токенов предложений пользователя (10) способен определить конкретное намерение и/или причину, а также эмоции объекта и субъекта предложения. События выделяются с помощью синтаксического парсера (UDPipe модель для русского, обученная на синтагрусе http://ufal.mff.cuni.cz/udpipe), по шаблонам вида "субъект" (subj) + "действие" (глагол - verb) + "объект" (obj). Такого типа события («Я хочу кофе» → «X (subj) усталый») вычленяются из текста пользователя и подаются в нейросеть модуля (105). Получая на вход текст события, модель нацелена на генерацию трех прагматических выводов: намерение/причины события, реакция Персоны X и реакции других участников события (Персоны Y).[0045] The module (105), upon processing the tokens of the user's proposals (10), is able to determine a specific intention and / or reason, as well as the emotions of the object and subject of the proposal. Events are highlighted using a syntactic parser (UDPipe model for Russian, trained on the syntagrus http://ufal.mff.cuni.cz/udpipe), using templates of the form "subject" (subj) + "action" (verb - verb) + " object "(obj). Events of this type (“I want coffee” → “X (subj) tired”) are extracted from the user's text and fed to the neural network of the module (105). Receiving the event text as input, the model aims to generate three pragmatic conclusions: the intention / causes of the event, the reaction of Person X, and the reactions of other participants in the event (Person Y).

[0046] Работа нейросети модуля (105) выглядит следующим образом: 1) на вход подается описание событие в виде текста; 2) событие кодируется в векторное представление hE автоэнкодером; 3) далее используются несколько RNN декодеров, которые принимают векторное представление события на вход и генерируют текстовые сущности. Например, последовательность намерения/причины события (vi=vi^(0) vi^(1) …) высчитывается следующим образом: vi^(t+1) = softmax(Wi RNN(vi^(t), hi,dec^(t))+bi).[0046] The operation of the neural network of the module (105) is as follows: 1) the input is a description of the event in the form of text; 2) the event is encoded into the vector hE representation by an autoencoder; 3) then several RNN decoders are used, which accept a vector representation of an event as input and generate text entities. For example, an event intent / cause sequence (vi = vi ^ (0) vi ^ (1) ...) is calculated as follows: vi ^ (t + 1) = softmax (Wi RNN (vi ^ (t), hi, dec ^ ( t)) + bi).

[0047] Модуль обработки событий (105) позволяет как брать в виде признаков для обучения результаты работы модулей (103) и (104), так и автономно генерировать новые реплики, отражающие эмоциональное состояние пользователя (10). Для работы модуля (105) необходим размеченный датасет для русского языка с описанием событий и прагматических выводов из них. Модель нейронного енкодера-декодера показала, что может успешно составлять векторные представления ранее не присутствующих в обучающей выборке событий и генерировать вероятностные намерения и реакции участников события.[0047] The event processing module (105) allows both to take the results of the modules (103) and (104) as features for training, and to autonomously generate new replicas reflecting the user's emotional state (10). For the module (105) to work, a tagged dataset for the Russian language with a description of events and pragmatic conclusions from them is required. The neural encoder-decoder model has shown that it can successfully compose vector representations of events that were not previously present in the training sample and generate probabilistic intentions and reactions of event participants.

[0048] Система (100) может получать на вход не только предобработанный текст пользователя (10), но и контекст - предыдущие сообщения в беседе. Существует возможность реализовать как ответ напрямую, через модуль обработки событий (105), уточняющий эмоциональное состояние человека (эмпатичное сообщение), так и использовать полученную информацию при работе модулей (103) - (105) для более корректного выбора ответа, реализации различных модулей проактивности в диалоговой системе (110), накопления знаний о событиях в жизни пользователя и т.д.[0048] The system (100) can receive as input not only the preprocessed text of the user (10), but also the context - the previous messages in the conversation. It is possible to implement both the response directly, through the event processing module (105), which specifies the emotional state of a person (empathic message), and to use the information received during the operation of modules (103) - (105) for a more correct choice of response, the implementation of various modules of proactivity in dialogue system (110), accumulation of knowledge about events in the user's life, etc.

[0049] Потенциально заявленная система (100) может быть встроена в современные архитектуры диалоговых систем (110) и дополнять и улучшать их качество работы; система также может работать автономно и ее данные по результатам анализа большого массива отзывов пользователей могут быть полезны для аналитики, отслеживания трендов, помощи операторам колл-центров и др.[0049] Potentially the claimed system (100) can be built into modern architectures of dialog systems (110) and supplement and improve their quality of work; the system can also work autonomously and its data based on the analysis of a large array of user reviews can be useful for analytics, tracking trends, helping call center operators, etc.

[0050] В качестве диалоговой системы (110) могут выступать различные решения, например, голосовые помощники, чат-боты, роботизированные колл-центры, и иные технологии, воплощающие автоматизированный процесс общения с пользователем.[0050] Various solutions can act as the dialogue system (110), for example, voice assistants, chat bots, robotic call centers, and other technologies that embody an automated process of communicating with a user.

[0051] На Фиг. 3 представлен общий вид вычислительного устройства (200). На базе устройства (200) может быть реализовано устройство пользователя для формирования процесса общения с диалоговой системой (110), система (100) для реализации заявленного решения и иные непредставленные устройства, которые могут участвовать в общей информационной архитектуре заявленного решения.[0051] FIG. 3 shows a general view of the computing device (200). On the basis of the device (200), a user device can be implemented for forming a communication process with a dialogue system (110), a system (100) for implementing the claimed solution and other unrepresented devices that can participate in the general information architecture of the claimed solution.

[0052] В общем случае, вычислительное устройство (200) содержит объединенные общей шиной информационного обмена один или несколько процессоров (201), средства памяти, такие как ОЗУ (202) и ПЗУ (203), интерфейсы ввода/вывода (204), устройства ввода/вывода (205), и устройство для сетевого взаимодействия (206).[0052] In the General case, the computing device (200) contains one or more processors (201) united by a common data exchange bus, memory means such as RAM (202) and ROM (203), input / output interfaces (204), devices input / output (205), and a device for networking (206).

[0053] Процессор (201) (или несколько процессоров, многоядерный процессор) могут выбираться из ассортимента устройств, широко применяемых в текущее время, например, компаний Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™ и т.п. Процессор (201) может включать в себя также графический процессор или работать в совокупности с графическим ускорителем, например, Nvidia, AMD Radeon и др., которые могут применяться для осуществления вычислительных операций при выполнении алгоритмов машинного обучения.[0053] The processor (201) (or multiple processors, multi-core processor) can be selected from a range of devices currently widely used, for example, Intel ™, AMD ™, Apple ™, Samsung Exynos ™, MediaTEK ™, Qualcomm Snapdragon ™ and etc. The processor (201) can also include a graphics processor or work in conjunction with a graphics accelerator, for example, Nvidia, AMD Radeon, etc., which can be used to perform computational operations when executing machine learning algorithms.

[0054] ОЗУ (202) представляет собой оперативную память и предназначено для хранения исполняемых процессором (201) машиночитаемых инструкций для выполнение необходимых операций по логической обработке данных. ОЗУ (202), как правило, содержит исполняемые инструкции операционной системы и соответствующих программных компонент (приложения, программные модули и т.п.).[0054] RAM (202) is a random access memory and is intended for storing machine-readable instructions executed by the processor (201) for performing the necessary operations for logical data processing. RAM (202), as a rule, contains executable instructions of the operating system and corresponding software components (applications, software modules, etc.).

[0055] ПЗУ (203) представляет собой одно или более устройств постоянного хранения данных, например, жесткий диск (HDD), твердотельный накопитель данных (SSD), флэш-память (EEPROM, NAND и т.п.), оптические носители информации (CD-R/RW, DVD-R/RW, BlueRay Disc, MD) и др.[0055] ROM (203) is one or more persistent storage devices, such as a hard disk drive (HDD), solid state data storage device (SSD), flash memory (EEPROM, NAND, etc.), optical storage media ( CD-R / RW, DVD-R / RW, BlueRay Disc, MD), etc.

[0056] Для организации работы компонентов устройства (200) и организации работы внешних подключаемых устройств применяются различные виды интерфейсов В/В (204). Выбор соответствующих интерфейсов зависит от конкретного исполнения вычислительного устройства, которые могут представлять собой, не ограничиваясь: PCI, AGP, PS/2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232 и т.п.[0056] Various types of I / O interfaces (204) are used to organize the operation of the components of the device (200) and to organize the operation of external connected devices. The choice of the appropriate interfaces depends on the specific version of the computing device, which can be, but are not limited to: PCI, AGP, PS / 2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS / Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232, etc.

[0057] Для обеспечения взаимодействия пользователя с вычислительным устройством (200) применяются различные средства (205) В/В информации, например, клавиатура, дисплей (монитор), сенсорный дисплей, тач-пад, джойстик, манипулятор мышь, световое перо, стилус, сенсорная панель, трекбол, динамики, микрофон, средства дополненной реальности, оптические сенсоры, планшет, световые индикаторы, проектор, камера, средства биометрической идентификации (сканер сетчатки глаза, сканер отпечатков пальцев, модуль распознавания голоса) и т.п.[0057] To ensure user interaction with the computing device (200), various means (205) I / O information are used, for example, a keyboard, display (monitor), touch display, touch-pad, joystick, mouse manipulator, light pen, stylus, touch panel, trackball, speakers, microphone, augmented reality, optical sensors, tablet, light indicators, projector, camera, biometric identification (retina scanner, fingerprint scanner, voice recognition module), etc.

[0058] Средство сетевого взаимодействия (206) обеспечивает передачу данных устройством (200) посредством внутренней или внешней вычислительной сети, например, Интранет, Интернет, ЛВС и т.п. В качестве одного или более средств (206) может использоваться, но не ограничиваться: Ethernet карта, GSM модем, GPRS модем, LTE модем, 5G модем, модуль спутниковой связи, NFC модуль, Bluetooth и/или BLE модуль, Wi-Fi модуль и др.[0058] The means of networking (206) allows the device (200) to transmit data via an internal or external computer network, for example, Intranet, Internet, LAN, and the like. One or more means (206) may be used, but not limited to: Ethernet card, GSM modem, GPRS modem, LTE modem, 5G modem, satellite communication module, NFC module, Bluetooth and / or BLE module, Wi-Fi module and dr.

[0059] Дополнительно могут применяться также средства спутниковой навигации в составе устройства (300), например, GPS, ГЛОНАСС, BeiDou, Galileo.[0059] Additionally, satellite navigation aids can also be used as part of the device (300), for example, GPS, GLONASS, BeiDou, Galileo.

[0060] Представленные материалы заявки раскрывают предпочтительные примеры реализации технического решения и не должны трактоваться как ограничивающие иные, частные примеры его воплощения, не выходящие за пределы испрашиваемой правовой охраны, которые являются очевидными для специалистов соответствующей области техники.[0060] The presented application materials disclose preferred examples of the implementation of the technical solution and should not be construed as limiting other, particular examples of its implementation, not going beyond the scope of the claimed legal protection, which are obvious to specialists in the relevant field of technology.

Claims

1. A system for the automated assessment of the intentions and emotions of users of the dialogue system, containing

at least one processor;

at least one memory means;

a text preprocessing module capable of processing input data, in which text data is cleaned, normalized and tokenized;

a vectorization module that provides the formation of a vector of sentences based on tokens transmitted from the text preprocessing module;

a sentiment analysis module, configured to determine the type of sentence based on the resulting vector using a machine learning model, wherein the type of sentence is: negative, positive, neutral or conversational;

a module for extracting dialog acts, configured to determine the general intention in incoming proposals by processing said proposal vector with a machine learning model;

an event handling module, configured

processing tokens of sentences received from the sentiment analysis module and the module for extracting dialog acts in order to identify at least one of: subject, object, action or their combinations in each sentence;

determining a specific intention and / or reason, as well as the emotions of the object and the subject, based on the processing of the said offer tokens.

2. The system according to claim 1, characterized by the fact that during tokenization the input data is divided into words, numbers and / or punctuation marks.

3. The system according to claim 2, characterized by the fact that the division of sentences is carried out by spaces and punctuation marks, taking into account abbreviations and initials.

4. The system according to claim 1, characterized by the fact that when clearing the data, tokens that are not letters of the alphabet, punctuation marks and / or numbers are deleted.

5. The system according to claim 1, characterized by the fact that the text preprocessing module corrects typos in tokens during normalization.

6. The system according to claim 1, characterized by the fact that the text preprocessing module performs lemmatization of tokens during normalization.

7. The system according to claim 1, characterized by the fact that the text preprocessing module during normalization builds a syntactic graph for each sentence to determine the binary relations of control, agreement and contiguity between the words of the sentences.

8. A method for the automated assessment of the intentions and emotions of users of the dialogue system, performed with the help of a processor and containing the stages at which:

processing the input text data, in which the data is cleared, normalized and tokenized;

form at least one supply vector based on the received data tokens;

sentiment analysis is performed, in which the type of sentence is determined based on the mentioned vectors of sentences using their processing by a machine learning model, while the type of sentence is: negative, positive, neutral or conversational;

performing the extraction of dialog acts, in which the general intention in the incoming proposals is determined by processing the said vector of the proposal with a machine learning model;

process the said tokens of sentences received as a result of sentiment analysis and dialog acts in order to identify at least one of: subject, object, action or their combinations in each sentence and

determine the specific intent and / or reason, as well as the emotions of the object and the subject, based on the processing of said offer tokens.

9. The method according to claim 8, characterized in that during tokenization, the input data is broken into words, numbers and / or punctuation marks.

10. The method according to claim 9, characterized in that the division of sentences is carried out by spaces and punctuation marks, taking into account abbreviations and initials.

11. The method according to claim 8, characterized in that when clearing the data, tokens that are not alphabet letters, punctuation marks and / or numbers are deleted.

12. The method according to claim 8, characterized in that during the text normalization, typos in tokens are corrected.

13. The method according to claim 8, characterized in that during the normalization of the text, lemmatization of tokens is performed.

14. The method according to claim 8, characterized in that during the normalization of the text, a syntactic graph is built for each sentence to determine the binary relations of control, agreement and contiguity between the words of the sentences.