WO2019231346A1

WO2019231346A1 - Method and system for creating a dialog with a user in a channel convenient for said user

Info

Publication number: WO2019231346A1
Application number: PCT/RU2018/000352
Authority: WO
Inventors: Никита Александрович КУЗНЕЦОВ; Денис Павлович КИРЬЯНОВ; Андрей Сергеевич ЧЕРНОПЯТОВ; Кристина Сергеевна ДОМАНСКАЯ
Original assignee: Публичное Акционерное Общество "Сбербанк России"
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2019-12-05
Also published as: EA201891077A1; RU2688758C1

Abstract

The present technical solution generally relates to automated information systems, and more specifically to technology for intelligently generating dialog messages for creating a dialog with a user. The present method for creating a dialog with a user in a channel convenient for said user includes receiving user input in a graphic user interface, then pre-processing the received text using a syntactic parser, as well as employing lemmatisation, correcting orthographic errors, then classifying the user input for assignment to a particular dialog scenario, and finally generating an answer for the user. The technical result is faster service for a user.

Description

METHOD AND SYSTEM FOR BUILDING A DIALOGUE WITH A USER IN A CHANNEL CONVENIENT FOR A USER

FIELD OF TECHNOLOGY

[001] This technical solution, in General, relates to automated information systems and, more specifically, to technology for intelligent and proactive generation of dialog messages to build a dialogue with the user.

BACKGROUND

[002] The technical solutions described in this section may be implemented, but do not necessarily represent methods and systems that have been previously considered or implemented. Therefore, unless otherwise indicated, it should not be assumed that any of the technical solutions described in this section qualifies as prior art only by virtue of their inclusion in this section.

[003] A chat bot as an automated information system is a computer-based agent having a graphical interface adapted for humans to access and manage information. Traditionally, the chatbot can interact with users in a natural language to simulate intelligent communication and provide personalized assistance to users. For example, users can ask questions in the chatbot such as, “Where is the nearest hotel?” Or “What is the weather like now?”, And receive corresponding answers. Users can also ask in the chatbot to perform certain functions, including, for example, creating emails, making phone calls, searching for information, receiving data, forwarding user requests, sending a user, providing notes and reminders, and so on. Chatbots and personal digital assistant systems are widely used and provide tremendous help to computer users and are especially useful for owners of portable electronic devices, such as smartphones, tablets, game consoles, and so on. [004] The term “chat bot” can also be known and used as “conversational dialogue system”, “dialogue system”, “communication agent”, “robot interlocutor”, “bot interlocutor”, “chat agent”, “ digital personal assistant / agent ”,“ automated online assistant ”and so on. All these terms fall within the scope of the present description of the technical solution.

[005] Essentially, chatbot users can ask a wide variety of questions and request a wide range of information related to world and local news, weather, email content, calendar appointments, scheduled events, and any other searchable content. A chatbot can be useful not only for accessing certain information, but also for generating content, scheduling events, writing emails, navigation, and much more. On the other hand, it is often difficult for users to understand what type of information can be requested through the chat bot at a particular point in time. For example, novice users may have difficulty understanding or knowing the principles of the chatbot or its specific functionality. Users may not understand that several features of the chat bot application can very often be used to solve their daily tasks and needs. Therefore, there is still a need for the development of chat bots and, in particular, there is a need to improve the interface of human interaction with the chat bot.

[006] In financial institutions, the load on call centers is constantly growing, with the number of calls and dialogs with operators amounting to millions per year. Most of the questions are predictable and do not require a unique answer, and can be closed in real time automatically.

ESSENCE OF TECHNICAL SOLUTION

[007] This technical solution aims to eliminate the disadvantages inherent in existing solutions known in the art. [008] The technical problem (or technical problem) in this technical solution is the formation of a dialogue with the user, allowing accurate answers to user questions.

[009] The technical result manifested in solving the above problem is to increase the speed of user service.

[0010] An additional technical result is the reduction in the computational load on the call center information system by reducing the number of user calls and the number of conversations in the support service.

[0011] The specified technical result is achieved by implementing a method for building a dialogue with a user in a user-friendly channel in which user input data is obtained by means of a processor operably connected to the database; pre-processing user input by dividing it into sentences and words, and correcting spelling errors of the user by using the typo correction module; perform lemmatization of each word from user input; form the structure of the dependencies of words from each other in user input by using a syntactic parser; form a vector model of user input words; classifying by means of the dialogue module with the user at least part of the user input to form a response; provide by the processor at least one response to the recognized user data input.

[0012] In some embodiments, when preprocessing a user input, typos in the user input are further corrected by typos correction submodule.

[0013] In some embodiments, when preprocessing user input, tokenization of numerals from user input is additionally performed.

[0014] In some embodiments, a text parser uses recurrent neural networks to parse user text. [0015] In some embodiments, the user module uses a slot filling algorithm to classify user input.

[0016] In some embodiments, linear classification of dialogue dialogs or non-linear dialogue scenarios is determined when classifying through a dialogue module with a user input user.

[0017] In some embodiments, the CBOW and / or Skipgram model is used when generating a vector model of user input words.

[0018] In some embodiments, when generating a response for the user to display in the graphical user interface, the class to which the user input relates is determined.

[0019] In some embodiments, when generating a response for the user to display in the graphical user interface if there is no answer in the dialog, switch it to the operator.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] The features and advantages of the present invention will become apparent from the following detailed description of the invention and the accompanying drawings, in which:

[0021] In FIG. 1 shows an example implementation of a system for building a dialogue with a user in a user-friendly channel;

[0022] In FIG. 2 shows an example implementation of a text preprocessing module;

[0023] In FIG. Figure 3 shows an example of the implementation of a phrase transition from one to another according to the approach of the distance of movement of words.

[0024] In FIG. 4 shows an example implementation of a method for building a dialogue with a user in a user-friendly channel.

[0025] In FIG. Figure 5 shows an example implementation of the word2vec approach, which allows one to evaluate the semantic proximity of words.

[0026] In FIG. 6 is an example implementation of a system for building a dialogue with a user in a user-friendly channel implemented through a set of computing components.

DETAILED DESCRIPTION OF THE INVENTION [0027] This technical solution can be implemented on a computer, in the form of an automated information system (AIS) or a machine-readable medium containing instructions for performing the above method.

[0028] The technical solution may be implemented as a distributed computer system.

[0029] In this solution, a system means a computer system, a computer (electronic computer), CNC (numerical program control), PLC (programmable logic controller), computerized control systems, and any other devices that can perform a given, well-defined sequence of computing operations (actions, instructions).

[0030] An instruction processing device is understood to mean an electronic unit or an integrated circuit (microprocessor) executing machine instructions (programs).

[0031] The command processing device reads and executes machine instructions (programs) from one or more data storage devices. Data storage devices may include, but are not limited to, hard disks (HDDs), flash memory, ROM (read only memory), solid state drives (SSDs), and optical drives.

[0032] A program is a sequence of instructions for execution by a computer control device or an instruction processing device.

[0033] Below will be described the terms and concepts necessary for the implementation of this technical solution.

[0034] A virtual interlocutor, an interlocutor program, a chatbot is a computer program that imitates a person’s speech behavior when communicating with one or more interlocutors.

[0035] Lemmatization is the process of reducing a word form to a lemma - its normal (vocabulary) form. For example, “cats” after lemmatization is transformed into “cat”.

[0036] A database (DB) is a collection of data organized in accordance with a conceptual structure describing the characteristics of these data and the relationships between them, and such a collection data that supports one or more applications (ISO / IEC 2382: 2015, 2121423 “database”).

[0037] As shown in FIG. 1, a system 100 for building a dialogue with a user in a user-friendly data transmission channel may comprise the modules described below, between which data is exchanged.

[0038] Preliminarily, the text processing module 120 receives a request for the implementation of a service from a user (or some modification thereof), which generates it through its graphical user interface 110, for example, in a mobile application on a mobile communication device.

[0039] The service that the user requests may include, for example, obtaining a loan, issuing plastic cards, deposits, leasing, servicing a current account, operations with foreign currency, etc.

[0040] In this step 401, as shown in FIG. 4, in fact, user input is obtained, which can be a string or a set of them, moreover, character. User data input can include either a single question or a set of them. User input data is stored in database 150. The string can be, without being limited, for example, to the following: “Hello, chat bot! Transfer 100 rubles to Denis Ivanov from my account to his phone. ”

[0041] The basic text processing module 120 is responsible for reducing the variety of possible message texts in order to simplify the work of the following modules of the system 100. This module is designed to split incoming user input into sentences and words, as well as their morphological analysis, syntactic analysis and semantic typing tokens.

[0042] The tokenization phase involves the allocation of basic text elements (tokens), delimited on both sides by separating characters, spaces or punctuation marks. The elements here are words, numbers, dates, abbreviations, abbreviations, compound prepositions, etc. Tokenization allows you to select discrete units of text, which are the basis for further work at the stages of morphological and parsing. As a result of tokenization, each element is assigned the corresponding type: word, number, date, address, etc.

[0043] The purpose of the text preprocessing is to prepare it for a qualitative classification of queries. For this, the request goes through many stages of pre-processing 402 and enrichment with various information, among which the main stages are:

[0044] typos;

[0045] lemmatization with the removal of morphological homonymy;

[0046] enriching the results of parsing a syntactic parser and separately highlighting a set of triples object + subject + action (SAO);

[0047] the translation of the numerals from the text into a digital representation ("one thousand three hundred" is replaced by 1300), the allocation of name, time, etc.

[0048] The text preprocessing module 120 includes, but is not limited to, as shown in FIG. 2, syntax parser 210 of the text, sub-module 220 of typos correction, morphological analyzer 230 of the Russian language with support for removing morphological ambiguities, sub-module 240 of replacing synonyms and collocations, sub-module 250 of replacing numerals, sums and phone numbers.

[0049] Typo correction submodule 220 is used to correct spelling errors of users, which helps to correctly classify the request.

[0050] The principle of operation of this submodule 220 is as follows:

[0051] break the sentence into words;

[0052] for each word:

a. if the word is longer than 2 - return it;

B. if there is a standard correction for an error in the dictionary - return a typical correction;

c. if the word consists not only of letters - return it;

d. if the word is familiar to the frequency dictionary - return it

e. if the word is not familiar, the most likely correction is returned:

i. generate all the correction options for the word with one change: delete, add, replace single characters or swap two adjacent in the word; ii. only those familiar to the frequency dictionary are filtered out, their share is allocated (p1);

iii. if candidate 1 - return him;

iv. allocate the probability of a specific error (p2) (for example, the probability of error e / and higher than e / b);

v. highlight the probability of combining the selected correction options with the previous word (p3);

vi. calculate the cumulative probability of each candidate r = r1 ^L l1 * r2 ^L l2 * r3 ^L l3, where A1, A2 and AZ are tunable parameters and return the best one.

[0053] The morphological analyzer 230 of the Russian language with support for the removal of morphological ambiguity is a set of algorithms that compares individual words and word forms in a dictionary (lexicon, to be precise) and clarifies the grammatical characteristics of words. The markup of the source text with grammatical information greatly facilitates the preparation of rules further when the syntactic parser 210 of the text. The work of the morphological analyzer 230 is finalized by the establishment of morphological features of the words of the text. This task can be considered as the task of marking up or tagging text - establishing tags (morphological characters). The set of established features in this case directly depends on the language. The morphological analyzer 230 in some implementations can perform morphological analysis, which can be a dictionary (with a dictionary of bases and endings or a dictionary of word forms) or bulk (only with a dictionary of endings; a dictionary of endings can be embedded in the morphological analysis algorithm 230). The wordless method is used only to determine the variable morphological information (not always unambiguously), and the dictionary method is used in all other cases. Algorithms, a morphological model of the morphological analyzer 230, which generates and defines word forms and examples for various natural languages, are known from the information source [3].

[0054] The synonym and collocation substitution submodule 240 is configured to define synonyms and collocations in the user input and the existing database 150 based on statistical measures (Ml, t-score, log-likelihood), which are most often used in determining the degree of proximity between the components of phrases in the corpus. The Ml measure allows one to distinguish stable phrases, proper names, as well as low-frequency special terms. Words for which the MI-score takes the largest value are less frequent and have limited compatibility. The t-score measure also takes into account the frequency of joint occurrence of the keyword and collocate, answering the question of how random the strength of associations (connectivity) between collocates is. The words with the highest t-score are frequent and can be combined with many units. In some embodiments, a synonym and collocation substitution submodule 240 may use each measure or more in combination.

[0055] The result of the operation of the text parser 210 is the structure of the dependencies of the words from each other in user input in this text (its individual sentences). For example, from the phrase “transfer one hundred rubles to your mom”, system 100 decides that the mom is an indirect object (destination).

[0056] The syntactic parser 210 of the text selects the syntactic structure of the sentence, which is a dependency tree, in the nodes of which are the words of the sentence, and the branches are marked with the names of the syntactic relations.

[0057] The operation of the text parser 210 is already recognition of user input 403, as shown in FIG. 4.

[0058] In some embodiments, the text parser 210 for parsing user input may use the Recurrent Neural Network (RNN) to train the text parser 210. Recurrent neural networks are a class of machine learning models based on the use of previous network conditions to calculate the current one. Each character in the source text, individual words, punctuation marks and even whole phrases

- All this is an atomic element of the input sequence for a neural network. In some implementations, can be used controlled recurrent neurons (gated recurrent units, GRU). The update filter determines how much information will remain from the past state and how much will be taken from the previous layer. The reset filter works much like a forget filter.

[0059] For training neural networks, buildings are used as a training set, namely, tribanks. In linguistics, the corpus is a collection of texts selected and processed according to certain rules that are used as a database for language research. They are used for statistical analysis and verification of statistical hypotheses, confirmation of linguistic rules in a given language. Tribank is a collection of parsed sentences (i.e. parsing columns) prepared manually or automatically in advance. Classification of tribanks is divided into phrase-structure treebanks and dependency treebanks. The following tribanks or cases for the Russian language can be used in this technical solution, not limited to: SynTagRus (1, 107 thousand tokens), PUD (19 thousand tokens), GSD (99 thousand tokens), Taiga (20 thousand tokens ), Dependency Treebanks, etc.

[0060] Next, the Transition-based dependency parsing approach, commonly known in the art, is applied. This approach consists in trying to predict the sequence of actions (transitions) from some initial configuration of a phrase or user request to the final one, as a result of which the desired parse tree will be obtained, which allows to obtain sufficiently high accuracy and achieve fairly high speeds when processing text.

[0061] Arc-standard system is one of the most popular approaches for implementing a transition-based system. The system is described by a configuration consisting of three parts: c = (s, b, A),

[0062] where: s is the data stack;

[0063] b is the data buffer;

[0064] A is a plurality of dependencies.

[0065] Initially, the configuration for the character sequence w _lt ..., w _{n is} Next before processing:

[0066] s = [ROOT] - on the stack, one service symbol;

[0067] b = [w _lt ... _" w _n ] - the entire sequence of characters in the buffer;

Yu [0068] A = 0 - the set of dependencies is empty.

[0069] The final configuration after processing is as follows:

[0070] s = [ROOT] - on the stack, one service symbol;

[0071] b is empty;

[0072] A - contains the desired parsing tree.

[0073] We assume that s *, where i = (1, 2, ...), is the i-th top element of the stack, b _t , i = (1, 2, ...) is the i-th element of the data buffer.

[0074] The Arc-standard system approach has 4 types of operations:

[0075] SHIFT - removing b from the buffer and adding it to the stack;

[0076] LEFT_ARC - Adds a link from s _r to s ₂ to A with a specific label of the type of link, and removes s ₂ from the stack;

[0077] RIGHT_ARC - similar to LEFT_ARC, only with replacement

and s ₂ .

[0078] SWAP: returns the second element from the stack to the buffer.

[0079] Thus, it turns out all | T | = 2L ^ + 1 possible actions, where N _t is the number of types of dependency labels. The purpose of the text parser 210 is to select the most appropriate action for this configuration.

[0080] To train an artificial neural network, it is necessary to generate the most suitable sequence of actions based on available data. At each step, the configuration will contain the necessary data, and the action is the answer.

[0081] The result of the operation of the text parser 210 is a parse tree, where for each element of the phrase its parent and the type of dependency are indicated. The following metrics can be used to evaluate the results of a syntactic parser:

[0082] Unlabeled Attachment Score (UAS) - the ratio of the number of elements with a correctly specified parent to the total number of elements;

[0083] Labeled Attachment Score (LAS) is the ratio of the number of elements with the correct parent and type of relationship to the total number of elements.

[0084] An example of the operation of the text processing module 120 may be as follows:

[0085] Entrance: Request: “a person wants to take a loan for 100 million rubles tomorrow” [0086] Exit:

[0087] The user is not always able to accurately formulate his request for information that he needs. Moreover, even after receiving this information, its subsequent analytical processing is required to determine its usefulness and suitability for solving the task. The difficulties associated with the solution of this problem lie in the variety of possible forms of expression of the same idea, thought, which is especially characteristic of Russian-language texts.

[0088] To solve this problem, in this technical solution, a vector model of user input words is generated. The main advantage of the vector model is the ability to search and rank documents by similarity, that is, by their proximity in the vector space by determining the distance between words. [0089] The coordinates of the vectors are formed so that the cosine or Euclidean distance between the vectors of words that are close in meaning is less than the distance between the vectors of words that are far from the meaning. In this case, the individual components of the vector may reflect some specific category, for example, the first component may contain information about time (past - present - future), the second about the physical size (small - large), in the third about the cost (expensive - cheap), and t .d.

[0090] 3 then, at least a portion of the user input for generating a response is classified by the user dialogue unit 130.

[0091] The system 100 comprises a user dialogue module 130 responsible for maintaining a multi-stage user dialogue, reusing the dialogue context, or using data from available external services.

[0092] This module 130 is used so that individual scripts of dialogue with the user on new topics can be created without a single additional line of program code / with a small and uncomplicated amount of code, so that this solution:

[0093] allowed non-programmers to start new topics;

[0094] was easier to support.

[0095] Moreover, the architecture of module 130 allows embedding with its own (architecture-compatible solutions) routines that correspond to individual topics and can be of arbitrary complexity.

[0096] In order to be able to classify user requests, and accordingly maintain the context of a conversation with the user, interview a client for a given set of questions, module 130 may use an approach based on the algorithm for filling in slots or forms (from the English Slot Filling), widely known in the art.

[0097] When using this approach, two types of scenarios can be defined:

• linear dialogue scenarios in which information is collected that is necessary to respond to a user request from the database 150 and / or external sources of information, and • non-linear scenarios in which the information collected depends on the user's previous responses.

[0098] With this approach, each linear scenario is presented in the form of a form consisting of a set of fields, the storage time of the form, and the prepared response. The answer can be a chain of arbitrary actions and is performed after filling all the required fields of the form.

[0099] Each form field is defined as a set of properties:

• default value;

• a clarifying question to fill out this field;

• Mandatory filling (if the field is not filled, but required - a clarifying question will be asked in the dialogue to the user);

• a function for filling (one of the pre-prepared ones, which extracts the value for filling out the field from the result of text preprocessing or using auxiliary external services / databases);

• the ability to use the value of the form field from the context to other forms and the ability to fill the field with the value from the context;

• the lifetime of the field value (after which it will be cleared) and others.

[00100] Non-linear scenarios with this approach are defined as a unidirectional acyclic graph of forms indicating the initial form for the script. In this case, all forms are similar to the forms of linear scenarios and are filled in the same way, but there are transition possibilities between them. The ability to switch from filling out one form to another is specified in the form of imposed restrictions on the values of the fields of the first.

[00101] Thus, the module 130 for the classification of requests operates according to the following algorithm:

[00102] 1. Gets processed (step 402) and recognized (step 403) in the previous step the text obtained from the module 120 text processing; [00103] 2. Makes up an ordered list of scripts to go through: the last for this user, and then all the rest. Each scenario can be presented in one form and relates to one topic / request / necessary service;

[00104] 3. The conditions for getting into a particular scenario are specified in the form of rules, embedded arbitrary classifiers, examples that are used for classification by the platform method, which is also used in the module 140 for answering frequently asked questions, or combinations thereof. The last scenarios for the user are checked for the possibility of filling out the form fields with a client request, and if successful, control is transferred to them. Otherwise, hierarchical classification can be used to compile a list of the most probable verification scenarios, after which the specific conditions for getting into the script are checked;

[00105] 4. Control is transferred to the most likely scenario among a set of scenarios that have passed confidence thresholds that are necessary and specific to each scenario. In this case, the user is returned a response from this script by filling out the script form. If there is no significant difference in the probability of two past different scenarios, the user may be asked a clarifying question, which one is implied ..

[00106] In cases where the dialogue module 130 does not include any script corresponding to the user's request, the frequently asked questions answer module 140 is used. This module 140 is used when neither dialogue nor client-dependent answers are required, but only answers to frequently asked questions. To add new topics, minimal effort is needed: only a list of the questions themselves, answers to them and possible periphrases of the questions (although the search allows for arbitrariness, a large number of examples still improve quality).

[00107] An example of operation:

Login: Hello

Exit: "Hi-Hi. :)"

Entrance: "what I have overdue" Exit: To clarify information about the presence of arrears, you need to call Ns phone 8-800-333-31-38 "

[00108] The necessary data for the operation of this module 140 may be as follows:

5 [00109] the reference name of the question for navigation,

[00110] a response for the user,

[00111] data channels in which this question is available,

[00112] positive and negative examples of queries (positive ones are queries that should lead to this answer, negative ones are similar to positive queries that shouldn't lead to this answer).

[00113] The classification of the request by user topics from module 130 and module 140 may be combined in some implementations. After determining the topics, control is transferred to the appropriate module.

15 [00114] In some embodiments, to search for the desired scenario, the normalized text of user input is converted into a vector of words (form a vector model of words) using the statistical measure TF-IDF used to assess the importance of a word in the context of a document that is part of a collection of documents 20 or corpus .

[00115] TF-IDF is a statistical measure used to assess the importance of a word in the context of a document that is part of a collection of documents or corpus. The weight of a word is proportional to the amount of use of this word in the document, and inversely proportional to the frequency of 25 use of the word in other documents of the collection.

[00116] The TF-IDF measure is often used in text analysis and information retrieval tasks, for example, as one of the criteria for document relevance to a search query, when calculating the proximity measure of documents during clustering.

Zo [00117] TF (term frequency) is the ratio of the number of occurrences of a word to the total number of words in a document.

The significance of a word within a single document can be determined by the following characteristic:

[00118] where u is the number of occurrences of the word t _t in document d; E _k ^ _{k is the} total number of words in a given user query and / or document.

[00119] IDF (inverse document frequency) is a value inversely proportional to the frequency with which a certain word occurs in collection documents.

[00120] IDF accounting reduces the weight of commonly used words. For each unique word within a particular collection of documents, there is only one IDF value. The IDF characteristic is defined by the following relation:

[00121] where \ D \ is the number of documents in the enclosure; | d _t => t * | - the number of documents in which t _t occurs.

[00122] Thus, the measure TF-IDF is the product of two factors: tf · idf (t, d, D) = tf (t, d) x idf (t, D).

[00123] Words with a high frequency within a particular document and with a low frequency of use in other documents receive high weight in the TF-IDF measure.

[00124] The TF-IDF measure is often used to represent collection documents as numerical vectors reflecting the importance of using each word from a certain set of words (the number of words in a set determines the dimension of the vector) in each document. Such a model is called a vector model and makes it possible to compare texts by comparing the vectors representing them in any metric (Euclidean distance, cosine measure, Manhattan distance, Chebyshev distance, etc.), that is, by performing cluster analysis.

[00125] In this technical solution, for generating output data (step 404, as shown in Fig. 4), a dialog can be used

Word2vec - software tool for analyzing semantics of natural languages, which is a technology that is based on distributive semantics and vector representation of words.

[00126] Word2vec suggests finding relationships between the semantics of words under the assumption that words in similar contexts have

a tendency to mean similar things, i.e. be semantically close, as shown in FIG. 5. For example, word2vec allows you to use “math” in words: “king” - “man” + “woman” = “queen”. More formally, the task is as follows: maximizing the cosine proximity between the word vectors (scalar product of vectors) that appear next to each other, and minimizing the cosine proximity between the word vectors that do not appear next to each other. Next to each other in this case means in close contexts.

[00127] For example, the words “analysis” and “research” are often found in similar contexts. The phrases “Scientists conducted an analysis of the algorithms” or “Scientists conducted a study of the algorithms” are quite similar. Word2vec analyzes these contexts and concludes that the words “analysis” and “research” are close in meaning.

[00128] For example, for the word “coffee,” word2vec may produce the following

15 nearest neighbors in the format "word - cosine distance":

[00129] coffee 0.734483;

[00130] tea 0.690234;

[00131] tea 0.688656;

[00132] cappuccino 0.666638;

[00133] coffee shop 0.636362;

[00134] cocoa 0.619801;

[00135] espresso 0.599390;

[00136] coffee 0.595211;

[00137] chicory 0.594247;

[00138] coffee 0.593993;

[00139] copuccino 0.587324;

[00140] chocolate 0.585655;

[00141] cappuccino 0.580286;

[00142] cardamom 0.566781; [00143] Latte 0.563224.

[00144] In order to formulate a response to the user (step 404), in some embodiments of the invention, the vectors of all the words that are included in the user input are used, the average (centroid) distance between them is determined, and then the cosine distance between the words of the user input and the centroid is determined.

[00145] In some embodiments, instead of word2vec, vector models such as GloVe, FastText, etc. are used, but are not limited to.

[00146] In this technical solution, the following CBOW (Continuous Bag-of-Words) and Skip-gram models, without limitation, can be used for vector representation of words.

[00147] Then, according to the multi-index of the question vectors from the FAQ (a database of frequently asked questions by users), which is located in the database 150, several most similar in terms of Euclidean distance or cosine similarity are ranked. A multi-index is a generalization of the concept of an integer index to a vector index, which has found application in various areas of mathematics related to functions of many variables. Cosine similarity is a measure of similarity between two vectors of the pre-Hilbert space, which is used to measure the cosine of the angle between them.

[00148] If two feature vectors are given (in this case two word vectors), A and B, then the cosine similarity, cos (0), can be represented using their scalar product and norm:

[00149] In some embodiments, to determine the most similar questions that are known in advance, and the answers to them, Minkowski distance (L1), Euclidean distance (L2), and Jacquard measure can be used as a metric.

[00150] For example, if we have the phrase “Jessica loves oranges and tangerines,” which can be represented as a vector of words A (A: [1, 1, 0,0, 0,1, 1, 1]), and the phrase “Jesse likes citrus fruits,” which can be represented as vector B (B: [0,0, 1, 1, 1, 0, 0,0]]. Due to the fact that there are no common words, the cosine distance will be 0. However, it should be borne in mind that the meaning in these phrases is semantically similar, since the “citrus” in the parse tree is the parent of the words “oranges” and “tangerines”.

[00151] Next, the result is formatted. Depending on the distance of the question to the best option and the general cut-off threshold, the user is given either a pre-prepared answer (step 405, as shown in Fig. 4), or a couple (question, answer) (if the chance of an irrelevance of the answer is sufficient), or switch to the operator .

[00152] Potentially, the system 100 may decide to display a list of these questions at the right place for the user to choose, if the graphical user interface allows it or requires a separate command. To support this mode, it is enough to pass an additional parameter to this module 140, which means returning a list of the best options for formatting instead of the text itself.

[00153] In other embodiments of the invention, a list of frequently asked questions in the database is preliminarily generated, and then when a new user input arrives, the Word Mover’s Distance method proposed in the information source is applied [1].

[00154] Using this approach, for each word from one phrase, the closest neighbor from another is found. Word Mover’s Distance (WMD) is the minimum distance to go from one phrase to another, as shown in FIG. 3.

[00155] The minimum cumulative distance in this approach can be determined as follows.

[00156] Let there be a set of word vectors and word spacing for two phrases: for example, the question sounds after the preprocessing of the text as “hello how to buy a car loan”, and the answer should be “hi give a loan to a car”.

[00157] Next, module 140 takes the minimum value of the distance between words from each column, from each row, after which they are summed, divided by the total number of words (normalized), and get the "cost of switching from one phrase to another":

[00158] (0.64 + 0.90 + 0.00 + 0.39 + 0.64 + 0.82 + 0.88 + 0.39 + 0.00) / 9 =

0.52

[00159] In some embodiments, the implementation can normalize the total number of unique tokens or the number of unique tokens in a phrase, not limited to.

[00160] In some implementations, the topic “How to block cards” may be mistakenly approached “I have secured a credit card, how to cancel?”. In such cases, such negative examples are stored in the database 150, their relation to each other in semantics, and this is taken into account in future work.

[00161] In some embodiments of the invention, not only the distance between words is used, but also additional attributes, including, but not limited to: the frequency of occurrence of each word in the database 150, morphological features of each word, syntactic information.

[00162] In some embodiments of the invention, not only the distance between words is used, but also some own weight of each word, for example, its frequency of occurrence.

[00163] If x is the proportion of all topics in which the word W occurs, and y is the proportion of all examples in this topic in which the word W occurs, multiply the distance between words by 1 - x ^y for normalization.

[00164] Also, the problem with the traditional WMD approach may be that it does not take into account the word order, so that the phrases “ATM ate a card” and “card ate an ATM” are identical after lemmatization. To solve this problem, you can use syntactic information from the syntactic parser 210 of the text, which displays the tree of parsing words. In a particular embodiment, the parser 210 displays an error because the “card ate ATM” design cannot be used. To correct this situation, a cost matrix for transitions from one class to another is introduced and the Euclidean distances between the phrases are multiplied by the corresponding values.

[00165] To make it possible to make antonyms further in the distance between words, and synonyms are closer in word2vec from the point of view of, for example, Euclidean distance, without significantly damaging the distances of other words in space, use the approaches described in the prior art (source of information [2]) e.g. retrofitting.

[00166] If at none of the previous steps (the operation of module 130 and module 140) failed to find the answer to the user in the dialog of the graphical user interface, the user will be notified or the user will be switched to the operator. In some cases, a suggestion may be made to switch the user to an operator for communication on another data channel.

[00167] In a specific embodiment, without limitation, an operating mode can be selected that simulates the absence of the chat bot itself and smoothly translates to the operator with almost any obscure user request (this is done due to the insignificant amount of chat bot content):

[00168] if the chatbot expects an answer to the question asked by it and receives a message that module 130 or module 140 cannot recognize, then it gives another attempt to the user (informing him of this) and, if repeated, transfers to the operator; [00169] if the chat bot does not expect a response, but is just waiting for a request from the user and receives an unrecognizable message, then immediately transfers the user to the operator.

[00170] In general, the behavior of the module sets up a special parameter of the form v = [2,3,6]. In the given example, the chat bot will behave as follows:

[00171] upon receipt of the 1st unrecognized message, the fact of its existence is forgotten after v [1] = 2 recognized messages;

[00172] upon receipt of the 2nd unrecognized message, the fact of its existence is forgotten after v [2] = 3 recognized messages;

[00173] upon receipt of the 3rd unrecognized message, the fact of its existence is forgotten after v [3] = 6 recognized messages;

[00174] receiving the 4th unrecognized message takes the user to the operator (or, more gently, invites him to do this in the graphical user interface).

[00175] In the general case, if a transfer to the operator is not required, one random message from blanks with a single meaning is displayed: it is informed to the user that nothing was found.

[00176] The illustrative embodiments described herein may be embodied in an operating environment comprising computer-executable instructions (eg, software) installed on a computer, in hardware, or in a combination of software and hardware. Computer-executable instructions may be written in a computer programming language or may be embodied in hardware logic. If written in a programming language in accordance with a recognized standard, such commands can be executed on many hardware platforms and for interfaces of many operating systems. Although not limited to this, software system computer programs for implementing the present method can be written in any number of suitable programming languages, such as, for example, hypertext markup language (HTML), dynamic HTML, extensible markup language (XML), extensible style sheet language (XSL), style semantics language and document specifications (DSSSL), cascading style sheets (CSS), synchronized multimedia data integration language (SMIL), document markup language for wireless communications (WML), JavaTM, JiniTM, C, C ++, Perl, UNIX Shell, Visual Basic or Visual Basic Script, markup language for virtual reality events (VRML), ColdFusionTM, or other compilers, assemblers, interpreters, or other computer languages or platforms.

[00177] The system 100 may be implemented by architecture, as shown in FIG. 6, and include the following components shown below, including processor 610. In a specific embodiment of the present technical solution, processor 610 may include one or more processors and / or one or more microcontrollers configured to execute instructions for performing operations related to the operation of the aforementioned method for building a dialogue with the user in a user-friendly channel. In various embodiments of the present technical solution, processor 610 may be implemented as single-chip, multi-chip, and / or electrical components, including one or more integrated circuits and printed circuit boards. The processor 610 may optionally comprise a cache block (not shown) for temporarily locally storing instructions, data, or computer addresses. For example, the processor 610 may include one or more processors or one or more controllers related to specific tasks or a single multifunction processor or controller.

[00178] A processor 610 is operatively coupled to a data input / output module 620, an audio module 630.

[00179] In the presented embodiment of the present technical solution, the data input / output module 620 can be implemented as a touch screen that performs the functionality of both an input device (by fixing user commands in the form of touches) and a user output device (ie display).

In other words, the touch screen is a display that determines the presence and position of the user input-touch. In alternative embodiments of the present technical solution, I / O module 620 can be implemented as a separate display and a separate input device. However, in other alternative embodiments of the present technical solution, the input / output module 620 may include a physical keyboard (containing one or more physical buttons) in addition to the touch screen.

[00180] The processor 610 is further associated with a memory module 640, which contains a database 150. The memory module 640 may span one or more media and generally provide storage space for computer code to implement the aforementioned method for transmitting P2P translation information (eg, software and / or hardware). For example, memory module 640 may include various tangible computer-readable media, including read-only memory (ROM) and / or random access memory (RAM). As is well known to specialists in this field of technology, ROM unidirectionally transfers data and instructions to processor 610, and RAM is usually used to transfer data and instructions bilaterally.

[00181] The memory module 640 may also include one or more fixed storage devices in the form of, for example, a hard disk drive (HDD), solid state drive (SSD), flash memory card (eg, Secured Digital or SD card, multimedia eMMD cards), along with other types of memory, bilaterally connected to the processor 610. The information may also be located on one or more removable media loaded or installed in the system 100, when necessary. For example, any of a number of suitable memory cards (eg, SD cards) can be downloaded to system 100 on a temporary or permanent basis (using, for example, one or more additional ports).

[00182] The memory module 640 may store, among other things, a series of machine-readable instructions upon execution of which the processor 610 (as well as other components of the system 100) are configured to perform various operations described herein.

[00183] In various specific embodiments, the system 100 may further comprise a wireless module 650 and sensor module 660, both of which are coupled to processor 610 to simplify various functions of system 100.

[00184] Wireless module 650 may be designed to operate over one or more wireless networks, such as a wireless personal area network (WPAN) (such as a BLUETOOTH WPAN, IK personal area network), a WI-FI network (eg, 802.11 a / b / g / n WIFI network, 802.11 standards network), WI-MAX network, mobile cellular network. As a mobile cellular network, for example, a Global System for Mobile Communications (GSM) network, a developed GSM standard with an increased data rate (EDGE), a universal mobile telecommunication system (UMTS) network or a long-term development network (LTE) can be used. Additionally, wireless communication module 650 may include hosting protocols, so that system 100 can be configured as a base station for wireless devices.

[00185] The sensor module 660 may include one or more sensor devices to provide additional input and simplify the various functions of the system 100. Some examples of implementation of the sensor module 660 may include one or more of devices: an accelerometer, a temperature measuring device environment, a device for measuring the force of gravity, a gyroscope, a device for measuring illumination, a device for measuring the acceleration force, a device for measuring the surrounding geomagnetic field, a device for measuring the degree of rotation, a device for measuring atmospheric pressure, a device for measuring relative humidity, a device for measuring the orientation of the device, and so on. It should be noted that some of these devices can be implemented as hardware, software, or a combination of both.

[00186] A power supply module 670 is also provided for providing power to one or more components of a system 100.

In some embodiments of the present technical solution, the power supply module 670 may be implemented as a lithium-ion battery. However, other types may be used. rechargeable (and ordinary) batteries. Naturally, in other embodiments of the present technical solution, in addition to or alternatively to using a battery, the power supply module 670 may be implemented as a main power source configured to connect the system 100 to a main power source, for example, a standard power cable and plug.

[00187] In some embodiments of the present technical solution, various components of the system 100 can be connected to each other via one or more buses (including hardware and / or software), these buses are not numbered. By way of non-limiting example, one or more of the buses may include an accelerated graphics port (AGP) or other graphics ports, enhanced industry standard bus architecture (EISA), front bus (FSB), hyper transport bus (HT), industry standard bus architecture ( ISA), INFINIBAND connection, LPC bus, memory bus, microchannel architecture bus (MCA), peripheral component bus (PCI), express peripheral component bus (PCI-X), serial data bus from the drive and information (SATA), local video electronic association (VLB) bus, universal asynchronous transceiver (UART) interface, serial data bus for integrated circuit communication (I2C), serial peripheral interface bus (SPI), Secure Digital (SD) memory interface, and MultiMediaCard (MMS), Memory Stick (MS), Secure Digital Input Output (SDIO), multi-channel buffered serial port (McBSP) bus, universal serial bus (USB), universal memory controller bus (GPMC), random access synchronous dynamic memory controller (SDRC) bus, general-purpose input / output bus (GPIO), separate video signal bus (S-Video), serial display interface bus (DSI), bus extended bus architecture for microcontrollers ( AMBA), or any other suitable tire or a combination of two or more tires. [00188] The system 100 in some embodiments using the speech-to-text and text-to-speech modules can be used for voice service (IVR) and / or voice control of a mobile application.

[00189] Those skilled in the art will understand that in the present description, the expression “receiving data” from a user means receiving by the electronic device data from the user in the form of an electronic (or other) signal. In addition, those skilled in the art will understand that displaying data to a user through a graphical user interface component (for example, an electronic device screen and the like) may include transmitting a signal to a graphical user interface component, this signal contains data that can be processed, and at least a portion of this data may be displayed to the user through a graphical user interface component.

[00190] Some of these steps, as well as signal transmission-reception, are well known in the art and therefore, have been omitted in specific parts of this description for simplicity. Signals can be transmitted-received using optical means (for example, fiber optic connection), electronic means (for example, wired or wireless connection) and mechanical means (for example, based on pressure, temperature or other suitable parameter).

[00191] Modifications and improvements to the above-described embodiments of the present technical solution will be apparent to those skilled in the art. The preceding description is provided as an example only and is not subject to any restrictions. Thus, the scope of the present technical solution is limited only by the scope of the attached claims.

USED INFORMATION SOURCES

[00192] 1. Kusner M. et al. From word embeddings to document distances

// International Conference on Machine Learning. - 2015 .-- C. 957-966.

[00193] 2. Faruqui M. et al. Retrofitting word vectors to semantic lexicons

// arXiv preprint arXiv: 1411.4166. - 2014. [00194] 3. Prutskov A. V. Generation and definitions of word forms of natural languages based on their successive transformations // Bulletin of the Ryazan State Radio Engineering University. - 2009. - Ns. 27.- S. 51.

Claims

FORMULA

1. A method for building a dialogue with a user in a user-friendly channel, comprising the following steps:

• receive, by means of a processor functionally connected to the database, user input data;

• pre-process user input by dividing it into sentences and words, moreover

o correct spelling errors of the user through the use of the module for correcting typos;

o perform lemmatization of each word from user input;

about form the structure of the dependencies of words from each other in user input by using a syntactic parser;

• form a vector model of user input words;

• classify by means of the dialogue module with the user at least part of the user input to form a response;

• form a response for the user to display in the graphical user interface;

• provide at least one response to the recognized user data input by the processor.

2. The method according to p. 1, characterized in that during the preprocessing of user input, typos in the user input are additionally corrected by means of a typo correction submodule.

3. The method according to p. 1, characterized in that when performing preprocessing of user input, tokenization of numerals from user input is additionally performed.

4. The method according to p. 1, characterized in that the syntactic parser of the text for the analysis of user text uses recurrent neural networks.

5. The method according to p. 1, characterized in that when classifying user input, the dialogue module uses a slot filling algorithm.

6. The method according to claim 1, characterized in that during the classification, linear dialog scripts or non-linear dialog scripts are determined by means of a dialogue module with a user input user.

7. The method according to claim 1, characterized in that when forming a vector model of user input words, the CBOW and / or Skip-gram model is used.

8. The method according to claim 1, characterized in that when forming the response for the user to display in the graphical user interface, the class to which the user input relates is determined.

9. The method according to p. 1, characterized in that when forming the response for the user to display in the graphical user interface in the absence of an answer in the dialog, switch it to the operator.

10. A system for building a dialogue with a user in a user-friendly channel, comprising:

• a text preprocessing module, configured to receive user input and lemmatize it, correct typos in it, tokenize, and parse;

• a dialogue module with the user, configured to receive from the text preprocessing module the processed text of the user input and classifying at least a portion of the user input to form a response; • a module for answering frequently asked questions, configured to receive user input from a module for conducting a dialogue with a user and generating an answer for the user for displaying in the graphical user interface