WO2023211305A1

WO2023211305A1 - Method and system for generating text for a digital assistant

Info

Publication number: WO2023211305A1
Application number: PCT/RU2022/000147
Authority: WO
Inventors: Мария Ивановна ТИХОНОВА
Original assignee: Публичное Акционерное Общество "Сбербанк России"
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2023-11-02

Abstract

The claimed technical solution relates generally to the field of computing, and more particularly to a method and system for generating text for a digital assistant in dialogue systems. The technical result of implementing the claimed solution is an increase in the semantic accuracy with which stylized text is generated from a source text. This technical result is achieved using a method for automatically generating text for a digital assistant in dialogue systems comprising the steps of: receiving incoming data containing natural language source text and the target style of the utterances of a digital assistant; encoding the source text; vectorizing tokens; processing vector representations of the source text tokens using a neural network-based machine learning model trained on digital assistant text utterances stylized in accordance with the given target style, with an array of vectorized stylized texts being formed during processing; decoding each vectorized stylized text from the array, wherein decoding at least includes the conversion of a vectorized stylized text into tokens and detokenization; filtering the array of stylized texts; ranking the filtered stylized texts and selecting the best stylized text, wherein the best text is selected on the basis of the pairwise distance between the source text and each of the possible stylized texts; and sending the stylized text to the dialogue system.

Description

METHOD AND SYSTEM OF TEXT GENERATION FOR DIGITAL ASSISTANT

TECHNICAL FIELD

[0001] The claimed technical solution generally relates to the field of computer technology, and in particular to a method and system for generating text for a digital assistant in dialogue systems.

BACKGROUND OF THE ART

[0002] As a result of the functioning of the language, its varieties have developed, belonging to a certain style of text presentation, which is characterized by certain features, linguistic means, genres, etc. So, when publishing a text in a scientific journal, such a text will have features of a scientific style, while in informal communication the text will have features of a conversational style, for example, informal “you” addresses, simplicity of sentence construction, the use of slang, etc. .

[0003] Currently, with the development of information technology, speech style transfer technologies have received active development in the field of text processing in natural language (Natural Language Processing, NLP) and today they are trying to integrate it into a variety of areas. Automating the process or part of the process of styling text in a certain style can significantly increase efficiency in areas such as journalism, in publishing houses, for example, text editors, content creation for media platforms and virtual assistants, etc.

[0004] However, despite the demand for this technology, there are a number of difficulties that do not allow, for example, generating stylized text with high accuracy. Thus, the problems of text stylization are ensuring the safety of the original information, ensuring the absence of new facts in the generated texts, preserving the semantic load of the source text, etc. In addition, one of the important problems is also the possibility of ensuring the versatility of the technology, which allows generating text not only in one style, but also providing the ability to stylize the source text in several styles, depending on the scope of application. Therefore, creating an effective and An accurate way to automatically generate text in given styles is an essential task.

[0005] Thus, a method for transferring text style, disclosed in the source [1], is known from the prior art. This method provides the ability to generate stylized text from source text by solving a machine translation problem. In this case, the “language” into which the source text needs to be translated is the text style.

[0006] The disadvantages of this solution include the high complexity of implementation and narrow focus of this solution, due to the huge set of required training data, and the inability to adapt to different styles due to the peculiarities of language translation technology. In addition, this solution also does not provide high accuracy, because in the process of such “translation” the meaning of the original phrase may be lost due to changes in all the words of the source text.

[0007] The prior art also knows a method for providing logical answers that imitate the user's speech style, disclosed in RF patent No. RU 2693332 C1 (LIMITED LIABILITY COMPANY "YANDEX"), publ. 07/02/2019. This method provides the ability to select a contextual answer to a question, depending on the context of the question, by analyzing the vector representation of the contextual question and searching for the closest answer from a set of answers in the database.

[0008] The disadvantages of this solution are the inability to generate stylized text based on the source text, the high cost of computing power and the large amount of memory required to generate a database (DB), the limitation of stylization to DB patterns, the low accuracy of generating stylized text due to the selection of pre-created and stored in DB of stylistic responses.

[0009] The general disadvantages of existing solutions are the lack of an effective way to generate stylized text with high accuracy, ensuring the safety of the original information and the absence of new facts in the generated texts. Also, this method must ensure that the semantic load of the source text is preserved. In addition, this method should ensure the versatility of text stylization technology, allowing you to generate text not only in one style, but also provide the ability stylization of the source text in several styles, depending on the scope of application.

DISCLOSURE OF INVENTION

[0010] The claimed technical solution proposes a new approach to generating text for a digital assistant in dialogue systems. This solution uses a machine learning algorithm that allows the generation of stylized text for the digital assistant from the source text with high accuracy, ensuring the semantic proximity of the source and stylized text and excluding the distortion of the stylized text by new facts.

[0011] Thus, the technical problem of providing the ability to generate stylized text is solved.

[0012] The technical result achieved by solving this problem is to increase the semantic accuracy of generating stylized text from the source text.

[0013] An additional technical result that appears when solving the above problem is the ability to generate multiple stylized texts from one source.

[0014] These technical results are achieved by implementing a computer-implemented method for automatically generating text for a digital assistant in conversational systems, performed by at least one computing device, and comprising the steps of: a) receiving input data containing source text in natural language and target style of digital assistant remarks; b) encoding the source text, and during encoding at least tokenization of the text data is performed; c) perform vectorization of the tokens obtained in step b); d) process vector representations of source text tokens obtained at stage c), using a machine learning model based on a neural network trained on stylized text replicas of the digital assistant, in accordance with a given target style, during which an array of vectorized stylized texts; e) decoding each vectorized stylized text from the array obtained in step d), and during decoding, at least converting the vectorized stylized text into tokens and detokenization is performed; f) filter the array of stylized texts obtained at step e); g) ranking the filtered stylized texts and selecting the best stylized text, the selection of the best text being based on the pairwise distance between the original text and each of the possible stylized texts; h) send the stylized text obtained in step g) to the dialogue system.

[0015] In one of the particular embodiments of the method, filtering an array of stylized texts is performed using regular expressions and a morphological analyzer.

[0016] In another particular embodiment of the method, filtering an array of stylized texts is performed based on the match of proper names in the source text and each stylized text from the array.

[0017] In another particular embodiment of the method, the matching of proper names is determined using the number of immutable named entities contained in the source text and the stylized text.

[0018] In another particular embodiment of the method, the number of immutable named entities is determined based on the recognition of named entities in the source text and each stylized text from the array.

[0019] In another particular embodiment, the method further comprises the step of checking for the presence of text stylization.

[0020] In another particular embodiment of the method, the presence of text stylization is checked based on determining whether the target style indicators have been replaced.

[0021] In addition, the claimed technical results are achieved through an automatic text generation system for a digital assistant in dialogue systems, comprising: at least one processor; • at least one memory coupled to the processor that contains computer-readable instructions that, when executed by the at least one processor, provide a method for generating text for digital assistants in conversational systems.

BRIEF DESCRIPTION OF THE DRAWINGS

[0022] Features and advantages of the present invention will become apparent from the following detailed description of the invention and the accompanying drawings.

[0023] FIG. 1 illustrates a general block diagram of the proposed system.

[0024] FIG. 2 illustrates a block diagram of the claimed method.

[0025] FIG. 3 illustrates an example of a general view of a computing device that provides implementation of the claimed solution.

IMPLEMENTATION OF THE INVENTION

[0026] The concepts and terms necessary to understand this technical solution will be described below.

[0027] A model in machine learning (ML) is a set of artificial intelligence methods, the characteristic feature of which is not the direct solution of a problem, but learning in the process of applying solutions to many similar problems.

[0028] Named-entity recognition (NER) is an information extraction subtask that aims to find and classify mentions of named entities in unstructured text into predefined categories such as proper names, character names, organizations, locations, monetary values, percentages, etc.

[0029] Word embeddings are a general name for various approaches to language modeling and representation learning in natural language processing that aim to map words (and possibly phrases) from some dictionary of vectors from an n-dimensional real space R_n.

[0030] Tokenization is the process of breaking text into text units or tokens (most often these units are words, but they can also be letters, parts of a sentence, combinations of words, etc.). [0031 ] A language model is a probability distribution on a set of vocabulary sequences. In this patent, the term “language model” is used to describe neural network language models that are designed to model a language by estimating the probability of a particular sequence of characters.

[0032] The claimed technical solution offers a new approach that improves the semantic accuracy of stylized text generation, which consists in ensuring the preservation of semantic content in the stylized text, as well as eliminating the addition of stylized text with new facts and logical units of language. One of the features of the claimed technical solution is the ability to generate multiple stylized texts from a single source text, which also ensures automation of the process of generating stylized texts and significantly reduces the time of text stylization compared to manual generation of each individual stylized text.

[0033] The claimed technical solution can be implemented on a computer, in the form of an automated information system (AIS) or a machine-readable medium containing instructions for performing the above method.

[0034] The technical solution may also be implemented as a distributed computer system or computing device.

[0035] In this solution, a system means a computer system, computer (electronic computer), CNC (computer numerical control), PLC (programmable logic controller), computerized control systems and any other devices capable of performing a given, clearly defined sequence of computing operations (actions, instructions).

[0036] A command processing device means an electronic unit or an integrated circuit (microprocessor) that executes machine instructions (programs)/

[0037] An instruction processing device reads and executes machine instructions (programs) from one or more data storage devices, such as devices such as random access memory (RAM) and/or read only memory (ROM). They can act as ROM, but not limited to hard drives (HDD), flash memory, solid state drives (SSD), optical storage media (CD, DVD, BD, MD, etc.), etc.

[0038] Program - a sequence of instructions intended for execution by a computer control device or command processing device.

[0039] The term "instructions" as used in this application may refer generally to software instructions or software commands that are written in a given programming language to perform a specific function, such as, for example, text encoding and decoding, filtering, ranking , translation of texts into a dialogue system, etc. Instructions can be implemented in a variety of ways, including, for example, object-oriented methods. For example, instructions can be implemented using the Python programming language, C++, Java, Python, various libraries (for example, MFC; Microsoft Foundation Classes), etc. Instructions that implement the processes described in this solution can be transmitted either over the wire, and via wireless data transmission channels, such as Wi-Fi, Bluetooth, USB, WLAN, LAN, etc.

[0040] In FIG. 1 shows a general view of a text generation system 100 for a digital assistant in dialog systems. The system 100 includes basic functional elements, such as: an encoding/decoding module 101, a style transfer module 102, a stylized text filtering module 103, a stylized text ranking module 104. The elements of the system 100 are disclosed in more detail in FIG. 3.

[0041] The dialogue system can be system 120 and represent various solutions, for example, voice assistants, chat bots, robotic call centers, and other technologies that implement an automated process of communication with a user, such as user 110. It is worth noting that that in this solution, dialogue systems should be understood as any automated human-machine system operating in dialogue mode, in which it responds to every user command and turns to him for information as needed.

[0042] Digital assistants (virtual digital assistants) can be systems for automating user interaction, implemented on the basis of artificial intelligence in a dialog format (chatbot, skills for voice assistant, etc.). Thus, in one particular embodiment, the digital assistant may be a computer system that simulates a conversation with users in a conversational format.

[0043] Text stylization in this solution refers to the generation of text by transforming the received source text into text that has stylistic speech features. Thus, stylistic features can be the emotional color of the text (cheerful, sad, etc.). In another particular embodiment, stylistic speech features may be the conditions and goals of communication in some area of public activity, for example, official business activity, journalistic activity, conversational, artistic, etc. It is worth noting that stylization of the text can also be the giving of characteristic features to the text inherent in the characteristics of communication of individual individuals, characters, literary heroes, etc., without limitation. Thus, in another particular embodiment, text stylization may be the transformation of the original text into a stylized one, in accordance with the specified communication style of a particular digital assistant, which has conversational style features, for example, the use of certain pronouns emphasizing the informal/formal style of communication (“You” and “You”), gender, number, etc.

[0044] The encoding/decoding module 101 may be implemented on at least one computing device equipped with appropriate software and include a set of models for tokenizing and detokenizing text, vectorizing tokenized text, and converting tokens to text, such as one or more models machine learning for converting text information into vector form, for example, BERT, ELMo, ULMFit, XLNet, RoBerta, RuGPT3 and others. In one particular embodiment, module 101 may be implemented on system 300, which is described in more detail in FIG. 3. It is worth noting that the specific method of tokenization and vectorization depends on the selected language model on the basis of which module 102 is implemented. For example, when using the RuGPT3 model, tokenization is carried out using the BPE (Byte Pair Encoding) method, and subsequent vectorization is carried out by replacing each token with its index in the language model dictionary compiled at the stage of initial model training. Additionally, in yet another particular embodiment, word tokenization may be used as the tokenization method. Example of tokenization by words and encoding of words by indexes in the dictionary:

[0045] Module 102 can be implemented on the basis of at least one neural network, pre-trained on specific sets of stylized texts in accordance with specified styles. As a machine learning model that implements the function of generating stylized texts in accordance with a given style, for example, a generative language model such as RuGPT3, XLNet, etc. can be used. In one particular embodiment, when implementing the stated solution, the MO was the Russian-language generative language model RuGPT3 -Large. The model was trained on sources from different domains: Wikipedia, books, news, Russian Common Crawl, etc. The model was trained for 14 days on 128 GPUs with a context window of 1024 and an additional few days on 16 GPUs with a context window of 2048. The final model has a perplexity of 13.8 on test data set. At this stage of training, the result of training the language model was the ability to predict the probability of the next token based on the previous initial text fragment. So, if during the training process the model often encountered a certain phrase in the training data, then when predicting the next token after the known one from the phrase, the model will with a high probability predict exactly the token from the phrase in the training data set.

[0046] Next, to directly perform the process of generating stylized text from the source text, additional training of the trained model was performed. To further train the model, the fine-tune procedure was used. At this stage, the weights of the trained model were adjusted in accordance with the problem being solved. Thus, when the original sentence appears in the model, due to the changed weighting coefficients, the most likely sentence for the model will be a paraphrase of this sentence in a certain style, i.e. the probability of continuation of a text fragment in the format in which the stylized text should be generated is increased. When additionally training the model, datasets with various stylized replicas of digital assistants were used. Thus, in one particular embodiment, the problem of stylizing the source text for three speech styles of digital assistants was solved. The initial data for additional training contained 2174, 2436 and 32242 replicas in the style of each assistant, respectively. In this case, the original data contained only replicas in style of specific assistants, but did not contain the source text from which such stylized replicas are generated.

[0047] To generate a dataset (training data set) in a format suitable for additional training of the model, consisting of pairs “original replica” - “assistant-style replica”, a parafraser was applied to the stylized replicas, for example, a parafraser based on the RuT5 generative model. The paraphraser model is known from the prior art and is disclosed, for example, in a source available via the Internet link: https://huggingface.co/cointegrated/rut5-base-paraphraser. For each stylized replica, 10 initial versions of the paraphrase were generated, from which the 2 closest initial replicas were then selected based on a semantic metric, for example, the LabSe metric. This metric evaluates the cosine similarity between the vector representations of sentences obtained using the model, which corresponds to semantic proximity. Thus, the final datasets for additional training of RuGPT3 for the style transfer task contained 4348, 4872, 64484 pairs of sentences (“original replica” - “replica in the assistant’s style”). In addition, a tag was also added to the data that characterizes a specific digital assistant with its corresponding speech style to make it possible to determine in what style the stylized text will be generated. Using the received data, the model was additionally trained for 5 epochs.

[0048] The resulting model was evaluated on a test set separately for each assistant. The test data set was 1097 replicas for each assistant from the assistant sets. The following metrics were used to assess quality: 1) BLEU (Papineni, K.; Roukos, S.; Ward, T.; Zhu, WJ (2002). BLEU: a method for automatic evaluation of machine translation (PDF). ACL- 2002: 40th Annual meeting of the Association for Computational Linguistics, pp. 311-318. CiteSeerX 10.1.1.19.9416.). The BLEU algorithm compares the double translation phrases with the phrases it finds in the reference variant and calculates the number of matches in a weighted manner. These matches are independent of position. A higher degree of match indicates a higher degree of similarity to the reference translation and a higher score. Clarity and grammar are not taken into account. 2) The number of common N-Grams, where sequences of words of length from 3 to 8 were taken as N-Grams. 3) Levenshtein - Levenshtein distance (V.I. Levenshtein. Binary codes with correction of deletions, insertions and substitutions of symbols. Reports of the Academies of Sciences USSR, 1965. 163.4:845-848.) (editorial distance, editing distance). A specified metric that measures the absolute value of the difference between two sequences of characters. It is defined as the minimum number of single-character operations (namely insertion, deletion, replacement) required to transform one sequence of characters into another. 4) Jaccard index - Jaccard index (Jaccard R. Distribution de la flore alpine dans le Bassin des Dranses et dans quelques regions voisines 11 Bull. Soc. Vaudoise sci. Natur. 1901. V. 37. Bd. 140. S. 241 - 272.), which was calculated as the number of common tokens in the stylized replica and the original sentence divided by the combination of tokens in these two text fragments.

[0049] The model evaluation results for three digital assistants, each with their own communication style, are shown in Table 1.

Table

1

[0050] From Table 1 it can be seen that the BLEU score, the number of common n-grams and the Jaccard index have a level of more than 95%. The Levenshtein distance for all assistants is also on average less than 2, which indicates that the generated sentences, on average, differ from the sample answers by no more than 2 characters.

[0051] Module 103 may be implemented on at least one computing device and include a morphological parser, such as a Unigram parser, N-gram parser, regular expression parser, etc. Also, module 103 may contain a neural network trained to solve the NER (named entity recognition) problem, for example, Slovnet BERT NER, DeepPavlov BERT NER, etc., but not limited to. In one particular embodiment, module 103 may contain a heavy model with BERT architecture, and be trained on a small manually annotated dataset.

[0052] The stylized text ranking module 104 may be implemented on at least one computing device equipped with appropriate distance calculation software Levenshtein. Thus, the specified module 104 is configured to implement an algorithm for calculating pairwise distances between the source text and stylized texts.

[0053] It will be apparent to one skilled in the art that, although the modules described above are presented as separate devices, these modules can also be combined within a single device, such as system 300.

[0054] In FIG. 2 is a block diagram of a method 200 for automatically generating text for a digital assistant in conversational systems, which is described step by step in more detail below. Said method 200 consists of performing steps to process various digital data. Processing is typically performed by a system, such as system 100, which may also represent, for example, a server, computer, mobile device, computing device, etc.

[0055] At step 210, the system 100 receives input data containing the natural language source text and the target speech style of the digital assistant. Thus, input data may be received from the dialog system 120 over data links such as the Internet. The source text received from the dialog system 120 may represent, for example, a dialog response such as an answer to a user question, a question to the user, etc. The target style of the digital assistant's remarks may represent certain characteristic features of the speech style, for example, a conversational style of communication, which is characterized, for example, by the presence of informal addresses to the user, a business style of communication, which is characterized by formal addresses, etc. Also, in one particular embodiment, the target style of remarks may indicate the emotional color of the text, for example, sad, happy, neutral, etc. The target style of the digital assistant's remarks may be determined by the settings of the conversational system 120. Thus, a conversational system, such as system 120, may contain a set of digital assistants, each of which has a different communication style. When starting a dialogue, the user can select a specific assistant in the system settings. The specified data about the selected assistant is also transmitted to the system 100.

[0056] At step 220, the source text is encoded, and during encoding, at least tokenization of the text data is performed. This step 220 may be performed by module 101. The input text may be divided into tokens. In this solution, a token should be understood as a sequence of characters in a text that is meaningful for analysis. In yet another particular embodiment, text tokenization can be performed using the BPE (Byte Pair encoding) algorithm. In yet another particular embodiment, tokenization may involve breaking text into words based on the space between words. Next, a dictionary of tokens of a fixed size (for example, 30,000 tokens) is compiled, where each token is associated with its index in the dictionary.

Example of tokenization for words:

<application>']

[0057] At this step 230, vectorization of the tokenized texts is performed. As mentioned above, the tokenization method depends on the language model that is used in module 102 at step 240. Thus, for example, when using the RuGPT3 language model, each token is associated with its index in the dictionary. Thus, a tokenized text fragment (list of tokens) after vectorization is mapped into a vector of token data indices in the dictionary.

An example of vectorization when tokenizing by words:

[0058] Method 100 then proceeds to step 240.

[0059] At step 240, vector representations of source text tokens are processed using module 103, during which an array of vectorized stylized texts is generated. As mentioned above, a machine learning model based on a neural network was additionally trained on text replicas of digital assistants stylized in accordance with the specified target style. At step 240, the model input is a vector representation of the source text. At the output, the model generates several variants of stylized texts (candidates) in the form of vector representations. The specified vector representations are saved into an array of stylized texts. The number of candidates generated by the model depends on the sampling methods used. In one particular embodiment, the following sampling criteria were used for the model: top P sample (top_p=0.92), top K sample (top_k=50), temperature sample (temperature=0.85). The specified sampling methods are disclosed in more detail in the source, found on the Internet at the link: https://towardsdatascience.com/how-to-sample-from-language-models-682bceb97277 The general principle of the model is the ability to predict the probability of the next token in a certain context. Thus, at the first stage of training, the language model is able to predict the probability of the next token based on the previous initial text fragment. To realize the possibility of generating stylized texts by the specified model, its weights are then changed so that the probabilities for the next token that the model predicts correspond to the current task of text stylization. The specified change in weights is carried out on the basis of the training data set. The additionally trained model is then able to generate from the source text a set of stylized texts with different degrees of probability of tokens in this text. Since the generated set of stylized texts can be very large, and the probability distribution in some of them will be too small, such candidates (generated stylized texts) can be cut off based on the sampling criteria specified above. As a result, an array of stylized texts is formed at the output of the model.

[0060] It is worth noting that providing the ability to generate an array of stylized texts, rather than just one stylized text, increases the accuracy of text stylization, because provides the ability to further check and process all variants of stylized text to select the most semantically close to the original text from the array. In addition, the generation of an array of stylized texts allows us to further eliminate the addition of new facts and/or incorrect changes to certain words based on further processing of such texts.

[0061] At step 250, the generated array of vectorized stylized texts is supplied to module 101. At said step 250, each vectorized stylized text from the array is decoded, and during decoding, at least the vectorized stylized text is converted into tokens and detokenized. Thus, for example, during this process, each vector of a fixed length, based on its dimension, is mapped to a token by a dictionary index, which allows each vector to be represented as a token. The process of detokenization is the reverse process of tokenization and consists of combining tokens into text. As a result of this step 250, the array of vectorized stylized texts is converted into an array of stylized natural language texts. Example:

Original text ['Here's what I found based on your request]

An array of stylized texts: ['This is what I found from your application', 'This is what I found from your application', 'This is what I found from your application', 'What I found from your application'...]

[0062] At step 260, the array of stylized texts is filtered. At said step 260, module 103 filters said array. Thus, at this stage, those candidates who do not satisfy the specified characteristic style features are excluded. So, for example, when generating a stylized replica for a digital assistant simulating conversational informal communication as a female, there will be at least the following criteria: by gender (feminine) and address (Tu). Thus, as a result of generating a stylized text for a female character who adheres to a business style of communication (characteristic stylistic features of a business style of communication are inherent), the candidate “This is what I found based on your application” is excluded. This filtering also improves the accuracy of text styling. To check the correctness, regular expressions written on the basis of the ge library and the MorphAnalyzer morphological analyzer from the pymorphy2 library are used. The output of this module is a filtered list of possible candidates that have passed the correctness check. Example: [‘This is what I found from the application ‘What I found from your application’...].

[0063 ] Additionally, in one particular embodiment, filtering of an array of stylized text is performed based on matches of named entities (eg, proper names) in the source text and each stylized text in the array. To do this, in the source and each of the stylized texts, a named entity recognition algorithm is executed, for example, using module 103, and the number of unchanged named entities contained in the source text and the stylized text is compared. As discussed above, at step 240, an array of stylized texts is generated. To increase semantic accuracy between the stylized and source text, the stylized text should not distort facts and should not be supplemented with new logical connections. In turn, the specified array of stylized texts may contain relevant stylized texts that satisfy the characteristic stylistic features, however, such texts will be incorrect, for example, due to changes in the facts of such text. Example:

Source text: I think you should read the work of Alexander Sergeevich Pushkin “Eugene Onegin”

Incorrect stylized text: I think you should read the work of Alexander Vasilyevich Pushkin “Eugene Onegin”

[0064] As can be seen from the example, the generated text satisfies the stylistic features of the official style (appeal to “You”), i.e. is relevant, however, it changes the facts of the text (the name of the author of the work).

[0065] To solve this problem, an approach has been proposed that consists of comparing named entities in the source text and each stylized text from the set. To do this, in the first step, recognition of named entities (for example, names, names of toponyms and organizations) is performed in the source text and stored in memory, for example, in the memory of the system 100, in the form of a data file. The next step is to recognize named entities in each stylized text from the array. After this, a comparison of immutable entities between the original and stylized texts is performed. Accordingly, candidates that differ from the source text in named entities are discarded, which further increases the semantic accuracy of text stylization by excluding changes/additions of facts to the stylized text.

[0066] Thus, at step 260, the array of stylized texts is filtered.

[0067] At step 270, the filtered stylized texts are ranked and the best stylized text is selected, the selection of the best text being based on the pairwise distance between the original text and each of the possible stylized texts. At step 270, the set of filtered stylized texts is ranked by character-by-character proximity to the original text. This ranking is done based on the Levenshtein distance. To do this, pairwise distances between the original neutral phrase and each of the possible candidates are calculated. The candidate whose Levenshtein distance is minimal is selected as the best one. Levenshtein distance is calculated using the distance function from the Levenshtein library. As a result, we get a stylized replica of the digital assistant, with the least change source text fragment, which accordingly increases the accuracy of the entire stylization algorithm and reduces the likelihood of adding new facts.

[0068] In addition, in one particular embodiment of the claimed solution, the text obtained as a result of ranking can be additionally checked for stylization. Checking for text stylization is carried out by determining whether the target style indicators have been replaced. For example, system 100 may further include a styling checker implemented on the computing device. The specified module is made with the ability to check for a pair (original replica, stylized replica) whether during stylization the replacement of target style indicators occurs, for example, gender/number or number in circulation You/You compared to the original phrase (there are many neutral phrases for which no replacement required). This module is designed to be replaced only if at least one change occurs. Thus, if a replacement occurs, then at the output we get a stylized replica, and if there is no replacement, then the original one, which is neutral and suitable for any style.

[0069] At step 280, the stylized text is transmitted to the dialog system. At this step 280, the stylized replica of the digital assistant may be stored in system memory as a file and sent to the dialog system via, for example, a data link for subsequent display to the user, for example, using the I/O interface of the dialog system 120.

[0070] Thus, the above materials described a system and method for generating text for digital assistants in dialog systems, ensuring high semantic accuracy in generating stylized text.

[0071] In addition, it is worth noting that thanks to the implementation of the claimed solution, the universality of text stylization is also ensured, allowing you to generate text not only in one style, but also to provide the ability to stylize the source text in several styles, depending on the scope of application. This feature eliminates the need to separately create a unique stylized replica for each style. Thanks to the application of the claimed solution, it is possible to submit one source text and generate different stylized texts based on it. [0072] Now let's look at one example of the implementation of the claimed technical solution.

[0073] One possible use of the system is to style an initially neutral response to match the style of a given assistant. So, as mentioned above, conversational systems can contain a set of digital assistants, each of which has its own communication style. When a user requests, the dialogue system generates an initially neutral response (source text). The specified method 200 provides the ability to generate from a specified neutral response, for example, a response from a digital assistant, a stylized response (stylized text) in accordance with a specified style of the digital assistant. This eliminates the need to generate multiple stylized replicas for each digital assistant. Also, another advantage of this approach is flexibility in choosing a style; generating a text response is a computationally complex operation, and it is necessary to train a large model for it. The stated solution, in turn, allows you to stylize the generated initial response using the 100 system for different styles.

[0074] In FIG. 3 shows an example of a general view of a computer system (300), which provides the implementation of the claimed method or is part of a computer system, for example, modules 101-103, a server, a personal computer, part of a computing cluster that processes the necessary data to implement the claimed technical solution.

[0075] In general, the system (300) includes components such as: one or more processors (301), at least one memory (302), data storage means (303), input/output interfaces (304), means B /B (305), a means of network interaction (306), which are combined via a universal bus.

[0076] The processor (301) performs the basic computational operations necessary to process data when executing the method (200). The processor (301) executes the necessary machine-readable instructions contained in the RAM (302).

[0077] The memory (302) is typically in the form of RAM and contains the necessary software logic to provide the required functionality.

[0078] The data storage medium (303) can be in the form of HDD, SSD drives, raid array, flash memory, optical storage devices (CD, DVD, MD, Blue-Ray disks), etc. The means (303) allow carry out long-term storing various types of information, such as generated stylized replicas, user IDs, digital assistant IDs, etc.

[0079] To organize the operation of system components (300) and organize the operation of external connected devices, various types of I/O interfaces (304) are used. The choice of appropriate interfaces depends on the specific design of the computing device, which can be, but is not limited to: PCI, AGP, PS/2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232, etc.

[0080] The choice of interfaces (304) depends on the specific design of the system (300), which can be implemented on the basis of a wide class of devices, for example, a personal computer, mainframe, laptop, server cluster, thin client, smartphone, server, etc.

[0081] The following can be used as I/O data (305): keyboard, joystick, display (touch display), monitor, touch display, touchpad, mouse, light pen, stylus, touchpad, trackball, speakers, microphone, augmented reality tools, optical sensors, tablet, light indicators, projector, camera, biometric identification tools (retina scanner, fingerprint scanner, voice recognition module), etc.

[0082] Network interaction means (306) are selected from devices that provide network reception and transmission of data, for example, an Ethernet card, WLAN/Wi-Fi module, Bluetooth module, BLE module, NFC module, IrDa, RFID module, GSM modem, etc. etc. Using the means (305), the organization of data exchange is ensured between, for example, the system (300), presented in the form of a server, and the user’s computing device, on which the received data (generated stylized replica of the digital assistant) can be displayed via a wired or wireless transmission channel data, for example WAN, PAN, LAN, Intranet, Internet, WLAN, WMAN or GSM.

[0083] The specific selection of system elements (300) for implementing various software and hardware architectural solutions may vary while maintaining the required functionality provided. [0084] The submitted application materials disclose preferred examples of implementation of a technical solution and should not be interpreted as limiting other, particular examples of its implementation that do not go beyond the scope of the requested legal protection, which are obvious to specialists in the relevant field of technology. Thus, the scope of the present technical solution is limited only by the scope of the attached formula.

INFORMATION SOURCES

1. Empirical Study on Multi-Task Learning for Text Style Transfer and Paraphrase Generation, Pawel Bujnowskia, Kseniia Ryzhovac, Hyungtak Choib, Katarzyna Witkowskad, Jarostaw Piersaa, Tymoteusz Krumholca and Katarzyna Beksa. Found on the Internet at: https://aclanthology.Org/2020.coling-industry.6.pdf. 04/20/2022.

Claims

FORMULA A method for automatically generating text for a digital assistant in conversational systems, performed by at least one computing device, and comprising the steps of: a) obtaining input data containing natural language source text and the target style of the digital assistant's remarks; b) encoding the source text, and during encoding at least tokenization of the text data is performed; c) perform vectorization of the tokens obtained in step b); d) process vector representations of source text tokens obtained at stage c), using a machine learning model based on a neural network trained on stylized text replicas of the digital assistant, in accordance with a given target style, during which an array of vectorized stylized texts; e) decoding each vectorized stylized text from the array obtained in step d), and during decoding, at least converting the vectorized stylized text into tokens and detokenization is performed; f) filter the array of stylized texts obtained at step e); g) ranking the filtered stylized texts and selecting the best stylized text, the selection of the best text being based on the pairwise distance between the original text and each of the possible stylized texts; h) send the stylized text obtained in step g) to the dialogue system. The method according to claim 1, characterized in that filtering an array of stylized texts is performed using regular expressions and a morphological analyzer. The method according to claim 1, characterized in that the filtering of an array of stylized texts is performed based on the coincidence of proper names in the source text and each stylized text from the array. The method according to claim 3, characterized in that the coincidence of proper names is determined using the number of unchanged named entities contained in the source text and the stylized text. The method according to claim 4, characterized in that the number of immutable named entities is determined based on the recognition of named entities in the source text and each stylized text from the array. The method according to claim 1, characterized in that it further comprises the step of checking for the presence of text stylization. The method according to claim 7, characterized in that the presence of text stylization is checked based on determining whether the target style indicators have been replaced. A system for automatically generating text for a digital assistant in dialogue systems, containing:

• at least one processor;

• at least one memory coupled to the processor, which contains machine-readable instructions that, when executed by at least one processor, enable execution of the method according to any one of claims. 1-7.