CN112069813A

CN112069813A - Text processing method, device and equipment and computer readable storage medium

Info

Publication number: CN112069813A
Application number: CN202010944900.7A
Authority: CN
Inventors: 王兴光
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2020-12-11
Anticipated expiration: 2040-09-10
Also published as: CN112069813B

Abstract

The embodiment of the application provides a text processing method, a text processing device, text processing equipment and a computer readable storage medium, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: dividing a word vector of each word in a text to be processed to at least form a global information sub-vector and a local information sub-vector of the word vector; performing attention calculation on corresponding words through the global information subvectors of each word to obtain the attention value of the corresponding words; accumulating the local information sub-vectors of the corresponding words and the attention values to obtain weighted word vectors of the corresponding words, and further forming merged vectors; and determining the merged vector as a feature vector of the text to be processed, and performing text processing on the text to be processed by adopting the feature vector. By the method and the device, the characteristic vector of the text to be processed can be accurately obtained, and accuracy of a processing result in a subsequent text processing process is improved.

Description

Text processing method, device and equipment and computer readable storage medium

Technical Field

The embodiment of the application relates to the technical field of internet, and relates to but is not limited to a text processing method, a text processing device, text processing equipment and a computer readable storage medium.

Background

In the field of artificial intelligence, when a text is processed, for example, when any one of the text processes such as translation of the text, question-and-answer matching of the text, and search of the text is performed, it is generally necessary to process a vector corresponding to the text in advance to obtain a processed feature vector, and then implement processing of the text based on the processed feature vector.

In the related art, the processing of the vector corresponding to the text is usually implemented in advance by Ordered Neurons (Ordered Neurons) or Self-Attention structures (Self-Attention).

However, none of the vector processing methods in the related art can describe the semantic hierarchical relationship between symbols in a text, and the embedded representation vector (Embedding) corresponding to the default current symbol of Self-orientation needs to completely interact with other symbols, so that the accuracy of the processing result obtained in the subsequent text processing process is low.

Disclosure of Invention

The embodiment of the application provides a text processing method, a text processing device, text processing equipment and a computer readable storage medium, and relates to the technical field of artificial intelligence. Because the word vector of each word in the text to be processed is divided, at least the global information sub-vector and the local information sub-vector of the word vector are formed, and attention calculation is performed based on the global information sub-vector and the local information sub-vector, the weighted word vector of each word can be accurately obtained, and the accuracy of the processing result in the subsequent text processing process is improved.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a text processing method, which comprises the following steps:

dividing a word vector of each word in a text to be processed to at least form a global information sub-vector and a local information sub-vector of the word vector;

performing attention calculation on corresponding words through the global information subvectors of each word to obtain the attention value of the corresponding words;

accumulating the local information sub-vectors of the corresponding words and the attention values to obtain weighted word vectors of the corresponding words;

merging the weighted word vectors of at least one word in the text to be processed to form a merged vector;

and determining the merged vector as a feature vector of the text to be processed, and performing text processing on the text to be processed by adopting the feature vector.

An embodiment of the present application provides a text processing apparatus, including:

the dividing module is used for dividing the word vector of each word in the text to be processed to at least form a global information sub-vector and a local information sub-vector of the word vector;

the attention calculation module is used for carrying out attention calculation on corresponding words through the global information subvectors of the words to obtain the attention values of the corresponding words;

the accumulation processing module is used for carrying out accumulation processing on the local information sub-vectors of the corresponding words and the attention values to obtain weighted word vectors of the corresponding words;

a merging module, configured to merge the weighted word vector of at least one word in the text to be processed to form a merged vector;

and the processing module is used for determining the merged vector as a feature vector of the text to be processed and performing text processing on the text to be processed by adopting the feature vector.

a memory for storing executable instructions; and the processor is used for realizing the text processing method when executing the executable instructions stored in the memory.

An embodiment of the present application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the executable instructions to implement the text processing method described above.

The embodiment of the application has the following beneficial effects: the word vectors of each word in the text to be processed are divided to at least form a global information sub-vector and a local information sub-vector of the word vectors, attention calculation is carried out on the corresponding word based on the global information sub-vector to obtain the attention value of the corresponding word, accumulation processing is carried out on the local information sub-vector and the attention value of the corresponding word to obtain the weighted word vector of the corresponding word, and therefore the feature vector of the text to be processed is determined finally according to the weighted word vector. Therefore, the weighted word vector of each word can be accurately obtained through the global information sub-vector and the local information sub-vector, so that the feature vector of the text to be processed can be accurately obtained, and the accuracy of the processing result in the subsequent text processing process is improved.

Drawings

FIG. 1 is an alternative architectural diagram of a text processing system provided by an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a Self-orientation model provided in an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a server provided in an embodiment of the present application;

FIG. 4 is a schematic flow chart of an alternative text processing method provided in the embodiments of the present application;

FIG. 5 is a schematic flow chart of an alternative text processing method provided in the embodiments of the present application;

FIG. 6 is a schematic flow chart of an alternative text processing method according to an embodiment of the present disclosure;

FIG. 7 is an alternative flow chart of a text processing method provided by an embodiment of the present application;

fig. 8 is a structural diagram of a modified Self-orientation model provided in an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiments of the present application belong. The terminology used in the embodiments of the present application is for the purpose of describing the embodiments of the present application only and is not intended to be limiting of the present application.

Before explaining a text processing method according to an embodiment of the present application, a text processing method in the related art is first explained:

in the related art, the processing of the vector corresponding to the text is usually realized by adopting an ordered neuron or a self-attention structure in advance. The method provides a serialization model based on a Long Short-Term Memory network (LSTM), weights hidden state vectors, models the hierarchical relationship of different position states of the vectors, and accordingly obtains the processed feature vectors of the text; in the method of Slef-orientation, a model (e.g., a pre-trained language characterization model (BERT) models the Position relationship between symbols (Token) by introducing Position Embedding (Position Embedding), which shows some excellent effects in many tasks.

However, none of the vector processing methods in the related art can describe semantic hierarchical relationships between symbols in a text, such as context relationships in syntactic analysis, and the Self-Attention defaults that an embedded representation vector corresponding to a current symbol needs to be completely interacted with other symbols, so that local information of the symbol itself is not fully considered, and accuracy of a processing result obtained in a subsequent text processing process is low.

In order to solve at least one problem of a text processing method in the related art, an embodiment of the present application provides a text processing method, which is a Self-orientation calculation method that considers local and global information of each word in a text to be processed, and introduces a new activation function to divide a vector corresponding to each symbol (including a word and a punctuation) of a Slef-orientation into a local part and a global part. The global part carries out the normal Self-Attention calculation, and the local part carries out accumulation in a mode similar to residual errors when the Self-Attention calculated output is output.

The embodiment of the application provides a text processing method, which comprises the steps of firstly, dividing a word vector of each word in a text to be processed to at least form a global information sub-vector and a local information sub-vector of the word vector; then, carrying out attention calculation on the corresponding word through the global information subvector of each word to obtain the attention value of the corresponding word; accumulating the local information sub-vectors and the attention values of the corresponding words to obtain weighted word vectors of the corresponding words; combining the weighted word vectors of at least one word in the text to be processed to form a combined vector; and finally, determining the merged vector as a feature vector of the text to be processed, and performing text processing on the text to be processed by adopting the feature vector. Therefore, the weighted word vector of each word can be accurately obtained through the global information sub-vector and the local information sub-vector, so that the feature vector of the text to be processed can be accurately obtained, and the accuracy of the processing result in the subsequent text processing process is improved.

An exemplary application of the text processing device according to the embodiment of the present application is described below, in one implementation, the text processing device according to the embodiment of the present application may be implemented as any terminal such as a notebook computer, a tablet computer, a desktop computer, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), an intelligent robot, and in another implementation, the text processing device according to the embodiment of the present application may also be implemented as a server. Next, an exemplary application when the text processing apparatus is implemented as a server will be explained.

Referring to fig. 1, fig. 1 is a schematic diagram of an alternative architecture of a text processing system 10 according to an embodiment of the present application. In order to implement text processing on a file to be processed, the text processing system 10 provided in this embodiment of the present application includes a terminal 100, a network 200, and a server 300, where a text processing application (for example, the text processing application may be a translation application or a text search application) runs on the terminal 100, and here, the text processing application is translation software, and the text to be processed is text to be translated. A user can input a text to be translated on a client of translation software of the terminal 100, after the terminal 100 obtains the text to be translated, the text to be translated is sent to the server 300 through the network 200, the server 300 divides a word vector of each word in the text to be translated to form at least a global information sub-vector and a local information sub-vector of the word vector; performing attention calculation on the corresponding word through the global information subvector of each word to obtain the attention value of the corresponding word; accumulating the local information sub-vectors and the attention values of the corresponding words to obtain weighted word vectors of the corresponding words; combining the weighted word vectors of at least one word in the text to be translated to form a combined vector; and determining the merged vector as a feature vector of the text to be translated, and translating the text to be translated by adopting the feature vector to obtain a translated text after translation. The server 300, after obtaining the translated text, transmits the translated text to the terminal 100 through the network 200, and the terminal 100 displays the translated text on the current interface 100-1.

The text processing method provided by the embodiment of the application also relates to the technical field of artificial intelligence, and can be realized through a natural language processing technology and a machine learning technology in the artificial intelligence technology. Wherein,

artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

In the embodiment of the application, the text processing method in the embodiment of the application is realized through a natural language processing technology and a machine learning technology in an artificial intelligence technology. It should be noted that, in the text processing method according to the embodiment of the present application, the step of obtaining the feature vector of the text to be processed may be implemented by Self-Attention, and the Se-Attention utilizes an Attention mechanism (Attention) to calculate the association between each word and other words in the text to be processed, that is, calculate an Attention value (Attention Score) between each word and other words, and obtain a weighted vector representation of each word by using the Attention value of each word, and then place the weighted vector representation in a feedforward neural network to obtain a new vector representation, where the new vector representation can well consider context information in the text to be processed.

Fig. 2 is a schematic structural diagram of a Self-authorization model provided in an embodiment of the present application, as shown in fig. 2, for Self-authorization, input vectors include Q (query), K (key), and V (value), and the Q, K, and V vectors are all input from the same text, and first a dot product between Q and K is calculated by a matrix multiplication (MatMul)201, then scaling is performed by a scaling module (Scale)202 to prevent an excessive dot product result, then Mask processing 203 is performed, and finally a logistic regression processing 204 is performed by a Softmax function to normalize the result into a probability distribution, and then a weighted sum representation is obtained by multiplying the vector V by a matrix multiplication 205, so as to obtain a weighted vector representation of each word.

Fig. 3 is a schematic structural diagram of a server 300 according to an embodiment of the present application, where the server 300 shown in fig. 3 includes: at least one processor 310, memory 350, at least one network interface 320, and a user interface 330. The various components in server 300 are coupled together by a bus system 340. It will be appreciated that the bus system 340 is used to enable communications among the components connected. The bus system 340 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 340 in fig. 3.

The Processor 310 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 330 includes one or more output devices 331, including one or more speakers and/or one or more visual display screens, that enable presentation of media content. The user interface 330 also includes one or more input devices 332, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 350 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 350 optionally includes one or more storage devices physically located remote from processor 310. The memory 350 may include either volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 350 described in embodiments herein is intended to comprise any suitable type of memory. In some embodiments, memory 350 is capable of storing data, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below, to support various operations.

An operating system 351 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 352 for communicating to other computing devices via one or more (wired or wireless) network interfaces 320, exemplary network interfaces 320 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

an input processing module 353 for detecting one or more user inputs or interactions from one of the one or more input devices 332 and translating the detected inputs or interactions.

In some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 3 illustrates a text processing apparatus 354 stored in the memory 350, where the text processing apparatus 354 may be a text processing apparatus in the server 300, and may be software in the form of programs and plug-ins, and the like, and includes the following software modules: the partitioning module 3541, attention calculation module 3542, accumulation processing module 3543, merging module 3544, and processing module 3545 are logical and thus may be arbitrarily combined or further separated depending on the functionality implemented. The functions of the respective modules will be explained below.

In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the text processing method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

The text processing method provided by the embodiment of the present application will be described below with reference to an exemplary application and implementation of the server 300 provided by the embodiment of the present application. Referring to fig. 4, fig. 4 is an alternative flowchart of a text processing method provided in an embodiment of the present application, and will be described with reference to the steps shown in fig. 4.

Step S401, dividing the word vector of each word in the text to be processed, and at least forming a global information sub-vector and a local information sub-vector of the word vector.

Here, the text to be processed may be any type of text, and may be, for example, a text to be translated, a text to be retrieved, a question text to be matched with an answer, or the like. The text to be processed comprises at least one word, wherein the word comprises not only words capable of representing text semantic information, but also words such as language words and language auxiliary words which are not used for representing text semantic information. The text to be processed may be a text of any language type, for example, a chinese text, an english text, or the like.

In the embodiment of the application, after the text to be processed is obtained, the text to be processed is divided to form at least one word, and each word can be a vocabulary or a single character. After at least one word is obtained, matching a word vector of each word in a preset word vector library. Here, Word vector (Word embedding), also known as a generalization of a set of Language modeling and feature learning techniques in Word embedded Natural Language Processing (NLP), where words or phrases from a vocabulary are mapped to a vector of real numbers, conceptually, relates to the mathematical embedding from a space of one dimension for each Word to a continuous vector space with a lower dimension. That is to say, the preset word vector library stores the word vector of each word, and after at least one word is obtained by division, the word vector of each word can be matched from the preset word vector library in sequence.

In the embodiment of the application, the word vector of each word is further divided into at least two parts, the two parts are the global information sub-vector and the local information sub-vector of the word vector respectively, wherein the word vector is formed after the global information sub-vector and the local information sub-vector are combined, that is, elements of a certain dimension in the word vector of each word are divided to form the global information sub-vector, and part or all of the elements in the remaining dimension in the word vector of the word are divided to form the local information sub-vector.

Step S402, calculating the attention of the corresponding word through the global information subvector of each word to obtain the attention value of the corresponding word.

Here, the attention calculation may be implemented by using an attention model, and inputting the global information subvector of each word as an input of the attention model into the attention model to obtain an attention value of the word, where the attention value is a value used for representing a weight occupied by the word in the whole text to be processed. If the attention value is higher, the weight of the word in the whole text to be processed is higher, which indicates that the word is more important, and the word needs to be paid more attention in the subsequent processing process of the whole text processing model; if the attention value is lower, the word has lower weight in the whole text to be processed, which indicates that the word is less important and can not be used as much attention in the subsequent processing process of the whole text processing model.

And step S403, accumulating the local information sub-vectors and the attention values of the corresponding words to obtain weighted word vectors of the corresponding words.

Here, the addition of the local information subvector of the corresponding word and the attention value corresponds to the residual processing of the corresponding word, and the addition of the calculated attention value to the local information subvector obtains a weighted word vector of the corresponding word, which is a vector to which the weight of the word is added. It should be noted that, by performing the accumulation processing on the local information sub-vectors and the attention values of the corresponding words, the weight of the vector of the important word in the text to be processed can be made higher than that of the vector of the unimportant word.

Step S404, merging the weighted word vectors of at least one word in the text to be processed to form a merged vector.

Here, merging the weighted word vector of at least one word in the text to be processed means that the weighted word vector of each word is sequentially connected with the weighted word vector of the next word to form a merged vector with a higher dimension. For example, the text to be processed includes two words a and B, where the weighted word vector of a is an n-dimensional vector, and the weighted word vector of B is an m-dimensional vector, so that after the weighted word vectors of a and B are combined, an n + m-dimensional combined vector is formed, and the elements in the combined vector are the elements in the weighted word vectors of a and B. In short, the weighted word vectors of a and B are merged, that is, the elements in the weighted word vectors of a and B are spliced to form a merged vector with a higher dimension.

Step S405, determining the merged vector as a feature vector of the text to be processed, and performing text processing on the text to be processed by adopting the feature vector.

Here, the merge vector is a vector that can represent information of the text to be processed, and attention calculation is performed on the merge vector according to the importance of each word, that is, the merge vector is a vector to which a weight is given to a word vector of each word. In this way, if the subsequent calculation of the text processing is performed by merging vectors, the calculation is performed according to the importance of each word in the whole text to be processed, and therefore, the subsequent text processing also takes into account the different importance of each word in the text to be processed.

In the embodiment of the application, the merged vector is determined as the feature vector of the text to be processed, and the feature vector can be input into any text processing model and used as an input value of the text processing model to perform relevant calculation of text processing.

According to the text processing method provided by the embodiment of the application, the word vector of each word in the text to be processed is divided, at least the global information sub-vector and the local information sub-vector of the word vector are formed, attention calculation is performed on the corresponding word based on the global information sub-vector, the attention value of the corresponding word is obtained, accumulation processing is performed on the local information sub-vector and the attention value of the corresponding word, the weighted word vector of the corresponding word is obtained, and therefore the feature vector of the text to be processed is determined finally according to the weighted word vector. Therefore, the weighted word vector of each word can be accurately obtained through the global information sub-vector and the local information sub-vector, so that the feature vector of the text to be processed can be accurately obtained, and the accuracy of the processing result in the subsequent text processing process is improved.

In some embodiments, the text processing system includes a terminal and a server, and a text processing application runs on the terminal, and the text processing application may be, for example, translation software, a text retrieval application, a question and answer matching application, and the text processing application is described as a question and answer matching application. The method comprises the steps that a question-answer matching application runs on a terminal, a server of the question-answer matching application is used as the server, a user inputs questions through a client of the question-answer matching application on the terminal, the server matches answer texts corresponding to the questions in a text base according to the questions input by the user, and the matched answer texts are output to the user.

Fig. 5 is an alternative flowchart of a text processing method provided in an embodiment of the present application, and as shown in fig. 5, the method includes the following steps:

step S501, the terminal obtains a text to be processed input by the user, wherein the text to be processed comprises a question of an answer to be matched.

Here, the terminal may acquire the text to be processed in any manner, for example, a user may input the text to be processed through a text input module on the terminal, where the text input module may be a touch screen input module, or a physical input module such as a keyboard or a mouse; or, the user can input voice information in a voice input mode, and the terminal analyzes the voice information of the user to obtain a text to be processed; or, the user can also input gesture information in a gesture input mode, and the terminal analyzes the gesture information of the user to obtain the text to be processed.

Step S502, the terminal encapsulates the text to be processed in the text processing request.

Here, the text processing request is used to request processing of the text to be processed, that is, the text processing request is used to request matching of an answer to the question.

In step S503, the terminal sends a text processing request to the server.

Step S504, the server analyzes the text processing request to obtain the question of the answer to be matched.

In the embodiment of the application, after the question of the answer to be matched is analyzed, the text of the question is divided to obtain at least one word.

In step S505, the server obtains a word vector of each word in the question from a preset word vector library.

Here, the preset word vector library includes word vectors of at least one word, and after the at least one word is obtained by division, the word vectors of each word may be sequentially matched in the preset word vector library.

In step S506, the server divides the word vector of each word to at least form a global information sub-vector and a local information sub-vector of the word vector.

Step S507, the server performs attention calculation on the corresponding word through the global information subvector of each word to obtain the attention value of the corresponding word.

Step S508, the server performs an accumulation process on the local information sub-vectors and the attention values of the corresponding words to obtain weighted word vectors of the corresponding words.

In step S509, the server merges the weighted word vectors of all words in the question to form a merged vector.

It should be noted that steps S506 to S509 correspond to steps S401 to S404, please refer to the detailed explanation in steps S401 to S404, and the process of steps S506 to S509 is not repeated in this embodiment of the present application.

Step S510, the server determines the merged vector as a feature vector of the text to be processed, and performs question-answer matching on the question by using the feature vector, so as to match a reply text corresponding to the question in the text library.

Here, since the text processing model is a question-answer matching model, after the merged vector of the text to be processed is obtained, the merged vector is input into the question-answer matching model as a feature vector of the text to be processed, and the feature vector of the text to be processed is processed through the question-answer matching model, so that the answer text corresponding to the question is matched in the text library.

In the embodiment of the application, the text library comprises at least one text, each text corresponds to a specific field, and each text is a solution result corresponding to at least one question. Each text corresponds to a text vector, which is a vector used to represent text information. Therefore, by calculating the similarity between the feature vector of the text to be processed and the text vector of each text, or calculating the matching degree between the feature vector of the text to be processed and the text vector of each text, the target text most relevant or matched with the text to be processed, that is, the reply text of the question corresponding to the text to be processed, can be determined.

In step S511, the server transmits the reply text to the terminal as an answer to the question.

And S512, displaying the answer of the question on the current interface by the terminal.

The text processing method provided by the embodiment of the application realizes question and answer matching of the questions corresponding to the text to be processed. The method comprises the steps that a to-be-processed text acquired by a terminal is sent to a server through interaction between the terminal and the server, so that the server is requested to perform text processing on the to-be-processed text, before the text processing, the server divides word vectors of each word in the text to at least form a global information sub-vector and a local information sub-vector of the word vectors, attention calculation is performed on corresponding words based on the global information sub-vectors to obtain attention values of the corresponding words, accumulation processing is performed on the local information sub-vectors and the attention values of the corresponding words to obtain weighted word vectors of the corresponding words, and therefore feature vectors of the to-be-processed text can be finally determined according to the weighted word vectors. Therefore, the weighted word vector of each word can be accurately obtained through the global information sub-vector and the local information sub-vector, so that the feature vector of the text to be processed can be accurately obtained, and the accuracy of the matched reply text in the subsequent text matching process is improved.

Based on fig. 4, fig. 6 is an optional flowchart of the text processing method provided in the embodiment of the present application, and as shown in fig. 6, step S401 may be implemented by the following steps:

step S601, determining a gating vector, wherein the gating vector at least comprises a non-zero interval.

Here, the gating vector is a vector for determining a division position when dividing the word vector, and the gating vector is a vector obtained by a preset setting or a vector transformed by a vector of a device in advance.

In some embodiments, the determining the gating vector in step S601 may be implemented by:

step S6011, a first gating vector and a second gating vector are obtained.

Here, the sum of all elements of the first gating vector is 1, and the elements in the first gating vector are arranged in an increasing order; the sum of all elements of the second gating vector is 1, and the elements in the second gating vector are arranged according to a descending order; the first gating vector has the same dimensions as the second gating vector. That is, the elements in the first gating vector and the second gating vector are arranged in the reverse order, one in increasing order and one in decreasing order.

Step S6012, sequentially multiplying the element at each position in the first gating vector with the element at the corresponding position in the second gating vector to obtain a product at the corresponding position. Step S6013, sequentially adding the product of each position to a new vector according to the order of each position in the first gating vector, and generating the gating vector.

For example, the first gating vector may be an increasing sequence of [0, … …, s ], the dimension of the first gating vector being N, where each element has a value less than or equal to 1 and the sum of N elements is 1. When setting the first gating vector, it may be that, starting from the first element, the value of the first element and the value of the second element are assigned to the second element, and the value of the second element and the value of the third element are assigned to the third element, so that the nth number in the first gating vector represents the sum of the first n numbers, and the value of the last element is 1, that is, the first gating vector may be a vector that is incremented to 1. Conversely, the second gating vector may be an increasing sequence of [ t, … …, 0], and the dimension of the second gating vector is also N, where each element has a value less than or equal to 1, the sum of N elements is 1, and the second gating vector may be a vector that decreases to 0.

Here, it is assumed that the first gating vector is 0 for the first 10 elements (may be a very small number close to 0, for example 10 to the power of-5), the last element is 1 (may be a number close to 1 and less than 1), and the second gating vector may be 0 for the last 10 elements, i.e. the sequence of the entire vector is tapered to 0. The first gating vector and the second gating vector may then be multiplied by the corresponding positions, i.e., the number of positions 1 of the first gating vector and the number of positions 1 of the second gating vector, the number of positions 2 of the first gating vector and the number of positions 2 of the second gating vector, and so on, … …, until the number of positions N of the first gating vector and the number of positions N of the second gating vector are multiplied. At this time, since the elements of the first 10 positions of the first gating vector and the last 10 positions of the second gating vector are both 0, after multiplication, the first 10 positions and the last 10 positions of the resulting gating vector are both 0, and then the middle position is not 0.

Step S602, determining the non-zero interval as a global position interval.

Here, a non-zero interval of the gating vector intermediate position is determined as a global position interval. For example, if the first 10 positions of the gating vector obtained by multiplying the element of each position in the first gating vector by the element of the corresponding position in the second gating vector are 0, and the last 10 positions are also 0, the global position interval is determined for the intervals corresponding to the positions except the first 10 positions and the last 10 positions.

Step S603, determining a subinterval after the nonzero interval in the gating vector as a local position interval. Here, assuming that the start position of the non-zero interval is i and the end position of the non-zero interval is j, the local position interval is the interval from the position j to the last position in the gating interval.

Step S604, dividing the word vector of each word according to the global position interval and the local position interval, and forming at least a global information sub-vector and a local information sub-vector of the word vector.

In some embodiments, step S604 may be implemented by:

step S6041 determines the position of the first element in the global position interval in the gating vector as an initial position.

Step S6042, determine the position of the last element in the global position interval in the gating vector as the termination position.

Step S6043, the word vectors of each word are divided according to the initial position and the end position, and at least the global information sub-vector and the local information sub-vector of the word vectors are formed.

In some embodiments, the method may further comprise the steps of:

step S61, a first number corresponding to the vector dimension of the gating vector is obtained.

Step S62, equally dividing the word vector into a first number of subintervals according to the sequence of the elements in the word vector of each word; each subinterval in the first number of subintervals corresponds to a position in the gating vector in turn.

Correspondingly, step S6043 may be implemented by:

step S6043a combines a first subinterval with an initial position corresponding to the first number of subintervals, a second subinterval with a termination position corresponding to the first number of subintervals, and other subintervals between the first subinterval and the second subinterval to form a global information subvector. In step S6043b, the remaining subintervals after the second subinterval are combined to form a local information subvector.

For example, it is assumed that the word vector X of the input word is a 128-dimensional vector, but the first gating vector and the second gating vector are both 8-dimensional vectors, that is, the first gating vector and the second gating vector are not 128-dimensional, and then the dimension of the gating vector obtained after the first gating vector and the second gating vector are interacted is also 8-dimensional. At this time, the word vector X may be changed into an 8 by 16 vector, that is, each consecutive 16 elements is a sub-interval, each sub-interval corresponds to a position, and the whole word vector X corresponds to 8 positions, that is, each 16 elements corresponding to the word vector X corresponds to an element in the gating vector. And finally, determining the first position of a non-zero interval of the gating interval as an initial position, determining the last position as an end position, combining a subinterval corresponding to the initial position in the word vector X, a subinterval corresponding to the end position in the word vector X and elements between the two subintervals to form a global information subvector. For example, the non-zero interval is position 3 to position 5 in the gating vector, then the 128-dimensional word vector X is equally divided into 8 sub-intervals, then the elements between the third sub-interval to the 5 th sub-interval form a global information sub-vector, and the elements after the 5 th sub-interval form a local information sub-vector.

Based on fig. 4, fig. 7 is an optional flowchart of the text processing method provided in the embodiment of the present application, and as shown in fig. 7, step S402 may be implemented by the following steps:

step S701, regarding each word in the text to be processed, taking the global information subvector of the corresponding word and the word vector of each word as input values, and inputting the input values into the self-attention model.

In step S702, the attention value of the corresponding word is calculated by the self-attention model.

The Self-Attention model is a Self-Attention model, which may be the model shown in fig. 2 or a Self-Attention model obtained by performing a deformation based on the model shown in fig. 2. In the embodiment of the application, the attention value of each word in the whole text to be processed can be calculated through the self-attention model, so that each word is subjected to weighted calculation, and the more important word has higher weight in the whole text to be processed.

With continued reference to fig. 7, in some embodiments, after forming the feature vectors in step S405, the method may further include the steps of:

step S703, determining the merged vector as a feature vector of the text to be processed.

Step S704, the feature vector is divided to form at least a global feature sub-vector and a local feature sub-vector of the feature vector.

Step S705, the attention of the text to be processed is calculated through the global feature sub-vector, and the text attention value of the text to be processed is obtained.

And step S706, performing accumulation processing on the local feature sub-vectors and the text attention values to obtain weighted text vectors of the text to be processed.

It should be noted that, steps S703 to S706 are processes of performing self-attention calculation again on the merged vector determined by the method in the embodiment of the present application, where the self-attention calculation is a process of calculating an attention value and obtaining a weighted word vector in the embodiment of the present application. That is to say, the model for self-attention calculation provided in the embodiment of the present application may be used at any position in the entire text processing model, may perform one self-attention calculation when the entire text processing model is initially input, and may perform one or more self-attention calculations on the obtained intermediate vector at an intermediate position of the text processing model.

Correspondingly, the text processing of the text to be processed by using the feature vector in step S405 can be implemented by the following steps: and step S707, performing text processing on the text to be processed by adopting the weighted text vector.

The text processing method provided by the embodiment of the application can apply the provided self-attention calculation method to any position in the text processing model, that is, a self-attention calculation model with a multilayer structure can be added in the text processing model, and the self-attention calculation model can be used in a partition layer in the text processing model. Therefore, according to the text processing requirements, the self-attention calculation based on global and local information can be carried out on the output vectors under different processing conditions, the weighted word vector obtained by each layer is ensured to better accord with the semantic representation of the text to be processed, and the accuracy of the processing result of the whole text processing model is improved.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

The embodiment of the application provides a text processing method which is suitable for all possible algorithms using Self-Attention.

For a word, the word often contains two parts, local information and global information. The local information means: the semantics can be clearly expressed without any context, for example, the polarity information described in "cold" and "hot", and a person can acquire the information corresponding to low cold temperature, pride nature and the like and the information corresponding to high hot temperature, pride nature and the like without any context. The global information means: the context is needed to further clarify the semantics, for example, the "cold" word in the case that others are cold, refers to the pride character. The embodiment of the application provides a local and global sensitive Self-orientation calculation method based on the assumptions of the local information and the global information.

In the traditional method, the semantics of a word are often directly coded into a vector, and the local information and the global information of the word are not distinguished. The Attention algorithm proposed in recent years, including Self-Attention, ignores local information of words, directly models global information of words, or models global information of sentences with respect to entire sentences of text. This solution does not distinguish between the intrinsic properties of the word itself and the contextual properties of the word, i.e., the local information and the global information mentioned in the embodiments of the present application.

The core idea of the embodiment of the application is a distinguishing algorithm of local information and global information, and an algorithm flow can be realized by the following codes:

def cumsum_active(X,n_chunks):

shape＝X.get_shape()

l＝tf.nn.softmax(tf.layers.dense(X,n_chunks))#N,T,NChunks

g＝tf.nn.softmax(tf.layers.dense(X,n_chunks))#N,T,NChunks

l＝tf.expand_dims(tf.math.cumsum(f,axis＝-1),axis＝-1)#N,T,NChunks,1

g＝tf.expand_dims(1.-tf.math.cumsum(i,axis＝-1),axis＝-1)#N,T,NChunks,1

w＝l*g

X＝tf.stack(tf.split(X,n_chunks,axis＝2),axis＝2)#N,T,NChunks,C/NChunksX_context＝tf.reshape(X*w,shape)

X_local＝tf.reshape(X*l,shape)

return X_context,X_local

it can be seen from the above codes that, for the input X, l gates and g gates of two laser gate structures are calculated, respectively, where l gates correspond to the above first gating vector and g gates correspond to the above second gating vector.

The l gate is finally represented as an increasing sequence of [0, … …, 1] by means of summation, and the g gate is represented as a decreasing sequence of [1, … …, 0] in contrast to the l gate. During the calculation, meaningless representation areas are introduced. The gate l interacts with the gate g to obtain a nonzero interval, and the starting position and the ending position of the interval are assumed to be a position i and a position j respectively; for an original vector X, the global information of the original vector X is a vector X [ i: j ], namely, elements between a position i and a position j in the vector X are determined as vector representation of the global information; the local information is defined as X [ j ], that is, elements from a position i to the tail end position of the X vector in the X vector are determined as vector representation of the local information; a meaningless representation area is defined for the remainder of the vector X [0: i ], i.e., the elements in the X vector from the initial position to position i. It should be noted that, since the positions i and j of different words are different, the meaningless area is completely calculated by the vector of the external word.

In the embodiment of the application, two vectors can be obtained through cumsum-activate operation: x _ cont ext and x _ local, where the x _ context [ i: j ] position is not 0 and the x _ local [ j: ] position is not 0.

For a detailed explanation of the nonsense part: for some words, a D-dimensional vector may not be needed for representation, and less dimensions can describe all information, so that the vector of the word is divided into three parts, and the meaningless part is obtained by context calculation completely without affecting the weight calculation of Attention; the global part is also obtained through context calculation, but directly influences the weight calculation of the Attention; the local part is not involved in context calculation, and is the local information inherent to each symbol.

In the embodiment of the present application, the modified Self-orientation method can be applied to a transformer j structure, which is equivalent to modifying two positions of a model: the first position is that before the Attention weight is calculated, the vector is processed by a cumsum-activity method to extract x _ context, and the subsequent operation of the Attention is executed based on the x _ context; the second position is that after a Q (i.e., Query) representation of the new word vector is obtained, the original representation of the Query is not accumulated and is changed to a local representation of the Query.

The modified Self-Attention calculation method refers to the modified Self-Attention model structure schematic diagram shown in FIG. 8, and the Self-Attention model structure schematic diagram based on the above FIG. 2In the figure, the thick solid line portion 801 in fig. 8, i.e., the newly added operation, corresponds to the cumsum-activity calculation 81 and the new residual path, respectively. The dashed portion 802 represents the original residual block discard. In addition, Q in fig. 8_lRepresenting a partial representation of Q, Q_gRepresenting a global representation of Q, K_gRepresenting a global representation of K (i.e. Key), V_gRepresenting the global representation of V (namely Value), the operation of K and V is the same as that of Q, and finally the output result Q is obtained_a。

According to the method provided by the embodiment of the application, the input vector is segmented through the cumsum-active, the input vector is represented into three regions of meaningless, global and local, and the global information and the local information contained in the input vector can be better kept. Global information participates in computing the Attention weights, the weights are re-assigned to the meaningless and global parts of the vector, and the local area replaces the original residual error structure. The algorithm provided by the embodiment of the application can be directly modified and applied to the existing Self-attention calculation, so that a good effect of improving some tasks can be achieved.

It should be noted that there may be other positions that can be tried in the calculation of the self-attribute of the cumsum-activate, for example, after the weight calculation, the cumsum-activate is recalculated, that is, the cumsum-activate structure provided in the embodiment of the present application is superimposed for multiple times; the cumsum-activator can also be changed into a multilayer structure, or the cumsum-activator is used in a way of being separated in the whole text processing model, and the like.

The three-segment information representation and subsequent use method based on Self-Attention provided by the embodiment of the application, particularly the original representation in the calculation of the local representation replacement residual error, belong to the core protection range of the embodiment of the application. The embodiment of the application does not restrict the internal result of Self-Attention.

Continuing with the exemplary structure of the text processing device 354 implemented as a software module provided in the embodiments of the present application, in some embodiments, as shown in fig. 3, the software module stored in the text processing device 354 of the memory 350 may be a text processing device in the server 300, including:

a dividing module 3541, configured to divide a word vector of each word in a text to be processed, so as to form at least a global information sub-vector and a local information sub-vector of the word vector;

an attention calculation module 3542, configured to perform attention calculation on a corresponding word through the global information subvector of each word, so as to obtain an attention value of the corresponding word;

an accumulation processing module 3543, configured to perform accumulation processing on the local information sub-vectors of the corresponding words and the attention values to obtain weighted word vectors of the corresponding words;

a merging module 3544, configured to merge the weighted word vectors of at least one word in the text to be processed to form a merged vector;

a processing module 3545, configured to determine the merged vector as a feature vector of the text to be processed, and perform text processing on the text to be processed by using the feature vector.

In some embodiments, the partitioning module is further configured to:

determining gating vectors, wherein the gating vectors at least comprise non-zero regions;

determining the non-zero interval as a global position interval;

determining a subinterval after the non-zero interval in the gating vector as a local position interval;

and dividing the word vector of each word according to the global position interval and the local position interval to at least form a global information sub-vector and a local information sub-vector of the word vector.

In some embodiments, the partitioning module is further configured to:

acquiring a first gating vector and a second gating vector; the sum of all elements of the first gating vector is 1, and the elements in the first gating vector are arranged according to a sequentially increasing order; the sum of all elements of the second gating vector is 1, and the elements in the second gating vector are arranged according to a descending order; the dimension of the first gating vector is the same as the dimension of the second gating vector;

sequentially multiplying elements of each position in the first gating vector with elements of a corresponding position in the second gating vector to obtain a product of the corresponding positions;

and sequentially adding the product of each position to a new vector according to the sequence of each position in the first gating vector to generate the gating vector.

In some embodiments, the partitioning module is further configured to:

determining the position of a first element in the global position interval in the gating vector as an initial position;

determining the position of the last element in the global position interval in the gating vector as a termination position;

and dividing the word vector of each word according to the initial position and the termination position to at least form a global information sub-vector and a local information sub-vector of the word vector.

In some embodiments, the apparatus further comprises:

a first quantity obtaining module, configured to obtain a first quantity corresponding to a vector dimension of the gating vector;

an equally dividing module, configured to equally divide the word vector into the subintervals of the first number according to an order of elements in the word vector of each word; wherein each subinterval of the first number of subintervals sequentially corresponds to a position in the gating vector;

the partitioning module is further configured to:

combining a first subinterval corresponding to the initial position in the first number of subintervals, a second subinterval corresponding to the termination position in the first number of subintervals, and other subintervals between the first subintervals and the second subregion to form the global information subvector;

and combining the rest subintervals after the second subinterval to form the local information subvector.

In some embodiments, the attention calculation module is further to:

for each word in the text to be processed, the global information subvector corresponding to the word and the word vector of each word are used as input values and input into a self-attention model;

calculating an attention value of the corresponding word through the self-attention model.

In some embodiments, the apparatus further comprises:

the characteristic vector dividing module is used for dividing the characteristic vectors to at least form global characteristic sub-vectors and local characteristic sub-vectors of the characteristic vectors;

the first attention calculation module is used for performing attention calculation on the text to be processed through the global feature sub-vector to obtain a text attention value of the text to be processed;

the first accumulation processing module is used for carrying out accumulation processing on the local feature sub-vectors and the text attention values to obtain weighted text vectors of the text to be processed;

correspondingly, the processing module is further configured to:

and performing text processing on the text to be processed by adopting the weighted text vector.

It should be noted that the description of the apparatus in the embodiment of the present application is similar to the description of the method embodiment, and has similar beneficial effects to the method embodiment, and therefore, the description is not repeated. For technical details not disclosed in the embodiments of the apparatus, reference is made to the description of the embodiments of the method of the present application for understanding.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method of the embodiment of the present application.

Embodiments of the present application provide a storage medium having stored therein executable instructions, which when executed by a processor, will cause the processor to perform a method provided by embodiments of the present application, for example, the method as illustrated in fig. 4.

In some embodiments, the storage medium may be a computer-readable storage medium, such as a Ferroelectric Random Access Memory (FRAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), a charged Erasable Programmable Read Only Memory (EEPROM), a flash Memory, a magnetic surface Memory, an optical disc, or a Compact disc Read Only Memory (CD-ROM), and the like; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (H TML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method of text processing, comprising:

2. The method according to claim 1, wherein the dividing the word vector of each word in the text to be processed into at least a global information sub-vector and a local information sub-vector of the word vector comprises:

determining the non-zero interval as a global position interval;

3. The method of claim 2, wherein determining the gating vector comprises:

4. The method of claim 2, wherein said dividing the word vector for each word according to the global position interval and the local position interval to form at least a global information sub-vector and a local information sub-vector of the word vector comprises:

5. The method of claim 4, further comprising:

acquiring a first quantity corresponding to the vector dimension of the gating vector;

equally dividing the word vector into the first number of subintervals according to the order of the elements in the word vector for each word; wherein each subinterval of the first number of subintervals sequentially corresponds to a position in the gating vector;

the dividing the word vector of each word according to the initial position and the end position to form at least a global information sub-vector and a local information sub-vector of the word vector includes:

6. The method of claim 1, wherein said performing an attention calculation on a corresponding word through the global information subvector for each word to obtain an attention value of the corresponding word comprises:

7. The method according to any one of claims 1 to 6, further comprising:

dividing the feature vectors to form at least global feature sub-vectors and local feature sub-vectors of the feature vectors;

performing attention calculation on the text to be processed through the global feature sub-vector to obtain a text attention value of the text to be processed;

performing accumulation processing on the local feature sub-vectors and the text attention values to obtain weighted text vectors of the text to be processed;

correspondingly, the text processing on the text to be processed by adopting the feature vector comprises the following steps:

8. A text processing apparatus, comprising:

9. A text processing apparatus characterized by comprising:

a memory for storing executable instructions; a processor for implementing the text processing method of any one of claims 1 to 7 when executing executable instructions stored in the memory.

10. A computer-readable storage medium having stored thereon executable instructions for causing a processor to perform the text processing method of any one of claims 1 to 7 when the executable instructions are executed.