CN112084331A

CN112084331A - Text processing method, text processing device, model training method, model training device, computer equipment and storage medium

Info

Publication number: CN112084331A
Application number: CN202010881097.7A
Authority: CN
Inventors: 刘知远; 苏裕胜; 韩旭; 张正彦; 林衍凯; 李鹏; 孙茂松; 周杰
Original assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd
Current assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2020-12-15

Abstract

The application relates to a text processing and model training method and device, computer equipment and a storage medium. The text processing method comprises the following steps: acquiring a target text to be processed, and coding the target text to obtain a target text coding vector; acquiring a target entity in a target text, and determining a first associated entity corresponding to the target entity; determining a target knowledge representation vector corresponding to a target entity according to the entity representation vector of the first associated entity and the corresponding attention weight; fusing the target text coding vector and a target knowledge representation vector corresponding to a target entity to obtain a target fusion result; and determining a text processing result corresponding to the target text according to the target fusion result. The text processing result of the embodiment of the application can be obtained by processing the text processing model based on artificial intelligence, and the accuracy of the obtained text processing result can be improved by adopting the method.

Description

Text processing method, text processing device, model training method, model training device, computer equipment and storage medium

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a method and an apparatus for text processing and model training, a computer device, and a storage medium.

Background

With the development of computer and internet technologies, there are many cases where text needs to be processed, such as translation of text or named entity recognition of text.

At present, texts can be processed based on an artificial intelligence text processing model to obtain a text processing result. However, there are often cases where the accuracy of the text processing result obtained by the text processing model processing is poor, that is, the accuracy of the text processing is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a text processing method, a model training method, an apparatus, a computer device and a storage medium.

A method of text processing, the method comprising: acquiring a target text to be processed, and encoding the target text to obtain a target text encoding vector, wherein the target text encoding vector comprises a target semantic vector corresponding to the target text; acquiring a target entity in the target text, and determining a first associated entity corresponding to the target entity; determining a target knowledge representation vector corresponding to the target entity according to the entity representation vector of the first associated entity and a corresponding attention weight, wherein the attention weight is obtained according to the association degree between the target semantic vector and an association representation vector, and the association representation vector is a vector representing entity association; fusing the target text coding vector and a target knowledge representation vector corresponding to the target entity to obtain a target fusion result; and determining a text processing result corresponding to the target text according to the target fusion result.

A text processing apparatus, the apparatus comprising: the target text coding module is used for acquiring a target text to be processed, coding the target text and obtaining a target text coding vector, wherein the target text coding vector comprises a target semantic vector corresponding to the target text; a first entity obtaining module, configured to obtain a target entity in the target text, and determine a first associated entity corresponding to the target entity; a target knowledge representation vector determining module, configured to determine a target knowledge representation vector corresponding to the target entity according to the entity representation vector of the first associated entity and a corresponding attention weight, where the attention weight is obtained according to a degree of association between the target semantic vector and an association representation vector, and the association representation vector is a vector representing an entity association; the first fusion module is used for fusing the target text coding vector and a target knowledge representation vector corresponding to the target entity to obtain a target fusion result; and the text processing result determining module is used for determining a text processing result corresponding to the target text according to the target fusion result.

In some embodiments, the target knowledge representation vector determination module comprises: a relationship network graph obtaining unit, configured to obtain a relationship network graph formed by the first associated entity and the target entity; an incidence relation representation vector obtaining unit, configured to obtain, for a network graph entity in the relational network graph, an incidence relation representation vector representing an incidence relation between the network graph entity and an adjacent entity; the attention weight determining unit is used for obtaining a vector association degree according to the association relation expression vector and the target semantic vector and determining the attention weight corresponding to the adjacent entity according to the vector association degree; a target knowledge representation vector determining unit, configured to determine a target knowledge representation vector corresponding to the network graph entity according to the attention weight corresponding to the adjacent entity and the entity representation vector of the adjacent entity; and the extracting unit is used for extracting target knowledge representation vectors corresponding to the target entities from the target knowledge representation vectors corresponding to the network graph entities of the relational network graph.

In some embodiments, the target knowledge representation vector corresponding to the network graph entity is output by a knowledge vector determination model, the knowledge vector determination model comprising at least one target hidden layer, the target knowledge representation vector determination unit being configured to: inputting the entity representation vector of the adjacent entity and the incidence relation representation vector into the target hidden layer for processing to obtain a first knowledge representation vector corresponding to the network graph entity; and determining a target knowledge representation vector corresponding to the network graph entity according to the first knowledge representation vector corresponding to the network graph entity and the attention weight corresponding to the adjacent entity.

In some embodiments, the target knowledge representation vector determination unit is to: determining a target calculation direction according to the entity association relationship between the network graph entity and the adjacent entity, wherein the target calculation direction is addition or subtraction; calculating the entity representation vector of the adjacent entity and the incidence relation representation vector according to the target calculation direction to obtain a calculation representation vector corresponding to the network graph entity; and processing the calculation expression vector by using hidden layer parameters in the target hidden layer to obtain a first knowledge expression vector corresponding to the network graph entity.

In some embodiments, the target knowledge representation vector determination unit is to: acquiring an output expression vector corresponding to the adjacent entity, which is output by a previous hidden layer corresponding to the target hidden layer in the knowledge vector determination model; and the target hidden layer processes the calculation expression vector and the output expression vector by using a first hidden layer parameter to obtain a first knowledge expression vector corresponding to the network graph entity.

In some embodiments, the target knowledge representation vector corresponding to the network graph entity is output by a knowledge vector determination model, and the attention weight determination unit is configured to: processing the incidence relation expression vector by using a second hidden layer parameter in the target hidden layer to obtain a key vector; processing the target semantic vector by using a third hidden layer parameter in the target hidden layer to obtain a query vector; calculating to obtain a vector association degree according to the key vector and the query vector; and determining the attention weight corresponding to the adjacent entity according to the vector association degree, wherein the vector association degree and the attention weight corresponding to the adjacent entity form a positive correlation relationship.

In some embodiments, the target text comprises a plurality of participles, the target text encoding vector comprises a sequence of participle encoding vectors, and the sequence of participle encoding vectors comprises a participle encoding vector corresponding to each participle; the first fusion module includes: a knowledge fusion coding vector obtaining unit, configured to perform knowledge fusion processing on a participle coding vector corresponding to a target participle according to a target knowledge representation vector corresponding to the target entity, so as to obtain a knowledge fusion coding vector corresponding to the target participle; the updating unit is used for utilizing the knowledge corresponding to the target participle to be fused into a coding vector, updating the participle coding vector sequence corresponding to the target participle in the participle coding vector sequence, and obtaining an updated participle coding vector sequence; and the fusion unit is used for performing fusion processing on the updated participle coding vector sequence and the target semantic vector by using a fusion model to obtain a fused participle coding vector sequence and a fused target semantic vector.

In some embodiments, the text processing result determination module is to: and inputting the fused target semantic vector into a trained text classification model to obtain a text classification result corresponding to the target text.

In some embodiments, the knowledge-fused-to-code-vector derivation unit is configured to: and carrying out vector splicing treatment according to the target knowledge expression vector corresponding to the target entity and the participle coding vector corresponding to the target participle to obtain a knowledge fusion coding vector corresponding to the target participle.

A computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the corresponding steps of the text processing method when executing the computer program

A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps corresponding to the above-mentioned text processing method.

The text processing method, the text processing device, the computer equipment and the storage medium have the advantages that the target knowledge expression vector is fused in the target text coding vector, and the target knowledge representation vector is obtained according to the entity representation vector of the first associated entity corresponding to the target entity and the attention weight in the target text, since the attention weight is obtained from the degree of association between the target semantic vector and the association relation representing vector, therefore, the importance degree of the entity representation vector of the associated entity to the representation vector of the target entity can be determined according to the semantics of the target text, the attention weight is determined according to the importance degree, so that the target knowledge vector obtained based on the attention weight and the entity expression vector can promote the understanding of the semantics of the target text, the text processing result corresponding to the target text is obtained based on the target fusion result, and the accuracy of the text processing result is improved.

A method of text processing model training, the method comprising: acquiring a training text and a standard text processing result corresponding to the training text; inputting the training text into a text coding model to obtain a training text coding vector, wherein the training text coding vector comprises a training semantic vector corresponding to the training text; acquiring a training entity corresponding to the training text, and determining a second associated entity corresponding to the training entity; inputting the entity representation vector corresponding to the second associated entity into a knowledge vector determination model, and determining a training knowledge representation vector corresponding to the training entity according to the entity representation vector and a corresponding attention weight, wherein the attention weight is obtained according to the association degree between the training semantic vector and an association representation vector, and the association representation vector is a vector representing entity association; inputting the training text coding vector and a training knowledge representation vector corresponding to the training entity into a fusion model for fusion processing to obtain a training fusion result; processing the training fusion result according to a task processing model to obtain a training processing result; and adjusting parameters of the task processing model or adjusting parameters of the task processing model and a language model according to the training processing result and the standard text processing result, wherein the language model comprises the text coding model, the knowledge vector determination model and the fusion model.

A text processing model training apparatus, the apparatus comprising: the training text acquisition module is used for acquiring a training text and a standard text processing result corresponding to the training text; a training text coding vector obtaining module, configured to input the training text into a text coding model to obtain a training text coding vector, where the training text coding vector includes a training semantic vector corresponding to the training text; the second entity acquisition module is used for acquiring a training entity corresponding to the training text and determining a second associated entity corresponding to the training entity; a training knowledge representation vector determination module, configured to input an entity representation vector corresponding to the second associated entity into a knowledge vector determination model, and determine a training knowledge representation vector corresponding to the training entity according to the entity representation vector and a corresponding attention weight, where the attention weight is obtained according to a degree of association between the training semantic vector and an association representation vector, and the association representation vector is a vector representing an entity association; a training fusion result obtaining module, configured to input the training text coding vector and the training knowledge representation vector corresponding to the training entity into a fusion model for fusion processing, so as to obtain a training fusion result; a training processing result obtaining module, configured to process the training fusion result according to the task processing model to obtain a training processing result; and the adjusting module is used for adjusting parameters of the task processing model according to the training processing result and the standard text processing result, or adjusting parameters of the task processing model and a language model, wherein the language model comprises the text coding model, the knowledge vector determining model and the fusion model.

A computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the corresponding steps of the text processing model training method when executing the computer program

A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the corresponding steps of the above-mentioned text processing model training method.

According to the text processing model training method, the text processing model training device, the computer equipment and the storage medium, because the language model comprises the text coding model, the knowledge vector determination model and the fusion model, the knowledge vector determination model can determine the training knowledge representation vector corresponding to the training entity according to the entity representation vector and the corresponding attention weight, the attention weight is obtained according to the association degree between the training semantic vector and the association relation representation vector, the association relation representation vector is a vector representing the entity association relation, because the attention weight is obtained according to the association degree between the training semantic vector and the association relation representation vector, the importance degree of the entity representation vector of the association entity to the representation vector of the training entity can be determined according to the semantics of the training text, the attention weight is determined according to the importance degree, and therefore the obtained training knowledge vector can better promote the understanding of the semantics of the training text, therefore, the text comprehension capability of the obtained language model and the task processing model is improved, and the accuracy of the text processing result is improved.

Drawings

FIG. 1 is a diagram of an application environment of a text processing method in one embodiment;

FIG. 2 is a flow diagram that illustrates a method for text processing in one embodiment;

FIG. 3A is a schematic diagram of the encoding principle of the BERT-based text encoding model in one embodiment;

FIG. 3B is a diagram illustrating an embodiment of determining a text processing result corresponding to a target text based on a target fusion result;

FIG. 3C is a diagram illustrating an embodiment of determining a text processing result corresponding to a target text based on a target fusion result;

FIG. 3D is a diagram illustrating an embodiment of determining a text processing result corresponding to a target text based on a target fusion result;

FIG. 4 is a schematic flow chart illustrating the process of determining a target knowledge representation vector corresponding to a target entity according to an entity representation vector of a first associated entity and a corresponding attention weight in another embodiment;

FIG. 5 is a schematic diagram of the encoding principle of the BERT-based text encoding model in one embodiment;

FIG. 6 is a flowchart illustrating a method for training a text processing model according to one embodiment;

FIG. 7 is an interface diagram for text translation in one embodiment;

FIG. 8 is a diagram illustrating the determination of a knowledge representation vector based on knowledge in a knowledge-graph in one embodiment;

FIG. 9 is a diagram illustrating computation of a knowledge representation vector in one embodiment;

FIG. 10 is a block diagram showing a configuration of a text processing apparatus according to an embodiment;

FIG. 11 is a block diagram showing the construction of a text processing model training apparatus according to an embodiment;

FIG. 12 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to the machine learning technology of artificial intelligence and natural language processing, and is specifically explained by the following embodiment:

the text processing method and the text processing model training method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. When text processing is required, the terminal 102 may send a text processing request to the server 104, the server 104 may be deployed with a text processing model, the text processing model includes a language model and a task processing model, the text processing model may be used to process a target text to be processed to obtain a text processing result, and the server 104 may send the text processing result to the terminal 102 or may store the text processing result. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a text processing method is provided, which is exemplified by the application of the method to the server 104 in fig. 1, and includes the following steps:

step S202, a target text to be processed is obtained, the target text is coded, and a target text coding vector is obtained and comprises a target semantic vector corresponding to the target text.

The target text is a text to be processed, and the language of the target text can be determined according to actual needs, for example, the target text may be a chinese sentence or a japanese sentence. One target text may include a plurality of segments, and the target text may be segmented to obtain a plurality of segments (tokens). Plural means at least two. The segmentation mode can be a dictionary-based or statistic-based segmentation mode. For example, assuming that the target text is "today is sunday", the segmented word segmentation sequence may be "today/yes/sunday".

Encoding refers to converting text into vectors for representation. The target text encoding vector is a vector obtained by encoding the target text. The target semantic vector is a vector representing the semantics of the target text. The target semantic vector is obtained by coding according to each participle of the target text, and semantic information of each participle in the text is fused. The target text encoding vector can also comprise a word segmentation encoding vector sequence, the word segmentation encoding vector sequence comprises word segmentation encoding vectors corresponding to word segmentation, and the word segmentation encoding vectors refer to vectors obtained by encoding word segmentation. And sequencing the vectors obtained by coding the participles according to the sequence of the corresponding participles in the target text to form a participle coding vector sequence.

Specifically, the server may obtain a target text to be processed, segment the target text into a segmentation sequence with semantic rationality, and encode the target text by using a text encoding model to obtain a target text encoding vector, where the target text vector includes the segmentation encoding vector sequence and a target semantic vector. The text coding model may be a BERT (Bidirectional Encoder from Transformer) based bi-directional coding model, among others. When a target text is given and the target text comprises N participles, the formula for obtaining the participle coding vector sequence can be represented as formula (1), T-Encoder is a text coding model, and j represents the jth participle. Wj represents the jth participle, that is, the jth participle can be coded by a text coding model, and the participle represented by a text form is represented by a vector.

For example, as shown in fig. 3A, a schematic diagram of the encoding principle of the BERT-based text encoding model provided for some embodiments is provided. The server may segment the target text to obtain N tokens (tokens), where N is a positive integer and is denoted as Tok1 and Tok2 … … TokN. Before Tok1, a mark of 'CLS' is added, wherein 'CLS' represents classification, E represents a certain embedded vector, for example, the embedded vector of the 'CLS' is E [ CLS ], T represents an encoded vector obtained by encoding, C is a semantic vector and is a semantic expression corresponding to the 'CLS', namely a semantic encoded vector. That is, the server may input a target text including N participles into a text coding model, which outputs a semantic representation C (called target semantic vector) of [ CLS ] corresponding to the text, and a participle coding vector T corresponding to each participle.

Step S204, a target entity in the target text is obtained, and a first associated entity corresponding to the target entity is determined.

Specifically, an Entity (Entity) refers to something having a specific meaning, and may include at least one of a person name, a place name, an organization name, a proper noun, or the like, for example. The target entities are entities in the target text, and one target text may include one or more target entities. For example, assuming that the target text is "monkey likes to eat banana", the target entities may include "monkey" and "banana".

The first associated entity refers to an entity having an association relationship with the target entity. The association relationship may be, for example, an affiliation or an affiliation. The associated entity corresponding to the target entity may be obtained according to a Knowledge Graph (Knowledge Graph). The knowledge graph can be used for describing the association relationship between the entity and the entity, so that the associated entity which has the association relationship with the target entity in the knowledge graph can be obtained. The first associated entity may include at least one of an entity in the knowledge-graph that has a direct association with the target entity and an entity that has an indirect association. The direct association relation refers to the connection of the target entity and the first association entity with the edge, and the indirect association relation refers to the association entity between the target entity and the first association entity and the middle association entity. For example, assuming that the target entity is a, the daughter of a is B, the son of B is C, that is, there is a connection of edges between a and B, and there is a connection of edges between B and C, then B is the first associated entity that has a direct association relationship with a, and C is the first associated entity that has an indirect association relationship with a. The degree of the incidence relation between the entities can be represented by the order, the entity which has direct incidence relation with the target entity is called the first order incidence entity of the target entity, and the entity which has direct incidence relation with the first order incidence entity is called the second order incidence entity of the target entity. The first associated entity may be an associated entity whose association order with the target entity is within a preset association order, and the preset order may be set as needed, for example, may be 2.

Specifically, the server may perform Named Entity Recognition (NER) on the target text to obtain the target Entity. The server may obtain, as the first associated entity, an associated entity in the knowledge graph whose associated order with the target entity is within a preset order.

Step S206, determining a target knowledge representation vector corresponding to the target entity according to the entity representation vector of the first associated entity and the corresponding attention weight, where the attention weight is obtained according to the association degree between the target semantic vector and the association representation vector, and the association representation vector is a vector representing the entity association.

Specifically, the entity representation vector refers to an embedded vector for representing an entity, and the embedded vector (Embedding) refers to a mapping result from a semantic space to a vector space, i.e. an entity can be represented by a low-dimensional vector, such as [0.2,0.3, … … 0.6.6 ], and the like. The association relation representation vector is a vector representing a relation between entities. The entity representation vector and the association relationship representation vector may be obtained by a knowledge representation learning method, for example, a _ transfer (knowledge representation learning) method, where knowledge representation learning is representation learning oriented to entities and relationships in a knowledge base, and the transfer regards relationship representation in a triple (head, relationship, tail) as a translation from a head entity head to a tail entity tail.

In the attention mechanism, one output can be obtained according to a plurality of inputs, and the attention weight represents the attention magnitude of the input when the output is determined, and the larger the attention weight corresponding to one input is, the more attention is paid to the input when the output is determined. I.e. the attention weight represents the influence of the input on the output. Attention weight is positively correlated with influence. An input is more critical to the output the greater its attention weight corresponds to the greater its impact on the output. Conversely, the smaller the attention weight for an input, the less critical the input will have on the output. The Attention weight may be derived from an Attention Model (Attention Model). The target knowledge representation vector is an output, and an entity representation vector corresponding to the first associated entity may be used as an input, or an input may be obtained according to the entity representation vector corresponding to the first associated entity. For example, the entity representation vector corresponding to the first associated entity may be input into the knowledge vector determination model to obtain a first knowledge representation vector, and the first knowledge representation vector may be used as an input.

The attention weight is obtained according to the relevance between the target semantic vector and the relevance expression vector, the attention weight and the relevance form a positive relevance, and the greater the relevance, the greater the corresponding attention weight. The degree of association between the target semantic vector and the association relation representing vector may be at least one of a direct degree of association or an indirect degree of association. The direct association degree is to calculate an association degree, for example, a similarity degree, between the target semantic vector and the association relation expression vector as an association degree between the target semantic vector and the association relation expression vector. The indirect association degree refers to further processing the target semantic vector, further processing the association relation expression vector, and obtaining the association degree between the target semantic vector and the association relation expression vector based on the processed target semantic vector and the processed association relation expression vector. For example, the target semantic vector and the association expression vector may be input to a trained model, and the target semantic vector may be processed by using model parameters, and the association expression vector may be processed.

The association relation expression vector may be a vector representing an association relation between the first association entity and the target entity, or may be a vector representing a relation between the first association entities, for example, the attention weight corresponding to the first association entity may be obtained from the attention weight of the edge of the shortest path traveled from the first association entity to the target entity, for example, may be obtained by multiplying the attention weights of the edges of the shortest path traveled. For example, assuming a → B → C, the target entity is a, and the attention weight calculation method for the first associated entity C is as follows: and multiplying the attention weight obtained based on the association degree of the association relation expression vector between the B and the C and the target semantic vector by the attention weight obtained based on the association degree of the association relation expression vector between the A and the B and the target semantic vector to obtain the attention weight corresponding to the first associated entity C.

The knowledge representation vector is a vector obtained based on knowledge and is used for representing the knowledge. The associated entity corresponding to the target entity and the associated relationship between the target entity and the associated entity are knowledge and can be obtained based on a knowledge graph, so that the entity representation vector and the associated relationship representation vector according to the associated entity corresponding to the target entity can be called a knowledge representation vector.

Specifically, the server may input an entity representation vector of the first associated entity, an association representation vector between the first associated entity and the target entity, and an association representation vector between the first associated entity into the knowledge vector determination model, and the knowledge vector determination model determines the attention weight corresponding to each first associated entity according to the association degree between the target semantic vector and the association representation vector. That is, the degree of association between the target semantic vector and the association relation expression vector can express the influence of the relation between the entities on the semantics of the target text, and for the relation more related to the semantics, the relation is more useful knowledge, and the associated entities corresponding to the relation need to be focused on. The knowledge vector determination model may be a Graph Neural Network (GNN) model, among others.

In some embodiments, a relational network graph composed of the first associated entity and the target entity may be obtained, an entity representation vector corresponding to each network graph entity in the relational network graph and an association representation vector representing an association between entities in the network graph entities are obtained and input into the graph neural network model, and a manner of determining the knowledge representation vector of the network graph entity by the graph neural network model includes: and for any network graph entity, processing an entity representation vector of an adjacent entity of the network graph entity and an incidence relation representation vector representing the relation between the network graph entity and the adjacent entity based on the model parameters to obtain a first knowledge representation vector corresponding to the network graph entity. When the adjacent entities are multiple, the obtained first knowledge representation vectors corresponding to the network graph entity are also multiple, so that the attention weight corresponding to each first knowledge representation vector can be obtained, the weighted calculation is performed according to the first knowledge representation vectors and the corresponding attention weight, and the target knowledge representation vector corresponding to the network graph entity is obtained. For example, determining target entity pairsThe company that should target the knowledge representation vector can represent as in equation (2). Left side of equation e_jE representing the target knowledge representation vector of the jth network graph entity, right of the equation_jAn entity representing vector of the jth network graph entity, DK-Encoder (Dynamic Knowledge Context Encoder) representing a Knowledge vector determination model, a target text comprising N participles, and M being the number of the entities.

And S208, fusing the target text coding vector and the target knowledge representation vector corresponding to the target entity to obtain a target fusion result.

The Fusion (Fusion) refers to combining together, and the Fusion mode may be set according to needs, for example, the Fusion mode may be at least one of splicing or weighted addition. For example, the splicing may be performed, and the result obtained by the splicing is input into a fusion model for processing, where the fusion model may be, for example, a multilayer perceptron model, a recurrent neural network model, or a convolutional neural network model. For example, the fusion model may include P aggregators, each aggregator mixes two different heterogeneous features (heterologous features) through an MLP (multi layer perceptron), that is, the target text encoding vector and the target knowledge representation vector corresponding to the target entity, and the number of P may be set as needed.

In some embodiments, the target text encoding vector includes a word segmentation encoding vector sequence, and the fusing of the target text encoding vector and the target knowledge representation vector corresponding to the target entity, to obtain the target fusion result, includes: performing knowledge fusion processing on the participle coding vector corresponding to the target participle according to the target knowledge representation vector corresponding to the target entity to obtain a knowledge fusion coding vector corresponding to the target participle; fusing the knowledge corresponding to the target participle into the coding vector, updating the participle coding vector corresponding to the target participle in the participle coding vector sequence, and obtaining an updated participle coding vector sequence; and performing fusion processing on the updated participle coding vector sequence and the target semantic vector by using a fusion model to obtain the participle coding vector sequence after the fusion processing and the target semantic vector after the fusion processing.

The target participle refers to a participle corresponding to a target entity, and since the participle coding vector is a coding vector corresponding to the participle in the target text and the target entity is an entity in the target text, the participle of the target text comprises the participle corresponding to the target entity, and therefore the participle coding vector corresponding to the target participle can be obtained. For example, for the target text "monkey likes to eat banana", the segmentations include "monkey", "like", "eat", and "banana". The word segmentation coding vector sequence comprises word segmentation coding vectors corresponding to 'monkey', 'like' corresponding word segmentation coding vectors, 'eat' corresponding word segmentation coding vectors and 'banana'. After the target knowledge representation vector corresponding to the target entity monkey is obtained, knowledge fusion processing can be performed on the participle coding vector corresponding to the monkey and the target knowledge representation vector corresponding to the monkey.

The knowledge fusion process refers to fusion of a target knowledge representation vector into a participle coding vector, and the knowledge fusion process can be splicing or weighted summation, such as vector addition. For example, the target knowledge representation vector corresponding to the target entity and the participle coding vector corresponding to the target participle may be spliced, the target knowledge representation vector corresponding to the target entity and the participle coding vector corresponding to the target participle may be directly spliced to obtain the knowledge corresponding to the target participle, or the first step of processing and then splicing may be performed. For example, the fusion model may include two multi-head self-attention models (multi-head self-attention models), one of the multi-head self-attention models is used to process a participle coding vector sequence in the participle coding vector sequence, the other multi-head self-attention model is used to process a target knowledge representation vector, and the participle coding vector obtained through the multi-head self-attention model processing and the target knowledge representation vector obtained through the multi-head self-attention model processing are spliced to obtain a knowledge fusion coding vector.

For example. The splicing manner may be horizontal splicing. For example, assume that the word-segmentation code vector corresponding to "monkey" is a k-dimensional vector, which is denoted as (a)₁，a₂……a_k) And the target knowledge expression vector corresponding to the monkey is a j-dimensional vector and is expressed as (b1, b2 … … bj), and the knowledge fusion vector obtained by horizontal splicing is a k + j-dimensional vector and is expressed as (a)₁，a₂……a_k，b1，b2……bj)。

Specifically, after the server obtains the knowledge fusion coding vector corresponding to the target participle, the knowledge fusion coding vector is used for replacing the participle coding vector sequence corresponding to the target participle in the participle coding vector sequence, and the updated participle coding vector sequence is obtained. The server can input the updated participle coding vector sequence and the target semantic vector into a fusion layer of the fusion model, and fusion processing is performed to obtain at least one of the participle coding vector sequence and the target semantic vector after the fusion processing. The fusion model can be a BERT-based model, and the updated participle coding vector sequence and the target semantic vector can be processed by continuously using the BERT-based model. For example, the fusion model may include a multilayer perceptron (MLP). For example, the formula of the Fusion model to obtain the target Fusion result can be expressed as formula (3), K-encoder (knowledge Fusion encoder) represents the Fusion model,

representing the participle code vector after the fusion process,

and the fusion-processed participle coding vector corresponding to one entity and the fusion-processed knowledge representation vector are spliced together to be used as the vector representation corresponding to the entity.

And step S210, determining a text processing result corresponding to the target text according to the target fusion result.

The text processing result may be determined according to an application scenario, for example, the text processing result may include at least one of a text labeling result, a text classification result, a sentence relation determination result, or a text generation result. Text annotation refers to labeling a text, for example, performing Named Entity Recognition (NER) on the text or performing part-of-speech Recognition on a participle in the text. The text classification result refers to classifying the text, such as sentiment classification or spam classification. Emotion classification classifies text as either text expressing positive emotions or text expressing negative emotions. The spam classification classifies mail as either spam or non-spam. The sentence relation determination result may be used to determine a relation between sentences, for example, the relation between sentences may be a relation between a question and an answer, or a relation between contexts, for example, to determine whether the second sentence is next to the first sentence. The text generation result is, for example, translation of the text or generation of a summary of the text.

Specifically, the target fusion result may be a fusion-processed participle coding vector sequence or a fusion-processed target semantic vector. The target fusion result is determined according to a specific scene. The server can input the target fusion result into the task processing model, and the task processing model processes the target fusion result to obtain a text processing result.

In some embodiments, for the text classification task, the target semantic vector after the fusion processing is a target fusion result, and the target semantic vector after the fusion processing may be input into the text classification model to obtain a text classification result. For example, as shown in fig. 3B, for a general (normal) natural language processing task, such as a text classification task, a text processing result may be determined according to the expression of [ CLS ].

In some embodiments, for the text generation result, the segmentation coding vector sequence after the fusion processing may be input into the decoding model, so as to obtain the text generation result.

In some embodiments, for named entity recognition, the word segmentation coding vector sequence after the fusion processing is a target fusion result, and the entity identifier may be used to identify the word segmentation coding vector corresponding to the entity in the word segmentation coding vector sequence after the fusion processing, so as to perform named entity recognition according to the word segmentation coding vector corresponding to the entity. For example, an [ ENT ] character can be added to the front and the back of an entity in the target text to serve as an entity identifier, and during the prediction of a downstream task, a vector identified by the [ ENT ] is used as an entity representation vector of the entity to conduct named entity identification. For example, as shown in FIG. 3C, assuming that the target text is "Steph Current and Klay Thompson led the rollers to the 2015NBA Championship", and assuming that the target entity is "Steph Current", the entity identifier is added and then denoted as "[ ENT ] Steph Current [ ENT ] and [ ENT ] Klay Thompson [ ENT ] led the rollers to the 2015NBA Championship".

In some embodiments, for entity relationship extraction, the word segmentation coding vector sequence after fusion processing is a target fusion result, and the word segmentation coding vector corresponding to the entity in the word segmentation coding vector sequence after fusion processing may be identified by using a relationship identifier, so that when entity relationship extraction is performed, the word segmentation coding vector corresponding to the head entity (head) and the word segmentation coding vector corresponding to the tail entity (tail) are extracted from the word segmentation coding vector according to the relationship identifier, and are spliced. The relationship identifier includes a head entity identifier and a tail entity identifier, and for each entity, the head entity identifier, e.g., [ HD ], may be added before the entity and the tail entity identifier, e.g., [ TL ], may be added after the entity. When the downstream task is used for prediction, the participle coding vector corresponding to the head entity [ HD ] and the participle coding vector corresponding to the tail entity [ TL ] can be extracted and spliced together to be used as a final expression to extract the relation, so that the relation between the entities is obtained. For example, as shown in FIG. 3D, assuming that the target text is "Steph Curry and Klay Thompson led the Warriors to the 2015NBA Charpionship", after adding the relationship identifier, it is denoted as "[ HD ] Steph Curry [ TL ] and [ HD ] Klay Thompson [ TL ] led the Warriors to the 2015NBA Charpionship".

In the text processing method, the target text coding vector is fused with the target knowledge expression vector, and the target knowledge representation vector is obtained according to the entity representation vector of the first associated entity corresponding to the target entity and the attention weight in the target text, since the attention weight is obtained from the degree of association between the target semantic vector and the association relation representing vector, therefore, the importance degree of the entity representation vector of the associated entity to the representation vector of the target entity can be determined according to the semantics of the target text, the attention weight is determined according to the importance degree, so that the target knowledge vector obtained based on the attention weight and the entity expression vector can promote the understanding of the semantics of the target text, the text processing result corresponding to the target text is obtained based on the target fusion result, and the accuracy of the text processing result is improved.

In some embodiments, as shown in fig. 4, the step S206 of determining the target knowledge representation vector corresponding to the target entity according to the entity representation vector of the first associated entity and the corresponding attention weight includes the following steps:

step S402, acquiring a relationship network diagram formed by the first association entity and the target entity.

Specifically, the relational network graph includes nodes and edges. The nodes are entities, and the existence of edges between the entities indicates that direct association exists between the entities.

For example, it is assumed that the first associated entity is an associated entity whose association order with the target entity is within a preset order, and the preset order is 2. For target entity a, assume entity a's daughter is B, B's son is C, and B is valid for the H team. The effectiveness of A is shown in team E, which also includes player F, teams E and H are competitor relationships, and friend D, the relationship network graph can be shown in FIG. 5. It is understood that the edges between the relational network graphs may also be directional, and the relationships between entities may be represented by a triplet (h, r, t), h being the head entity, r the relationship, and t the tail entity.

In some embodiments, the knowledge-graph may be derived fromAnd acquiring the associated entities with the association orders of the target entities at preset association orders to form a relationship network diagram, wherein the relationship network diagram can be acquired according to the formulas (4), (5), (6), (7) and (8). Wherein, g_mA relational network diagram corresponding to the target entity m is shown,

indicating that the association relation with the entity m is a tail entity of order i,

indicating that the association relationship with the entity m is the head entity of order i. H denotes a head entity, t denotes a tail entity, r denotes a relationship, g denotes a knowledge map, "-" denotes a logical operator "and,

indicating a non-belonging, "∈" indicates a belonging.

Representing a set of associated entities from order 0 to order i-1. "U" indicates a union. That is, for an entity m in the text, the position of the entity m in the knowledge graph can be obtained, and then the server can collect i-order neighbor entities of the entity. Defining the ith order neighbor entity as

The server may loop through the formulas (4) - (6) to obtain the associated entity of order i, since the entity m may be the head entity or the tail entity in the triplet, and thus

Two differently oriented entities are involved.

Step S404, for the network graph entity in the relational network graph, acquiring an incidence relation representation vector representing the incidence relation between the network graph entity and the adjacent entity.

The network graph entity refers to an entity in the relational network graph, and the target entity and the first associated entity are entities in the network graph. A adjacency entity refers to a connected entity that has an edge with the network graph entity. For example, in fig. 5, for network graph entity a in the relational network graph, its neighbor entities include B.

Step S406, obtaining a vector association degree according to the association relation expression vector and the target semantic vector, and determining the attention weight corresponding to the adjacent entity according to the vector association degree.

The vector association degree may refer to the association degree of the vector, and may be a similarity. The degree of association is positively correlated with the attention weight. I.e. the greater the degree of association, the greater the attention weight.

Specifically, the server may use the vector relevance degree as the attention weight, or may perform normalization processing on the vector relevance degree to obtain the attention weight.

In some embodiments, the knowledge representation vector corresponding to the network graph entity is output by a knowledge vector determination model, the knowledge vector determination model comprises at least one target hidden layer, and the attention weight corresponding to the knowledge representation vector can be constant or variable for different hidden layers. For example, obtaining a vector relevance degree according to the relevance relation expression vector and the target semantic vector, and determining the attention weight corresponding to the adjacent entity according to the vector relevance degree includes: processing the incidence relation expression vector by using a second hidden layer parameter in the target hidden layer to obtain a key vector; processing the target semantic vector by using a third hidden layer parameter in the target hidden layer to obtain a query vector; calculating to obtain a vector association degree according to the key vector and the query vector; and determining the attention weight corresponding to the adjacent entity according to the vector association degree, wherein the vector association degree and the attention weight corresponding to the adjacent entity form a positive correlation.

The attention mechanism may include a key vector (key) and a query vector (query), and the key vector (key) and the value (first knowledge vector) have a corresponding relationship. The attention weight corresponding to the neighboring entity may be determined based on the relevance of the key vector (key) to the query vector (query). The second hidden layer parameters are model parameters for processing the incidence relation expression vectors in the target hidden layer, the third hidden layer parameters are model parameters for processing the target semantic vectors in the target hidden layer, and the corresponding second hidden layer parameters and the corresponding third hidden layer parameters are changed in different hidden layers, so that the attention weight is also changed, which is obtained by integrating the attention weights obtained by different hidden layers when the target knowledge vector is obtained, and the accuracy of the obtained target knowledge vector can be improved. The vector relevance may be a vector similarity, which may be derived based on a similarity algorithm, such as a cosine similarity algorithm.

Specifically, the server may input the incidence relation expression vector and the target semantic vector into the target hidden layer, calculate a key vector through a second hidden layer parameter of the target hidden layer, calculate a query vector through a third hidden layer parameter of the target hidden layer, calculate a similarity between the query vector and the key vector, and perform normalization processing on the similarity to obtain the attention weight. The formula for obtaining the query vector q can be expressed as formula (9), and the key vector is obtained by calculation

Can be expressed as equation (10),

and

representing a second hidden layer parameter in the i-th hidden layer,

and

the matrix which represents the hidden layer of the ith layer to be learned has a deviation, namely a third hidden layer parameter in the hidden layer of the ith layer, and sigma is an activation function, such as a tanh function.

Step S408, determining a target knowledge representation vector corresponding to the network graph entity according to the attention weight corresponding to the adjacent entity and the entity representation vector of the adjacent entity.

Specifically, there may be one or more adjacent entities corresponding to one network diagram entity, and the server may perform weighted summation of the attention weight and the entity representation vector of the adjacent entity to obtain the knowledge representation vector corresponding to the network diagram entity. Or the trained knowledge vector is used for determining the model parameters of the model to process the entity expression vector of the adjacent entity to obtain a first knowledge vector, and then the first knowledge expression vector and the attention weight corresponding to the adjacent entity are weighted to obtain a target knowledge expression vector corresponding to the network diagram entity. The knowledge vector determination model may comprise one or more hidden layers (hidden layers), and for at least one of the hidden layers, a step of weighting according to the first knowledge representation vector and the attention weight corresponding to the adjacent entity may be performed.

In some embodiments, the knowledge representation vector corresponding to the network graph entity is output by a knowledge vector determination model, the knowledge vector determination model includes at least one target hidden layer, and determining the target knowledge representation vector corresponding to the network graph entity according to the attention weight corresponding to the adjacent entity and the entity representation vector of the adjacent entity includes: inputting the entity representation vector and the incidence relation representation vector of the adjacent entity into a target hidden layer for processing to obtain a first knowledge representation vector corresponding to the network graph entity; and determining a target knowledge representation vector corresponding to the network graph entity according to the first knowledge representation vector corresponding to the network graph entity and the attention weight corresponding to the adjacent entity.

Wherein, the hidden layer is short for the hidden layer. The target hidden layer can be one or more. The hidden layer comprises model parameters obtained by model training, and the entity expression vector is processed based on the model parameters.

Specifically, the server may perform weighting processing on the first knowledge representation vector corresponding to the network graph entity and the attention weight corresponding to the adjacent entity, so as to obtain the knowledge representation vector corresponding to the network graph entity. Because the entity expression vectors and the incidence relation expression vectors of the adjacent entities are input into the hidden layer for processing, the knowledge expression vectors are determined by combining the entity expression vectors and the incidence relation expression vectors, so that the obtained knowledge expression vectors are more accurate.

In some embodiments, the target calculation direction may be determined according to an entity association relationship between the network graph entity and the corresponding adjacent entity, and the target calculation direction is addition or subtraction; calculating the entity expression vector and the incidence relation expression vector of the adjacent entity according to the target calculation direction to obtain a calculation expression vector corresponding to the network graph entity; and processing the calculation expression vector by using hidden layer parameters in the target hidden layer to obtain a first knowledge expression vector corresponding to the network graph entity.

Specifically, when the entity association relationship is that the network graph entity is the head entity and the neighboring entity is the tail entity, the target computation direction is subtraction. When the physical association is that the network graph entity is the tail entity and the neighboring entity is the head entity, then the target computation direction is additive. When determining the entity representation vector and the association representation vector in the relational network graph, the relationship of the entity may be regarded as a translation operation from the head entity to the tail entity, that is, the tail entity may be obtained according to the head entity and the association, so that the entity representation vector of the head entity plus the association representation vector may represent the tail entity. Or the entity representation vector of the tail entity minus the association representation vector may represent the head entity. Therefore, the calculation of the expression vector refers to a vector representing the network diagram entity obtained by calculating the entity expression vector and the association relation expression vector of the adjacent entity according to the target calculation direction. Therefore, the calculation expression vector is processed based on the target hidden layer, and the first knowledge expression vector corresponding to the network graph entity can be accurately obtained.

In some embodiments, the server may obtain an output representation vector corresponding to an adjacent entity output by a previous hidden layer corresponding to a target hidden layer in the knowledge vector determination model; and the target hidden layer processes the calculation expression vector and the output expression vector by using the first hidden layer parameter to obtain a first knowledge expression vector corresponding to the network graph entity.

Specifically, the output representation vector corresponding to the adjacent entity refers to a knowledge representation vector of the adjacent entity output in the previous hidden layer. The first hidden-layer parameters are parameters in the hidden layer that are used to determine the first knowledge representation vector. The knowledge vector determination model can comprise a plurality of hidden layers, for a target hidden layer, a knowledge representation vector corresponding to an adjacent entity output by a previous hidden layer can be obtained and input into the target hidden layer, so that the target hidden layer is continuously processed on the basis of the previous hidden layer, and the knowledge vector can be more and more accurate along with the gradual increase of the depth of the hidden layer.

For example, the knowledge vector determination model may include multiple target hidden layers, and for any network diagram entity in the relational network diagram, at each target hidden layer, the formula for calculating the corresponding first knowledge representation vector may be as in formula (11), and the formula for calculating the target knowledge representation vector of the network diagram entity may be as in formula (11)Can be expressed as equation (12). In the formula (11), the reaction mixture,

represented in the i-th hidden layer, for the network diagram entity e, based on adjacent entities

The entity of (b) represents the first knowledge vector derived from the vector. WⁱThe model parameters in the target hidden layer are obtained by model training,

is that

Knowledge vectors obtained at the i-1 hidden layer, [;]indicating that the expressions are connected horizontally, e.g. will

And

splicing in the horizontal direction is performed. Wherein when the adjacent entity

When the head entity of a triplet, i.e. when

Then, the first term of equation (11) is used for calculation

When adjoining entity

When the tail entity of a triplet, i.e. when

Then, the second term of equation (11) is used for calculation

In the formula (12), eⁱAnd the knowledge representation vector of the network graph entity e representing the output of the i-th hidden layer. N is a radical of_eSet of contiguous nodes representing e, fⁱ(.) is an aggregation function in layer i for weighted calculation based on the first knowledge vector and the corresponding attention weights, i.e. not all knowledge contexts g_mIt is effective to understand the inputted text, and the target knowledge vector can be obtained by giving different weights to the knowledge representation vector obtained based on the adjacent entity through an attention mechanism. f. ofⁱ(.) may be expressed as equation (13). In the formula (13), T represents a transpose,

to represent

The corresponding key vector, q, represents the query vector. Exp represents an exponential function with a natural constant e as the base.

The meanings of formulae (11), (12) and (13) can be explained as follows: in order to dynamically determine the influence of the adjacent entities corresponding to the network diagram entity on the knowledge representation vector of the network diagram entity according to the context content of the text, a knowledge vector determination model can be provided, and the knowledge vector determination model can be called S-GNN (semantic driven graph neural network). The entity expression vector and the association expression vector obtained based on the TransE algorithm in the relational network diagram can be input into the S-GNN, and the importance degree of the association relation between the network diagram entity and the adjacent entity to the semantic meaning is determined by the S-GNN model based on the target semantic vector, so that when the knowledge expression vector corresponding to the network diagram is determined based on the entity expression vector corresponding to the adjacent entity, the more important relation to the target semantic meaning is, the more important degree is for determining the knowledge expression vector, and therefore important knowledge can be selected and obtained, and the knowledge expression vector of the target entity is determined.

Step S410, extracting target knowledge representation vectors corresponding to the target entities from the target knowledge representation vectors corresponding to the network graph entities of the relational network graph.

Specifically, since the relational network graph includes the target entity, that is, the target entity is one of the network graph entities, after the knowledge representation vector corresponding to the obtained network graph entity is obtained, the knowledge representation vector corresponding to the target entity can be extracted and obtained.

In the embodiment of the application, the incidence relation expression vector of the incidence relation between the network graph entity and the adjacent entity is obtained by forming the relation network graph, the vector incidence degree is obtained based on the incidence relation expression vector and the target semantic vector, whether the incidence relation between the network graph entity and the adjacent entity is related to the semantic of the target text or not can be represented, so that the knowledge expression vector of the network graph entity is obtained based on the attention weight and the entity expression vector of the adjacent entity, and the knowledge aggregation can be related to the semantic of the target text. It is to be understood that the aggregation of the entity representation vectors of the neighboring entities to obtain the knowledge representation vectors of the network diagram entities may be performed multiple times, and at each aggregation, the aggregation may be performed by combining the knowledge representation vectors obtained by the previous aggregation and the entity representation vectors of the neighboring entities, for example, the aggregation may be performed by using a knowledge vector determination model, which may include multiple hidden layers, and the knowledge representation vectors output by an upper hidden layer, for example, a layer 2, and the entity representation vectors of the neighboring entities may be further processed as inputs to a next hidden layer, for example, a layer 3, to obtain the knowledge representation vectors output by a next layer.

In one embodiment, as shown in fig. 6, a text processing model training method is provided, which is described by taking the method as an example applied to the server 104 in fig. 1, and includes the following steps:

step S602, a training text and a standard text processing result corresponding to the training text are obtained.

The standard text processing result refers to a standard processing result of the training text, and can be considered as a correct processing result. For example, assuming that the training task is to predict words in a text, the training text may be a text subjected to word masking processing, for example, some words in the complete text may be replaced with a symbol "[ mask ]" to obtain a training text, and the standard text processing result is a masked word. For example, assume that the complete text is "today is friday", and assume that the symbol "[ mask ]" replaces "yes" therein, the training text is "today [ mask ] friday", and the standard text processing result is "yes". For another example, if the training task is to classify the text, the standard processing result is the correct classification result of the text.

Specifically, the server may obtain the initial text, process the initial text, for example, perform word masking processing, to obtain a training text, and obtain a standard text processing result corresponding to the training text.

Step S604, inputting the training text into the text coding model to obtain a training text coding vector, wherein the training text coding vector comprises a training semantic vector corresponding to the training text.

Specifically, the text coding model may encode the training text, for example, segment the training text to obtain a corresponding training participle sequence, and then encode training participles corresponding to each training participle sequence to obtain a training participle coding vector corresponding to each training participle, and a training semantic vector representing the semantics of the training text.

Step S606, a training entity corresponding to the training text is obtained, and a second associated entity corresponding to the training entity is determined.

Specifically, the second associated entity refers to an entity having an association relationship with the training entity. The second associated entity may be, for example, an associated entity whose association order with the training entity is within a preset association order. The manner of obtaining the training entity and the second associated entity corresponding to the training entity may refer to the manner of step S204, and is not described herein again.

Step S608, inputting the entity representation vector corresponding to the second associated entity into the knowledge vector determination model, determining a training knowledge representation vector corresponding to the training entity according to the entity representation vector and the corresponding attention weight, where the attention weight is obtained according to the association degree between the training semantic vector and the association representation vector, and the association representation vector is a vector representing the entity association.

In particular, the knowledge vector determination model is used to derive the knowledge representation vector. The manner how the training knowledge representation vector is obtained may be referred to the manner in which the target knowledge representation vector is obtained.

For example, the server may obtain a training network graph composed of a training entity and a second associated entity, and for a network graph entity in the training network graph, obtain an association relationship representation vector representing an association relationship between the network graph entity and an adjacent entity; obtaining a vector association degree according to the association relation expression vector and the training semantic vector, and determining the attention weight corresponding to the adjacent entity according to the vector association degree; determining training knowledge representation vectors corresponding to all entities in a training network diagram according to attention weights corresponding to adjacent entities and entity representation vectors of the adjacent entities; and extracting training knowledge representation vectors corresponding to the training entities from the training knowledge representation vectors corresponding to the network graph entities of the training network graph.

For another example, the entity representation vectors of the adjacent entities and the association representation vectors may be input to the target hidden layer and processed to obtain first knowledge representation vectors corresponding to each network graph entity in the training network graph, and the training knowledge representation vectors corresponding to the network graph entities may be determined according to the first knowledge representation vectors corresponding to the network graph entities and the attention weights corresponding to the adjacent entities. In the training phase, the parameters of the target hidden layer can be continuously optimized.

And step S610, inputting the training text coding vector and the training knowledge expression vector corresponding to the training entity into a fusion model for fusion processing to obtain a training fusion result.

Specifically, the method for obtaining the training fusion result may refer to a method for obtaining the target fusion result, and details are not repeated herein.

For example, the server may perform knowledge fusion processing on the participle coding vectors corresponding to the training participles according to the training knowledge representation vectors corresponding to the training entities to obtain knowledge fusion coding vectors corresponding to the training participles; fusing knowledge corresponding to training participles into the coding vectors, updating the participle coding vectors corresponding to the training participles in the training participle coding vector sequence, and obtaining an updated participle coding vector sequence; and performing fusion processing on the updated participle coding vector sequence and the training semantic vector by using a fusion model to obtain the participle coding vector sequence after the fusion processing and the training semantic vector after the fusion processing.

And step S612, processing the training fusion result according to the task processing model to obtain a training processing result.

Specifically, the task processing model is used for processing the text. For example, a translation model or a text classification model. Can be set according to different requirements.

In some embodiments, the text processing Model processing method provided in the embodiments of the present application may be performed in a Pre-training stage of Pre-training a Pre-trained language Model (Pre-trained Model), or may be performed in a fine-tuning stage of fine-tuning (fine tune) according to a downstream text processing task and the Pre-trained language Model after the Pre-trained language Model is trained. The fine tuning stage refers to adjusting the task processing model according to the downstream task on the basis of the pre-trained language model. In the pre-training phase, the task processing model may be a model for predicting the occluded words or a sentence relation judgment model. In the fine tuning stage, the task processing model in the fine tuning stage can be used for replacing the task processing in the pre-training task according to the requirement, and then fine tuning is carried out. For example, the fine tuning phase may replace the task processing therein with a summary generation model or a translation model, etc.

Step S614, adjusting parameters of the task processing model or adjusting parameters of the task processing model and the language model according to the training processing result and the standard text processing result, wherein the language model comprises a text coding model, a knowledge vector determination model and a fusion model.

Specifically, the training processing result refers to a processing result output by the model. A Language Model (LM) is a probability distribution of natural language text sequences that characterizes the likelihood of the existence of a particular sequence of text of a particular length. The pre-trained language model may be a pre-trained language model such as BERT or RoBERTa, which may be trained using a large amount of text data to project words in the text into a tensor space. The text processing model includes a language model and a task processing model, and the task processing model may be different according to different text processing tasks, for example, the task processing model may be a text classification model or a translation model. The language model includes a text coding model, a knowledge vector determination model, and a fusion model. In the model training, if the model is in the fine tuning stage, only the model parameters of the task processing model may be adjusted, or the model parameters of the task processing model and the model parameters of the language model may be adjusted at the same time. In the pre-training phase, the model parameters of the task processing model and the language model can be adjusted simultaneously.

In some embodiments, the server may obtain the model loss value according to a difference between the training processing result and the standard text processing result, and the larger the difference is, the larger the model loss value is. And adjusting the parameters of the model towards the direction of reducing the model loss value until the model converges to obtain the text processing model. Wherein the model convergence may refer to a model loss value being less than a preset loss value.

In the pre-training stage, in order to better integrate knowledge into the pre-training language model, entities in the training text can be randomly shielded according to a method of Denoising Entity Auto-encoder (dea), the shielded entities are predicted by learning and using an Entity expression in the text through a text processing model, which entities are in the knowledge graph, namely the probability that the entities in the knowledge graph are the shielded entities, the probability that the entities in the knowledge graph are the shielded entities is obtained according to prediction, a cross entropy loss value is obtained through cross entropy function calculation, and parameters of the model are adjusted according to the cross entropy loss value.

In some embodiments, the manner in which the model loss value is calculated may be determined based on the substrate model employed, e.g., BERT may be used_BASE、RoBERTa_BASEOr Roberta_LARGEAs the base model, the task processing model and the language model of the present application may be built on the base model, and the text encoding parameters may be initialized using the parameters of the base model as initial parameters. Different basis models, corresponding to different loss functions, e.g. when BERT is used_BASEBeing a base model, the corresponding loss function can be expressed as equation (14). Wherein

The value of the model loss is represented,

representing the corresponding loss value of the language model,

the loss value indicating the sentence relation prediction can be, for example, [ CLS ]]The expression(s) of (a) is used to predict the Next Sentence (NSP) of a given text, and the loss value of sentence relation prediction is obtained according to the prediction result and the standard result.

Representing the loss value obtained based on the DEA algorithm. When Roberta is adopted_BASEOr Roberta_LARGEFor the basis model, the corresponding loss function can be expressed as equation (15)

In the text processing model training method, the language model comprises a text coding model, a knowledge vector determination model and a fusion model, the knowledge vector determination model can determine a training knowledge representation vector corresponding to a training entity according to an entity representation vector and a corresponding attention weight, the attention weight is obtained according to the association degree between the training semantic vector and an association relationship representation vector, the association relationship representation vector is a vector representing the entity association relationship, the attention weight is obtained according to the association degree between the training semantic vector and the association relationship representation vector, so the importance degree of the entity representation vector of the association entity to the representation vector of the training entity can be determined according to the semantics of the training text, the attention weight is determined according to the importance degree, and the obtained training knowledge vector can better promote the understanding of the semantics of the training text, therefore, the text comprehension capability of the obtained language model and the task processing model is improved, and the accuracy of the text processing result is improved.

The text processing method provided by the embodiment of the application can be applied to tasks of translating or classifying texts and the like. As shown in fig. 7, in an interface diagram for text translation in some embodiments, in order to translate chinese into english, a terminal may receive an input sentence to be translated, and when a translation operation is received, for example, an operation of clicking a "translation" control in the interface, a translation request may be sent to a server, the server may input the text to be translated into a language model, obtain a target fusion result corresponding to the language model, obtain a translation result based on the target fusion result by a task processing model, return the translation result to the terminal by the server, and display the translation result on the interface by the terminal.

According to the text processing method provided by the embodiment of the application, the pre-trained speech model is used as a base, a subgraph (relational network diagram) can be captured from the knowledge graph by using the context of the entity in the target text, the knowledge representation vector corresponding to the entity is obtained through calculation, and the knowledge representation vector is migrated into the pre-trained language model, so that the understanding ability of the language model to the language is enhanced. For example, for the target text "Steph Curry and Klay Thompson led the Warriors to the 2015NBA Championship" (the total Championship NBA obtained in 2015 by the warrior team of stefin-kory and krey-Thompson band-lead) "for human, it can be inferred from the target text that stefin-kory and krey-Thompson are the same team efficacy, but such intuitive information for human is not necessarily intuitive for language models. So that it can be obtained according to knowledge of the knowledge-graph. In a knowledge graph, not all information is useful, or even has negative effects. For example, referring to FIG. 8, the size of the circle surrounding the entity represents the degree of importance of the entity to the target text, and in FIG. 8, for the target entity "Stephen", the corresponding triple [ Raly, daughter, Stephen ] is less important in a given sentence than [ Stephen, effective for, warrior team ]. Therefore, the attention weight corresponding to the adjacent entity can be determined through the target semantic vector and the incidence relation expression vector, and the knowledge expression vector of each entity is determined based on the entity expression vector corresponding to the adjacent entity and the attention weight, so that more important knowledge for the target text can be focused, and the knowledge expression vector of the target entity can be obtained. The method comprises the steps of screening out necessary knowledge according to the context of a text by introducing human knowledge, giving different weights through an attention mechanism, finally aggregating the knowledge expression vectors corresponding to target entities, and embedding the knowledge expression vectors into a pre-training language model, so that the comprehension capability of the model to the text is improved.

The following describes, with reference to fig. 9, a text processing method provided in an embodiment of the present application, including the following steps:

1. and acquiring a target text to be processed, and coding the target text to obtain a target text coding vector.

Specifically, the target text may be input to a text coding model for coding, so as to obtain a target text coding vector. The target text encoding vector comprises a word segmentation encoding vector sequence and a target semantic vector. For example, assuming that the target text is "Steph current and Klay Thompson led the Warriors to the 2015NBA chapitionship", after the word segmentation, the obtained word segmentation result is "Steph current/and/Klay Thompson/led/the/Warriors/to/the/2015 NBA chapitionship/", and the text coding model may code a word segmentation coding vector corresponding to each word segmentation and a target semantic vector expressing the semantics of the target text. For example, the target semantic vector may be a semantic representation of [ CLS ].

2. And acquiring a target entity in the target text, and determining a first associated entity corresponding to the target entity.

3. And acquiring a relation network diagram formed by the first associated entity and the target entity.

Specifically, the target entities may include, for example, "Steph Curry" and "Klay Thompson". Hereinafter, the "Steph current" will be described as an example. According to the knowledge graph, the "Steph current" efficacy in (play for) "warriors" and "Riley" is the daughter of "Steph current", so that the first associated entities "warriors" and "Riley" corresponding to "Steph current" can be obtained. Assuming that the preset order is 2, the associated entities a and B corresponding to "warriors" and the entities C and D corresponding to "Riley" can be continuously obtained to form the relationship network diagram.

4. And obtaining the vector association degree according to the association relation expression vector and the target semantic vector, and determining the attention weight corresponding to the adjacent entity according to the vector association degree.

5. And determining a target knowledge representation vector corresponding to the network graph entity according to the attention weight corresponding to the adjacent entity and the target entity representation vector of the adjacent entity.

Specifically, as shown in fig. 9, the expression of [ CLS ] of the text coding model is input into the knowledge vector determination model, and for each network graph entity in the relational network graph, a corresponding first knowledge representation vector can be obtained according to the entity representation vectors of the adjacent entities, the attention weight of the first knowledge representation vector is determined based on the expression of CLS and the incidence relation representation vector, and the target knowledge representation vector of the network graph entity is determined based on the first knowledge representation vector and the corresponding attention weight.

Referring back to fig. 9, in fig. 9, for the knowledge representation vector of the network diagram entity "warriors", two first knowledge representation vectors may be determined according to the entity representation vectors of its neighboring entities a and B, attention weights corresponding to the two first knowledge representation vectors may be determined based on the expression of [ CLS ], and the knowledge representation vector of "warriors" is obtained according to the first knowledge representation vectors and the corresponding attention weights.

For the knowledge representation vector of the network diagram entity "Steph Curry", the first knowledge vector a1 may be determined according to the entity representation vector of "warriors" and the first knowledge vector a2 may be determined according to the entity representation vector of "Riley", and the attention weights corresponding to the two first knowledge representation vectors may be determined based on the expression of [ CLS ], for example, in fig. 9, the attention weight corresponding to a1 is 0.9 and the attention weight corresponding to a2 is 0.1. Namely, in the knowledge map, for the text "Steph Curry and Klay Thompson led the Warriors to the 2015NBA Championship", the knowledge that the effectiveness of the "Steph Curry" is more important than the knowledge that the daughter of the "Steph Curry" is "Riley", so that the knowledge that the effectiveness of the "Steph Curry" is more important than the knowledge of the "Warriors" when determining the knowledge representation vector corresponding to the "Steph Curry".

6. And carrying out vector splicing treatment according to the target knowledge expression vector corresponding to the target entity and the participle coding vector corresponding to the target participle to obtain a knowledge fusion coding vector corresponding to the target participle.

For example, for the target entity "Steph Curry", the fusion model may splice the participle coding vector of "Steph Curry" obtained by coding the text coding model with the target knowledge representation vector of "Steph Curry" to obtain the knowledge corresponding to "Steph Curry" fused into the coding vector.

7. And fusing the knowledge corresponding to the target participle into the coding vector, updating the participle coding vector corresponding to the target participle in the participle coding vector sequence, and obtaining the updated participle coding vector sequence.

Specifically, the fusion model may replace the participle code vector of "Steph Curry" in the participle code vector sequence with the knowledge corresponding to "Steph Curry" to be fused into the code vector.

8. And performing fusion processing on the updated participle coding vector sequence and the target semantic vector by using a fusion model to obtain the participle coding vector sequence after the fusion processing and the target semantic vector after the fusion processing.

Specifically, the fusion model may be a BERT-based model, and may obtain a target semantic vector and an updated participle coding vector sequence, and continue encoding to obtain a fused participle coding vector sequence and a fused target semantic vector.

10. And determining a text processing result corresponding to the target text according to the target fusion result.

Specifically, the target fusion result may be at least one of a segmentation coding vector sequence after fusion processing or a target semantic vector after fusion processing, and is specifically selected according to actual needs. For example, for a text classification task, the target semantic vector after the fusion process may be used as a target fusion result.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the above-mentioned flowcharts may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the steps or the stages in other steps.

In one embodiment, as shown in fig. 10, there is provided a text processing apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes: a target text encoding module 1002, a first entity obtaining module 1004, a target knowledge representation vector determining module 1006, a first fusion module 1008, and a text processing result determining module 1010, wherein:

the target text encoding module 1002 is configured to acquire a target text to be processed, and encode the target text to obtain a target text encoding vector, where the target text encoding vector includes a target semantic vector corresponding to the target text.

The first entity obtaining module 1004 is configured to obtain a target entity in the target text, and determine a first associated entity corresponding to the target entity.

A target knowledge representation vector determining module 1006, configured to determine a target knowledge representation vector corresponding to the target entity according to the entity representation vector of the first associated entity and a corresponding attention weight, where the attention weight is obtained according to a degree of association between the target semantic vector and the association representation vector, and the association representation vector is a vector representing an entity association.

The first fusion module 1008 is configured to perform fusion processing on the target text encoding vector and the target knowledge representation vector corresponding to the target entity to obtain a target fusion result.

And the text processing result determining module 1010 is configured to determine a text processing result corresponding to the target text according to the target fusion result.

In some embodiments, the target knowledge representation vector determination module comprises: the relation network graph acquiring unit is used for acquiring a relation network graph formed by the first associated entity and the target entity; an incidence relation representation vector obtaining unit, configured to obtain, for a network graph entity in a relational network graph, an incidence relation representation vector representing an incidence relation between the network graph entity and an adjacent entity; the attention weight determining unit is used for obtaining a vector association degree according to the association relation expression vector and the target semantic vector and determining the attention weight corresponding to the adjacent entity according to the vector association degree; the target knowledge representation vector determining unit is used for determining a target knowledge representation vector corresponding to the network graph entity according to the attention weight corresponding to the adjacent entity and the entity representation vector of the adjacent entity; and the extracting unit is used for extracting target knowledge representation vectors corresponding to the target entities from the target knowledge representation vectors corresponding to the network graph entities of the relational network graph.

In some embodiments, the target knowledge representation vector corresponding to the network graph entity is output by a knowledge vector determination model, the knowledge vector determination model including at least one target hidden layer, the target knowledge representation vector determination unit being configured to: inputting the entity representation vector and the incidence relation representation vector of the adjacent entity into a target hidden layer for processing to obtain a first knowledge representation vector corresponding to the network graph entity; and determining a target knowledge representation vector corresponding to the network graph entity according to the first knowledge representation vector corresponding to the network graph entity and the attention weight corresponding to the adjacent entity.

In some embodiments, the target knowledge representation vector determination unit is to: determining a target calculation direction according to the entity incidence relation between the network graph entity and the adjacent entity, wherein the target calculation direction is addition or subtraction; calculating the entity expression vector and the incidence relation expression vector of the adjacent entity according to the target calculation direction to obtain a calculation expression vector corresponding to the network graph entity; and processing the calculation expression vector by using hidden layer parameters in the target hidden layer to obtain a first knowledge expression vector corresponding to the network graph entity.

In some embodiments, the target knowledge representation vector determination unit is to: acquiring an output expression vector which is output by a previous hidden layer corresponding to a target hidden layer and corresponds to an adjacent entity in a knowledge vector determination model; and the target hidden layer processes the calculation expression vector and the output expression vector by using the first hidden layer parameter to obtain a first knowledge expression vector corresponding to the network graph entity.

In some embodiments, the target knowledge representation vector corresponding to the network graph entity is output by the knowledge vector determination model, and the attention weight determination unit is configured to: processing the incidence relation expression vector by using a second hidden layer parameter in the target hidden layer to obtain a key vector; processing the target semantic vector by using a third hidden layer parameter in the target hidden layer to obtain a query vector; calculating to obtain a vector association degree according to the key vector and the query vector; and determining the attention weight corresponding to the adjacent entity according to the vector association degree, wherein the vector association degree and the attention weight corresponding to the adjacent entity form a positive correlation.

In some embodiments, the target text comprises a plurality of participles, the target text encoding vector comprises a sequence of participle encoding vectors, and the sequence of participle encoding vectors comprises a participle encoding vector corresponding to each participle; the first fusion module includes: a knowledge fusion coding vector obtaining unit, configured to perform knowledge fusion processing on a participle coding vector corresponding to a target participle according to a target knowledge representation vector corresponding to a target entity, so as to obtain a knowledge fusion coding vector corresponding to the target participle; the updating unit is used for fusing the knowledge corresponding to the target participle into the coding vector, updating the participle coding vector corresponding to the target participle in the participle coding vector sequence, and obtaining the updated participle coding vector sequence; and the fusion unit is used for performing fusion processing on the updated participle coding vector sequence and the target semantic vector by using the fusion model to obtain the participle coding vector sequence after the fusion processing and the target semantic vector after the fusion processing.

In some embodiments, the text processing result determination module is to: and inputting the target semantic vector subjected to fusion processing into a trained text classification model to obtain a text classification result corresponding to the target text.

In one embodiment, as shown in fig. 11, there is provided a text processing model training apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and specifically includes: a training text obtaining module 1102, a training text coding vector obtaining module 1104, a second entity obtaining module 1106, a training knowledge representation vector determining module 1108, a training fusion result obtaining module 1110, a training processing result obtaining module 1112, and an adjusting module 1114, wherein:

the training text obtaining module 1102 is configured to obtain a training text and a standard text processing result corresponding to the training text.

A training text coding vector obtaining module 1104, configured to input the training text into the text coding model to obtain a training text coding vector, where the training text coding vector includes a training semantic vector corresponding to the training text.

A second entity obtaining module 1106, configured to obtain a training entity corresponding to the training text, and determine a second associated entity corresponding to the training entity.

The training knowledge representation vector determination module 1108 is configured to input the entity representation vector corresponding to the second associated entity into the knowledge vector determination model, and determine a training knowledge representation vector corresponding to the training entity according to the entity representation vector and the corresponding attention weight, where the attention weight is obtained according to the association degree between the training semantic vector and the association relationship representation vector, and the association relationship representation vector is a vector representing the entity association relationship.

A training fusion result obtaining module 1110, configured to input the training text coding vector and the training knowledge representation vector corresponding to the training entity into the fusion model for fusion processing, so as to obtain a training fusion result.

A training processing result obtaining module 1112, configured to process the training fusion result according to the task processing model to obtain a training processing result.

The adjusting module 1114 is configured to adjust parameters of the task processing model according to the training processing result and the standard text processing result, or adjust parameters of the task processing model and a language model, where the language model includes a text coding model, a knowledge vector determination model, and a fusion model.

The specific limitations of the text processing device and the text processing model training device can be referred to the limitations of the text processing method above, and are not described herein again. The modules in the text processing device and the text processing model training device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store the target text. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of text processing or a method of training a text processing model.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of text processing, the method comprising:

acquiring a target text to be processed, and encoding the target text to obtain a target text encoding vector, wherein the target text encoding vector comprises a target semantic vector corresponding to the target text;

acquiring a target entity in the target text, and determining a first associated entity corresponding to the target entity;

determining a target knowledge representation vector corresponding to the target entity according to the entity representation vector of the first associated entity and a corresponding attention weight, wherein the attention weight is obtained according to the association degree between the target semantic vector and an association representation vector, and the association representation vector is a vector representing entity association;

fusing the target text coding vector and a target knowledge representation vector corresponding to the target entity to obtain a target fusion result;

and determining a text processing result corresponding to the target text according to the target fusion result.

2. The method of claim 1, wherein determining a target knowledge representation vector corresponding to the target entity according to the entity representation vector of the first associated entity and the corresponding attention weight comprises:

acquiring a relationship network graph formed by the first associated entity and the target entity;

for a network graph entity in the relational network graph, acquiring an incidence relation representation vector representing the incidence relation between the network graph entity and an adjacent entity;

obtaining a vector association degree according to the association relation expression vector and the target semantic vector, and determining the attention weight corresponding to the adjacent entity according to the vector association degree;

determining a target knowledge representation vector corresponding to the network graph entity according to the attention weight corresponding to the adjacent entity and the entity representation vector of the adjacent entity;

and extracting target knowledge representation vectors corresponding to the target entities from the target knowledge representation vectors corresponding to the network graph entities of the relational network graph.

3. The method of claim 2, wherein the target knowledge representation vector corresponding to the network graph entity is output by a knowledge vector determination model, the knowledge vector determination model comprises at least one target hidden layer, and the determining the target knowledge representation vector corresponding to the network graph entity according to the attention weight corresponding to the adjacent entity and the entity representation vector of the adjacent entity comprises:

inputting the entity representation vector of the adjacent entity and the incidence relation representation vector into the target hidden layer for processing to obtain a first knowledge representation vector corresponding to the network graph entity;

and determining a target knowledge representation vector corresponding to the network graph entity according to the first knowledge representation vector corresponding to the network graph entity and the attention weight corresponding to the adjacent entity.

4. The method according to claim 3, wherein the inputting the entity representation vector of the adjacent entity and the association relationship representation vector into the target hidden layer for processing to obtain the first knowledge representation vector corresponding to the network graph entity comprises:

determining a target calculation direction according to the entity association relationship between the network graph entity and the adjacent entity, wherein the target calculation direction is addition or subtraction;

calculating the entity representation vector of the adjacent entity and the incidence relation representation vector according to the target calculation direction to obtain a calculation representation vector corresponding to the network graph entity;

and processing the calculation expression vector by using hidden layer parameters in the target hidden layer to obtain a first knowledge expression vector corresponding to the network graph entity.

5. The method according to claim 4, wherein the processing the calculation expression vector by using the hidden layer parameters in the target hidden layer to obtain the first knowledge expression vector corresponding to the network diagram entity comprises:

acquiring an output expression vector corresponding to the adjacent entity, which is output by a previous hidden layer corresponding to the target hidden layer in the knowledge vector determination model;

and the target hidden layer processes the calculation expression vector and the output expression vector by using a first hidden layer parameter to obtain a first knowledge expression vector corresponding to the network graph entity.

6. The method of claim 2, wherein a target knowledge representation vector corresponding to the network graph entity is output by a knowledge vector determination model, the knowledge vector determination model comprises at least one target hidden layer, the obtaining of a vector relevance according to the relevance representation vector and the target semantic vector and the determining of the attention weight corresponding to the adjacent entity according to the vector relevance comprise:

processing the incidence relation expression vector by using a second hidden layer parameter in the target hidden layer to obtain a key vector;

processing the target semantic vector by using a third hidden layer parameter in the target hidden layer to obtain a query vector;

calculating to obtain a vector association degree according to the key vector and the query vector;

and determining the attention weight corresponding to the adjacent entity according to the vector association degree, wherein the vector association degree and the attention weight corresponding to the adjacent entity form a positive correlation relationship.

7. The method of claim 1, wherein the target text comprises a plurality of participles, wherein the target text encoding vector comprises a sequence of participle encoding vectors, and wherein the sequence of participle encoding vectors comprises a participle encoding vector corresponding to each participle;

the fusing the target text encoding vector and the target knowledge representation vector corresponding to the target entity to obtain a target fusion result comprises:

performing knowledge fusion processing on the participle coding vector corresponding to the target participle according to the target knowledge representation vector corresponding to the target entity to obtain a knowledge fusion coding vector corresponding to the target participle;

fusing the knowledge corresponding to the target participle into a coding vector, and updating the participle coding vector corresponding to the target participle in the participle coding vector sequence to obtain an updated participle coding vector sequence;

and performing fusion processing on the updated participle coding vector sequence and the target semantic vector by using a fusion model to obtain a fused participle coding vector sequence and a fused target semantic vector.

8. The method according to claim 7, wherein the determining the text processing result corresponding to the target text according to the target fusion result comprises:

and inputting the fused target semantic vector into a trained text classification model to obtain a text classification result corresponding to the target text.

9. The method according to claim 7, wherein the performing knowledge fusion processing on the participle coding vector corresponding to the target participle according to the target knowledge representation vector corresponding to the target entity to obtain the knowledge fusion coding vector corresponding to the target participle comprises:

and carrying out vector splicing treatment according to the target knowledge expression vector corresponding to the target entity and the participle coding vector corresponding to the target participle to obtain a knowledge fusion coding vector corresponding to the target participle.

10. A method for training a text processing model, the method comprising:

acquiring a training text and a standard text processing result corresponding to the training text;

inputting the training text into a text coding model to obtain a training text coding vector, wherein the training text coding vector comprises a training semantic vector corresponding to the training text;

acquiring a training entity corresponding to the training text, and determining a second associated entity corresponding to the training entity;

inputting the entity representation vector corresponding to the second associated entity into a knowledge vector determination model, and determining a training knowledge representation vector corresponding to the training entity according to the entity representation vector and a corresponding attention weight, wherein the attention weight is obtained according to the association degree between the training semantic vector and an association representation vector, and the association representation vector is a vector representing entity association;

inputting the training text coding vector and a training knowledge representation vector corresponding to the training entity into a fusion model for fusion processing to obtain a training fusion result;

processing the training fusion result according to a task processing model to obtain a training processing result;

and adjusting parameters of the task processing model or adjusting parameters of the task processing model and a language model according to the training processing result and the standard text processing result, wherein the language model comprises the text coding model, the knowledge vector determination model and the fusion model.

11. A text processing apparatus, characterized in that the apparatus comprises:

the target text coding module is used for acquiring a target text to be processed, coding the target text and obtaining a target text coding vector, wherein the target text coding vector comprises a target semantic vector corresponding to the target text;

a first entity obtaining module, configured to obtain a target entity in the target text, and determine a first associated entity corresponding to the target entity;

a target knowledge representation vector determining module, configured to determine a target knowledge representation vector corresponding to the target entity according to the entity representation vector of the first associated entity and a corresponding attention weight, where the attention weight is obtained according to a degree of association between the target semantic vector and an association representation vector, and the association representation vector is a vector representing an entity association;

the first fusion module is used for fusing the target text coding vector and a target knowledge representation vector corresponding to the target entity to obtain a target fusion result;

and the text processing result determining module is used for determining a text processing result corresponding to the target text according to the target fusion result.

12. The apparatus of claim 11, wherein the target knowledge representation vector determination module comprises:

a relationship network graph obtaining unit, configured to obtain a relationship network graph formed by the first associated entity and the target entity;

an incidence relation representation vector obtaining unit, configured to obtain, for a network graph entity in the relational network graph, an incidence relation representation vector representing an incidence relation between the network graph entity and an adjacent entity;

the attention weight determining unit is used for obtaining a vector association degree according to the association relation expression vector and the target semantic vector and determining the attention weight corresponding to the adjacent entity according to the vector association degree;

a target knowledge representation vector determining unit, configured to determine a target knowledge representation vector corresponding to the network graph entity according to the attention weight corresponding to the adjacent entity and the entity representation vector of the adjacent entity;

and the extracting unit is used for extracting target knowledge representation vectors corresponding to the target entities from the target knowledge representation vectors corresponding to the network graph entities of the relational network graph.

13. A text processing model training apparatus, the apparatus comprising:

the training text acquisition module is used for acquiring a training text and a standard text processing result corresponding to the training text;

a training text coding vector obtaining module, configured to input the training text into a text coding model to obtain a training text coding vector, where the training text coding vector includes a training semantic vector corresponding to the training text;

the second entity acquisition module is used for acquiring a training entity corresponding to the training text and determining a second associated entity corresponding to the training entity;

a training knowledge representation vector determination module, configured to input an entity representation vector corresponding to the second associated entity into a knowledge vector determination model, and determine a training knowledge representation vector corresponding to the training entity according to the entity representation vector and a corresponding attention weight, where the attention weight is obtained according to a degree of association between the training semantic vector and an association representation vector, and the association representation vector is a vector representing an entity association;

a training fusion result obtaining module, configured to input the training text coding vector and the training knowledge representation vector corresponding to the training entity into a fusion model for fusion processing, so as to obtain a training fusion result;

a training processing result obtaining module, configured to process the training fusion result according to the task processing model to obtain a training processing result;

and the adjusting module is used for adjusting parameters of the task processing model according to the training processing result and the standard text processing result, or adjusting parameters of the task processing model and a language model, wherein the language model comprises the text coding model, the knowledge vector determining model and the fusion model.

14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 10 when executing the computer program.

15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.