WO2024007119A1

WO2024007119A1 - Training method for text processing model, and text processing method and device

Info

Publication number: WO2024007119A1
Application number: PCT/CN2022/103682
Authority: WO
Inventors: 林雪玲; 李昊阳; 王路宁; 曹琛
Original assignee: 华为技术有限公司
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2024-01-11

Abstract

The present application provides a training method for a text processing model, and a text processing method and device. The training method for a text processing model comprises: determining an initial conceptual graph of training text on the basis of a knowledge graph; and inputting the initial conceptual graph into a relation-aware graph attention network (RGAT) model for training to obtain a target RGAT model, wherein during training, a first conceptual graph in an (i+1)-th iteration is determined according to the correlation between nodes in a second conceptual graph in an i-th iteration and the training text, as well as the weights of edges in the second conceptual graph, the first conceptual graph is a sub-graph of the initial conceptual graph, and the second conceptual graph is a sub-graph of the initial conceptual graph. According to the solution of the present application, the training effect of the RGAT model can be improved, and more accurate coding at a knowledge level can be learned, so that the accuracy of a downstream text processing task is improved.

Description

Text processing model training method, text processing method and device

Technical field

The present application relates to the field of artificial intelligence, and in particular to a training method for a text processing model, a text processing method and a device.

Background technique

Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that can respond in a manner similar to human intelligence. Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

Natural language processing (NLP) is an important research direction in the field of artificial intelligence. Natural language processing tasks are usually performed based on the natural language text itself. The natural language text itself contains relatively limited features, and the text processing effect may not meet expectations. Some solutions use knowledge graphs as auxiliary information for text processing, but these solutions may introduce knowledge graph information that is completely irrelevant to the text content during the text processing process, thus affecting the effect of text processing.

Contents of the invention

This application provides a training method for a text processing model, a text processing method and a device, which can improve the effect of text processing.

In the first aspect, a training method for a text processing model is provided, including: obtaining training text; obtaining a knowledge graph; and determining an initial concept graph of the training text based on the knowledge graph, where the nodes in the initial concept graph include topic nodes. Including candidate entities in the knowledge graph corresponding to the target noun phrase in the training text. The edges between the nodes in the initial concept graph are used to represent the entity relationships between the nodes in the initial concept graph; the initial concept graph is input to the relationship awareness The attention network RGAT model is trained to obtain the target RGAT model. During the training process, the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i iteration and the training text. The degree and the weight of the edge in the second concept map are determined, i is a positive integer, the first concept map is a subgraph of the initial concept map, and the second concept map is a subgraph of the initial concept map.

According to the solution of the embodiment of the present application, during the iterative training process of the RGAT model, the concept map can be optimized according to the correlation between the nodes in the concept map and the training text, and the importance of the edges in the concept map, and the optimized The concept map is used as the concept map used in the next iteration. In this way, it is helpful to reduce the RGAT model's focus on knowledge with low text relevance and strengthen the model's focus on knowledge with greater text relevance, thereby improving the training effect of the RGAT model and learning more accurate knowledge-level encoding, thereby Improve the accuracy of downstream text processing tasks.

For example, the knowledge graph can be a knowledge graph in the professional field to which the training text belongs.

The target noun phrase refers to the noun phrase in the text that corresponds to at least one candidate entity in the knowledge graph. In other words, if a noun phrase in the training text corresponds to at least one candidate entity in the knowledge graph, then the noun phrase can be used as a target noun phrase in the training text.

The concept maps in the iterative process are all subgraphs of the initial concept map. The correlation between the nodes in the concept map and the training text during the iterative process is the correlation between the same node in the initial concept map and the training text.

For example, the correlation between the nodes in the initial concept map and the training text may be determined based on the importance of the nodes. The importance of a node can be determined by the eigenvector centrality of the node.

Combined with the first aspect, in some implementations of the first aspect, the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.

According to the solution of the embodiment of the present application, the topic node may include all candidate entities corresponding to each target noun phrase in the knowledge map, and the resulting concept map covers all candidate entities related to the text data and the corresponding entity relationships. This can provide comprehensive and complete knowledge related to the text for subsequent processing. Using the solution of the embodiment of the present application to learn the expression of the knowledge level of the text further ensures the accuracy of downstream tasks and avoids errors due to missing part of the knowledge. reasoning path.

Combined with the first aspect, in some implementations of the first aspect, the first concept map in the i+1th iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and The weight of the edge in the second concept map is determined, including: selecting the edge in the second concept map as the first according to the order of the ratio between the income of the edge in the second concept map and the first consumption in descending order. For the edges in the concept map, until the sum of the first consumption of the selected edges is greater than the threshold, the income of the edges in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map The first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.

Combined with the first aspect, in some implementations of the first aspect, the first concept map in the i+1th iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and The weight of the edge in the second concept map is determined, including: selecting the connected subgraph in the second concept map as the first concept map in order of the second cost of the connected subgraph in the second concept map from small to large. Connected subgraphs until all selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.

Combined with the first aspect, in some implementations of the first aspect, the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic correlation graph, and the nodes in the topic correlation graph include topic nodes ,The weight of the edge in the topic related graph is ,determined based on the number of entity relationships between the ,corresponding entities in the knowledge graph between the two ,nodes connected by the edge.

The eigenvector centrality of a topic node is determined based on the initial importance of the topic node and the weight of the edges in the topic correlation graph. The initial importance of a node in the topic related graph is set based on the probability of the node appearing in the facts recorded in the knowledge graph.

Combined with the first aspect, in some implementations of the first aspect, the initial concept graph further includes neighbor nodes, and the neighbor nodes include neighbor entities in the knowledge graph of the candidate entities corresponding to the target noun phrase.

The initial concept map in the embodiment of the present application also includes neighbor entities, that is, neighbor nodes, that are connected to the candidate entities (ie, topic nodes) in the knowledge map, which can further provide more comprehensive and complete knowledge and help improve the accuracy of coding at the knowledge level. .

Combined with the first aspect, in some implementations of the first aspect, the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph, and the nodes in the information propagation graph include the initial For nodes in the concept map, when the first node in the initial concept map is a one-hop neighbor of the second node, there is a link between the second node and the first node in the information propagation graph from the second node to the first node. ’s directed edge.

Combined with the first aspect, in some implementations of the first aspect, the score of the strongly connected branch on the information propagation graph is by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected branch according to the topological sorting. Obtained, the initial score of the strongly connected branch where the topic node is located is determined based on the maximum importance of the nodes in the strongly connected branch where the topic node is located.

In the second aspect, a text processing method is provided, which method includes: obtaining the text to be processed; obtaining the knowledge graph; determining the text encoding of the text to be processed; determining the concept map of the text to be processed based on the knowledge graph; through the target RGAT processes the concept map of the text to be processed to obtain the knowledge encoding of the text to be processed. The target RGAT is obtained by inputting the initial concept map of the training text into RGAT for training. During the training process, the i+th The first concept map in one iteration is determined based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and the weight of the edges in the second concept map. i is a positive integer, and the first The concept map is a subgraph of the initial concept map, and the second concept map is a subgraph of the initial concept map. The nodes in the initial concept map include topic nodes, and the topic nodes include candidate entities in the knowledge map corresponding to the target noun phrases in the training text. , the edges between nodes in the initial concept map are used to represent the entity relationships between nodes in the initial concept map; the processing results of the text to be processed are determined based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed. .

According to the solution of the embodiment of the present application, during the iterative training process of the RGAT model, the concept map can be optimized according to the correlation between the nodes in the concept map and the training text, and the importance of the edges in the concept map, and the optimized The concept map is used as the concept map used in the next iteration. In this way, it is helpful to reduce the RGAT model's focus on knowledge with low text relevance and strengthen the model's focus on knowledge with greater text relevance, thereby improving the training effect of the RGAT model and learning more accurate knowledge-level encoding, thereby Improve the accuracy of processing results.

Combined with the second aspect, in some implementations of the second aspect, the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.

The topic nodes in the concept map of the text to be processed may include all candidate entities in the knowledge map corresponding to the target noun phrases in the text to be processed.

According to the solution of the embodiment of the present application, the topic node may include all candidate entities corresponding to each target noun phrase in the knowledge map, and the resulting concept map covers all candidate entities related to the text data and the corresponding entity relationships. This can provide comprehensive and complete knowledge related to the text for subsequent processing, avoiding incorrect reasoning paths due to missing part of the knowledge. Moreover, the target RGAT focuses on knowledge that is highly relevant to the text during the training process, further ensuring that improve the accuracy of downstream tasks.

Combined with the second aspect, in some implementations of the second aspect, the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and The weight of the edge in the second concept map is determined, including: selecting the edge in the second concept map as the first according to the order of the ratio between the income of the edge in the second concept map and the first consumption in descending order. For the edges in the concept map, until the sum of the first consumption of the selected edges is greater than the threshold, the income of the edges in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map The first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.

Combined with the second aspect, in some implementations of the second aspect, the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and The weight of the edge in the second concept map is determined, including: selecting the connected subgraph in the second concept map as the first concept map in order of the second cost of the connected subgraph in the second concept map from small to large. Connected subgraphs until all selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.

Combined with the second aspect, in some implementations of the second aspect, the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic correlation graph, and the nodes in the topic correlation graph include topic nodes. ,The weight of the edge in the topic related graph is ,determined based on the number of entity relationships between the ,corresponding entities in the knowledge graph between the two ,nodes connected by the edge.

Combined with the second aspect, in some implementations of the second aspect, the nodes in the initial concept graph also include neighbor nodes, and the neighbor nodes include neighbor entities in the knowledge graph of the candidate entities corresponding to the target noun phrase.

Neighbor nodes in the concept graph of the text to be processed may include neighbor entities in the knowledge graph of candidate entities corresponding to the target noun phrases in the text to be processed.

Combined with the second aspect, in some implementations of the second aspect, the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph. The nodes in the information propagation graph include the initial For nodes in the concept map, when the first node in the initial concept map is a one-hop neighbor of the second node, there is a link between the second node and the first node in the information propagation graph from the second node to the first node. ’s directed edge.

Combined with the second aspect, in some implementations of the second aspect, the score of the strongly connected branch on the information propagation graph is by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected branch according to the topological sorting. Obtained, the initial score of the strongly connected branch where the topic node is located is determined based on the maximum importance of the nodes in the strongly connected branch where the topic node is located.

Combined with the second aspect, in some implementations of the second aspect, the method further includes: outputting the knowledge path (knowledge path) in the concept map of the text to be processed based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed. path), this knowledge path is used to indicate the basis for judging the processing results.

A knowledge path refers to the path between two nodes in the concept map. For example, the k-hop knowledge path between node e _q and node e _q+k can be expressed as (e _q ,r _q ,e _q+1 ,r _q+1 ,…,r _q+k-1 ,e _{q+ k} ), (e _q ,r _q ,e _q+1 ) is a triplet, r _q represents the entity relationship between the two nodes, and so on. q is a positive integer.

The knowledge path can improve the interpretability of the model and provide users with a basis for judging the processing results, that is, the complete logic of the processing results can be obtained, which is conducive to improving the user's trust.

In a third aspect, a text processing model training device is provided. The device includes a unit for executing the method of any implementation of the first aspect.

A fourth aspect provides a text processing device, which includes a unit for executing the method of any implementation of the second aspect.

In a fifth aspect, a training device for a text processing model is provided. The device includes: a memory for storing a program; a processor for executing the program stored in the memory. When the program stored in the memory is executed, The processor is configured to execute the method in any implementation manner of the first aspect.

The processor in the fifth aspect mentioned above can be either a central processing unit (CPU) or a combination of a CPU and a neural network computing processor. The neural network computing processor here can include a graphics processor (graphics processing unit (GPU), neural-network processing unit (NPU) and tensor processing unit (TPU), etc. Among them, TPU is an artificial intelligence accelerator dedicated integrated circuit fully customized by Google for machine learning.

In a sixth aspect, a text processing device is provided. The device includes: a memory for storing a program; a processor for executing the program stored in the memory. When the program stored in the memory is executed, the The processor is a unit configured to execute the method of any implementation manner of the second aspect.

The processor in the sixth aspect may be a CPU or a combination of a CPU and a neural network computing processor. The neural network computing processor here may include a GPU, NPU or TPU, etc.

In a seventh aspect, a computer-readable medium is provided. The computer-readable medium stores program code for device execution. The program code includes any implementation manner for executing any one of the first to second aspects. method in.

An eighth aspect provides a computer program product containing instructions, which when the computer program product is run on a computer, causes the computer to execute the method in any implementation of any one of the above-mentioned first to second aspects.

A ninth aspect provides a chip. The chip includes a processor and a data interface. The processor reads instructions stored in a memory through the data interface and executes any one of the above first to second aspects. method in any implementation.

Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, and the processor is configured to execute the instructions stored in the memory. When the instructions are executed, the The processor is configured to execute the method in any implementation manner of any one of the first aspect to the second aspect.

Description of the drawings

Figure 1 is a schematic block diagram of a natural language processing system provided by an embodiment of the present application;

Figure 2 is a schematic block diagram of a system architecture provided by an embodiment of the present application;

Figure 3 is a schematic block diagram of a text processing system provided by an embodiment of the present application;

Figure 4 is a schematic block diagram of a text processing model provided by an embodiment of the present application;

Figure 5 is a schematic flow chart of a text processing model training method provided by an embodiment of the present application;

Figure 6 is a schematic diagram of knowledge extraction provided by an embodiment of the present application;

Figure 7 is a schematic diagram of the construction process of a concept map provided by the embodiment of the present application;

Figure 8 is a schematic diagram of a subject-related diagram provided by an embodiment of the present application;

Figure 9 is a schematic diagram of an information dissemination diagram provided by an embodiment of the present application;

Figure 10 is a schematic diagram of another text processing model training method provided by an embodiment of the present application;

Figure 11 is a schematic flow chart of a text processing method provided by an embodiment of the present application;

Figure 12 is a schematic flow chart of another text processing method provided by an embodiment of the present application;

Figure 13 is a schematic diagram of a text classification result provided by an embodiment of the present application;

Figure 14 is a schematic block diagram of a training device provided by an embodiment of the present application;

Figure 15 is a schematic block diagram of a text processing device provided by an embodiment of the present application;

Figure 16 is a schematic block diagram of another training device provided by an embodiment of the present application;

Figure 17 is a schematic block diagram of another text processing device provided by an embodiment of the present application.

Detailed ways

The technical solutions in this application will be described below with reference to the accompanying drawings.

Natural language processing is an important research direction in the field of artificial intelligence, which enables humans and machines to interact through natural language. Natural language processing tasks are usually performed based on the natural language text itself. The natural language text itself contains relatively limited features, and the text processing effect may not meet expectations. In order to improve the effect of text processing, some solutions introduce knowledge graphs as auxiliary information for text processing. However, the introduction of knowledge graphs may bring other problems. For example, the ambiguity of entities in the knowledge graph and the noise of the knowledge graph may lead to the introduction of knowledge graph information that is completely irrelevant to the text content during text processing, and the text processing results cannot be guaranteed. accuracy.

The embodiment of the present application provides a text processing method, which is beneficial to improving the effect of text processing.

In order to facilitate understanding of the embodiments of the present application, the relevant concepts of relevant terms involved in the embodiments of the present application are first introduced below.

(1)Natural language processing

Natural language is human language, and natural language processing is the processing of human language. Natural language processing is the process of systematically analyzing, understanding and extracting information from text data in an intelligent and efficient way. NLP and its components can manage very large blocks of text data, or perform a large number of automated tasks, and solve a variety of problems, such as automatic summarization (automatic summarization), machine translation (MT), named entity recognition ( named entity recognition (NER), relationship extraction (RE), information extraction (IE), sentiment analysis, speech recognition (speech recognition), question answering system (question answering), topic segmentation, etc.

(2) Knowledge graph (KG)

Knowledge graph is a knowledge base that integrates real-world facts through a graph-structured data model. Knowledge graphs are often used to store entities that are related to each other. For example, a fact that represents the existence of some entity relationship between two entities can be expressed as a triple data structure in the form of (entity, entity relationship, entity).

Entities are represented as nodes in the knowledge graph and represent conceptual entities in the real world. For example, "Peking University (Organization)", "Vitamin B12 (Medical Element)" and "Hemoglobin (Medical Element)" etc. Entity relationships are represented by edges between nodes corresponding to two entities in the knowledge graph, representing the relationship between two entities in the real world. For example, the entity relationship between "vitamin B12" and "hemoglobin" is "increase". (Vitamin B12, increases hemoglobin) indicates the fact that vitamin B12 increases hemoglobin.

The knowledge graph in the professional field refers to the knowledge graph containing entities, relationships and facts in the professional field. For example, knowledge graphs in the financial domain are used to indicate entities, relationships, and facts in the financial domain. The knowledge graph in the medical field is used to indicate entities, relationships and facts in the medical field.

The triplet data structure extracted from natural language text that can express facts can be called the knowledge triplet of the text, in the form of (noun phrase, relational phrase, noun phrase). Among them, a noun phrase can include one word or multiple words. A relational phrase can include one word or multiple words.

A noun phrase can correspond to one or more candidate entities within the knowledge graph. For example, the Chinese noun phrase "apple" can correspond to candidate entities such as "apple (fruit)" or "apple (company)". For another example, the English noun phrase "anemia" can correspond to candidate entities such as "anemia(disease)", "anemia(symptom)" or "anemia(plant)".

(3) Knowledge graph embedding

Knowledge graph embedding refers to mapping the entities and entity relationships in the knowledge graph to a low-dimensional vector space to obtain the embedded representation of the knowledge graph and realize the semantic information representation of entities and entity relationships. The embedded representation of knowledge graphs can be used for various tasks related to knowledge graphs. For example, the embedded representation of the knowledge graph may include at least one of the following: an embedded representation of an entity or an embedded representation of a relationship, etc.

The embedded representation of the knowledge graph can be obtained through the knowledge graph embedding model. The knowledge graph embedding model can be implemented based on graph neural network (GNN).

(4) K-hop neighbors of the node (k-hop neighbors)

In graph theory, the k-hop neighbors of a node in the graph refer to the set of all nodes whose shortest path to the node is k-hops starting from the node. k is a positive integer.

(5) Directed acyclic graph (DAG)

In graph theory, if a directed graph starts from any node and cannot return to that point through several edges, then the graph is a directed acyclic graph.

(6)Topological sorting

Topological sorting of a directed acyclic graph G is to arrange all the nodes in G into a linear sequence, so that for any pair of nodes u and v in the graph, the edge <u,v> represents the path from node u to node v. , the edge set E(G) represents the set of edges between each node in G. If the edge <u,v>∈E(G), then u appears before v in the linear sequence. Usually, such a linear sequence is called a sequence that satisfies topological order, or a topological sequence for short.

In other words, a topological sequence needs to satisfy two conditions:

1) Each node in the directed acyclic graph G appears only once in the topological sequence.

2) If there is a path from node u to node v in the directed acyclic graph G, then node u appears before node v in the topological sequence.

A directed acyclic graph can have one or more topologically ordered sequences.

(7) Strongly connected branch

If a directed graph G has a directed path from v to u and from u to v for any two nodes v and u, then the directed graph G is called a strongly connected graph. In a directed graph G, if two nodes u and v have directed paths in both directions, then u and v are said to be strongly connected.

In other words, if two nodes v and u in the directed graph G can reach each other, then the two nodes are strongly connected. If every two nodes of a directed graph G are strongly connected, then the directed graph G is said to be a strongly connected graph.

For a strongly connected subgraph S of a directed graph G that is not a strongly connected graph, if adding any node to S will cause S to lose the property of strong connectivity, then the strongly connected subgraph S is a maximal strongly connected subgraph of G, or , S can also be called a strongly connected branch of G.

For example, Tarjan's algorithm can be used to solve strongly connected branches of directed graphs. Specifically, this algorithm can be used to calculate the size of each strongly connected branch in the directed graph, the nodes of each strongly connected branch, the total number of strongly connected branches, etc.

(8) Connected subgraph

In graph theory, any two nodes in a connected subgraph of an undirected graph are connected to each other by a path, and nodes are not connected in a hypergraph.

(9) Eigenvector centrality (eigenvector centrality)

In graph theory, eigenvector centrality is a way to measure the influence of a node on a network.

The importance of a node usually depends on the number of neighbor nodes of the node (that is, the degree of the node) and the importance of the node's neighbor nodes. The more important the neighbor nodes connected to it are, the more important the node is.

For example, the importance of a node can be expressed as a score. The higher the score, the more important the node is. For nodes with the same number of connections, the node with a higher score of the adjacent node will have a higher score than the node with a lower score of the adjacent node. According to this principle, all nodes can be assigned corresponding scores. A higher eigenvector score means that the node is connected to many nodes that themselves have higher scores.

(10)Neural network

The neural network can be composed of neural units. The neural unit can refer to an arithmetic unit that takes x _s and intercept 1 as input. The output of the arithmetic unit can be as follows:

Among them, s=1, 2,...n, n is a natural number greater than 1, W _s is the weight of x _s , and b is the bias of the neural unit.

f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to transform the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next layer. For example, the activation function can be ReLU, tanh or sigmoid function.

A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.

(11)Graph neural network

Graph neural network refers to the general term for algorithms that use neural networks to learn graph-structured data, extract and explore features and patterns in graph-structured data, and meet the needs of graph learning tasks such as clustering, classification, prediction, segmentation, and generation.

For graph-structured data, since each node has a close relationship with its neighbor nodes, the graph neural network can aggregate the neighborhood information of each node to obtain the embedded representation of each node.

The attention mechanism allows a neural network to focus only on the information required for task learning. The attention mechanism is introduced into GNN to form a graph attention network (GAT). GAT focuses on nodes and edges that are more relevant to the task, which can improve the processing effect.

(12)Relation-aware graph attention network (RGAT)

RGAT is a graph neural network that can model various relationships in the graph structure through the graph attention mechanism to obtain the vector expression of the nodes in the graph structure and the relationships between nodes in a low-dimensional space. For example, RGAT can be used to process the input knowledge graph to obtain the low-dimensional space vector of each entity and entity relationship on the knowledge graph.

As mentioned above, the solution of the embodiment of the present application can be applied to natural language processing tasks. Figure 1(a) shows an application scenario of the natural language processing system. In this scenario, the natural language processing system includes user equipment and data processing equipment. User equipment includes users and smart terminals such as mobile phones, personal computers, or information processing centers. User equipment is the initiator of natural language data processing. As the initiator of language question and answer or query requests, users usually initiate requests through user equipment.

Data processing equipment can be cloud servers, network servers, application servers, management servers and other devices or servers with data processing functions. The data processing equipment receives query statements/voice/text and other questions from the smart terminal through an interactive interface, and then performs machine learning, deep learning, search, reasoning, decision-making, etc. through the memory that stores the data and the processor that processes the data. data processing. Storage can be a general term that includes local storage and databases that store historical data. The database can be located on the data processing device or on other network servers.

(b) of Figure 1 shows another application scenario of the natural language processing system. In this scenario, the user device directly serves as a data processing device, directly receiving input from the user and processing it directly by the hardware of the user device itself. The specific process is similar to (a) in Figure 1. Please refer to the above description and will not be repeated here. .

(c) of FIG. 1 shows a schematic diagram of related equipment of the natural language processing system provided by the embodiment of the present application. The natural language processing system may include a local device 101, a local device 102, an execution device 110 and a data storage system 150, where the local device 101 and the local device 102 are connected to the execution device 110 through a communication network.

The execution device 110 is implemented by one or more servers, and optionally cooperates with other computing devices, such as data storage, routers, load balancers and other devices; the execution device 110 can be arranged on a physical site, or distributed across multiple on the physical site. The execution device 110 can use the data in the data storage system 150 or call the program code in the data storage system 150 to implement the training method of the text processing model in the embodiment of the present application.

It should be noted that the above-mentioned execution device 110 can also be called a cloud device, and in this case, the execution device 110 can be deployed in the cloud. Alternatively, the execution device 110 may also be a terminal device. In this case, the execution device 110 may be deployed on the user terminal side, which is not limited in the embodiments of the present application.

Users may operate respective user devices (eg, local device 101 and local device 102) to interact with execution device 110. Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, game console, etc.

Each user's local device can interact with the execution device 110 through a communication network of any communication mechanism/communication standard. The communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

The data storage system 150 can be integrated on the execution device 110, the local device 101 or the local device 102, or can be set up on the cloud or other network servers.

In one implementation, the local device 101 or the local device 102 can obtain the relevant parameters of the text processing model from the execution device 110, and use the text processing model on the local device 101 or the local device 102 to obtain the execution result of the text processing task.

In another implementation, the text processing model can be deployed directly on the execution device 110. The execution device 110 obtains the text to be processed from the local device 101 and the local device 102, and obtains the execution result of the text processing task through the text processing model.

The user equipment in Figure 1(a) and Figure 1(b) may be the

local device

101 or 102 in Figure 1(c). The data processing in Figure 1(a) and Figure 1(b) The device may be the execution device 110 in (c) of Figure 1 .

Figure 2 shows a system architecture 200 provided by an embodiment of the present application. The data collection device 260 is used to collect training data and store it in the database 230. The training device 220 generates a target model/rule 201 based on the training data maintained in the database 230, for example, the text processing model in the embodiment of the present application. The model in the embodiment of this application may be a neural network model, or it may also be other models. The training data may include training text and target processing results of the training text, such as labels of the training text.

It should be noted that in actual applications, the training data maintained in the database 230 may not necessarily be collected by the data collection device 260, but may also be received from other devices. In addition, it should be noted that the training device 220 does not necessarily perform training of the target model/rules 201 based entirely on the training data maintained by the database 230. It may also obtain training data from the cloud or other places for model training. The above description should not be regarded as a limitation of this application. Limitations of Examples.

Figure 2 shows the functional module diagram in the data processing process. For example, client device 240 in FIG. 2 may be the user device of FIG. 1 . When the data processing capability of the user equipment in Figure 1 is relatively strong, the execution device 210 and the data storage system 250 in Figure 2 can be integrated into the user equipment in Figure 1 . In some embodiments, the execution device 210 and the data storage system 250 in Figure 2 can also be integrated on the data processing device in Figure 1 . The database 230, training device 220 and data collection device 260 in Figure 2 can be integrated correspondingly on the data processing device in Figure 1, and can be set up on the cloud or other servers on the network.

For example, the data collection device 260 may be a terminal device, or an input and output interface of a server or cloud, an interaction layer (interface) used to obtain user input and return processing results.

The target model/rule obtained by the training device 220 can be applied in different systems or devices. As applied to the execution device 210 shown in Figure 2, the execution device 210 can be a terminal, such as a mobile phone terminal, a tablet computer, a laptop, AR/VR, a vehicle-mounted terminal, etc., or it can also be a server or cloud. In Figure 2, the execution device 210 is configured with an I/O interface 212 for data interaction with external devices. The "user" can input data to the I/O interface 212 through the client device 240.

When the execution device 210 preprocesses the input data, or when the calculation module 211 of the execution device 210 performs calculations and other related processes, the execution device 210 can call the data, code, etc. in the data storage system 250, or can transfer the data, Instructions, etc. are stored in data storage system 250.

Finally, the I/O interface 212 returns the processing results to the client device 240 and provides them to the user.

It is worth mentioning that the training device 220 can generate corresponding target models/rules 201 based on different data for different goals to provide users with better results.

In the situation shown in FIG. 2 , the user can manually specify the data to be input into the execution device 210 , for example, by operating in the interface provided by the I/O interface 212 . In another case, the client device 240 can automatically input data to the I/O interface 212 and obtain the results. If the client device 240 automatically inputs data and requires the user's authorization, the user can set corresponding permissions in the client device 240 . The user can view the results output by the execution device 210 on the client device 240, and the specific presentation form may be display, sound, action, etc. The client device 240 can also serve as a data collection terminal to store the collected data in the database 230.

It is worth noting that Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application. The positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in Figure 2, data storage The system 250 is an external memory relative to the execution device 210. In other cases, the data storage system 250 can also be placed in the execution device 210.

Figure 3 shows a schematic block diagram of a text processing system provided by an embodiment of the present application. The execution device 210 in Fig. 2, the data processing devices in (a) and (b) of Fig. 1, or the execution device 110 in (c) of Fig. 1 can be deployed on the server in Fig. 3.

The text processing model in the embodiment of the present application can be implemented through program code deployed on the hardware of the server. In other words, the text processing model of the embodiment of the present application can be modified and implemented on the basis of the existing software platform. Specifically, the program code runs in the server's host storage (host memory or disk as shown in Figure 3) and acceleration hardware (such as GPU, FPGA or dedicated chip, etc.) memory. For example, the dedicated chip may be a neural network operation processor, which can be used to perform operations on the neural network model.

Figure 4 shows a schematic structural diagram of a text processing model provided by an embodiment of the present application. As shown in FIG. 4 , the text processing model 400 includes a knowledge extraction module 410 , a text encoding module 420 , a knowledge encoding module 430 and a task processing module 440 .

Among them, the knowledge extraction module 410 is used to extract knowledge from input data. The input data can be text. During the training process, the input data can be training text, and during the inference process, the input data can be text to be processed.

The text encoding module 420 is used to encode the text to obtain text-level encoding of the text.

The knowledge encoding module 430 is used to process the knowledge extracted by the knowledge extraction module 410 through RGAT to generate knowledge-level encoding of the text.

In a possible implementation, the text-level encoding of the text can be used as an input to the knowledge encoding module 430 and participate in the process of the knowledge encoding module 430 generating the knowledge-level encoding.

During the training process, the output of the knowledge encoding module 430 can be understood as predictive encoding of the knowledge level of the text.

During the reasoning process, the output of the knowledge encoding module 430 is the encoding of the knowledge level of the text.

The task processing module 440 is configured to output text processing results based on text-level coding and knowledge-level coding.

For example, the text processing model shown in Figure 4 may be a text classification model for text classification. In this case, the task processing module 440 may be configured to output the result of text classification, that is, predict the category of the text based on text-level coding and knowledge-level coding.

For the specific process of text processing by the text processing model 400, please refer to the method below.

Figure 5 shows a schematic flow chart of a text processing model training method provided by an embodiment of the present application. The method 500 shown in Figure 5 can be executed by a device or device capable of performing a text processing model training process. For example, the device can be a cloud service device or a terminal device, such as a computer, server, vehicle, mobile phone, etc. with computing capabilities. The device sufficient to execute the training method of the text processing model may also be a system composed of a cloud service device and a terminal device. For example, the method 500 may be executed by any one of the execution device 110 in FIG. 1 , a local device, or the training device 220 in FIG. 2 .

For example, the method 500 may be specifically executed by the training device 220 as shown in FIG. 2 , and the training data in the method 500 may be training data maintained in the database 230 as shown in FIG. 2 .

For example, the text processing model in method 500 may be the text processing model shown in Figure 4. The knowledge encoding module in the text processing model can be implemented through the RGAT model, and the training method of the text processing model can also be understood as the training method of RGAT.

Method 500 includes steps 510 to 540. Steps 510 to 540 are described below.

510, obtain training text.

520, obtain the knowledge graph.

530. Determine an initial concept map of the training text based on the knowledge map. The nodes in the initial concept map include topic nodes, where the topic nodes include candidate entities in the knowledge map corresponding to the target noun phrases in the training text. In the initial concept map The edges between the nodes are used to represent the entity relationships between the nodes in the initial concept graph.

540. Input the initial concept map into the RGAT model for training to obtain the target RGAT model. During the training process, the first concept map in the i+1 iteration is based on the nodes in the second concept map in the i-th iteration. Determined by the relevance to the training text and the weight of the edge in the second concept graph, i is a positive integer. The first concept map is a subgraph of the initial concept map, and the second concept map is a subgraph of the initial concept map.

According to the solution of the embodiment of the present application, during the iterative training process of the RGAT model, the concept map can be optimized according to the correlation between the nodes in the concept map and the training text, and the importance of the edges in the concept map, and the optimized The concept map is used as the concept map used in the next iteration. In this way, it is helpful to reduce the RGAT model's focus on knowledge with low text relevance and strengthen the model's focus on knowledge with greater text relevance, thereby improving the training effect of the RGAT model and learning more accurate knowledge-level encoding, thereby Improve the accuracy of downstream text processing tasks. Knowledge with low textual relevance can also be understood as redundant knowledge or ambiguous knowledge. Knowledge with greater text relevance can also be understood as key knowledge.

For example, the knowledge graph can be represented as graph-structured data, used to represent entities and their relationships that exist in the real world. Entities can be represented as nodes in the knowledge graph, and entity relationships can be represented as edges in the knowledge graph. In the embodiment of the present application, nodes in the knowledge graph may also be called entity nodes in the knowledge graph or entities in the knowledge graph. The edges in the knowledge graph can also be called relationship edges in the knowledge graph or entity relationships in the knowledge graph.

The knowledge graph may be an existing knowledge graph, or it may be a pre-constructed knowledge graph, which is not limited in the embodiments of the present application.

The knowledge graph in step 520 may be a knowledge graph in the professional field to which the training text belongs.

For example, the text is data in the medical field, and the knowledge graph can be a knowledge graph in the medical field.

For another example, the text is data in the financial field, and the knowledge graph can be a knowledge graph in the financial field.

Knowledge graphs can be constructed based on corpus in professional fields. For example, the corpus can include website articles or books, etc. Knowledge graphs in different professional fields can be constructed based on corpora in different professional fields.

For example, the fact in the knowledge graph that represents the existence of an entity relationship between two entities can be represented as a triple data structure.

It should be understood that this is only an example, and the facts in the knowledge graph can also be expressed in other forms other than triples, which is not limited in the embodiments of the present application.

Illustratively, the construction process of the initial concept map may be performed by the knowledge encoding module 430 shown in FIG. 4 .

The initial concept map can be represented as graph-structured data that is used to indicate knowledge in the text. The initial concept graph can be understood as a subgraph of the knowledge graph. The nodes in the concept map can also be called entities in the concept map or entity nodes in the concept map.

The topic nodes in the initial concept map include candidate entities in the knowledge map that correspond to the target noun phrases in the training text. It can also be understood that the topic nodes in the initial concept map correspond to the target noun phrases in the knowledge map that correspond to the training text. candidate entities. There can be a one-to-one correspondence between topic nodes and candidate entities.

The edges in the initial concept map are used to represent the entity relationships between nodes in the initial concept map. It can also be understood that the edges in the initial concept map are used to represent entities in the knowledge map that correspond to the nodes in the initial concept map. entity relationship. The edges between nodes in the initial concept graph are determined based on the entity relationships in the knowledge graph. By connecting the nodes in the initial concept graph according to the entity relationships in the knowledge graph, the edges between the nodes in the initial concept graph can be obtained.

For example, an edge between node A and node B in the initial concept map is used to represent the entity relationship between node A and node B. Node A in the initial concept map corresponds to entity A in the knowledge map, and node B in the initial concept map corresponds to entity B in the knowledge map. The entity relationship between node A and node B is the entity relationship between entity A and entity B in the knowledge graph. Entity A in the knowledge graph can also be called node A in the knowledge graph. Node B in the knowledge graph can also be called node B in the knowledge graph. The entity relationship between node A and node B is the node in the knowledge graph. Entity relationship between node A and node B.

The target noun phrase in the text can be obtained based on the knowledge graph.

For example, knowledge triples are extracted from the text based on the knowledge graph, that is, knowledge triples of the text. The form of the knowledge triplet of the text can be (noun phrase, relational phrase, noun phrase). Among them, the noun phrases in the knowledge triplet are phrases in the text data. The relational phrases in the knowledge triplet are phrases in the text data. The noun phrase in the knowledge triplet corresponds to at least one candidate entity in the knowledge graph. The relational phrases in the knowledge triplet may correspond to at least one entity relationship in the knowledge graph. A noun phrase can include one word or multiple words. A relational phrase can include one word or multiple words. The noun phrase in the knowledge triplet is the target noun phrase. In other words, among the noun phrases of the text data, the noun phrases that have corresponding candidate entities in the knowledge graph can be used as the target noun phrases. The relational phrases in the knowledge triplet can be used as the target relational phrases. In other words, among the relationship phrases of the text data, the relationship phrases that have corresponding entity relationships in the knowledge graph can be used as the target relationship phrases. Extracting knowledge triples in text data can also be understood as identifying noun phrases and relational phrases that constitute knowledge triples in text data.

Extracting knowledge triples from text based on knowledge graphs can also be called extracting knowledge from texts based on knowledge graphs.

For example, the process of extracting knowledge may be performed by the knowledge extraction module 410 in FIG. 4 .

Figure 6 shows a schematic diagram of knowledge extraction. For example, as shown in Figure 6, knowledge triples in the text data are identified based on the knowledge graph in the medical field. The text is "Vitamin B 12 creates risk for anemia", that is, "Vitamin B12 increases the risk of anemia". The knowledge triplet extracted from this text is (Vitamin B12, creates risk for (+), anemia), where the noun phrase For "Vitamin B12" and "anemia", the related phrase is "creates risk for (+)".

It should be understood that Figure 6 only takes the extraction of one knowledge triplet from the text as an example. In practical applications, there may be multiple knowledge triplets identified from the text. This is not the case in the embodiment of the present application. Make limitations.

It should be noted that the entity relationships corresponding to the relational phrases in the knowledge triplet in the knowledge graph are not necessarily the entity relationships between the candidate entities corresponding to the noun phrases in the knowledge triplet in the knowledge graph. Taking Figure 6 as an example, the knowledge triples extracted from the text are (Vitamin B12, creates risk for (+), anemia). Among them, the noun phrases "Vitamin B12" and "anemia" have correspondences in the knowledge map The candidate entity, the relationship phrase is "creates risk for (+)" and there is a corresponding entity relationship in the knowledge graph. In the knowledge map, the entity relationship between the candidate entities corresponding to "Vitamin B12" and "anemia" in the knowledge map is not necessarily "creates risk for (+)".

Topic nodes may include candidate entities in the knowledge graph corresponding to all target noun phrases in the text data. In other words, the topic node may include candidate entities corresponding to the noun phrases in all knowledge triples extracted from the text data.

Optionally, the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.

A target noun phrase may correspond to one or more candidate entities in the knowledge graph.

For example, the target noun phrase "anemia" can correspond to multiple candidate entities such as "anemia(disease)", "anemia(symptom)" or "anemia(plant)".

In the above solution, the topic node can include all candidate entities corresponding to each target noun phrase in the knowledge graph, and the resulting concept map covers all candidate entities related to the text data and the corresponding entity relationships. This can provide comprehensive and complete knowledge related to the text for subsequent processing and avoid incorrect reasoning paths due to missing partial knowledge.

Optionally, the nodes in the initial concept graph also include neighbor nodes, and the neighbor nodes include neighbor entities in the knowledge graph of the candidate entities corresponding to the target noun phrase.

In other words, the neighbor nodes in the initial concept graph correspond to the neighbor entities of the candidate entities corresponding to the target noun phrase in the knowledge graph. There can be a one-to-one correspondence between neighbor nodes.

The neighbor nodes may include k-hop neighbors of the candidate entities corresponding to the target noun phrase in the knowledge graph. k is a positive integer. For example, k may be a positive integer less than or equal to 3.

For example, the neighbor nodes may include k-hop neighbors of all candidate entities corresponding to the target noun phrase in the knowledge graph.

Neighbor nodes play an important role in knowledge reasoning. The initial concept map in the embodiment of the present application also includes neighbor entities, that is, neighbor nodes, that are connected to the candidate entities (ie, topic nodes) in the knowledge map, which can further provide more comprehensive and complete knowledge and help improve the accuracy of coding at the knowledge level. .

Furthermore, the initial concept map may also include entity relationships corresponding to the target relationship phrases in the knowledge map.

Figure 7 shows a schematic diagram of an initial concept map. As shown in Figure 7, a concept map corresponding to the text data is constructed based on the knowledge map in the medical field and the knowledge triples extracted from the text data. The initial concept map shown in Figure 7 is an initial concept map corresponding to the text shown in Figure 6. The topic nodes "anemia(disease)", "anemia(symptom)" and "anemia(plant)" in Figure 7 are all candidate entities in the knowledge graph corresponding to the target noun phrase "anemia" in the text shown in Figure 6 , the topic node "Vitamin B 12 (chemical)" in Figure 7 is all candidate entities in the knowledge graph corresponding to the target noun phrase "Vitamin B 12" in the text shown in Figure 6. The neighbor nodes in Figure 7 are one-hop neighbor entities of the above candidate entities in the knowledge graph. "Vitamin B12(chemical)", "anemia(disease)", "anemia(symptom)" and "anemia(plant)" are the theme nodes in the concept map. "hemoglobin(biological substance)", "GI bleeding(biologic function)" and "plant(type)" are used as neighbor nodes in the concept map. The edges between nodes in Figure 7 are used to represent the entity relationships between corresponding entities in the knowledge graph.

For example, the correlation between the nodes in the initial concept map and the training text may be determined based on the importance of the nodes. For example, the relevance of a node in the initial concept map to the training text can be the importance of the node. Alternatively, the correlation between the nodes in the initial concept map and the training text can be positively correlated with the importance of the node, that is, the greater the importance of the node, the higher the correlation between the node and the training text.

For example, the importance of a node can be determined by a centrality measurement method (centrality measurement), for example, by a webpage ranking (PageRank) algorithm, degree centrality (degree centrality), etc. However, these methods have certain limitations and cannot accurately represent the importance of nodes.

For example, the importance of a node may be determined by the feature vector centrality of the node. For example, the importance of a node can be the eigenvector centrality of the node. Alternatively, the importance of a node can be positively correlated with the eigenvector centrality of the node.

Optionally, the correlation between each topic node in the initial concept map and the training text can be the feature vector centrality of each topic node.

Further, the correlation between each topic node in the initial concept map and the training text can be the feature vector centrality of each topic node on the topic correlation graph. The nodes in the topic related graph are all topic nodes. The feature vector centrality of each topic node is determined based on the initial importance of each topic node and the weight of the edges in the topic correlation graph. The initial importance of a node in the topic related graph is set based on the probability of the node appearing in the facts recorded in the knowledge graph. The edges in the topic related graph are connected based on the entity relationships between candidate entities in the knowledge graph. The weight of an edge in a topic related graph is determined based on the number of entity relationships between the candidate entities corresponding to the two topic nodes connected by the edge in the knowledge graph, that is, the number of edges between candidate entities.

For example, node C and node D in the topic related graph are respectively candidate entity C and candidate entity D in the knowledge graph. There are n edges between candidate entity C and candidate entity D in the knowledge graph, then in the topic related graph An edge with weight n is constructed between node C and node D in . n is a positive integer.

FIG. 8 shows a topic correlation graph provided by an embodiment of the present application. The nodes in the topic correlation graph are the topic nodes in FIG. 7 .

In the embodiment of this application, feature vector centrality is used to calculate the correlation between the topic node and the training text, the importance of a node is determined based on the importance of its neighbor nodes, and the initial importance of a node is based on the node's knowledge The probability setting that appears in the facts recorded in the graph can more accurately reflect the importance of the node.

Further, the correlation between each neighbor node in the initial concept map and the training text can be the score of the strongly connected branch where each neighbor node on the information propagation graph corresponding to the initial concept map is located. The nodes in the information dissemination graph corresponding to the initial concept map are the nodes in the initial concept map. When the first node in the initial concept graph is a 1-hop neighbor of the second node, there is a path from the second node to the first node in the information propagation graph. A directed edge of a node.

Figure 9 shows an information propagation diagram provided by an embodiment of the present application. The information propagation diagram shown in Figure 9 is the information propagation diagram corresponding to the initial concept map shown in Figure 7, or in other words, the training text shown in Figure 6 The corresponding information dissemination diagram.

Each concept map can correspond to an information dissemination map. The information propagation graph is a directed graph. The edges in a directed graph are directed edges, that is, the edges in a directed graph have directionality. The nodes in the information dissemination diagram corresponding to the concept map are the nodes in the concept map. If and only if node u in the concept graph is a 1-hop neighbor of node v, a directed edge from node v to node u is constructed between node v and node u in the information propagation graph. Node u may be the first node, and node v may be the second node.

For example, the score of each strongly connected branch on the information propagation graph may be obtained by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected branch according to topological sorting. The initial score of each strongly connected branch on the information propagation graph may be determined based on the maximum value of the importance of the nodes in each strongly connected branch. For example, the initial score of each strongly connected branch on the information propagation graph may be the maximum value of the importance of the nodes in each strongly connected branch.

Topological sorting can be obtained through deep optimization search of strongly connected branches. The topological sorting results can be understood as the depth-first search results of strongly connected branches. "Propagation" can be understood as passing the score of the previous strongly connected branch to the subsequent strongly connected score according to topological sorting, or in other words, updating the score of the downstream strongly connected branch to the score of the upstream strongly connected branch.

For example, the topological sorting can be {C ₁ , C ₂ , C ₃ }, where C ₁ , C ₂ , and C ₃ respectively represent three strongly connected branches. "Propagation" can be understood as updating the score of C ₂ to the score of C ₁ , and updating the score of C ₃ to the score of C ₂ .

In step 540, RGAT is trained to learn the knowledge encoding expression corresponding to the training text.

For example, the training process of RGAT can be carried out according to the following process.

1) Input the concept map into RGAT for processing to obtain the predictive knowledge encoding of the training text output by RGAT.

2) Determine the prediction results of the training text based on the text encoding and prediction knowledge encoding of the training text.

3) Adjust the parameters of RGAT based on the prediction results of the training text.

4) Use the adjusted RGAT as the RGAT in step 1), and repeat the above steps 1) to 3) for iterative training. In other words, the RGAT in the next iteration process is the adjusted RGAT in the current iteration process.

Step 1) to step 3) can be regarded as an iterative process.

Illustratively, step 1) may be performed by the knowledge encoding module 430 in FIG. 4 .

The knowledge encoding of the training text can also be called the encoding of the knowledge level of the training text. Knowledge encoding can be expressed as knowledge embedding vector or knowledge feature vector, etc.

For example, the knowledge encoding of the training text may include encoding of nodes in the concept graph and encoding of edges in the concept graph.

For example, the knowledge encoding of the training text may include embedding vectors of nodes and embedding vectors of edges in the concept graph.

In step 2), the text encoding of the training text can also be called the text-level encoding of the training text. Text-level encoding refers to low-dimensional space vectors used to express text content and text arrangement sequences in text data.

Text encoding can be represented as text embedding vectors or text feature vectors.

Exemplarily, the text encoding of the training text may include text encoding of the training text sequence and text encoding of the target phrases in the training text. The target phrase may include a target noun phrase and a target relative phrase. The text encoding of the training text sequence can also be called the text encoding of the training text itself.

Specifically, the training text can be processed through a pre-trained language model to obtain the text encoding of the training text.

For example, it can be represented by a bidirectional encoder representation from transformers (BERT) model, a bidirectional gating recurrent unit (BiGRU) or a bidirectional long short-term memory (bi-directional long short memory). -term memory, BiLSTM) model and other methods process the training text to obtain the text encoding of the training text.

Exemplarily, the process of obtaining the text encoding of the training text may be performed by the text encoding module 420 in FIG. 4 .

For example, step 2) may be performed by the task processing module 440 in FIG. 4 .

The type of prediction result of the training text is related to the type of text processing task, that is, related to the type of downstream task.

Illustratively, the method 500 is used for text classification tasks, in which case the prediction result of the training text may be the predicted category of the training text. For example, the text encoding and prediction knowledge encoding of the training text are input into the classifier to obtain the predicted category of the training text. Another example is to fuse the text encoding of the training text and the prediction knowledge encoding, and input the fusion result into the classifier to obtain the prediction category of the training text. The fusion method can be vector splicing of the text encoding of the training text and the prediction knowledge encoding to obtain the text fusion encoding. The classifier can be a softmax function.

In step 3), the parameters of RGAT can be adjusted with the goal of reducing the gap between the target processing results and prediction results of the training text.

Taking the text classification task as an example, in step 3), the parameters of RGAT can be adjusted with the goal of reducing the gap between the label of the training text and the predicted category of the training text. The label of the training text is the target processing result of the training text. The label of the training text is used to indicate the true value of the category corresponding to the training text, that is, the true category of the training text.

It should be understood that the above are only examples, and the method 500 can also be used for other natural language processing tasks, such as multi-hop reasoning question and answer tasks. The embodiments of the present application do not limit this.

In the first iteration process, the concept map in step 1) can be the initial concept map.

The first concept map in the i+1 iteration can be understood as the concept map input to RGAT during the i+1 iteration. In other words, in the i+1 iteration process, RGAT performs forward propagation and back propagation based on the first concept map.

The second concept map in the i-th iteration can be understood as the concept map input to RGAT during the i-th iteration. In the i-th iteration process, RGAT performs forward propagation and back propagation based on the second concept map.

The i-th iteration can be any iteration in the RGAT training process.

It should be noted that the "first" in the "first concept map" and the "second" in the second concept map are only used to distinguish the input data of RGAT in the two iteration processes and have no other limiting effect.

In the embodiment of the present application, the concept map can be continuously optimized during the iterative training of RGAT, and the concept map in each iteration process can be different. Taking the above training process as an example, in different iterative processes, the concept map in step 1) can be different.

For example, in the process of RGAT learning the expression of the concept map, that is, during the training process of RGAT, the concept map in each iteration can be based on the correlation between the nodes in the concept map in the previous iteration and the training text. And the weights of the edges in the concept map in the previous iteration are determined.

During the iterative training process of RGAT, the concept map is optimized. The direction of optimization can be understood as pruning nodes and/or edges with small weights that are less relevant to the training text, and retaining those that are more relevant to the training text. Nodes and/or edges with larger weights are retained.

The weight of the edge in the concept map can also be called the weight of the facts in the concept map, that is, the attention weight.

In a possible implementation, the first concept map belongs to the set of first subgraphs of the initial concept map, the first cost of the first subgraph is less than or equal to the threshold, and the benefit of the first concept map is is greater than or equal to the benefit of the other first subgraphs in the set of the first subgraph, the first cost of the first subgraph is determined based on the first cost of the edge within the first subgraph, and the benefit of the first concept graph is The income of the first subgraph is determined based on the income of the edge in the first concept graph. The income of the edge is positively correlated with the weight of the edge in the i-th iteration process. , there is a negative correlation between the first consumption of an edge and the correlation between the two nodes connected by the edge and the training text.

Among the subgraphs of the initial concept map, any subgraph whose first consumption is less than or equal to the threshold can be called the first subgraph. The set of first subgraphs is the set of subgraphs whose first consumption is less than or equal to the threshold. The first concept map is an element in the set. In other words, the first concept map is a first subgraph.

In other words, among the set of subgraphs whose first consumption of the initial concept map is less than or equal to the threshold, the first subgraph with the greatest benefit is used as the first concept map. In other words, by selecting a subgraph of the initial concept map and maximizing the revenue of the subgraph when the first consumption of the subgraph is less than or equal to the threshold, the first concept map can be obtained.

The higher the correlation between the two nodes connected by an edge and the training text, the smaller the first cost of the edge. The lower the correlation between the two nodes connected by an edge and the training text, the greater the first consumption of the edge. For example, the higher the average correlation between the two nodes connected by an edge and the training text, the smaller the first cost of the edge. For another example, the lower the average correlation between the two nodes connected by an edge and the training text, the greater the first consumption of the edge.

For example, the first consumption of the subgraph may be determined based on the first consumption of all edges within the subgraph. For example, the first cost of a subgraph can be the sum of the first costs of all edges within the subgraph. For another example, the first cost of the subgraph may be the average of the first costs of all edges in the subgraph.

The greater the weight of an edge during the i-th iteration, the greater the benefit of the edge. The smaller the weight of an edge during the i-th iteration, the smaller the benefit of the edge.

For example, the benefit of a subgraph may be determined based on the benefits of all edges within the subgraph. For example, the payoff of a subgraph can be the sum of the payoffs of all edges within the subgraph. For another example, the revenue of a subgraph can be the average of the revenue of all edges in the subgraph.

For example, the first concept map in the i+1 iteration can be obtained by optimizing the second concept map in the i iteration, and the optimized concept map can be the concept map in the i iteration. subplot. In other words, the first concept map may be a subgraph of the second concept map.

Specifically, the second concept map is optimized according to the correlation between the nodes in the second concept map and the training text and the weight of the edges in the second concept map in the i-th iteration to obtain the value in the i+1 iteration. ’s first concept map.

Optionally, the ratio between the benefit of the first side and the first consumption of the first side is less than or equal to the ratio between the benefit of the second side and the first consumption of the second side. The first side belongs to the second concept map, and the first side does not belong to the first concept map, and the second side belongs to the first concept map. The benefit of an edge is positively related to the weight of the edge in the i-th iteration. The first consumption of an edge is negatively correlated with the correlation between the two nodes connected by the edge and the training text. The first consumption of the first concept map is less than or equal to the threshold. The sum of the first cost of the first concept map and the first cost of the first edge is greater than the threshold.

The second side belongs to the second concept map, and the second side also belongs to the first concept map.

The second side is any side in the first concept map. The first side is any side in the second concept map that does not belong to the first concept map. In other words, in the second concept map, the ratio between the revenue and the first consumption of any edge that does not belong to the first concept map is less than or equal to the ratio between the revenue and the first consumption of any side of the first concept map.

The greater the weight of the edge in the i-th iteration, the greater the benefit of the edge. The higher the correlation between the two nodes connected by an edge and the training text, the smaller the first cost of the edge. For example, the higher the average correlation between the two nodes connected by an edge and the training text, the smaller the first cost of the edge.

For example, the first consumption of the first concept graph is less than or equal to the threshold, which may be that the sum of the first consumption of all edges in the first concept graph is less than or equal to the threshold.

Alternatively, the first consumption of the first concept graph is less than or equal to the threshold, which may be that the average of the first consumption of all edges in the first concept graph is less than or equal to the threshold.

For example, before the i+1 iteration starts, the first concept map in the i+1 iteration can be determined through the following steps.

S11: Obtain the first consumption and benefit of the edge in the second concept graph in the i-th iteration.

S12: Select the edges in the second concept map as the edges of the first concept map in descending order according to the ratio between the edge income and the first consumption, until the sum of the first consumption of the selected edges is greater than threshold. The first consumption of the first concept map is less than or equal to the threshold.

In other words, the ratio between the revenue and the first consumption of any side of the first concept map is greater than or equal to the ratio between the revenue and the first consumption of the edge in the first concept map that does not belong to the second concept map.

For example, the threshold may be a tolerance value for the number of uncertain edges. The threshold W may be determined based on the tolerance ratio θ and the number N of uncertain edges in the initial concept map. For example, the tolerance threshold may be W=θN. 0＜θ＜1. Uncertain edges refer to edges with weights less than 1 in the initial concept graph. N is less than or equal to the number of edges in the initial concept graph. N is a positive integer.

For example, if the number of uncertain edges N in the initial concept map is 60, the tolerance ratio θ is 0.5, then the threshold W is 30. In this case, step 2) can be understood as selecting the edges in the second concept map as the edges of the first concept map in descending order according to the ratio between the edge's income and the first consumption until it has been The sum of the first costs of the selected edges reaches 30. The first consumption of the edge may be determined based on the average correlation between the two nodes connected by the edge and the training text.

It should be understood that the above are only examples, and the concept map can also be optimized in other ways to retain the edges with larger weights and nodes with greater correlation with the training text in the concept map to reduce the model's knowledge of small correlation with the text. Focus on strengthening the model's attention to knowledge with high text relevance.

In a possible implementation, the first concept map is determined based on a first subset of a set of connected subgraphs of the initial concept map, and the second consumption of the first subset is less than or equal to the second consumption of the second subset. consumption. The second cost of the first subset is determined based on the second cost of the edges within the connected subgraph of the first subset. The second cost of the second subset is determined based on the second cost of the edges within the connected subgraph of the second subset. There is a negative correlation between the second consumption of an edge and the correlation between the two nodes connected by the edge and the training text, and there is a negative correlation between the second consumption of the edge and the weight of the edge in the i-th iteration process.

Exemplarily, the first subset includes at least one candidate entity corresponding to the target noun phrase, and the second subset includes at least one candidate entity corresponding to the target noun phrase.

A subset of the set of connected subgraphs includes at least one candidate entity corresponding to the target noun phrase. It can be understood that at least one candidate entity corresponding to the target noun phrase exists on at least one connected subgraph in the subset. Candidate entities corresponding to different target noun phrases may exist on different connected subgraphs in the subset, or may exist on the same connected subgraph in the subset.

In other words, among the subsets of the set of connected subgraphs of the initial concept map, the subset that contains at least one candidate entity corresponding to all target noun phrases and has the smallest second consumption will be regarded as the first subset. In other words, select a subset of the set of connected subgraphs of the initial concept map so that all target noun phrases have at least one corresponding candidate entity in the subset, and the second consumption of the subset is to include all target nouns The second smallest consumption among the subsets of at least one candidate entity corresponding to the phrase is obtained, that is, the first subset is obtained, or in other words, the first concept map is obtained.

Exemplarily, the first subset includes at least one entity relationship corresponding to the target relationship phrase, and the second subset includes at least one entity relationship corresponding to the target relationship phrase.

A subset of the set of connected subgraphs includes at least one entity relationship corresponding to the target relationship phrase. It can be understood that at least one entity relationship corresponding to the target relationship phrase exists on at least one connected subgraph in the subset. Entity relationships corresponding to different target relationship phrases may exist on different connected subgraphs within the subset, or may exist on the same connected subgraph within the subset.

In other words, among the subsets of the set of connected subgraphs of the initial concept map, the subset that contains at least one entity relationship corresponding to all target relationship phrases and has the smallest second consumption is regarded as the first subset. In other words, select a subset of the set of connected subgraphs of the initial concept map so that all target relationship phrases have at least one corresponding entity relationship existing in the subset, and the second consumption of the subset is to include all target relationships. Among the subsets of at least one entity relationship corresponding to the phrase, the second smallest consumption is obtained, that is, the first subset is obtained, or in other words, the first concept map is obtained.

Exemplarily, the first subset includes at least one candidate entity corresponding to the target noun phrase and at least one entity relationship corresponding to the target relation phrase. The second subset includes at least one candidate entity corresponding to the target noun phrase and at least one entity relationship corresponding to the target relation phrase.

In other words, in the subset of the set of connected subgraphs of the initial concept map, the subset that contains at least one candidate entity corresponding to all target noun phrases and at least one entity relationship corresponding to all target relation phrases, and has the smallest second consumption is taken as First subset. In other words, select a subset of the set of connected subgraphs of the initial concept map so that all target relation phrases have at least one corresponding entity relationship existing in the subset, so that all target noun phrases have at least one corresponding candidate The entity exists in the subset, and the second consumption of the subset is the smallest second consumption among the subsets containing at least one entity relationship corresponding to all target relation phrases and at least one candidate entity corresponding to all target noun phrases, that is, the second consumption is obtained A subset, or in other words, the first concept map is obtained.

The higher the correlation between the two nodes connected by an edge and the training text, the greater the weight of an edge in the i-th iteration process, and the smaller the second consumption of the edge. The lower the correlation between the two nodes connected by an edge and the training text, the smaller the weight of an edge in the i-th iteration process, and the greater the second consumption of the edge. For example, the higher the average correlation between the two nodes connected by an edge and the training text, the smaller the second cost of the edge. For another example, the lower the average correlation between the two nodes connected by an edge and the training text, the greater the second cost of the edge.

Exemplarily, the second cost of the first subset may be the sum of the second costs according to all edges within all connected subgraphs within the first subset. The second cost of the second subset may be the sum of the second costs according to all edges within all connected subgraphs within the second subset.

As mentioned above, the first concept map may be a subgraph of the second concept map.

Optionally, the second cost of the first connected subgraph is greater than or equal to the second cost of the second connected subgraph, the first connected subgraph belongs to the second concept graph in the i-th iteration, and the first connected subgraph does not Belonging to the first concept graph in the i+1 iteration, the second connected subgraph is the second most expensive connected subgraph in the first concept graph. The second cost of the first connected subgraph is determined based on the second cost of the edges within the first connected subgraph, and the second cost of the second connected subgraph is determined based on the second cost of the edges within the second connected subgraph. , the second consumption of the edge is negatively correlated with the weight of the edge in the i-th iteration, and the second consumption of the edge is negatively correlated with the correlation between the two nodes connected by the edge and the training text. The nodes of the first concept map include at least one candidate entity corresponding to the target noun phrase, the nodes of the second connected subgraph include at least one candidate entity corresponding to the first noun phrase in the target noun phrase, and other connected subgraphs in the first concept map The nodes of the graph do not include at least one candidate entity corresponding to the first noun phrase.

The second connected subgraph belongs to the first concept map, and accordingly, the second connected subgraph belongs to the second concept map.

The first connected subgraph is any connected subgraph in the second concept map that does not belong to the first concept map. In other words, the second consumption of any connected subgraph in the second concept map that does not belong to the second concept map is greater than or equal to the second consumption of any connected subgraph of the second concept map.

For example, the second cost of the connected subgraph may be the sum of the second costs of all edges in the connected subgraph.

Alternatively, the second cost of the connected subgraph is the average of the second costs of all edges in the connected subgraph.

For example, before the i+1th iteration starts, the concept map in the i+1th iteration can be determined through the following steps.

S21: Obtain the second consumption of the connected subgraph in the second concept graph in the i-th iteration.

S22, sequentially select the connected subgraphs in the second concept map as the connected subgraphs of the first concept map according to the order of the second consumption of the connected subgraphs from small to large, until all the selected connected subgraphs include the target nouns of the training text. At least one candidate entity corresponding to the phrase.

For example, the initial subset is set as an empty set, and the connected subgraphs in the second concept map are selected and added to the subset in order of the second consumption of the connected subgraph from small to large, until the current subset includes the training text. At least one candidate entity corresponding to the target noun phrase. The first concept map can be determined based on the current subset. The second connected subgraph is the last connected subgraph added to the subset. The first connected subgraph may be any connected subgraph that is not added to the second concept graph of the subset.

It should be understood that the above are examples only. For example, "at least one candidate entity corresponding to the target noun phrase" in the above step S22 can also be replaced with "at least one entity relationship corresponding to the target relation phrase".

The target RGAT is the trained RGAT. The target RGAT can be used to obtain the feature vector of the graph structure data input to the target RGAT. Input the concept map of the text to be processed into the target RGAT, and the output data can be used as an embedded expression of the concept map of the text to be processed, that is, the encoding of the knowledge level of the text to be processed.

According to the solution of the embodiment of the present application, in the process of RGAT learning expressions through iterative messages between entity nodes, that is, during the training process, the concept graph is optimized according to the correlation between the entity nodes and the text and the weight of the edges. For example, the training text is Pruning nodes with smaller correlations and/or edges with smaller weights will help reduce the model’s attention to knowledge that is less relevant to the text, and increase its focus on knowledge that is more relevant to the text, thus Improving the expressive ability of RGAT will help improve the accuracy of downstream tasks.

In addition, in the solution of the embodiment of the present application, the nodes in the initial concept map can include all candidate entities corresponding to the target noun phrase. This is beneficial to ensuring that no text-related knowledge is omitted in the concept map, and ensures that The completeness of knowledge, using the solution of the embodiment of the present application to learn the knowledge-level expression of the text, further ensures the accuracy of downstream tasks and avoids incorrect reasoning paths due to missing part of the knowledge.

In addition, in the solution of the embodiment of the present application, the nodes in the initial concept map may also include k-hop neighbor entities of each candidate entity, which can further improve the integrity of the knowledge in the concept map.

FIG. 10 shows a training method 800 for a text processing model provided by an embodiment of the present application. The text processing model may be the text processing model in FIG. 4 . Method 800 can be regarded as a specific implementation of method 500. For simplicity of description, part of the description is appropriately omitted when describing the method 800.

Method 800 includes steps 810 to 850.

810. Extract knowledge from the training text.

For example, step 810 may be performed by the knowledge extraction module 410 in FIG. 4 .

Illustratively, the knowledge graph is a knowledge graph of a business domain related to text data. In order to facilitate understanding and description, method 800 will be described below by taking the business field as the medical field as an example, which does not limit the solutions of the embodiments of the present application.

As shown in Figure 8, step 810 may include: identifying the knowledge triplet _Td in the training text _d based on the knowledge graph to obtain the noun phrase Md and the relational phrase _Pd in the knowledge triplet, that is, in the training text The target noun phrase and the target relational phrase are shown in Figure 6.

820. Generate text-level encoding of the training text through the natural language pre-training model.

Illustratively, step 820 may be performed by text encoding module 420 in FIG. 4 .

The text-level coding of the training text may include the text-level coding of each noun phrase M _d in the training text d, the text-level coding of each relational phrase P _d , and the text-level coding of the training text sequence.

830. Process the extracted knowledge through RGAT to generate predictive coding at the knowledge level of the training text.

Illustratively, step 830 may be performed by the knowledge encoding module 430 in FIG. 4 .

In a possible implementation, the text-level coding of the training text can be used as an input to RGAT and participate in the process of generating knowledge-level predictive coding.

840. Prediction results are obtained based on text-level coding and knowledge-level predictive coding.

For example, step 840 may be performed by the task processing module 440 in FIG. 4 .

For example, the downstream task is text classification task. Step 840 may include vector splicing text-level coding and knowledge-level predictive coding to obtain predictive text fusion coding. Input the predictive text fusion code into the classifier to obtain the predictive classification result of the training text.

850, perform iterative training on RGAT based on the prediction result.

Before iterative training, that is, before step 830, an initial concept map of the training text may be constructed, and the correlation between the nodes in the initial concept map and the training text may be calculated.

For example, all candidate entities corresponding to each noun phrase M _d in the training text are located from the knowledge graph, and k-hop neighbor entities of each candidate entity are located as nodes of the initial concept graph. The nodes in the initial concept map are connected according to the entity relationship between each pair of entities recorded in the knowledge map to obtain the initial concept map. This initial concept map contains complete knowledge.

Specifically, the following steps can be used to construct an initial concept map and calculate the correlation between the nodes in the initial concept map and the training text.

(1) Locate the topic node

For each noun phrase M _d obtained in step 810, all candidate entities corresponding to the noun phrase in the knowledge graph are located as topic nodes.

The topic nodes are used as nodes in the topic related graph, and the corresponding topic nodes on the topic related graph are connected based on the entity relationships between candidate entities recorded in the knowledge graph to obtain edges in the topic related graph. Specifically, for a pair of nodes in the topic correlation graph, if there are n edges between the candidate entities corresponding to the pair of nodes in the knowledge graph, then the weight of the edge between the pair of topic nodes on the topic correlation graph is n. .

(2) Calculate the correlation between topic nodes and training text

The initial relevance of each topic node is set based on the probability of each topic node appearing in the facts recorded in the knowledge graph.

The feature vector centrality of each topic node is calculated on the topic correlation graph to obtain the importance of each topic node. The importance of each topic node is taken as the correlation between the topic node and the training text.

(3) Locate neighbor nodes

For each topic node, locate the k-hop neighbor entities of the topic node in the knowledge graph as neighbor nodes.

Construct an information propagation graph based on the progressive relationship of hop layers.

(4) Locate strongly connected branches

Locate all strongly connected branches on the information propagation graph, and calculate the initial score of each strongly connected branch based on the maximum importance of the nodes in each strongly connected branch. The time complexity is O(V,E). Among them, V represents the set of nodes on the information propagation graph. E represents the set of edges on the information propagation graph.

(5) Calculate the correlation between neighbor nodes and training text

According to the topological sorting result, the initial score of the strongly connected branch where the topic node is located is propagated to the downstream strongly connected branches along the topological sorting result, thereby updating the score of each strongly connected branch.

The updated score of the strongly connected branch where each neighbor node is located is used as the correlation between the neighbor node and the training text.

(6)Construct a concept map

The topic nodes and neighbor nodes are used as nodes in the initial concept map, and the corresponding nodes are connected according to the entity relationships recorded in the knowledge map to obtain the initial concept map, as shown in Figure 7.

During the iterative training process, that is, in step 850, the concept map is optimized.

For example, in the process of learning the expression of nodes through iterative message passing between nodes based on RGAT, before each round of iteration, according to the correlation between the node and the training text and the weight of the edge in the forward propagation of RGAT, Reduce RGAT's focus on knowledge that is less relevant to the text, and strengthen RGAT's focus on knowledge that is more relevant to the text.

The following is an illustrative explanation of the optimization method of the concept map during the iterative process through Example 1 or Example 2.

Example 1

Define the optimization problem to be solved: select a subgraph of the initial concept graph so that the sum of the first costs of all edges in the subgraph is less than or equal to the threshold, and the sum of the benefits of all edges in the subgraph is maximized. The solution to this optimization problem is the optimized concept map.

(1) Before each iteration, obtain the first consumption and benefit of each edge in the current concept map.

For example, the lower the average correlation between the two nodes connected to an edge and the training text, the greater the first consumption of the edge.

For example, in the previous iteration process, the higher the weight of an edge, the greater the benefit of the edge.

(2) Use the approximate algorithm to solve the above optimization problem to obtain the optimized concept map.

Specifically, the edge with the largest ratio between the revenue and the first consumption in the current concept map is selected first, and stops until the sum of the first consumption of the selected edges reaches the threshold. In this way, the solution of the optimization plan can be obtained under the linear time complexity of O(|E|*log|E|), that is, an optimized concept map. The optimized concept map is the concept map in the next iteration.

Example 2

Define the optimization problem to be solved: select a subset of the set of connected subgraphs of the initial concept graph so that for each noun phrase M _d or relational phrase P _d in the training text, there is at least one corresponding node or edge included in In this subset, the first cost of the connected subgraph in this subset is the smallest among all subset selection methods. The solution to this optimization problem is the optimized concept map.

(1) Before each iteration, obtain the second cost of each edge in the current concept map.

Calculate the second cost for each connected subgraph in the concept graph. Illustratively, the second cost of the connected subgraph is the average of the second costs of all edges within the connected subgraph.

For example, the lower the average correlation between the two nodes connected to an edge and the training text, and the lower the weight of the edge in the previous iteration process, the greater the second consumption of the edge.

Specifically, the initial state of the subset is the state of the empty set, and the connected subgraph with the second smallest consumption in the current concept map is added to the subset first, until every noun phrase M _d or relational phrase P in the training text _d , there is at least one corresponding node or edge included in the subset. In this way, an approximate solution to the optimization problem can be obtained under O(|V|+|E|) linear time complexity, that is, an optimized concept map. The optimized concept map is the concept map in the next iteration.

Figure 11 shows a schematic flowchart of a text processing method 900 provided by an embodiment of the present application. The method can be executed by a device or device capable of text processing. For example, the device can be a cloud service device or a terminal. Equipment, for example, computers, servers and other devices with sufficient computing power to perform text processing methods, may also be a system composed of cloud service equipment and terminal equipment. Exemplarily, the method 900 may be executed by the execution device 210 in FIG. 2 or the execution device 110 in FIG. 1 or a local device.

For example, the method 900 may be specifically executed by the execution device 210 as shown in FIG. 2 , and the text to be processed in the method 900 may be input data provided by the client device 240 as shown in FIG. 2 .

The model used in the text processing method 900 in Figure 11 can be constructed by the method in Figure 5 or Figure 10 described above. Relevant descriptions may refer to the aforementioned method 500 or method 800. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing method 900 below.

The method 900 includes steps 910 to 960, which are described below.

910, get the text to be processed.

920, obtain the knowledge graph.

930. Determine the text encoding of the text to be processed.

940. Determine the concept map of the text to be processed based on the knowledge map.

950. Input the concept map into the target RGAT for processing to obtain the knowledge encoding of the text to be processed.

960. Determine the processing result of the text to be processed based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed.

Among them, the target RGAT is obtained by inputting the initial concept map of the training text into the RGAT for training. During the training process, the first concept map in the i+1 iteration is based on the second concept map in the i iteration. The correlation between the nodes in and the training text and the weight of the edges in the second concept graph are determined, i is a positive integer. The first concept map is a subgraph of the initial concept map, and the second concept map is a subgraph of the initial concept map. The nodes in the initial concept graph include topic nodes, where the topic nodes include candidate entities in the knowledge graph corresponding to the target noun phrases in the training text, and the edges between the nodes in the initial concept graph are used to represent the initial concept graph. Entity relationships between nodes in .

It should be understood that the step numbers in the embodiments of the present application are only used for convenience of description and do not limit the execution order of the steps.

In step 940, the topic nodes in the concept map of the text to be processed may include candidate entities in the knowledge graph corresponding to the target noun phrase in the text to be processed, and edges between nodes in the concept map of the text to be processed Used to represent the entity relationships between nodes in the concept map.

Optionally, the initial concept graph also includes neighbor nodes, and the neighbor nodes include neighbor entities in the knowledge graph of the candidate entities corresponding to the target noun phrases in the training text.

Correspondingly, the neighbor nodes in the concept map of the text to be processed may include neighbor entities in the knowledge map of the candidate entities corresponding to the target noun phrases in the text to be processed.

Optionally, the topic nodes in the initial concept graph include all candidate entities in the knowledge graph corresponding to the target noun phrase in the training text.

The topic nodes in the concept map of the text to be processed may include all candidate entities in the knowledge map that correspond to the target noun phrases in the text to be processed.

Optionally, the first concept graph in the i+1 iteration is determined based on the correlation between the nodes in the second concept graph in the i-th iteration and the training text and the weight of the edges in the second concept graph, It includes: selecting the edge in the second concept map as the edge in the first concept map in descending order of the ratio between the income of the edge in the second concept map and the first consumption, until the edge of the selected edge is The sum of the first consumption is greater than the threshold. The income of the edge in the second concept graph is positively correlated with the weight of the edge in the i-th iteration. The first consumption of the edge in the second concept graph is related to the two nodes connected by the edge. There is a negative correlation with the relevance of the training text.

Optionally, the first concept graph in the i+1 iteration is determined based on the correlation between the nodes in the second concept graph in the i-th iteration and the training text and the weight of the edges in the second concept graph, Including: selecting the connected subgraph in the second concept map as the connected subgraph of the first concept map in order from small to large in the second cost of the connected subgraph in the second concept map, until all connected subgraphs are selected Include at least one candidate entity corresponding to the target noun phrase.

Optionally, the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic related graph. The nodes in the topic related graph include topic nodes, and the weight of the edge in the topic related graph is based on the edge. The number of entity relationships between the corresponding entities in the knowledge graph between the two connected nodes is determined.

Optionally, the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph. The nodes in the information propagation graph include the nodes in the initial concept map, and the nodes in the initial concept map are When a node is a one-hop neighbor of a second node, there is a directed edge from the second node to the first node between the second node and the first node in the information propagation graph.

Optionally, the score of the strongly connected branch on the information propagation graph is obtained by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected branch according to topological sorting. The initial score of the strongly connected branch where the topic node is located is It is determined based on the maximum importance of the nodes in the strongly connected branch where the topic node is located.

Optionally, the method 900 further includes: outputting a knowledge path (knowledge path) in the concept map of the text to be processed based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed, and the knowledge path is used to indicate the processing result. basis for judgment.

The knowledge path can improve the interpretability of the model, provide users with a basis for judging processing results, and help improve users' trust. Moreover, the concept map in the solution of the embodiment of the present application has comprehensive and accurate knowledge, which is conducive to ensuring the integrity and accuracy of the knowledge path.

Optionally, the weight of the knowledge path is determined based on the attention weight of the triplet in the knowledge path.

The attention weight of a triplet is used to indicate the importance of the triplet in the reasoning process of RGAT. The attention weight of the knowledge path is used to determine the importance of the knowledge path.

For example, the weight of the knowledge path may be the average of the attention weights of the triples in the knowledge path.

For example, during the reasoning process of RGAT, the expression of node i at the l+1th layer

The following formula can be satisfied:

Among them, ψ represents the set of all triples in the knowledge graph. σ represents the activation function. For example, the activation function can be a binary step function (binary step function), a linear activation function (liner activation function), a Sigmoid function, a rectified linear unit (ReLU) or a leaky ReLU. (LeakyReLU) etc. In the embodiment of this application, LeakyReLU can be used as the activation function.

Represents the expression of node i’s neighbor node j in the lth layer.

Indicates the importance of node j to node i in relationship r. The attention weight of the triplet (j, r, i) is

R' represents a set of relations. l is an integer greater than or equal to 0.

Figure 12 shows a schematic flow chart of a text classification method according to an embodiment of the present application. The method shown in Figure 12 can be regarded as a specific implementation of the method shown in Figure 11. For simplicity of description, part of the description is appropriately omitted when describing the method 1000.

Method 1000 includes steps 1010 to 1040.

1010. Extract knowledge from the text to be processed.

For example, step 1010 may be performed by the knowledge extraction module 410 in FIG. 4 .

Illustratively, the knowledge graph is a knowledge graph of a business domain related to text data. In order to facilitate understanding and description, method 1010 will be described below by taking the business field as the medical field as an example, which does not limit the solutions of the embodiments of the present application.

As shown in Figure 10, step 1010 may include: identifying the knowledge triplet T _d in the text d to be processed based on the knowledge graph to obtain the noun phrase M _d and the relational phrase P _d in the knowledge triplet, that is, to be processed The target noun phrase and the target relative phrase in the text.

1020. Generate text-level encoding of the text to be processed.

Illustratively, step 1020 may be performed by text encoding module 420 in FIG. 4 .

The text-level coding of the text to be processed may include the text-level coding of each noun phrase M _d in the text d to be processed, the text-level coding of each relational phrase P _d , and the text-level coding of the text sequence to be processed. encoding.

1030. Generate knowledge-level encoding of the text to be processed through the target RGAT.

In a possible implementation, the text-level encoding of the text to be processed can participate in the process of the target RGAT generating the knowledge-level encoding of the text to be processed.

The target RGAT used in Figure 10 can be trained by the method 800 shown in Figure 10. For the specific training method, please refer to the description of the method 800, which will not be described again here.

Illustratively, step 1030 may be performed by the knowledge encoding module 430 in FIG. 4 .

For example, all candidate entities corresponding to each noun phrase M _d in the text to be processed are located from the knowledge graph, and k-hop neighbor entities of each candidate entity are located as nodes of the concept map of the text to be processed. The nodes in the concept map of the text to be processed are connected according to the entity relationship between each pair of entities recorded in the knowledge map to obtain the concept map of the text to be processed. The concept map of the text to be processed contains complete knowledge.

The specific construction method of the concept map of the text to be processed can refer to the construction method of the initial concept map of the training text in the previous article. As long as the training text in the relevant description is replaced with the text to be processed, it will not be described again here.

1040. Obtain prediction classification results based on text-level coding and knowledge-level coding.

For example, step 1040 may be performed by the task processing module 440 in FIG. 4 .

Step 1040 may include vector splicing text-level coding and knowledge-level coding to obtain text fusion coding. Input the text fusion code into the classifier to obtain the classification result of the text to be processed.

Figure 13 shows a schematic diagram of a text classification result according to an embodiment of the present application. As shown in Figure 13, the main text of the text is "...diabetes has become an epidemic, and the number of patients with type 2 diabetes is increasing at an alarming rate. We know that controlling diet and Western lifestyle can lead to type 2 diabetes and cardiovascular disease...". Through the solutions of the embodiments of this application, it is determined that the text contains false information. The primary basis for judgment is the knowledge path obtained from the concept map ('diet', 'reducesRiskFor', 'atherosclerosis', 'causes', 'cardiovascular diseases'), namely (control diet, reduce risk (-), arteriosclerosis, Causes (+), cardiovascular disease), with a weight of 0.99998. The secondary judgment basis is the knowledge path ('diet', 'alleviates', 'diabetes') obtained from the concept map, that is (controlling diet, relieving (-), diabetes), with a weight of 0.57651. These two knowledge paths are contrary to the text semantics of "controlling diet can lead to type 2 diabetes and cardiovascular disease", and the text is judged to contain wrong information. The solution of the embodiment of the present application can improve the accuracy of the classification task and at the same time generate a weighted knowledge path as an interpretable classification basis.

Table 1 shows the two data of diabetes and cancer through the scheme of the embodiment of the present application and the existing knowledge guided graph attention network for detecting healthcare misinformation (DETERRENT). Comparative results of performance indicators for text classification on the set. Table 1 shows four performance indicators, namely: accuracy, precision, recall and F1 score. The optimization scheme of the concept map used in the training process of the RGAT model used in the scheme of the embodiment of the present application in Table 1 is the scheme in Example 1.

Table 1

As can be seen from Table 1, the solution of the embodiment of the present application improves the above-mentioned indicators by 1-5 percentage points compared with the existing solution. The solutions of the embodiments of the present application can effectively improve the accuracy of classification results.

Table 2 shows the comparison results of the performance indicators of text classification on the two data sets of diabetes and cancer through the scheme of the embodiment of the present application and the existing DETERRENT. The optimization scheme of the concept map used in the training process of the RGAT model used in the scheme of the embodiment of the present application in Table 2 is the scheme in Example 2.

Table 2

As can be seen from Table 2, the solution of the embodiment of the present application improves the above-mentioned indicators by 1-58 percentage points compared with the existing solution. The solutions of the embodiments of the present application can effectively improve the accuracy of classification results.

The device according to the embodiment of the present application will be described in detail below with reference to the accompanying drawings. It should be understood that the device described below can perform the foregoing method of the embodiment of the present application. In order to avoid unnecessary repetition, repeated descriptions will be appropriately omitted when introducing the devices of the embodiments of the present application.

Figure 14 is a schematic block diagram of a training device according to an embodiment of the present application. The training device 3000 shown in FIG. 14 includes an acquisition unit 3010 and a processing unit 3020.

The training device can be used to perform the method 500 or the method 800 in the embodiment of the present application.

In a possible implementation, the acquisition unit 3010 can perform the

above steps

510 and 520. The processing unit 3020 may perform the above steps 530 to 540. It should be noted that the obtaining unit used to perform step 510 and the obtaining unit used to perform step 520 may be the same or different.

Figure 15 is a schematic block diagram of a text processing device according to an embodiment of the present application. The device 4000 shown in FIG. 15 includes an acquisition unit 4010 and a processing unit 4020.

The device 4000 may be used to perform the method 900 in the embodiment of the present application.

In a possible implementation, the acquisition unit 4010 can perform the above steps 910 and 920. The processing unit 4020 may perform the above steps 930 to 960. It should be noted that the obtaining unit used to perform step 910 and the obtaining unit used to perform step 920 may be the same or different.

It should be noted that the above-mentioned training device 3000 and device 4000 are embodied in the form of functional units. The term "unit" here can be implemented in the form of software and/or hardware, and is not specifically limited.

For example, a "unit" may be a software program, a hardware circuit, or a combination of both that implements the above functions. The hardware circuit may include an application specific integrated circuit (ASIC), an electronic circuit, a processor (such as a shared processor, a dedicated processor, or a group processor) for executing one or more software or firmware programs. etc.) and memory, merged logic circuitry, and/or other suitable components to support the described functionality.

Therefore, the units of each example described in the embodiments of the present application can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

Figure 16 is a schematic diagram of the hardware structure of a training device provided by an embodiment of the present application. The training device 5000 shown in Figure 16 (the device 5000 may specifically be a computer device) includes a memory 5001, a processor 5002, a communication interface 5003 and a bus 5004. Among them, the memory 5001, the processor 5002, and the communication interface 5003 implement communication connections between each other through the bus 5004.

The memory 5001 may be a read only memory (ROM), a static storage device, a dynamic storage device or a random access memory (RAM). The memory 5001 can store programs. When the program stored in the memory 5001 is executed by the processor 5002, the processor 5002 is used to execute various steps of the training method according to the embodiment of the present application. For example, the processor 5002 may execute the method 500 or the method 800 above.

The processor 5002 may be a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more The integrated circuit is used to execute relevant programs to implement the training method of the method embodiment of the present application.

The processor 5002 may also be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the training method of the present application can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 5002.

The above-mentioned processor 5002 can also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gate or transistor logic devices, discrete hardware components. Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 5001, and the processor 5002 reads the information in the memory 5001, and combines its hardware to complete the functions required to be performed by the units included in the device shown in Figure 14, or to execute the method 500 of the method embodiment of the present application or Method 800.

The communication interface 5003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 5000 and other devices or communication networks. For example, the training text and knowledge graph can be obtained through the communication interface 5003.

Bus 5004 may include a path that carries information between various components of device 5000 (eg, memory 5001, processor 5002, communication interface 5003).

FIG. 17 is a schematic diagram of the hardware structure of a text processing device provided by an embodiment of the present application. The device 6000 shown in Figure 17 (the device 6000 may specifically be a computer device) includes a memory 6001, a processor 6002, a communication interface 6003 and a bus 6004. Among them, the memory 6001, the processor 6002, and the communication interface 6003 implement communication connections between each other through the bus 6004.

Memory 6001 may be ROM, static storage device, dynamic storage device or RAM. The memory 6001 can store programs. When the program stored in the memory 6001 is executed by the processor 6002, the processor 6002 is used to execute various steps of the text processing method according to the embodiment of the present application. For example, the processor 6002 can execute the method 900 above.

The processor 6002 can use a general-purpose CPU, microprocessor, ASIC, GPU or one or more integrated circuits to execute relevant programs to implement the text processing method of the method embodiment of the present application.

The processor 6002 may also be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the text processing method of the present application can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 6002.

The above-mentioned processor 6002 can also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component. Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 6001. The processor 6002 reads the information in the memory 6001, and combines its hardware to complete the functions required to be performed by the units included in the device shown in Figure 15, or to execute the method 900 of the method embodiment of the present application.

The communication interface 6003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 6000 and other devices or communication networks. For example, the text to be processed and the knowledge graph can be obtained through the communication interface 6003.

Bus 6004 may include a path that carries information between various components of device 6000 (eg, memory 6001, processor 6002, communication interface 6003).

Embodiments of the present application also provide a computer-readable medium that stores program code for device execution. The program code includes a method for executing the training method or text processing of the text processing model in the embodiment of the present application. any of the methods.

Embodiments of the present application also provide a computer program product containing instructions. When the computer program product is run on a computer, it causes the computer to execute any of the text processing model training methods or text processing methods in the embodiments of the present application. item.

An embodiment of the present application also provides a chip. The chip includes a processor and a data interface. The processor reads instructions stored in the memory through the data interface and executes the text processing model training method or text processing in the embodiment of the present application. any of the methods.

Optionally, as an implementation manner, the chip may also include a memory, in which instructions are stored, and the processor is used to execute the instructions stored in the memory. When the instructions are executed, the processor is used to execute the present application. Any of the text processing model training methods or text processing methods in the embodiment.

It should also be understood that in embodiments of the present application, the memory may include read-only memory and random access memory, and provide instructions and data to the processor. Part of the processor may also include non-volatile random access memory. For example, the processor may also store information about the device type.

It should be understood that the term "and/or" in this article is only an association relationship describing related objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, and A and B exist simultaneously. , there are three situations of B alone. In addition, the character "/" in this article generally indicates that the related objects are an "or" relationship.

It should be understood that in the various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation.

Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.

If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: Universal Serial Bus flash disk (UFD), UFD can also be referred to as U disk or USB flash drive, mobile hard disk, read-only memory (ROM), random access memory (random access memory, RAM), disk or optical disk and other media that can store program code.

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application. should be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A training method for text processing models, which is characterized by including:

Get training text;

Get the knowledge graph;

An initial concept map of the training text is determined based on the knowledge map, wherein the nodes in the initial concept map include topic nodes, and the topic nodes include the target noun phrases in the knowledge map corresponding to the training text. Candidate entities, the edges between nodes in the initial concept graph are used to represent entity relationships between nodes in the initial concept graph;

The initial concept map is input into the relationship-aware attention network RGAT model for training to obtain the target RGAT model. During the training process, the first concept map in the i+1 iteration is based on the first concept map in the i-th iteration. The correlation between the nodes in the second concept graph and the training text and the weight of the edges in the second concept graph are determined, i is a positive integer, and the first concept graph is a subgraph of the initial concept graph. , the second concept map is a subgraph of the initial concept map.
The training method according to claim 1, wherein the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
The training method according to claim 1 or 2, characterized in that the first concept map in the i+1 iteration is based on the nodes in the second concept map in the i-th iteration and the training text. The correlation degree and the weight of the edges in the second concept graph are determined, including:

According to the descending order of the ratio between the income of the edge in the second concept map and the first consumption, the edge in the second concept map is selected as the edge in the first concept map until the The sum of the first costs of the selected edges is greater than the threshold, the revenue of the edge in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map The first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
The training method according to claim 1 or 2, characterized in that the first concept map in the i+1 iteration is based on the nodes in the second concept map in the i-th iteration and the training text. The correlation degree and the weight of the edges in the second concept graph are determined, including:

According to the order of the second consumption of the connected subgraphs in the second concept map from small to large, the connected subgraphs in the second concept map are selected as the connected subgraphs of the first concept map until the connected subgraph is All selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.
The training method according to any one of claims 1 to 4, characterized in that the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic correlation graph. , the nodes in the topic related graph include the topic node, and the weight of the edge in the topic related graph is based on the entity relationship between the corresponding entities in the knowledge graph between the two nodes connected by the edge. The quantity is determined.
The training method according to any one of claims 1 to 5, characterized in that the initial concept graph further includes neighbor nodes, and the neighbor nodes include candidate entities corresponding to the target noun phrase in the knowledge graph. neighbor entities.
The training method according to claim 6, characterized in that the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph, and the information propagation The nodes in the graph include nodes in the initial concept graph. When the first node in the initial concept graph is a one-hop neighbor of the second node, the second node in the information propagation graph and There is a directed edge between the first nodes that points from the second node to the first node.
The training method according to claim 7, characterized in that, the score of the strongly connected branch on the information propagation graph is by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected according to topological sorting. Obtained after branching, the initial score of the strongly connected branch where the topic node is located is determined based on the maximum importance value of the nodes in the strongly connected branch where the topic node is located.
A text processing method, characterized by including:

Get the text to be processed;

Get the knowledge graph;

Determine the text encoding of the text to be processed;

Determine a concept map of the text to be processed based on the knowledge map;

The concept map of the text to be processed is processed through the target RGAT to obtain the knowledge encoding of the text to be processed, wherein the target RGAT is obtained by inputting the initial concept map of the training text into the RGAT for training. , during the training process, the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and the second concept map The weight of the edge in is determined, i is a positive integer, the first concept map is a subgraph of the initial concept map, the second concept map is a subgraph of the initial concept map, and the initial concept map The nodes in include topic nodes, the topic nodes include candidate entities in the knowledge graph corresponding to the target noun phrases in the training text, and the edges between the nodes in the initial concept graph are used to represent the initial Entity relationships between nodes in the concept map;

The processing result of the text to be processed is determined based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed.
The method of claim 9, wherein the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
The method according to claim 9 or 10, characterized in that the first concept map in the i+1 iteration is based on the node in the second concept map in the i iteration and the training text. The correlation degree and the weight of the edges in the second concept graph are determined, including:

According to the descending order of the ratio between the income of the edge in the second concept map and the first consumption, the edge in the second concept map is selected as the edge in the first concept map until the The sum of the first costs of the selected edges is greater than the threshold, the revenue of the edge in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map The first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
The method according to claim 9 or 10, characterized in that the first concept map in the i+1 iteration is based on the node in the second concept map in the i iteration and the training text. The correlation degree and the weight of the edges in the second concept graph are determined, including:

According to the order of the second consumption of the connected subgraphs in the second concept map from small to large, the connected subgraphs in the second concept map are selected as the connected subgraphs of the first concept map until the connected subgraph is All selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.
The method according to any one of claims 9 to 12, characterized in that the initial concept graph further includes neighbor nodes, and the neighbor nodes include candidate entities corresponding to the target noun phrase in the knowledge graph. Neighbor entities.
A training device for text processing models, which is characterized by including:

Get the unit for:

Get training text;

Get the knowledge graph;

Processing unit for:

An initial concept map of the training text is determined based on the knowledge map, wherein the nodes in the initial concept map include topic nodes, and the topic nodes include the target noun phrases in the knowledge map corresponding to the training text. Candidate entities, the edges between nodes in the initial concept graph are used to represent entity relationships between nodes in the initial concept graph;

The initial concept map is input into the relationship-aware attention network RGAT model for training to obtain the target RGAT model. During the training process, the first concept map in the i+1 iteration is based on the first concept map in the i-th iteration. The correlation between the nodes in the second concept graph and the training text and the weight of the edges in the second concept graph are determined, i is a positive integer, and the first concept graph is a subgraph of the initial concept graph. , the second concept map is a subgraph of the initial concept map.
The training device according to claim 14, wherein the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
The training device according to claim 14 or 15, characterized in that the processing unit is specifically used to:

According to the descending order of the ratio between the income of the edge in the second concept map and the first consumption, the edge in the second concept map is selected as the edge in the first concept map until the The sum of the first costs of the selected edges is greater than the threshold, the revenue of the edge in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map The first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
The training device according to claim 14 or 15, characterized in that the processing unit is specifically used to:

According to the order of the second consumption of the connected subgraphs in the second concept map from small to large, the connected subgraphs in the second concept map are selected as the connected subgraphs of the first concept map until the connected subgraph is All selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.
The training device according to any one of claims 14 to 17, characterized in that the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic correlation graph. , the nodes in the topic related graph include the topic node, and the weight of the edge in the topic related graph is based on the entity relationship between the corresponding entities in the knowledge graph between the two nodes connected by the edge. The quantity is determined.
The training device according to any one of claims 14 to 18, wherein the initial concept graph further includes neighbor nodes, and the neighbor nodes include candidate entities corresponding to the target noun phrase in the knowledge graph. neighbor entities.
The training device according to claim 19, characterized in that the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph, and the information propagation The nodes in the graph include nodes in the initial concept graph. When the first node in the initial concept graph is a one-hop neighbor of the second node, the second node in the information propagation graph and There is a directed edge between the first nodes that points from the second node to the first node.
The training device according to claim 20, wherein the score of the strongly connected branch on the information propagation graph is obtained by propagating the initial score of the strongly connected branch where the subject node is located to the downstream strongly connected branch according to topological sorting. Obtained after branching, the initial score of the strongly connected branch where the topic node is located is determined based on the maximum importance value of the nodes in the strongly connected branch where the topic node is located.
A text processing device, characterized by including:

Get the unit for:

Get the text to be processed;

Get the knowledge graph;

Processing unit for:

Determine the text encoding of the text to be processed;

Determine a concept map of the text to be processed based on the knowledge map;

The concept map of the text to be processed is processed through the target RGAT to obtain the knowledge encoding of the text to be processed, wherein the target RGAT is obtained by inputting the initial concept map of the training text into the RGAT for training. , during the training process, the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and the second concept map The weight of the edge in is determined, i is a positive integer, the first concept map is a subgraph of the initial concept map, the second concept map is a subgraph of the initial concept map, and the initial concept map The nodes in include topic nodes, the topic nodes include candidate entities in the knowledge graph corresponding to the target noun phrases in the training text, and the edges between the nodes in the initial concept graph are used to represent the initial Entity relationships between nodes in the concept map;

The processing result of the text to be processed is determined based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed.
The device according to claim 22, wherein the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
The device according to claim 22 or 23, characterized in that the first concept map in the i+1 iteration is based on the node in the second concept map in the i-th iteration and the training text. The correlation degree and the weight of the edges in the second concept graph are determined, including:

According to the descending order of the ratio between the income of the edge in the second concept map and the first consumption, the edge in the second concept map is selected as the edge in the first concept map until the The sum of the first costs of the selected edges is greater than the threshold, the revenue of the edge in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map The first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
The device according to claim 22 or 23, characterized in that the first concept map in the i+1 iteration is based on the node in the second concept map in the i-th iteration and the training text. The correlation degree and the weight of the edges in the second concept graph are determined, including:

According to the order of the second consumption of the connected subgraphs in the second concept map from small to large, the connected subgraphs in the second concept map are selected as the connected subgraphs of the first concept map until the connected subgraph is All selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.
The device according to any one of claims 22 to 25, characterized in that the initial concept graph further includes neighbor nodes, and the neighbor nodes include candidate entities corresponding to the target noun phrase in the knowledge graph. Neighbor entities.
A text processing device, characterized in that it includes a processor and a memory, the memory is used to store program instructions, and the processor is used to call the program instructions to execute claims 1 to 8 or claims 9 to 13 any one of the methods.
A computer-readable storage medium, characterized in that the computer-readable medium stores program code for device execution, and the program code includes a program code for executing any one of claims 1 to 8 or claims 9 to 13 the method described.
A computer program product containing instructions, characterized in that, when the computer program product is run on a computer, the computer is caused to perform the method as claimed in any one of claims 1 to 8 or 9 to 13 .