WO2024007119A1 - Training method for text processing model, and text processing method and device - Google Patents

Training method for text processing model, and text processing method and device Download PDF

Info

Publication number
WO2024007119A1
WO2024007119A1 PCT/CN2022/103682 CN2022103682W WO2024007119A1 WO 2024007119 A1 WO2024007119 A1 WO 2024007119A1 CN 2022103682 W CN2022103682 W CN 2022103682W WO 2024007119 A1 WO2024007119 A1 WO 2024007119A1
Authority
WO
WIPO (PCT)
Prior art keywords
concept map
text
graph
nodes
training
Prior art date
Application number
PCT/CN2022/103682
Other languages
French (fr)
Chinese (zh)
Inventor
林雪玲
李昊阳
王路宁
曹琛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2022/103682 priority Critical patent/WO2024007119A1/en
Publication of WO2024007119A1 publication Critical patent/WO2024007119A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present application relates to the field of artificial intelligence, and in particular to a training method for a text processing model, a text processing method and a device.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that can respond in a manner similar to human intelligence.
  • Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Natural language processing is an important research direction in the field of artificial intelligence. Natural language processing tasks are usually performed based on the natural language text itself. The natural language text itself contains relatively limited features, and the text processing effect may not meet expectations. Some solutions use knowledge graphs as auxiliary information for text processing, but these solutions may introduce knowledge graph information that is completely irrelevant to the text content during the text processing process, thus affecting the effect of text processing.
  • This application provides a training method for a text processing model, a text processing method and a device, which can improve the effect of text processing.
  • a training method for a text processing model including: obtaining training text; obtaining a knowledge graph; and determining an initial concept graph of the training text based on the knowledge graph, where the nodes in the initial concept graph include topic nodes. Including candidate entities in the knowledge graph corresponding to the target noun phrase in the training text.
  • the edges between the nodes in the initial concept graph are used to represent the entity relationships between the nodes in the initial concept graph; the initial concept graph is input to the relationship awareness
  • the attention network RGAT model is trained to obtain the target RGAT model.
  • the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i iteration and the training text.
  • the degree and the weight of the edge in the second concept map are determined, i is a positive integer, the first concept map is a subgraph of the initial concept map, and the second concept map is a subgraph of the initial concept map.
  • the concept map can be optimized according to the correlation between the nodes in the concept map and the training text, and the importance of the edges in the concept map, and the optimized The concept map is used as the concept map used in the next iteration.
  • the knowledge graph can be a knowledge graph in the professional field to which the training text belongs.
  • the target noun phrase refers to the noun phrase in the text that corresponds to at least one candidate entity in the knowledge graph.
  • the noun phrase can be used as a target noun phrase in the training text.
  • the concept maps in the iterative process are all subgraphs of the initial concept map.
  • the correlation between the nodes in the concept map and the training text during the iterative process is the correlation between the same node in the initial concept map and the training text.
  • the correlation between the nodes in the initial concept map and the training text may be determined based on the importance of the nodes.
  • the importance of a node can be determined by the eigenvector centrality of the node.
  • the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
  • the topic node may include all candidate entities corresponding to each target noun phrase in the knowledge map, and the resulting concept map covers all candidate entities related to the text data and the corresponding entity relationships.
  • This can provide comprehensive and complete knowledge related to the text for subsequent processing.
  • Using the solution of the embodiment of the present application to learn the expression of the knowledge level of the text further ensures the accuracy of downstream tasks and avoids errors due to missing part of the knowledge. reasoning path.
  • the first concept map in the i+1th iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and The weight of the edge in the second concept map is determined, including: selecting the edge in the second concept map as the first according to the order of the ratio between the income of the edge in the second concept map and the first consumption in descending order.
  • the income of the edges in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map
  • the first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
  • the first concept map in the i+1th iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and The weight of the edge in the second concept map is determined, including: selecting the connected subgraph in the second concept map as the first concept map in order of the second cost of the connected subgraph in the second concept map from small to large. Connected subgraphs until all selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.
  • the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic correlation graph, and the nodes in the topic correlation graph include topic nodes ,
  • the weight of the edge in the topic related graph is ,determined based on the number of entity relationships between the ,corresponding entities in the knowledge graph between the two ,nodes connected by the edge.
  • the eigenvector centrality of a topic node is determined based on the initial importance of the topic node and the weight of the edges in the topic correlation graph.
  • the initial importance of a node in the topic related graph is set based on the probability of the node appearing in the facts recorded in the knowledge graph.
  • the initial concept graph further includes neighbor nodes, and the neighbor nodes include neighbor entities in the knowledge graph of the candidate entities corresponding to the target noun phrase.
  • the initial concept map in the embodiment of the present application also includes neighbor entities, that is, neighbor nodes, that are connected to the candidate entities (ie, topic nodes) in the knowledge map, which can further provide more comprehensive and complete knowledge and help improve the accuracy of coding at the knowledge level. .
  • the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph, and the nodes in the information propagation graph include the initial For nodes in the concept map, when the first node in the initial concept map is a one-hop neighbor of the second node, there is a link between the second node and the first node in the information propagation graph from the second node to the first node. ’s directed edge.
  • the score of the strongly connected branch on the information propagation graph is by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected branch according to the topological sorting. Obtained, the initial score of the strongly connected branch where the topic node is located is determined based on the maximum importance of the nodes in the strongly connected branch where the topic node is located.
  • a text processing method includes: obtaining the text to be processed; obtaining the knowledge graph; determining the text encoding of the text to be processed; determining the concept map of the text to be processed based on the knowledge graph; through the target RGAT processes the concept map of the text to be processed to obtain the knowledge encoding of the text to be processed.
  • the target RGAT is obtained by inputting the initial concept map of the training text into RGAT for training.
  • the i+th The first concept map in one iteration is determined based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and the weight of the edges in the second concept map.
  • the concept map is a subgraph of the initial concept map
  • the second concept map is a subgraph of the initial concept map.
  • the nodes in the initial concept map include topic nodes, and the topic nodes include candidate entities in the knowledge map corresponding to the target noun phrases in the training text.
  • the edges between nodes in the initial concept map are used to represent the entity relationships between nodes in the initial concept map; the processing results of the text to be processed are determined based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed. .
  • the concept map can be optimized according to the correlation between the nodes in the concept map and the training text, and the importance of the edges in the concept map, and the optimized The concept map is used as the concept map used in the next iteration.
  • the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
  • the topic nodes in the concept map of the text to be processed may include all candidate entities in the knowledge map corresponding to the target noun phrases in the text to be processed.
  • the topic node may include all candidate entities corresponding to each target noun phrase in the knowledge map, and the resulting concept map covers all candidate entities related to the text data and the corresponding entity relationships.
  • This can provide comprehensive and complete knowledge related to the text for subsequent processing, avoiding incorrect reasoning paths due to missing part of the knowledge.
  • the target RGAT focuses on knowledge that is highly relevant to the text during the training process, further ensuring that improve the accuracy of downstream tasks.
  • the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and The weight of the edge in the second concept map is determined, including: selecting the edge in the second concept map as the first according to the order of the ratio between the income of the edge in the second concept map and the first consumption in descending order.
  • the income of the edges in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map
  • the first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
  • the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and The weight of the edge in the second concept map is determined, including: selecting the connected subgraph in the second concept map as the first concept map in order of the second cost of the connected subgraph in the second concept map from small to large. Connected subgraphs until all selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.
  • the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic correlation graph, and the nodes in the topic correlation graph include topic nodes.
  • the weight of the edge in the topic related graph is ,determined based on the number of entity relationships between the ,corresponding entities in the knowledge graph between the two ,nodes connected by the edge.
  • the eigenvector centrality of a topic node is determined based on the initial importance of the topic node and the weight of the edges in the topic correlation graph.
  • the initial importance of a node in the topic related graph is set based on the probability of the node appearing in the facts recorded in the knowledge graph.
  • the nodes in the initial concept graph also include neighbor nodes, and the neighbor nodes include neighbor entities in the knowledge graph of the candidate entities corresponding to the target noun phrase.
  • Neighbor nodes in the concept graph of the text to be processed may include neighbor entities in the knowledge graph of candidate entities corresponding to the target noun phrases in the text to be processed.
  • the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph.
  • the nodes in the information propagation graph include the initial For nodes in the concept map, when the first node in the initial concept map is a one-hop neighbor of the second node, there is a link between the second node and the first node in the information propagation graph from the second node to the first node. ’s directed edge.
  • the score of the strongly connected branch on the information propagation graph is by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected branch according to the topological sorting. Obtained, the initial score of the strongly connected branch where the topic node is located is determined based on the maximum importance of the nodes in the strongly connected branch where the topic node is located.
  • the method further includes: outputting the knowledge path (knowledge path) in the concept map of the text to be processed based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed. path), this knowledge path is used to indicate the basis for judging the processing results.
  • knowledge path knowledge path
  • a knowledge path refers to the path between two nodes in the concept map.
  • the k-hop knowledge path between node e q and node e q+k can be expressed as (e q ,r q ,e q+1 ,r q+1 ,...,r q+k-1 ,e q+ k ), (e q ,r q ,e q+1 ) is a triplet, r q represents the entity relationship between the two nodes, and so on.
  • q is a positive integer.
  • the knowledge path can improve the interpretability of the model and provide users with a basis for judging the processing results, that is, the complete logic of the processing results can be obtained, which is conducive to improving the user's trust.
  • a text processing model training device includes a unit for executing the method of any implementation of the first aspect.
  • a fourth aspect provides a text processing device, which includes a unit for executing the method of any implementation of the second aspect.
  • a training device for a text processing model includes: a memory for storing a program; a processor for executing the program stored in the memory.
  • the processor is configured to execute the method in any implementation manner of the first aspect.
  • the processor in the fifth aspect mentioned above can be either a central processing unit (CPU) or a combination of a CPU and a neural network computing processor.
  • the neural network computing processor here can include a graphics processor (graphics processing unit (GPU), neural-network processing unit (NPU) and tensor processing unit (TPU), etc.
  • GPU graphics processing unit
  • NPU neural-network processing unit
  • TPU tensor processing unit
  • TPU is an artificial intelligence accelerator dedicated integrated circuit fully customized by Google for machine learning.
  • a text processing device in a sixth aspect, includes: a memory for storing a program; a processor for executing the program stored in the memory.
  • the The processor is a unit configured to execute the method of any implementation manner of the second aspect.
  • the processor in the sixth aspect may be a CPU or a combination of a CPU and a neural network computing processor.
  • the neural network computing processor here may include a GPU, NPU or TPU, etc.
  • a computer-readable medium stores program code for device execution.
  • the program code includes any implementation manner for executing any one of the first to second aspects. method in.
  • An eighth aspect provides a computer program product containing instructions, which when the computer program product is run on a computer, causes the computer to execute the method in any implementation of any one of the above-mentioned first to second aspects.
  • a ninth aspect provides a chip.
  • the chip includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface and executes any one of the above first to second aspects. method in any implementation.
  • the chip may further include a memory, in which instructions are stored, and the processor is configured to execute the instructions stored in the memory.
  • the processor is configured to execute the method in any implementation manner of any one of the first aspect to the second aspect.
  • Figure 1 is a schematic block diagram of a natural language processing system provided by an embodiment of the present application.
  • Figure 2 is a schematic block diagram of a system architecture provided by an embodiment of the present application.
  • Figure 3 is a schematic block diagram of a text processing system provided by an embodiment of the present application.
  • Figure 4 is a schematic block diagram of a text processing model provided by an embodiment of the present application.
  • Figure 5 is a schematic flow chart of a text processing model training method provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of knowledge extraction provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of the construction process of a concept map provided by the embodiment of the present application.
  • Figure 8 is a schematic diagram of a subject-related diagram provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of an information dissemination diagram provided by an embodiment of the present application.
  • Figure 10 is a schematic diagram of another text processing model training method provided by an embodiment of the present application.
  • Figure 11 is a schematic flow chart of a text processing method provided by an embodiment of the present application.
  • Figure 12 is a schematic flow chart of another text processing method provided by an embodiment of the present application.
  • Figure 13 is a schematic diagram of a text classification result provided by an embodiment of the present application.
  • Figure 14 is a schematic block diagram of a training device provided by an embodiment of the present application.
  • Figure 15 is a schematic block diagram of a text processing device provided by an embodiment of the present application.
  • Figure 16 is a schematic block diagram of another training device provided by an embodiment of the present application.
  • Figure 17 is a schematic block diagram of another text processing device provided by an embodiment of the present application.
  • Natural language processing is an important research direction in the field of artificial intelligence, which enables humans and machines to interact through natural language. Natural language processing tasks are usually performed based on the natural language text itself.
  • the natural language text itself contains relatively limited features, and the text processing effect may not meet expectations.
  • some solutions introduce knowledge graphs as auxiliary information for text processing.
  • the introduction of knowledge graphs may bring other problems. For example, the ambiguity of entities in the knowledge graph and the noise of the knowledge graph may lead to the introduction of knowledge graph information that is completely irrelevant to the text content during text processing, and the text processing results cannot be guaranteed. accuracy.
  • the embodiment of the present application provides a text processing method, which is beneficial to improving the effect of text processing.
  • Natural language is human language, and natural language processing is the processing of human language. Natural language processing is the process of systematically analyzing, understanding and extracting information from text data in an intelligent and efficient way. NLP and its components can manage very large blocks of text data, or perform a large number of automated tasks, and solve a variety of problems, such as automatic summarization (automatic summarization), machine translation (MT), named entity recognition ( named entity recognition (NER), relationship extraction (RE), information extraction (IE), sentiment analysis, speech recognition (speech recognition), question answering system (question answering), topic segmentation, etc.
  • automatic summarization automatic summarization
  • MT machine translation
  • NER named entity recognition
  • RE relationship extraction
  • IE information extraction
  • sentiment analysis speech recognition
  • speech recognition speech recognition
  • question answering system questions answering
  • topic segmentation etc.
  • Knowledge graph is a knowledge base that integrates real-world facts through a graph-structured data model.
  • Knowledge graphs are often used to store entities that are related to each other. For example, a fact that represents the existence of some entity relationship between two entities can be expressed as a triple data structure in the form of (entity, entity relationship, entity).
  • Entities are represented as nodes in the knowledge graph and represent conceptual entities in the real world. For example, “Peking University (Organization)", “Vitamin B12 (Medical Element)” and “Hemoglobin (Medical Element)” etc. Entity relationships are represented by edges between nodes corresponding to two entities in the knowledge graph, representing the relationship between two entities in the real world. For example, the entity relationship between "vitamin B12” and “hemoglobin” is "increase”. (Vitamin B12, increases hemoglobin) indicates the fact that vitamin B12 increases hemoglobin.
  • the knowledge graph in the professional field refers to the knowledge graph containing entities, relationships and facts in the professional field.
  • knowledge graphs in the financial domain are used to indicate entities, relationships, and facts in the financial domain.
  • the knowledge graph in the medical field is used to indicate entities, relationships and facts in the medical field.
  • the triplet data structure extracted from natural language text that can express facts can be called the knowledge triplet of the text, in the form of (noun phrase, relational phrase, noun phrase).
  • a noun phrase can include one word or multiple words.
  • a relational phrase can include one word or multiple words.
  • a noun phrase can correspond to one or more candidate entities within the knowledge graph.
  • the Chinese noun phrase “apple” can correspond to candidate entities such as “apple (fruit)” or “apple (company)”.
  • the English noun phrase “anemia” can correspond to candidate entities such as “anemia(disease)", “anemia(symptom)” or “anemia(plant)”.
  • Knowledge graph embedding refers to mapping the entities and entity relationships in the knowledge graph to a low-dimensional vector space to obtain the embedded representation of the knowledge graph and realize the semantic information representation of entities and entity relationships.
  • the embedded representation of knowledge graphs can be used for various tasks related to knowledge graphs.
  • the embedded representation of the knowledge graph may include at least one of the following: an embedded representation of an entity or an embedded representation of a relationship, etc.
  • the embedded representation of the knowledge graph can be obtained through the knowledge graph embedding model.
  • the knowledge graph embedding model can be implemented based on graph neural network (GNN).
  • the k-hop neighbors of a node in the graph refer to the set of all nodes whose shortest path to the node is k-hops starting from the node. k is a positive integer.
  • DAG Directed acyclic graph
  • Topological sorting of a directed acyclic graph G is to arrange all the nodes in G into a linear sequence, so that for any pair of nodes u and v in the graph, the edge ⁇ u,v> represents the path from node u to node v. , the edge set E(G) represents the set of edges between each node in G. If the edge ⁇ u,v> ⁇ E(G), then u appears before v in the linear sequence. Usually, such a linear sequence is called a sequence that satisfies topological order, or a topological sequence for short.
  • a topological sequence needs to satisfy two conditions:
  • a directed acyclic graph can have one or more topologically ordered sequences.
  • a directed graph G has a directed path from v to u and from u to v for any two nodes v and u, then the directed graph G is called a strongly connected graph.
  • a directed graph G if two nodes u and v have directed paths in both directions, then u and v are said to be strongly connected.
  • the directed graph G is said to be a strongly connected graph.
  • the strongly connected subgraph S is a maximal strongly connected subgraph of G, or , S can also be called a strongly connected branch of G.
  • Tarjan's algorithm can be used to solve strongly connected branches of directed graphs. Specifically, this algorithm can be used to calculate the size of each strongly connected branch in the directed graph, the nodes of each strongly connected branch, the total number of strongly connected branches, etc.
  • any two nodes in a connected subgraph of an undirected graph are connected to each other by a path, and nodes are not connected in a hypergraph.
  • eigenvector centrality is a way to measure the influence of a node on a network.
  • the importance of a node usually depends on the number of neighbor nodes of the node (that is, the degree of the node) and the importance of the node's neighbor nodes. The more important the neighbor nodes connected to it are, the more important the node is.
  • the importance of a node can be expressed as a score. The higher the score, the more important the node is. For nodes with the same number of connections, the node with a higher score of the adjacent node will have a higher score than the node with a lower score of the adjacent node. According to this principle, all nodes can be assigned corresponding scores. A higher eigenvector score means that the node is connected to many nodes that themselves have higher scores.
  • the neural network can be composed of neural units.
  • the neural unit can refer to an arithmetic unit that takes x s and intercept 1 as input.
  • the output of the arithmetic unit can be as follows:
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to transform the input signal in the neural unit into an output signal.
  • the output signal of this activation function can be used as the input of the next layer.
  • the activation function can be ReLU, tanh or sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Graph neural network refers to the general term for algorithms that use neural networks to learn graph-structured data, extract and explore features and patterns in graph-structured data, and meet the needs of graph learning tasks such as clustering, classification, prediction, segmentation, and generation.
  • the graph neural network can aggregate the neighborhood information of each node to obtain the embedded representation of each node.
  • the attention mechanism allows a neural network to focus only on the information required for task learning.
  • the attention mechanism is introduced into GNN to form a graph attention network (GAT).
  • GAT focuses on nodes and edges that are more relevant to the task, which can improve the processing effect.
  • RGAT is a graph neural network that can model various relationships in the graph structure through the graph attention mechanism to obtain the vector expression of the nodes in the graph structure and the relationships between nodes in a low-dimensional space.
  • RGAT can be used to process the input knowledge graph to obtain the low-dimensional space vector of each entity and entity relationship on the knowledge graph.
  • Figure 1(a) shows an application scenario of the natural language processing system.
  • the natural language processing system includes user equipment and data processing equipment.
  • User equipment includes users and smart terminals such as mobile phones, personal computers, or information processing centers.
  • User equipment is the initiator of natural language data processing. As the initiator of language question and answer or query requests, users usually initiate requests through user equipment.
  • Data processing equipment can be cloud servers, network servers, application servers, management servers and other devices or servers with data processing functions.
  • the data processing equipment receives query statements/voice/text and other questions from the smart terminal through an interactive interface, and then performs machine learning, deep learning, search, reasoning, decision-making, etc. through the memory that stores the data and the processor that processes the data.
  • Storage can be a general term that includes local storage and databases that store historical data.
  • the database can be located on the data processing device or on other network servers.
  • FIG. 1 shows another application scenario of the natural language processing system.
  • the user device directly serves as a data processing device, directly receiving input from the user and processing it directly by the hardware of the user device itself.
  • the specific process is similar to (a) in Figure 1. Please refer to the above description and will not be repeated here. .
  • FIG. 1 shows a schematic diagram of related equipment of the natural language processing system provided by the embodiment of the present application.
  • the natural language processing system may include a local device 101, a local device 102, an execution device 110 and a data storage system 150, where the local device 101 and the local device 102 are connected to the execution device 110 through a communication network.
  • the execution device 110 is implemented by one or more servers, and optionally cooperates with other computing devices, such as data storage, routers, load balancers and other devices; the execution device 110 can be arranged on a physical site, or distributed across multiple on the physical site.
  • the execution device 110 can use the data in the data storage system 150 or call the program code in the data storage system 150 to implement the training method of the text processing model in the embodiment of the present application.
  • execution device 110 can also be called a cloud device, and in this case, the execution device 110 can be deployed in the cloud.
  • the execution device 110 may also be a terminal device. In this case, the execution device 110 may be deployed on the user terminal side, which is not limited in the embodiments of the present application.
  • Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, game console, etc.
  • Each user's local device can interact with the execution device 110 through a communication network of any communication mechanism/communication standard.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • the data storage system 150 can be integrated on the execution device 110, the local device 101 or the local device 102, or can be set up on the cloud or other network servers.
  • the local device 101 or the local device 102 can obtain the relevant parameters of the text processing model from the execution device 110, and use the text processing model on the local device 101 or the local device 102 to obtain the execution result of the text processing task.
  • the text processing model can be deployed directly on the execution device 110.
  • the execution device 110 obtains the text to be processed from the local device 101 and the local device 102, and obtains the execution result of the text processing task through the text processing model.
  • the user equipment in Figure 1(a) and Figure 1(b) may be the local device 101 or 102 in Figure 1(c).
  • the data processing in Figure 1(a) and Figure 1(b) The device may be the execution device 110 in (c) of Figure 1 .
  • Figure 2 shows a system architecture 200 provided by an embodiment of the present application.
  • the data collection device 260 is used to collect training data and store it in the database 230.
  • the training device 220 generates a target model/rule 201 based on the training data maintained in the database 230, for example, the text processing model in the embodiment of the present application.
  • the model in the embodiment of this application may be a neural network model, or it may also be other models.
  • the training data may include training text and target processing results of the training text, such as labels of the training text.
  • the training data maintained in the database 230 may not necessarily be collected by the data collection device 260, but may also be received from other devices.
  • the training device 220 does not necessarily perform training of the target model/rules 201 based entirely on the training data maintained by the database 230. It may also obtain training data from the cloud or other places for model training.
  • the above description should not be regarded as a limitation of this application. Limitations of Examples.
  • Figure 2 shows the functional module diagram in the data processing process.
  • client device 240 in FIG. 2 may be the user device of FIG. 1 .
  • the execution device 210 and the data storage system 250 in Figure 2 can be integrated into the user equipment in Figure 1 .
  • the execution device 210 and the data storage system 250 in Figure 2 can also be integrated on the data processing device in Figure 1 .
  • the database 230, training device 220 and data collection device 260 in Figure 2 can be integrated correspondingly on the data processing device in Figure 1, and can be set up on the cloud or other servers on the network.
  • the data collection device 260 may be a terminal device, or an input and output interface of a server or cloud, an interaction layer (interface) used to obtain user input and return processing results.
  • an interaction layer interface
  • the target model/rule obtained by the training device 220 can be applied in different systems or devices.
  • the execution device 210 can be a terminal, such as a mobile phone terminal, a tablet computer, a laptop, AR/VR, a vehicle-mounted terminal, etc., or it can also be a server or cloud.
  • the execution device 210 is configured with an I/O interface 212 for data interaction with external devices. The "user" can input data to the I/O interface 212 through the client device 240.
  • the execution device 210 When the execution device 210 preprocesses the input data, or when the calculation module 211 of the execution device 210 performs calculations and other related processes, the execution device 210 can call the data, code, etc. in the data storage system 250, or can transfer the data, Instructions, etc. are stored in data storage system 250.
  • the I/O interface 212 returns the processing results to the client device 240 and provides them to the user.
  • the training device 220 can generate corresponding target models/rules 201 based on different data for different goals to provide users with better results.
  • the user can manually specify the data to be input into the execution device 210 , for example, by operating in the interface provided by the I/O interface 212 .
  • the client device 240 can automatically input data to the I/O interface 212 and obtain the results. If the client device 240 automatically inputs data and requires the user's authorization, the user can set corresponding permissions in the client device 240 .
  • the user can view the results output by the execution device 210 on the client device 240, and the specific presentation form may be display, sound, action, etc.
  • the client device 240 can also serve as a data collection terminal to store the collected data in the database 230.
  • Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • data storage The system 250 is an external memory relative to the execution device 210. In other cases, the data storage system 250 can also be placed in the execution device 210.
  • Figure 3 shows a schematic block diagram of a text processing system provided by an embodiment of the present application.
  • the execution device 210 in Fig. 2 the data processing devices in (a) and (b) of Fig. 1, or the execution device 110 in (c) of Fig. 1 can be deployed on the server in Fig. 3.
  • the text processing model in the embodiment of the present application can be implemented through program code deployed on the hardware of the server.
  • the text processing model of the embodiment of the present application can be modified and implemented on the basis of the existing software platform.
  • the program code runs in the server's host storage (host memory or disk as shown in Figure 3) and acceleration hardware (such as GPU, FPGA or dedicated chip, etc.) memory.
  • the dedicated chip may be a neural network operation processor, which can be used to perform operations on the neural network model.
  • Figure 4 shows a schematic structural diagram of a text processing model provided by an embodiment of the present application.
  • the text processing model 400 includes a knowledge extraction module 410 , a text encoding module 420 , a knowledge encoding module 430 and a task processing module 440 .
  • the knowledge extraction module 410 is used to extract knowledge from input data.
  • the input data can be text.
  • the input data can be training text, and during the inference process, the input data can be text to be processed.
  • the text encoding module 420 is used to encode the text to obtain text-level encoding of the text.
  • the knowledge encoding module 430 is used to process the knowledge extracted by the knowledge extraction module 410 through RGAT to generate knowledge-level encoding of the text.
  • the text-level encoding of the text can be used as an input to the knowledge encoding module 430 and participate in the process of the knowledge encoding module 430 generating the knowledge-level encoding.
  • the output of the knowledge encoding module 430 can be understood as predictive encoding of the knowledge level of the text.
  • the output of the knowledge encoding module 430 is the encoding of the knowledge level of the text.
  • the task processing module 440 is configured to output text processing results based on text-level coding and knowledge-level coding.
  • the text processing model shown in Figure 4 may be a text classification model for text classification.
  • the task processing module 440 may be configured to output the result of text classification, that is, predict the category of the text based on text-level coding and knowledge-level coding.
  • Figure 5 shows a schematic flow chart of a text processing model training method provided by an embodiment of the present application.
  • the method 500 shown in Figure 5 can be executed by a device or device capable of performing a text processing model training process.
  • the device can be a cloud service device or a terminal device, such as a computer, server, vehicle, mobile phone, etc. with computing capabilities.
  • the device sufficient to execute the training method of the text processing model may also be a system composed of a cloud service device and a terminal device.
  • the method 500 may be executed by any one of the execution device 110 in FIG. 1 , a local device, or the training device 220 in FIG. 2 .
  • the method 500 may be specifically executed by the training device 220 as shown in FIG. 2 , and the training data in the method 500 may be training data maintained in the database 230 as shown in FIG. 2 .
  • the text processing model in method 500 may be the text processing model shown in Figure 4.
  • the knowledge encoding module in the text processing model can be implemented through the RGAT model, and the training method of the text processing model can also be understood as the training method of RGAT.
  • Method 500 includes steps 510 to 540. Steps 510 to 540 are described below.
  • the nodes in the initial concept map include topic nodes, where the topic nodes include candidate entities in the knowledge map corresponding to the target noun phrases in the training text.
  • the edges between the nodes are used to represent the entity relationships between the nodes in the initial concept graph.
  • the first concept map in the i+1 iteration is based on the nodes in the second concept map in the i-th iteration. Determined by the relevance to the training text and the weight of the edge in the second concept graph, i is a positive integer.
  • the first concept map is a subgraph of the initial concept map
  • the second concept map is a subgraph of the initial concept map.
  • the concept map can be optimized according to the correlation between the nodes in the concept map and the training text, and the importance of the edges in the concept map, and the optimized The concept map is used as the concept map used in the next iteration.
  • Knowledge with low textual relevance can also be understood as redundant knowledge or ambiguous knowledge.
  • Knowledge with greater text relevance can also be understood as key knowledge.
  • the knowledge graph can be represented as graph-structured data, used to represent entities and their relationships that exist in the real world.
  • Entities can be represented as nodes in the knowledge graph
  • entity relationships can be represented as edges in the knowledge graph.
  • nodes in the knowledge graph may also be called entity nodes in the knowledge graph or entities in the knowledge graph.
  • the edges in the knowledge graph can also be called relationship edges in the knowledge graph or entity relationships in the knowledge graph.
  • the knowledge graph may be an existing knowledge graph, or it may be a pre-constructed knowledge graph, which is not limited in the embodiments of the present application.
  • the knowledge graph in step 520 may be a knowledge graph in the professional field to which the training text belongs.
  • the text is data in the medical field
  • the knowledge graph can be a knowledge graph in the medical field.
  • the text is data in the financial field
  • the knowledge graph can be a knowledge graph in the financial field.
  • Knowledge graphs can be constructed based on corpus in professional fields.
  • the corpus can include website articles or books, etc.
  • Knowledge graphs in different professional fields can be constructed based on corpora in different professional fields.
  • the fact in the knowledge graph that represents the existence of an entity relationship between two entities can be represented as a triple data structure.
  • the construction process of the initial concept map may be performed by the knowledge encoding module 430 shown in FIG. 4 .
  • the initial concept map can be represented as graph-structured data that is used to indicate knowledge in the text.
  • the initial concept graph can be understood as a subgraph of the knowledge graph.
  • the nodes in the concept map can also be called entities in the concept map or entity nodes in the concept map.
  • the topic nodes in the initial concept map include candidate entities in the knowledge map that correspond to the target noun phrases in the training text. It can also be understood that the topic nodes in the initial concept map correspond to the target noun phrases in the knowledge map that correspond to the training text. candidate entities. There can be a one-to-one correspondence between topic nodes and candidate entities.
  • edges in the initial concept map are used to represent the entity relationships between nodes in the initial concept map. It can also be understood that the edges in the initial concept map are used to represent entities in the knowledge map that correspond to the nodes in the initial concept map. entity relationship. The edges between nodes in the initial concept graph are determined based on the entity relationships in the knowledge graph. By connecting the nodes in the initial concept graph according to the entity relationships in the knowledge graph, the edges between the nodes in the initial concept graph can be obtained.
  • an edge between node A and node B in the initial concept map is used to represent the entity relationship between node A and node B.
  • Node A in the initial concept map corresponds to entity A in the knowledge map
  • node B in the initial concept map corresponds to entity B in the knowledge map.
  • the entity relationship between node A and node B is the entity relationship between entity A and entity B in the knowledge graph.
  • Entity A in the knowledge graph can also be called node A in the knowledge graph.
  • Node B in the knowledge graph can also be called node B in the knowledge graph.
  • the entity relationship between node A and node B is the node in the knowledge graph. Entity relationship between node A and node B.
  • the target noun phrase refers to the noun phrase in the text that corresponds to at least one candidate entity in the knowledge graph.
  • the noun phrase can be used as a target noun phrase in the training text.
  • the target noun phrase in the text can be obtained based on the knowledge graph.
  • knowledge triples are extracted from the text based on the knowledge graph, that is, knowledge triples of the text.
  • the form of the knowledge triplet of the text can be (noun phrase, relational phrase, noun phrase).
  • the noun phrases in the knowledge triplet are phrases in the text data.
  • the relational phrases in the knowledge triplet are phrases in the text data.
  • the noun phrase in the knowledge triplet corresponds to at least one candidate entity in the knowledge graph.
  • the relational phrases in the knowledge triplet may correspond to at least one entity relationship in the knowledge graph.
  • a noun phrase can include one word or multiple words.
  • a relational phrase can include one word or multiple words.
  • the noun phrase in the knowledge triplet is the target noun phrase.
  • the noun phrases that have corresponding candidate entities in the knowledge graph can be used as the target noun phrases.
  • the relational phrases in the knowledge triplet can be used as the target relational phrases.
  • the relationship phrases that have corresponding entity relationships in the knowledge graph can be used as the target relationship phrases. Extracting knowledge triples in text data can also be understood as identifying noun phrases and relational phrases that constitute knowledge triples in text data.
  • Extracting knowledge triples from text based on knowledge graphs can also be called extracting knowledge from texts based on knowledge graphs.
  • the process of extracting knowledge may be performed by the knowledge extraction module 410 in FIG. 4 .
  • Figure 6 shows a schematic diagram of knowledge extraction.
  • knowledge triples in the text data are identified based on the knowledge graph in the medical field.
  • the text is "Vitamin B 12 creates risk for anemia", that is, "Vitamin B12 increases the risk of anemia”.
  • the knowledge triplet extracted from this text is (Vitamin B12, creates risk for (+), anemia), where the noun phrase For "Vitamin B12" and "anemia", the related phrase is "creates risk for (+)”.
  • Figure 6 only takes the extraction of one knowledge triplet from the text as an example. In practical applications, there may be multiple knowledge triplets identified from the text. This is not the case in the embodiment of the present application. Make limitations.
  • the entity relationships corresponding to the relational phrases in the knowledge triplet in the knowledge graph are not necessarily the entity relationships between the candidate entities corresponding to the noun phrases in the knowledge triplet in the knowledge graph.
  • the knowledge triples extracted from the text are (Vitamin B12, creates risk for (+), anemia).
  • the noun phrases "Vitamin B12” and “anemia” have correspondences in the knowledge map
  • the candidate entity, the relationship phrase is "creates risk for (+)” and there is a corresponding entity relationship in the knowledge graph.
  • the entity relationship between the candidate entities corresponding to "Vitamin B12" and “anemia” in the knowledge map is not necessarily “creates risk for (+)”.
  • Topic nodes may include candidate entities in the knowledge graph corresponding to all target noun phrases in the text data.
  • the topic node may include candidate entities corresponding to the noun phrases in all knowledge triples extracted from the text data.
  • the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
  • a target noun phrase may correspond to one or more candidate entities in the knowledge graph.
  • the target noun phrase “anemia” can correspond to multiple candidate entities such as “anemia(disease)", “anemia(symptom)” or “anemia(plant)”.
  • the topic node can include all candidate entities corresponding to each target noun phrase in the knowledge graph, and the resulting concept map covers all candidate entities related to the text data and the corresponding entity relationships. This can provide comprehensive and complete knowledge related to the text for subsequent processing and avoid incorrect reasoning paths due to missing partial knowledge.
  • the nodes in the initial concept graph also include neighbor nodes, and the neighbor nodes include neighbor entities in the knowledge graph of the candidate entities corresponding to the target noun phrase.
  • the neighbor nodes in the initial concept graph correspond to the neighbor entities of the candidate entities corresponding to the target noun phrase in the knowledge graph. There can be a one-to-one correspondence between neighbor nodes.
  • the neighbor nodes may include k-hop neighbors of the candidate entities corresponding to the target noun phrase in the knowledge graph.
  • k is a positive integer.
  • k may be a positive integer less than or equal to 3.
  • the neighbor nodes may include k-hop neighbors of all candidate entities corresponding to the target noun phrase in the knowledge graph.
  • Neighbor nodes play an important role in knowledge reasoning.
  • the initial concept map in the embodiment of the present application also includes neighbor entities, that is, neighbor nodes, that are connected to the candidate entities (ie, topic nodes) in the knowledge map, which can further provide more comprehensive and complete knowledge and help improve the accuracy of coding at the knowledge level. .
  • the initial concept map may also include entity relationships corresponding to the target relationship phrases in the knowledge map.
  • Figure 7 shows a schematic diagram of an initial concept map.
  • a concept map corresponding to the text data is constructed based on the knowledge map in the medical field and the knowledge triples extracted from the text data.
  • the initial concept map shown in Figure 7 is an initial concept map corresponding to the text shown in Figure 6.
  • the topic nodes "anemia(disease)", “anemia(symptom)” and “anemia(plant)” in Figure 7 are all candidate entities in the knowledge graph corresponding to the target noun phrase "anemia” in the text shown in Figure 6
  • the topic node “Vitamin B 12 (chemical)” in Figure 7 is all candidate entities in the knowledge graph corresponding to the target noun phrase "Vitamin B 12" in the text shown in Figure 6.
  • the neighbor nodes in Figure 7 are one-hop neighbor entities of the above candidate entities in the knowledge graph.
  • "Vitamin B12(chemical)”, “anemia(disease)”, “anemia(symptom)” and “anemia(plant)” are the theme nodes in the concept map.
  • "hemoglobin(biological substance)”, “GI bleeding(biologic function)” and “plant(type)” are used as neighbor nodes in the concept map.
  • the edges between nodes in Figure 7 are used to represent the entity relationships between corresponding entities in the knowledge graph.
  • the correlation between the nodes in the initial concept map and the training text may be determined based on the importance of the nodes.
  • the relevance of a node in the initial concept map to the training text can be the importance of the node.
  • the correlation between the nodes in the initial concept map and the training text can be positively correlated with the importance of the node, that is, the greater the importance of the node, the higher the correlation between the node and the training text.
  • the importance of a node can be determined by a centrality measurement method (centrality measurement), for example, by a webpage ranking (PageRank) algorithm, degree centrality (degree centrality), etc.
  • centrality measurement for example, by a webpage ranking (PageRank) algorithm, degree centrality (degree centrality), etc.
  • PageRank webpage ranking
  • degree centrality degree centrality
  • the importance of a node may be determined by the feature vector centrality of the node.
  • the importance of a node can be the eigenvector centrality of the node.
  • the importance of a node can be positively correlated with the eigenvector centrality of the node.
  • the correlation between each topic node in the initial concept map and the training text can be the feature vector centrality of each topic node.
  • the correlation between each topic node in the initial concept map and the training text can be the feature vector centrality of each topic node on the topic correlation graph.
  • the nodes in the topic related graph are all topic nodes.
  • the feature vector centrality of each topic node is determined based on the initial importance of each topic node and the weight of the edges in the topic correlation graph.
  • the initial importance of a node in the topic related graph is set based on the probability of the node appearing in the facts recorded in the knowledge graph.
  • the edges in the topic related graph are connected based on the entity relationships between candidate entities in the knowledge graph.
  • the weight of an edge in a topic related graph is determined based on the number of entity relationships between the candidate entities corresponding to the two topic nodes connected by the edge in the knowledge graph, that is, the number of edges between candidate entities.
  • node C and node D in the topic related graph are respectively candidate entity C and candidate entity D in the knowledge graph.
  • An edge with weight n is constructed between node C and node D in .
  • n is a positive integer.
  • FIG. 8 shows a topic correlation graph provided by an embodiment of the present application.
  • the nodes in the topic correlation graph are the topic nodes in FIG. 7 .
  • feature vector centrality is used to calculate the correlation between the topic node and the training text, the importance of a node is determined based on the importance of its neighbor nodes, and the initial importance of a node is based on the node's knowledge
  • the probability setting that appears in the facts recorded in the graph can more accurately reflect the importance of the node.
  • the correlation between each neighbor node in the initial concept map and the training text can be the score of the strongly connected branch where each neighbor node on the information propagation graph corresponding to the initial concept map is located.
  • the nodes in the information dissemination graph corresponding to the initial concept map are the nodes in the initial concept map.
  • Figure 9 shows an information propagation diagram provided by an embodiment of the present application.
  • the information propagation diagram shown in Figure 9 is the information propagation diagram corresponding to the initial concept map shown in Figure 7, or in other words, the training text shown in Figure 6 The corresponding information dissemination diagram.
  • Each concept map can correspond to an information dissemination map.
  • the information propagation graph is a directed graph.
  • the edges in a directed graph are directed edges, that is, the edges in a directed graph have directionality.
  • the nodes in the information dissemination diagram corresponding to the concept map are the nodes in the concept map. If and only if node u in the concept graph is a 1-hop neighbor of node v, a directed edge from node v to node u is constructed between node v and node u in the information propagation graph. Node u may be the first node, and node v may be the second node.
  • the score of each strongly connected branch on the information propagation graph may be obtained by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected branch according to topological sorting.
  • the initial score of each strongly connected branch on the information propagation graph may be determined based on the maximum value of the importance of the nodes in each strongly connected branch.
  • the initial score of each strongly connected branch on the information propagation graph may be the maximum value of the importance of the nodes in each strongly connected branch.
  • Topological sorting can be obtained through deep optimization search of strongly connected branches.
  • the topological sorting results can be understood as the depth-first search results of strongly connected branches.
  • "Propagation” can be understood as passing the score of the previous strongly connected branch to the subsequent strongly connected score according to topological sorting, or in other words, updating the score of the downstream strongly connected branch to the score of the upstream strongly connected branch.
  • the topological sorting can be ⁇ C 1 , C 2 , C 3 ⁇ , where C 1 , C 2 , and C 3 respectively represent three strongly connected branches.
  • "Propagation" can be understood as updating the score of C 2 to the score of C 1 , and updating the score of C 3 to the score of C 2 .
  • step 540 RGAT is trained to learn the knowledge encoding expression corresponding to the training text.
  • the training process of RGAT can be carried out according to the following process.
  • the RGAT in the next iteration process is the adjusted RGAT in the current iteration process.
  • Step 1) to step 3) can be regarded as an iterative process.
  • step 1) may be performed by the knowledge encoding module 430 in FIG. 4 .
  • the knowledge encoding of the training text can also be called the encoding of the knowledge level of the training text.
  • Knowledge encoding can be expressed as knowledge embedding vector or knowledge feature vector, etc.
  • the knowledge encoding of the training text may include encoding of nodes in the concept graph and encoding of edges in the concept graph.
  • the knowledge encoding of the training text may include embedding vectors of nodes and embedding vectors of edges in the concept graph.
  • the text encoding of the training text can also be called the text-level encoding of the training text.
  • Text-level encoding refers to low-dimensional space vectors used to express text content and text arrangement sequences in text data.
  • Text encoding can be represented as text embedding vectors or text feature vectors.
  • the text encoding of the training text may include text encoding of the training text sequence and text encoding of the target phrases in the training text.
  • the target phrase may include a target noun phrase and a target relative phrase.
  • the text encoding of the training text sequence can also be called the text encoding of the training text itself.
  • the training text can be processed through a pre-trained language model to obtain the text encoding of the training text.
  • bidirectional encoder representation from transformers (BERT) model, a bidirectional gating recurrent unit (BiGRU) or a bidirectional long short-term memory (bi-directional long short memory).
  • -term memory, BiLSTM Bidirectional long short memory
  • the process of obtaining the text encoding of the training text may be performed by the text encoding module 420 in FIG. 4 .
  • step 2) may be performed by the task processing module 440 in FIG. 4 .
  • the type of prediction result of the training text is related to the type of text processing task, that is, related to the type of downstream task.
  • the method 500 is used for text classification tasks, in which case the prediction result of the training text may be the predicted category of the training text.
  • the text encoding and prediction knowledge encoding of the training text are input into the classifier to obtain the predicted category of the training text.
  • Another example is to fuse the text encoding of the training text and the prediction knowledge encoding, and input the fusion result into the classifier to obtain the prediction category of the training text.
  • the fusion method can be vector splicing of the text encoding of the training text and the prediction knowledge encoding to obtain the text fusion encoding.
  • the classifier can be a softmax function.
  • step 3 the parameters of RGAT can be adjusted with the goal of reducing the gap between the target processing results and prediction results of the training text.
  • the parameters of RGAT can be adjusted with the goal of reducing the gap between the label of the training text and the predicted category of the training text.
  • the label of the training text is the target processing result of the training text.
  • the label of the training text is used to indicate the true value of the category corresponding to the training text, that is, the true category of the training text.
  • the method 500 can also be used for other natural language processing tasks, such as multi-hop reasoning question and answer tasks.
  • the embodiments of the present application do not limit this.
  • the concept map in step 1) can be the initial concept map.
  • the first concept map in the i+1 iteration can be understood as the concept map input to RGAT during the i+1 iteration.
  • RGAT performs forward propagation and back propagation based on the first concept map.
  • the second concept map in the i-th iteration can be understood as the concept map input to RGAT during the i-th iteration.
  • RGAT performs forward propagation and back propagation based on the second concept map.
  • the i-th iteration can be any iteration in the RGAT training process.
  • the concept map can be continuously optimized during the iterative training of RGAT, and the concept map in each iteration process can be different. Taking the above training process as an example, in different iterative processes, the concept map in step 1) can be different.
  • the concept map in each iteration can be based on the correlation between the nodes in the concept map in the previous iteration and the training text. And the weights of the edges in the concept map in the previous iteration are determined.
  • the concept maps in the iterative process are all subgraphs of the initial concept map.
  • the correlation between the nodes in the concept map and the training text during the iterative process is the correlation between the same node in the initial concept map and the training text.
  • the concept map is optimized.
  • the direction of optimization can be understood as pruning nodes and/or edges with small weights that are less relevant to the training text, and retaining those that are more relevant to the training text. Nodes and/or edges with larger weights are retained.
  • the weight of the edge in the concept map can also be called the weight of the facts in the concept map, that is, the attention weight.
  • the first concept map belongs to the set of first subgraphs of the initial concept map, the first cost of the first subgraph is less than or equal to the threshold, and the benefit of the first concept map is is greater than or equal to the benefit of the other first subgraphs in the set of the first subgraph, the first cost of the first subgraph is determined based on the first cost of the edge within the first subgraph, and the benefit of the first concept graph is The income of the first subgraph is determined based on the income of the edge in the first concept graph. The income of the edge is positively correlated with the weight of the edge in the i-th iteration process. , there is a negative correlation between the first consumption of an edge and the correlation between the two nodes connected by the edge and the training text.
  • any subgraph whose first consumption is less than or equal to the threshold can be called the first subgraph.
  • the set of first subgraphs is the set of subgraphs whose first consumption is less than or equal to the threshold.
  • the first concept map is an element in the set. In other words, the first concept map is a first subgraph.
  • the first subgraph with the greatest benefit is used as the first concept map.
  • the first concept map can be obtained.
  • the higher the average correlation between the two nodes connected by an edge and the training text the smaller the first cost of the edge.
  • the first consumption of the subgraph may be determined based on the first consumption of all edges within the subgraph.
  • the first cost of a subgraph can be the sum of the first costs of all edges within the subgraph.
  • the first cost of the subgraph may be the average of the first costs of all edges in the subgraph.
  • the benefit of a subgraph may be determined based on the benefits of all edges within the subgraph.
  • the payoff of a subgraph can be the sum of the payoffs of all edges within the subgraph.
  • the revenue of a subgraph can be the average of the revenue of all edges in the subgraph.
  • the first concept map in the i+1 iteration can be obtained by optimizing the second concept map in the i iteration, and the optimized concept map can be the concept map in the i iteration. subplot.
  • the first concept map may be a subgraph of the second concept map.
  • the second concept map is optimized according to the correlation between the nodes in the second concept map and the training text and the weight of the edges in the second concept map in the i-th iteration to obtain the value in the i+1 iteration. ’s first concept map.
  • the ratio between the benefit of the first side and the first consumption of the first side is less than or equal to the ratio between the benefit of the second side and the first consumption of the second side.
  • the first side belongs to the second concept map, and the first side does not belong to the first concept map, and the second side belongs to the first concept map.
  • the benefit of an edge is positively related to the weight of the edge in the i-th iteration.
  • the first consumption of an edge is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
  • the first consumption of the first concept map is less than or equal to the threshold.
  • the sum of the first cost of the first concept map and the first cost of the first edge is greater than the threshold.
  • the second side belongs to the second concept map, and the second side also belongs to the first concept map.
  • the second side is any side in the first concept map.
  • the first side is any side in the second concept map that does not belong to the first concept map.
  • the ratio between the revenue and the first consumption of any edge that does not belong to the first concept map is less than or equal to the ratio between the revenue and the first consumption of any side of the first concept map.
  • the first consumption of the first concept graph is less than or equal to the threshold, which may be that the sum of the first consumption of all edges in the first concept graph is less than or equal to the threshold.
  • the first consumption of the first concept graph is less than or equal to the threshold, which may be that the average of the first consumption of all edges in the first concept graph is less than or equal to the threshold.
  • the first concept map in the i+1 iteration can be determined through the following steps.
  • S12 Select the edges in the second concept map as the edges of the first concept map in descending order according to the ratio between the edge income and the first consumption, until the sum of the first consumption of the selected edges is greater than threshold.
  • the first consumption of the first concept map is less than or equal to the threshold.
  • the ratio between the revenue and the first consumption of any side of the first concept map is greater than or equal to the ratio between the revenue and the first consumption of the edge in the first concept map that does not belong to the second concept map.
  • the threshold may be a tolerance value for the number of uncertain edges.
  • the threshold W may be determined based on the tolerance ratio ⁇ and the number N of uncertain edges in the initial concept map.
  • Uncertain edges refer to edges with weights less than 1 in the initial concept graph. N is less than or equal to the number of edges in the initial concept graph. N is a positive integer.
  • step 2) can be understood as selecting the edges in the second concept map as the edges of the first concept map in descending order according to the ratio between the edge's income and the first consumption until it has been The sum of the first costs of the selected edges reaches 30.
  • the first consumption of the edge may be determined based on the average correlation between the two nodes connected by the edge and the training text.
  • concept map can also be optimized in other ways to retain the edges with larger weights and nodes with greater correlation with the training text in the concept map to reduce the model's knowledge of small correlation with the text. Focus on strengthening the model's attention to knowledge with high text relevance.
  • the first concept map is determined based on a first subset of a set of connected subgraphs of the initial concept map, and the second consumption of the first subset is less than or equal to the second consumption of the second subset. consumption.
  • the second cost of the first subset is determined based on the second cost of the edges within the connected subgraph of the first subset.
  • the second cost of the second subset is determined based on the second cost of the edges within the connected subgraph of the second subset.
  • the first subset includes at least one candidate entity corresponding to the target noun phrase
  • the second subset includes at least one candidate entity corresponding to the target noun phrase.
  • a subset of the set of connected subgraphs includes at least one candidate entity corresponding to the target noun phrase. It can be understood that at least one candidate entity corresponding to the target noun phrase exists on at least one connected subgraph in the subset. Candidate entities corresponding to different target noun phrases may exist on different connected subgraphs in the subset, or may exist on the same connected subgraph in the subset.
  • the subset that contains at least one candidate entity corresponding to all target noun phrases and has the smallest second consumption will be regarded as the first subset.
  • the second smallest consumption among the subsets of at least one candidate entity corresponding to the phrase is obtained, that is, the first subset is obtained, or in other words, the first concept map is obtained.
  • the first subset includes at least one entity relationship corresponding to the target relationship phrase
  • the second subset includes at least one entity relationship corresponding to the target relationship phrase
  • a subset of the set of connected subgraphs includes at least one entity relationship corresponding to the target relationship phrase. It can be understood that at least one entity relationship corresponding to the target relationship phrase exists on at least one connected subgraph in the subset. Entity relationships corresponding to different target relationship phrases may exist on different connected subgraphs within the subset, or may exist on the same connected subgraph within the subset.
  • the subset that contains at least one entity relationship corresponding to all target relationship phrases and has the smallest second consumption is regarded as the first subset.
  • the second smallest consumption is obtained, that is, the first subset is obtained, or in other words, the first concept map is obtained.
  • the first subset includes at least one candidate entity corresponding to the target noun phrase and at least one entity relationship corresponding to the target relation phrase.
  • the second subset includes at least one candidate entity corresponding to the target noun phrase and at least one entity relationship corresponding to the target relation phrase.
  • the subset that contains at least one candidate entity corresponding to all target noun phrases and at least one entity relationship corresponding to all target relation phrases, and has the smallest second consumption is taken as First subset.
  • the second consumption of the subset is the smallest second consumption among the subsets containing at least one entity relationship corresponding to all target relation phrases and at least one candidate entity corresponding to all target noun phrases, that is, the second consumption is obtained.
  • the higher the average correlation between the two nodes connected by an edge and the training text the smaller the second cost of the edge.
  • the lower the average correlation between the two nodes connected by an edge and the training text the greater the second cost of the edge.
  • the second cost of the first subset may be the sum of the second costs according to all edges within all connected subgraphs within the first subset.
  • the second cost of the second subset may be the sum of the second costs according to all edges within all connected subgraphs within the second subset.
  • the first concept map may be a subgraph of the second concept map.
  • the second cost of the first connected subgraph is greater than or equal to the second cost of the second connected subgraph
  • the first connected subgraph belongs to the second concept graph in the i-th iteration
  • the first connected subgraph does not Belonging to the first concept graph in the i+1 iteration
  • the second connected subgraph is the second most expensive connected subgraph in the first concept graph.
  • the second cost of the first connected subgraph is determined based on the second cost of the edges within the first connected subgraph
  • the second cost of the second connected subgraph is determined based on the second cost of the edges within the second connected subgraph.
  • the second consumption of the edge is negatively correlated with the weight of the edge in the i-th iteration, and the second consumption of the edge is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
  • the nodes of the first concept map include at least one candidate entity corresponding to the target noun phrase
  • the nodes of the second connected subgraph include at least one candidate entity corresponding to the first noun phrase in the target noun phrase
  • other connected subgraphs in the first concept map The nodes of the graph do not include at least one candidate entity corresponding to the first noun phrase.
  • the second connected subgraph belongs to the first concept map, and accordingly, the second connected subgraph belongs to the second concept map.
  • the first connected subgraph is any connected subgraph in the second concept map that does not belong to the first concept map.
  • the second consumption of any connected subgraph in the second concept map that does not belong to the second concept map is greater than or equal to the second consumption of any connected subgraph of the second concept map.
  • the second cost of the connected subgraph may be the sum of the second costs of all edges in the connected subgraph.
  • the second cost of the connected subgraph is the average of the second costs of all edges in the connected subgraph.
  • the concept map in the i+1th iteration can be determined through the following steps.
  • the initial subset is set as an empty set, and the connected subgraphs in the second concept map are selected and added to the subset in order of the second consumption of the connected subgraph from small to large, until the current subset includes the training text. At least one candidate entity corresponding to the target noun phrase.
  • the first concept map can be determined based on the current subset.
  • the second connected subgraph is the last connected subgraph added to the subset.
  • the first connected subgraph may be any connected subgraph that is not added to the second concept graph of the subset.
  • the target RGAT is the trained RGAT.
  • the target RGAT can be used to obtain the feature vector of the graph structure data input to the target RGAT.
  • Input the concept map of the text to be processed into the target RGAT, and the output data can be used as an embedded expression of the concept map of the text to be processed, that is, the encoding of the knowledge level of the text to be processed.
  • the concept graph is optimized according to the correlation between the entity nodes and the text and the weight of the edges.
  • the training text is Pruning nodes with smaller correlations and/or edges with smaller weights will help reduce the model’s attention to knowledge that is less relevant to the text, and increase its focus on knowledge that is more relevant to the text, thus Improving the expressive ability of RGAT will help improve the accuracy of downstream tasks.
  • the nodes in the initial concept map can include all candidate entities corresponding to the target noun phrase. This is beneficial to ensuring that no text-related knowledge is omitted in the concept map, and ensures that The completeness of knowledge, using the solution of the embodiment of the present application to learn the knowledge-level expression of the text, further ensures the accuracy of downstream tasks and avoids incorrect reasoning paths due to missing part of the knowledge.
  • the nodes in the initial concept map may also include k-hop neighbor entities of each candidate entity, which can further improve the integrity of the knowledge in the concept map.
  • FIG. 10 shows a training method 800 for a text processing model provided by an embodiment of the present application.
  • the text processing model may be the text processing model in FIG. 4 .
  • Method 800 can be regarded as a specific implementation of method 500. For simplicity of description, part of the description is appropriately omitted when describing the method 800.
  • Method 800 includes steps 810 to 850.
  • step 810 may be performed by the knowledge extraction module 410 in FIG. 4 .
  • the knowledge graph is a knowledge graph of a business domain related to text data.
  • method 800 will be described below by taking the business field as the medical field as an example, which does not limit the solutions of the embodiments of the present application.
  • step 810 may include: identifying the knowledge triplet Td in the training text d based on the knowledge graph to obtain the noun phrase Md and the relational phrase Pd in the knowledge triplet, that is, in the training text
  • the target noun phrase and the target relational phrase are shown in Figure 6.
  • step 820 may be performed by text encoding module 420 in FIG. 4 .
  • the text-level coding of the training text may include the text-level coding of each noun phrase M d in the training text d, the text-level coding of each relational phrase P d , and the text-level coding of the training text sequence.
  • step 830 may be performed by the knowledge encoding module 430 in FIG. 4 .
  • the text-level coding of the training text can be used as an input to RGAT and participate in the process of generating knowledge-level predictive coding.
  • Prediction results are obtained based on text-level coding and knowledge-level predictive coding.
  • step 840 may be performed by the task processing module 440 in FIG. 4 .
  • Step 840 may include vector splicing text-level coding and knowledge-level predictive coding to obtain predictive text fusion coding. Input the predictive text fusion code into the classifier to obtain the predictive classification result of the training text.
  • an initial concept map of the training text may be constructed, and the correlation between the nodes in the initial concept map and the training text may be calculated.
  • all candidate entities corresponding to each noun phrase M d in the training text are located from the knowledge graph, and k-hop neighbor entities of each candidate entity are located as nodes of the initial concept graph.
  • the nodes in the initial concept map are connected according to the entity relationship between each pair of entities recorded in the knowledge map to obtain the initial concept map.
  • This initial concept map contains complete knowledge.
  • the following steps can be used to construct an initial concept map and calculate the correlation between the nodes in the initial concept map and the training text.
  • the topic nodes are used as nodes in the topic related graph, and the corresponding topic nodes on the topic related graph are connected based on the entity relationships between candidate entities recorded in the knowledge graph to obtain edges in the topic related graph. Specifically, for a pair of nodes in the topic correlation graph, if there are n edges between the candidate entities corresponding to the pair of nodes in the knowledge graph, then the weight of the edge between the pair of topic nodes on the topic correlation graph is n. .
  • the initial relevance of each topic node is set based on the probability of each topic node appearing in the facts recorded in the knowledge graph.
  • the feature vector centrality of each topic node is calculated on the topic correlation graph to obtain the importance of each topic node.
  • the importance of each topic node is taken as the correlation between the topic node and the training text.
  • For each topic node locate the k-hop neighbor entities of the topic node in the knowledge graph as neighbor nodes.
  • V represents the set of nodes on the information propagation graph.
  • E represents the set of edges on the information propagation graph.
  • the initial score of the strongly connected branch where the topic node is located is propagated to the downstream strongly connected branches along the topological sorting result, thereby updating the score of each strongly connected branch.
  • the updated score of the strongly connected branch where each neighbor node is located is used as the correlation between the neighbor node and the training text.
  • the topic nodes and neighbor nodes are used as nodes in the initial concept map, and the corresponding nodes are connected according to the entity relationships recorded in the knowledge map to obtain the initial concept map, as shown in Figure 7.
  • the concept map is optimized.
  • Example 1 The following is an illustrative explanation of the optimization method of the concept map during the iterative process through Example 1 or Example 2.
  • the solution to this optimization problem is the optimized concept map.
  • the edge with the largest ratio between the revenue and the first consumption in the current concept map is selected first, and stops until the sum of the first consumption of the selected edges reaches the threshold.
  • the solution of the optimization plan can be obtained under the linear time complexity of O(
  • the optimized concept map is the concept map in the next iteration.
  • the optimization problem to be solved select a subset of the set of connected subgraphs of the initial concept graph so that for each noun phrase M d or relational phrase P d in the training text, there is at least one corresponding node or edge included in In this subset, the first cost of the connected subgraph in this subset is the smallest among all subset selection methods.
  • the solution to this optimization problem is the optimized concept map.
  • the second cost of the connected subgraph is the average of the second costs of all edges within the connected subgraph.
  • the initial state of the subset is the state of the empty set, and the connected subgraph with the second smallest consumption in the current concept map is added to the subset first, until every noun phrase M d or relational phrase P in the training text d , there is at least one corresponding node or edge included in the subset.
  • ) linear time complexity that is, an optimized concept map.
  • the optimized concept map is the concept map in the next iteration.
  • Figure 11 shows a schematic flowchart of a text processing method 900 provided by an embodiment of the present application.
  • the method can be executed by a device or device capable of text processing.
  • the device can be a cloud service device or a terminal.
  • Equipment for example, computers, servers and other devices with sufficient computing power to perform text processing methods, may also be a system composed of cloud service equipment and terminal equipment.
  • the method 900 may be executed by the execution device 210 in FIG. 2 or the execution device 110 in FIG. 1 or a local device.
  • the method 900 may be specifically executed by the execution device 210 as shown in FIG. 2 , and the text to be processed in the method 900 may be input data provided by the client device 240 as shown in FIG. 2 .
  • the model used in the text processing method 900 in Figure 11 can be constructed by the method in Figure 5 or Figure 10 described above. Relevant descriptions may refer to the aforementioned method 500 or method 800. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing method 900 below.
  • the method 900 includes steps 910 to 960, which are described below.
  • the target RGAT is obtained by inputting the initial concept map of the training text into the RGAT for training.
  • the first concept map in the i+1 iteration is based on the second concept map in the i iteration.
  • the correlation between the nodes in and the training text and the weight of the edges in the second concept graph are determined, i is a positive integer.
  • the first concept map is a subgraph of the initial concept map
  • the second concept map is a subgraph of the initial concept map.
  • the nodes in the initial concept graph include topic nodes, where the topic nodes include candidate entities in the knowledge graph corresponding to the target noun phrases in the training text, and the edges between the nodes in the initial concept graph are used to represent the initial concept graph. Entity relationships between nodes in .
  • the topic nodes in the concept map of the text to be processed may include candidate entities in the knowledge graph corresponding to the target noun phrase in the text to be processed, and edges between nodes in the concept map of the text to be processed Used to represent the entity relationships between nodes in the concept map.
  • the concept graph is optimized according to the correlation between the entity nodes and the text and the weight of the edges.
  • the training text is Pruning nodes with smaller correlations and/or edges with smaller weights will help reduce the model’s attention to knowledge that is less relevant to the text, and increase its focus on knowledge that is more relevant to the text, thus Improving the expressive ability of RGAT will help improve the accuracy of downstream tasks.
  • the initial concept graph also includes neighbor nodes, and the neighbor nodes include neighbor entities in the knowledge graph of the candidate entities corresponding to the target noun phrases in the training text.
  • the neighbor nodes in the concept map of the text to be processed may include neighbor entities in the knowledge map of the candidate entities corresponding to the target noun phrases in the text to be processed.
  • the topic nodes in the initial concept graph include all candidate entities in the knowledge graph corresponding to the target noun phrase in the training text.
  • the topic nodes in the concept map of the text to be processed may include all candidate entities in the knowledge map that correspond to the target noun phrases in the text to be processed.
  • the first concept graph in the i+1 iteration is determined based on the correlation between the nodes in the second concept graph in the i-th iteration and the training text and the weight of the edges in the second concept graph, It includes: selecting the edge in the second concept map as the edge in the first concept map in descending order of the ratio between the income of the edge in the second concept map and the first consumption, until the edge of the selected edge is The sum of the first consumption is greater than the threshold.
  • the income of the edge in the second concept graph is positively correlated with the weight of the edge in the i-th iteration.
  • the first consumption of the edge in the second concept graph is related to the two nodes connected by the edge. There is a negative correlation with the relevance of the training text.
  • the first concept graph in the i+1 iteration is determined based on the correlation between the nodes in the second concept graph in the i-th iteration and the training text and the weight of the edges in the second concept graph, Including: selecting the connected subgraph in the second concept map as the connected subgraph of the first concept map in order from small to large in the second cost of the connected subgraph in the second concept map, until all connected subgraphs are selected Include at least one candidate entity corresponding to the target noun phrase.
  • the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic related graph.
  • the nodes in the topic related graph include topic nodes, and the weight of the edge in the topic related graph is based on the edge.
  • the number of entity relationships between the corresponding entities in the knowledge graph between the two connected nodes is determined.
  • the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph.
  • the nodes in the information propagation graph include the nodes in the initial concept map, and the nodes in the initial concept map are When a node is a one-hop neighbor of a second node, there is a directed edge from the second node to the first node between the second node and the first node in the information propagation graph.
  • the score of the strongly connected branch on the information propagation graph is obtained by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected branch according to topological sorting.
  • the initial score of the strongly connected branch where the topic node is located is It is determined based on the maximum importance of the nodes in the strongly connected branch where the topic node is located.
  • the method 900 further includes: outputting a knowledge path (knowledge path) in the concept map of the text to be processed based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed, and the knowledge path is used to indicate the processing result. basis for judgment.
  • a knowledge path knowledge path
  • a knowledge path refers to the path between two nodes in the concept map.
  • the k-hop knowledge path between node e q and node e q+k can be expressed as (e q ,r q ,e q+1 ,r q+1 ,...,r q+k-1 ,e q+ k ), (e q ,r q ,e q+1 ) is a triplet, r q represents the entity relationship between the two nodes, and so on.
  • q is a positive integer.
  • the knowledge path can improve the interpretability of the model, provide users with a basis for judging processing results, and help improve users' trust.
  • the concept map in the solution of the embodiment of the present application has comprehensive and accurate knowledge, which is conducive to ensuring the integrity and accuracy of the knowledge path.
  • the weight of the knowledge path is determined based on the attention weight of the triplet in the knowledge path.
  • the attention weight of a triplet is used to indicate the importance of the triplet in the reasoning process of RGAT.
  • the attention weight of the knowledge path is used to determine the importance of the knowledge path.
  • the weight of the knowledge path may be the average of the attention weights of the triples in the knowledge path.
  • represents the set of all triples in the knowledge graph.
  • represents the activation function.
  • the activation function can be a binary step function (binary step function), a linear activation function (liner activation function), a Sigmoid function, a rectified linear unit (ReLU) or a leaky ReLU. (LeakyReLU) etc.
  • LeakyReLU can be used as the activation function.
  • the attention weight of the triplet (j, r, i) is R' represents a set of relations. l is an integer greater than or equal to 0.
  • Figure 12 shows a schematic flow chart of a text classification method according to an embodiment of the present application.
  • the method shown in Figure 12 can be regarded as a specific implementation of the method shown in Figure 11. For simplicity of description, part of the description is appropriately omitted when describing the method 1000.
  • Method 1000 includes steps 1010 to 1040.
  • step 1010 may be performed by the knowledge extraction module 410 in FIG. 4 .
  • the knowledge graph is a knowledge graph of a business domain related to text data.
  • method 1010 will be described below by taking the business field as the medical field as an example, which does not limit the solutions of the embodiments of the present application.
  • step 1010 may include: identifying the knowledge triplet T d in the text d to be processed based on the knowledge graph to obtain the noun phrase M d and the relational phrase P d in the knowledge triplet, that is, to be processed The target noun phrase and the target relative phrase in the text.
  • step 1020 may be performed by text encoding module 420 in FIG. 4 .
  • the text-level coding of the text to be processed may include the text-level coding of each noun phrase M d in the text d to be processed, the text-level coding of each relational phrase P d , and the text-level coding of the text sequence to be processed. encoding.
  • the text-level encoding of the text to be processed can participate in the process of the target RGAT generating the knowledge-level encoding of the text to be processed.
  • the target RGAT used in Figure 10 can be trained by the method 800 shown in Figure 10.
  • the specific training method please refer to the description of the method 800, which will not be described again here.
  • step 1030 may be performed by the knowledge encoding module 430 in FIG. 4 .
  • all candidate entities corresponding to each noun phrase M d in the text to be processed are located from the knowledge graph, and k-hop neighbor entities of each candidate entity are located as nodes of the concept map of the text to be processed.
  • the nodes in the concept map of the text to be processed are connected according to the entity relationship between each pair of entities recorded in the knowledge map to obtain the concept map of the text to be processed.
  • the concept map of the text to be processed contains complete knowledge.
  • the specific construction method of the concept map of the text to be processed can refer to the construction method of the initial concept map of the training text in the previous article. As long as the training text in the relevant description is replaced with the text to be processed, it will not be described again here.
  • step 1040 may be performed by the task processing module 440 in FIG. 4 .
  • Step 1040 may include vector splicing text-level coding and knowledge-level coding to obtain text fusion coding. Input the text fusion code into the classifier to obtain the classification result of the text to be processed.
  • Figure 13 shows a schematic diagram of a text classification result according to an embodiment of the present application.
  • the main text of the text is "...diabetes has become an epidemic, and the number of patients with type 2 diabetes is increasing at an alarming rate. We know that controlling diet and Western lifestyle can lead to type 2 diabetes and cardiovascular disease.". Through the solutions of the embodiments of this application, it is determined that the text contains false information.
  • the primary basis for judgment is the knowledge path obtained from the concept map ('diet', 'reducesRiskFor', 'atherosclerosis', 'causes', 'cardiovascular diseases'), namely (control diet, reduce risk (-), arteriosclerosis, Causes (+), cardiovascular disease), with a weight of 0.99998.
  • the secondary judgment basis is the knowledge path ('diet', 'alleviates', 'diabetes') obtained from the concept map, that is (controlling diet, relieving (-), diabetes), with a weight of 0.57651.
  • These two knowledge paths are contrary to the text semantics of "controlling diet can lead to type 2 diabetes and cardiovascular disease", and the text is judged to contain wrong information.
  • the solution of the embodiment of the present application can improve the accuracy of the classification task and at the same time generate a weighted knowledge path as an interpretable classification basis.
  • Table 1 shows the two data of diabetes and cancer through the scheme of the embodiment of the present application and the existing knowledge guided graph attention network for detecting healthcare misinformation (DETERRENT). Comparative results of performance indicators for text classification on the set. Table 1 shows four performance indicators, namely: accuracy, precision, recall and F1 score.
  • the optimization scheme of the concept map used in the training process of the RGAT model used in the scheme of the embodiment of the present application in Table 1 is the scheme in Example 1.
  • the solution of the embodiment of the present application improves the above-mentioned indicators by 1-5 percentage points compared with the existing solution.
  • the solutions of the embodiments of the present application can effectively improve the accuracy of classification results.
  • Table 2 shows the comparison results of the performance indicators of text classification on the two data sets of diabetes and cancer through the scheme of the embodiment of the present application and the existing DETERRENT.
  • the optimization scheme of the concept map used in the training process of the RGAT model used in the scheme of the embodiment of the present application in Table 2 is the scheme in Example 2.
  • the solution of the embodiment of the present application improves the above-mentioned indicators by 1-58 percentage points compared with the existing solution.
  • the solutions of the embodiments of the present application can effectively improve the accuracy of classification results.
  • FIG. 14 is a schematic block diagram of a training device according to an embodiment of the present application.
  • the training device 3000 shown in FIG. 14 includes an acquisition unit 3010 and a processing unit 3020.
  • the training device can be used to perform the method 500 or the method 800 in the embodiment of the present application.
  • the acquisition unit 3010 can perform the above steps 510 and 520.
  • the processing unit 3020 may perform the above steps 530 to 540. It should be noted that the obtaining unit used to perform step 510 and the obtaining unit used to perform step 520 may be the same or different.
  • FIG. 15 is a schematic block diagram of a text processing device according to an embodiment of the present application.
  • the device 4000 shown in FIG. 15 includes an acquisition unit 4010 and a processing unit 4020.
  • the device 4000 may be used to perform the method 900 in the embodiment of the present application.
  • the acquisition unit 4010 can perform the above steps 910 and 920.
  • the processing unit 4020 may perform the above steps 930 to 960. It should be noted that the obtaining unit used to perform step 910 and the obtaining unit used to perform step 920 may be the same or different.
  • training device 3000 and device 4000 are embodied in the form of functional units.
  • unit here can be implemented in the form of software and/or hardware, and is not specifically limited.
  • a "unit” may be a software program, a hardware circuit, or a combination of both that implements the above functions.
  • the hardware circuit may include an application specific integrated circuit (ASIC), an electronic circuit, a processor (such as a shared processor, a dedicated processor, or a group processor) for executing one or more software or firmware programs. etc.) and memory, merged logic circuitry, and/or other suitable components to support the described functionality.
  • ASIC application specific integrated circuit
  • processor such as a shared processor, a dedicated processor, or a group processor for executing one or more software or firmware programs. etc.
  • memory merged logic circuitry, and/or other suitable components to support the described functionality.
  • the units of each example described in the embodiments of the present application can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
  • FIG 16 is a schematic diagram of the hardware structure of a training device provided by an embodiment of the present application.
  • the training device 5000 shown in Figure 16 includes a memory 5001, a processor 5002, a communication interface 5003 and a bus 5004.
  • the memory 5001, the processor 5002, and the communication interface 5003 implement communication connections between each other through the bus 5004.
  • the memory 5001 may be a read only memory (ROM), a static storage device, a dynamic storage device or a random access memory (RAM).
  • the memory 5001 can store programs. When the program stored in the memory 5001 is executed by the processor 5002, the processor 5002 is used to execute various steps of the training method according to the embodiment of the present application. For example, the processor 5002 may execute the method 500 or the method 800 above.
  • the processor 5002 may be a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more
  • the integrated circuit is used to execute relevant programs to implement the training method of the method embodiment of the present application.
  • the processor 5002 may also be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the training method of the present application can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 5002.
  • the above-mentioned processor 5002 can also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 5001, and the processor 5002 reads the information in the memory 5001, and combines its hardware to complete the functions required to be performed by the units included in the device shown in Figure 14, or to execute the method 500 of the method embodiment of the present application or Method 800.
  • the communication interface 5003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 5000 and other devices or communication networks. For example, the training text and knowledge graph can be obtained through the communication interface 5003.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 5000 and other devices or communication networks.
  • the training text and knowledge graph can be obtained through the communication interface 5003.
  • Bus 5004 may include a path that carries information between various components of device 5000 (eg, memory 5001, processor 5002, communication interface 5003).
  • FIG. 17 is a schematic diagram of the hardware structure of a text processing device provided by an embodiment of the present application.
  • the device 6000 shown in Figure 17 includes a memory 6001, a processor 6002, a communication interface 6003 and a bus 6004.
  • the memory 6001, the processor 6002, and the communication interface 6003 implement communication connections between each other through the bus 6004.
  • Memory 6001 may be ROM, static storage device, dynamic storage device or RAM.
  • the memory 6001 can store programs. When the program stored in the memory 6001 is executed by the processor 6002, the processor 6002 is used to execute various steps of the text processing method according to the embodiment of the present application. For example, the processor 6002 can execute the method 900 above.
  • the processor 6002 can use a general-purpose CPU, microprocessor, ASIC, GPU or one or more integrated circuits to execute relevant programs to implement the text processing method of the method embodiment of the present application.
  • the processor 6002 may also be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the text processing method of the present application can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 6002.
  • the above-mentioned processor 6002 can also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 6001.
  • the processor 6002 reads the information in the memory 6001, and combines its hardware to complete the functions required to be performed by the units included in the device shown in Figure 15, or to execute the method 900 of the method embodiment of the present application.
  • the communication interface 6003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 6000 and other devices or communication networks. For example, the text to be processed and the knowledge graph can be obtained through the communication interface 6003.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 6000 and other devices or communication networks. For example, the text to be processed and the knowledge graph can be obtained through the communication interface 6003.
  • Bus 6004 may include a path that carries information between various components of device 6000 (eg, memory 6001, processor 6002, communication interface 6003).
  • Embodiments of the present application also provide a computer-readable medium that stores program code for device execution.
  • the program code includes a method for executing the training method or text processing of the text processing model in the embodiment of the present application. any of the methods.
  • Embodiments of the present application also provide a computer program product containing instructions.
  • the computer program product When the computer program product is run on a computer, it causes the computer to execute any of the text processing model training methods or text processing methods in the embodiments of the present application. item.
  • An embodiment of the present application also provides a chip.
  • the chip includes a processor and a data interface.
  • the processor reads instructions stored in the memory through the data interface and executes the text processing model training method or text processing in the embodiment of the present application. any of the methods.
  • the chip may also include a memory, in which instructions are stored, and the processor is used to execute the instructions stored in the memory.
  • the processor is used to execute the present application. Any of the text processing model training methods or text processing methods in the embodiment.
  • the memory may include read-only memory and random access memory, and provide instructions and data to the processor.
  • Part of the processor may also include non-volatile random access memory.
  • the processor may also store information about the device type.
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its functions and internal logic, and should not be used in the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • UFD Universal Serial Bus flash disk
  • ROM read-only memory
  • RAM random access memory
  • disk or optical disk and other media that can store program code.

Abstract

The present application provides a training method for a text processing model, and a text processing method and device. The training method for a text processing model comprises: determining an initial conceptual graph of training text on the basis of a knowledge graph; and inputting the initial conceptual graph into a relation-aware graph attention network (RGAT) model for training to obtain a target RGAT model, wherein during training, a first conceptual graph in an (i+1)-th iteration is determined according to the correlation between nodes in a second conceptual graph in an i-th iteration and the training text, as well as the weights of edges in the second conceptual graph, the first conceptual graph is a sub-graph of the initial conceptual graph, and the second conceptual graph is a sub-graph of the initial conceptual graph. According to the solution of the present application, the training effect of the RGAT model can be improved, and more accurate coding at a knowledge level can be learned, so that the accuracy of a downstream text processing task is improved.

Description

文本处理模型的训练方法、文本处理的方法及装置Text processing model training method, text processing method and device 技术领域Technical field
本申请涉及人工智能领域,特别涉及一种文本处理模型的训练方法、文本处理的方法及装置。The present application relates to the field of artificial intelligence, and in particular to a training method for a text processing model, a text processing method and a device.
背景技术Background technique
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that can respond in a manner similar to human intelligence. Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
自然语言处理(natural language processing,NLP)是人工智能领域的一个重要的研究方向。自然语言处理任务通常是基于自然语言的文本本身执行处理过程,自然语言的文本本身所包含的特征相对有限,文本处理效果可能无法满足预期。一些方案利用知识图谱作为文本处理的辅助信息,但这些方案在文本处理过程中可能引入与文本内容完全不相关的知识图谱信息,从而影响文本处理的效果。Natural language processing (NLP) is an important research direction in the field of artificial intelligence. Natural language processing tasks are usually performed based on the natural language text itself. The natural language text itself contains relatively limited features, and the text processing effect may not meet expectations. Some solutions use knowledge graphs as auxiliary information for text processing, but these solutions may introduce knowledge graph information that is completely irrelevant to the text content during the text processing process, thus affecting the effect of text processing.
发明内容Contents of the invention
本申请提供一种文本处理模型的训练方法、文本处理的方法及装置,能够提高文本处理的效果。This application provides a training method for a text processing model, a text processing method and a device, which can improve the effect of text processing.
第一方面,提供了一种文本处理模型的训练方法,包括:获取训练文本;获取知识图谱;基于知识图谱确定训练文本的初始概念图谱,其中,初始概念图谱中的节点包括主题节点,主题节点包括知识图谱中与训练文本中的目标名词词组对应的候选实体,初始概念图谱中的节点之间的边用于表示初始概念图谱中的节点之间的实体关系;将初始概念图谱输入至关系感知注意力网络RGAT模型进行训练以得到目标RGAT模型,在训练过程中,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与训练文本的相关度以及第二概念图谱中的边的权重确定的,i为正整数,第一概念图谱为初始概念图谱的子图,第二概念图谱为初始概念图谱的子图。In the first aspect, a training method for a text processing model is provided, including: obtaining training text; obtaining a knowledge graph; and determining an initial concept graph of the training text based on the knowledge graph, where the nodes in the initial concept graph include topic nodes. Including candidate entities in the knowledge graph corresponding to the target noun phrase in the training text. The edges between the nodes in the initial concept graph are used to represent the entity relationships between the nodes in the initial concept graph; the initial concept graph is input to the relationship awareness The attention network RGAT model is trained to obtain the target RGAT model. During the training process, the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i iteration and the training text. The degree and the weight of the edge in the second concept map are determined, i is a positive integer, the first concept map is a subgraph of the initial concept map, and the second concept map is a subgraph of the initial concept map.
根据本申请实施例的方案,在RGAT模型的迭代训练过程中,可以根据概念图谱中的节点与训练文本的相关度,以及概念图谱中的边的重要度对概念图谱进行优化,并将优化后的概念图谱作为下一次迭代所使用的概念图谱。这样,有利于消减RGAT模型对文本相关度较小的知识的关注,加强模型对文本相关度较大的知识的关注,进而提高RGAT模型的训练效果,学习到更准确的知识层面的编码,从而提高下游文本处理任务的准确性。According to the solution of the embodiment of the present application, during the iterative training process of the RGAT model, the concept map can be optimized according to the correlation between the nodes in the concept map and the training text, and the importance of the edges in the concept map, and the optimized The concept map is used as the concept map used in the next iteration. In this way, it is helpful to reduce the RGAT model's focus on knowledge with low text relevance and strengthen the model's focus on knowledge with greater text relevance, thereby improving the training effect of the RGAT model and learning more accurate knowledge-level encoding, thereby Improve the accuracy of downstream text processing tasks.
示例性地,知识图谱可以为训练文本所属专业领域的知识图谱。For example, the knowledge graph can be a knowledge graph in the professional field to which the training text belongs.
目标名词词组指的是文本中在知识图谱中对应至少一个候选实体的名词词组。换言 之,若训练文本中的一个名词词组对应于知识图谱中的至少一个候选实体,则该名词词组即可作为训练文本中的一个目标名词词组。The target noun phrase refers to the noun phrase in the text that corresponds to at least one candidate entity in the knowledge graph. In other words, if a noun phrase in the training text corresponds to at least one candidate entity in the knowledge graph, then the noun phrase can be used as a target noun phrase in the training text.
迭代过程中的概念图谱均为初始概念图谱的子图。迭代过程中的概念图谱中的节点与训练文本的相关度即为初始概念图谱中的同一节点与训练文本的相关度。The concept maps in the iterative process are all subgraphs of the initial concept map. The correlation between the nodes in the concept map and the training text during the iterative process is the correlation between the same node in the initial concept map and the training text.
示例性地,初始概念图谱中的节点与训练文本的相关度可以是根据节点的重要度确定的。节点的重要度可以是由节点的特征向量中心性确定的。For example, the correlation between the nodes in the initial concept map and the training text may be determined based on the importance of the nodes. The importance of a node can be determined by the eigenvector centrality of the node.
结合第一方面,在第一方面的某些实现方式中,主题节点包括知识图谱中与目标名词词组对应的全部候选实体。Combined with the first aspect, in some implementations of the first aspect, the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
根据本申请实施例的方案,主题节点可以包括知识图谱中与每个目标名词词组对应的所有候选实体,由此得到的概念图谱中涵盖了与文本数据相关的全部候选实体以及对应的实体关系。这样能够为后续的处理过程提供与文本相关的全面完整的知识,利用本申请实施例的方案学习文本的知识层面的表达,进一步保证了下游任务的准确性,避免由于遗漏部分知识而导致不正确的推理路径。According to the solution of the embodiment of the present application, the topic node may include all candidate entities corresponding to each target noun phrase in the knowledge map, and the resulting concept map covers all candidate entities related to the text data and the corresponding entity relationships. This can provide comprehensive and complete knowledge related to the text for subsequent processing. Using the solution of the embodiment of the present application to learn the expression of the knowledge level of the text further ensures the accuracy of downstream tasks and avoids errors due to missing part of the knowledge. reasoning path.
结合第一方面,在第一方面的某些实现方式中,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与训练文本的相关度以及第二概念图谱中的边的权重确定的,包括:按照第二概念图谱中的边的收益与第一消耗之间的比值从大到小的顺序,选取第二概念图谱中的边作为第一概念图谱中的边,直至被选取的边的第一消耗的总和大于阈值,第二概念图谱中的边的收益与第i次迭代中的边的权重呈正相关关系,第二概念图谱中的边的第一消耗与边所连接的两个节点与训练文本的相关度呈负相关关系。Combined with the first aspect, in some implementations of the first aspect, the first concept map in the i+1th iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and The weight of the edge in the second concept map is determined, including: selecting the edge in the second concept map as the first according to the order of the ratio between the income of the edge in the second concept map and the first consumption in descending order. For the edges in the concept map, until the sum of the first consumption of the selected edges is greater than the threshold, the income of the edges in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map The first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
结合第一方面,在第一方面的某些实现方式中,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与训练文本的相关度以及第二概念图谱中的边的权重确定的,包括:按照第二概念图谱中的连通子图的第二消耗从小到大的顺序,选取第二概念图谱中的连通子图作为第一概念图谱的连通子图,直至被选取的全部连通子图包括目标名词词组对应的至少一个候选实体。Combined with the first aspect, in some implementations of the first aspect, the first concept map in the i+1th iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and The weight of the edge in the second concept map is determined, including: selecting the connected subgraph in the second concept map as the first concept map in order of the second cost of the connected subgraph in the second concept map from small to large. Connected subgraphs until all selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.
结合第一方面,在第一方面的某些实现方式中,主题节点与训练文本的相关度是根据主题节点在主题相关图上的特征向量中心性确定的,主题相关图中的节点包括主题节点,主题相关图中的边的权重是根据边所连接的两个节点在知识图谱中对应的实体之间的实体关系的数量确定的。Combined with the first aspect, in some implementations of the first aspect, the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic correlation graph, and the nodes in the topic correlation graph include topic nodes ,The weight of the edge in the topic related graph is ,determined based on the number of entity relationships between the ,corresponding entities in the knowledge graph between the two ,nodes connected by the edge.
主题节点的特征向量中心性是根据主题节点的初始重要度和主题相关图中的边的权重确定的。主题相关图中的节点的初始重要度是根据该节点在知识图谱记录的事实中出现的概率设置的。The eigenvector centrality of a topic node is determined based on the initial importance of the topic node and the weight of the edges in the topic correlation graph. The initial importance of a node in the topic related graph is set based on the probability of the node appearing in the facts recorded in the knowledge graph.
结合第一方面,在第一方面的某些实现方式中,初始概念图谱还包括邻居节点,邻居节点包括目标名词词组对应的候选实体在知识图谱中的邻居实体。Combined with the first aspect, in some implementations of the first aspect, the initial concept graph further includes neighbor nodes, and the neighbor nodes include neighbor entities in the knowledge graph of the candidate entities corresponding to the target noun phrase.
本申请实施例的初始概念图谱中还包括知识图谱内与候选实体(即主题节点)相连的邻居实体,即邻居节点,能够进一步提供更加全面完整的知识,有利于提高知识层面的编码的准确性。The initial concept map in the embodiment of the present application also includes neighbor entities, that is, neighbor nodes, that are connected to the candidate entities (ie, topic nodes) in the knowledge map, which can further provide more comprehensive and complete knowledge and help improve the accuracy of coding at the knowledge level. .
结合第一方面,在第一方面的某些实现方式中,邻居节点与训练文本的相关度是根据信息传播图上邻居节点所在的强连通分支的分数确定的,信息传播图中的节点包括初始概念图谱中的节点,在初始概念图谱中的第一节点是第二节点的一跳邻居的情况下,信息传 播图中的第二节点和第一节点之间存在由第二节点指向第一节点的有向边。Combined with the first aspect, in some implementations of the first aspect, the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph, and the nodes in the information propagation graph include the initial For nodes in the concept map, when the first node in the initial concept map is a one-hop neighbor of the second node, there is a link between the second node and the first node in the information propagation graph from the second node to the first node. ’s directed edge.
结合第一方面,在第一方面的某些实现方式中,信息传播图上的强连通分支的分数是通过将主题节点所在的强连通分支的初始分数根据拓扑排序传播到下游的强连通分支后得到的,主题节点所在的强连通分支的初始分数是根据主题节点所在的强连通分支中的节点的重要度的最大值确定的。Combined with the first aspect, in some implementations of the first aspect, the score of the strongly connected branch on the information propagation graph is by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected branch according to the topological sorting. Obtained, the initial score of the strongly connected branch where the topic node is located is determined based on the maximum importance of the nodes in the strongly connected branch where the topic node is located.
第二方面,提供了一种文本处理的方法,该方法包括:获取待处理的文本;获取知识图谱;确定待处理的文本的文本编码;基于知识图谱确定待处理的文本的概念图谱;通过目标RGAT对待处理的文本的概念图谱进行处理,以得到待处理的文本的知识编码,其中,目标RGAT是通过训练文本的初始概念图谱输入至RGAT中进行训练得到的,在训练过程中,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与训练文本的相关度以及第二概念图谱中的边的权重确定的,i为正整数,第一概念图谱为初始概念图谱的子图,第二概念图谱为初始概念图谱的子图,初始概念图谱中的节点包括主题节点,主题节点包括知识图谱中与训练文本中的目标名词词组对应的候选实体,初始概念图谱中的节点之间的边用于表示初始概念图谱中的节点之间的实体关系;基于待处理的文本的文本编码和待处理的文本的知识编码确定待处理的文本的处理结果。In the second aspect, a text processing method is provided, which method includes: obtaining the text to be processed; obtaining the knowledge graph; determining the text encoding of the text to be processed; determining the concept map of the text to be processed based on the knowledge graph; through the target RGAT processes the concept map of the text to be processed to obtain the knowledge encoding of the text to be processed. The target RGAT is obtained by inputting the initial concept map of the training text into RGAT for training. During the training process, the i+th The first concept map in one iteration is determined based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and the weight of the edges in the second concept map. i is a positive integer, and the first The concept map is a subgraph of the initial concept map, and the second concept map is a subgraph of the initial concept map. The nodes in the initial concept map include topic nodes, and the topic nodes include candidate entities in the knowledge map corresponding to the target noun phrases in the training text. , the edges between nodes in the initial concept map are used to represent the entity relationships between nodes in the initial concept map; the processing results of the text to be processed are determined based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed. .
根据本申请实施例的方案,在RGAT模型的迭代训练过程中,可以根据概念图谱中的节点与训练文本的相关度,以及概念图谱中的边的重要度对概念图谱进行优化,并将优化后的概念图谱作为下一次迭代所使用的概念图谱。这样,有利于消减RGAT模型对文本相关度较小的知识的关注,加强模型对文本相关度较大的知识的关注,进而提高RGAT模型的训练效果,学习到更准确的知识层面的编码,从而提高处理结果的准确性。According to the solution of the embodiment of the present application, during the iterative training process of the RGAT model, the concept map can be optimized according to the correlation between the nodes in the concept map and the training text, and the importance of the edges in the concept map, and the optimized The concept map is used as the concept map used in the next iteration. In this way, it is helpful to reduce the RGAT model's focus on knowledge with low text relevance and strengthen the model's focus on knowledge with greater text relevance, thereby improving the training effect of the RGAT model and learning more accurate knowledge-level encoding, thereby Improve the accuracy of processing results.
结合第二方面,在第二方面的某些实现方式中,主题节点包括知识图谱中与目标名词词组对应的全部候选实体。Combined with the second aspect, in some implementations of the second aspect, the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
待处理的文本的概念图谱中的主题节点可以包括知识图谱中与待处理的文本中的目标名词词组对应的全部候选实体。The topic nodes in the concept map of the text to be processed may include all candidate entities in the knowledge map corresponding to the target noun phrases in the text to be processed.
根据本申请实施例的方案,主题节点可以包括知识图谱中与每个目标名词词组对应的所有候选实体,由此得到的概念图谱中涵盖了与文本数据相关的全部候选实体以及对应的实体关系。这样能够为后续的处理过程提供与文本相关的全面完整的知识,避免由于遗漏部分知识而导致不正确的推理路径,而且,目标RGAT在训练过程中关注与文本相关度较大的知识,进一步保证了下游任务的准确性。According to the solution of the embodiment of the present application, the topic node may include all candidate entities corresponding to each target noun phrase in the knowledge map, and the resulting concept map covers all candidate entities related to the text data and the corresponding entity relationships. This can provide comprehensive and complete knowledge related to the text for subsequent processing, avoiding incorrect reasoning paths due to missing part of the knowledge. Moreover, the target RGAT focuses on knowledge that is highly relevant to the text during the training process, further ensuring that improve the accuracy of downstream tasks.
结合第二方面,在第二方面的某些实现方式中,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与训练文本的相关度以及第二概念图谱中的边的权重确定的,包括:按照第二概念图谱中的边的收益与第一消耗之间的比值从大到小的顺序,选取第二概念图谱中的边作为第一概念图谱中的边,直至被选取的边的第一消耗的总和大于阈值,第二概念图谱中的边的收益与第i次迭代中的边的权重呈正相关关系,第二概念图谱中的边的第一消耗与边所连接的两个节点与训练文本的相关度呈负相关关系。Combined with the second aspect, in some implementations of the second aspect, the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and The weight of the edge in the second concept map is determined, including: selecting the edge in the second concept map as the first according to the order of the ratio between the income of the edge in the second concept map and the first consumption in descending order. For the edges in the concept map, until the sum of the first consumption of the selected edges is greater than the threshold, the income of the edges in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map The first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
结合第二方面,在第二方面的某些实现方式中,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与训练文本的相关度以及第二概念图谱中的边的权重确定的,包括:按照第二概念图谱中的连通子图的第二消耗从小到大的顺序,选取第二概念图谱中的连通子图作为第一概念图谱的连通子图,直至被选取的全部连通子图包括 目标名词词组对应的至少一个候选实体。Combined with the second aspect, in some implementations of the second aspect, the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and The weight of the edge in the second concept map is determined, including: selecting the connected subgraph in the second concept map as the first concept map in order of the second cost of the connected subgraph in the second concept map from small to large. Connected subgraphs until all selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.
结合第二方面,在第二方面的某些实现方式中,主题节点与训练文本的相关度是根据主题节点在主题相关图上的特征向量中心性确定的,主题相关图中的节点包括主题节点,主题相关图中的边的权重是根据边所连接的两个节点在知识图谱中对应的实体之间的实体关系的数量确定的。Combined with the second aspect, in some implementations of the second aspect, the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic correlation graph, and the nodes in the topic correlation graph include topic nodes. ,The weight of the edge in the topic related graph is ,determined based on the number of entity relationships between the ,corresponding entities in the knowledge graph between the two ,nodes connected by the edge.
主题节点的特征向量中心性是根据主题节点的初始重要度和主题相关图中的边的权重确定的。主题相关图中的节点的初始重要度是根据该节点在知识图谱记录的事实中出现的概率设置的。The eigenvector centrality of a topic node is determined based on the initial importance of the topic node and the weight of the edges in the topic correlation graph. The initial importance of a node in the topic related graph is set based on the probability of the node appearing in the facts recorded in the knowledge graph.
结合第二方面,在第二方面的某些实现方式中,初始概念图谱中的节点还包括邻居节点,邻居节点包括目标名词词组对应的候选实体在知识图谱中的邻居实体。Combined with the second aspect, in some implementations of the second aspect, the nodes in the initial concept graph also include neighbor nodes, and the neighbor nodes include neighbor entities in the knowledge graph of the candidate entities corresponding to the target noun phrase.
待处理的文本的概念图谱中的邻居节点可以包括知识图谱中与待处理的文本中的目标名词词组对应的候选实体在知识图谱中的邻居实体。Neighbor nodes in the concept graph of the text to be processed may include neighbor entities in the knowledge graph of candidate entities corresponding to the target noun phrases in the text to be processed.
结合第二方面,在第二方面的某些实现方式中,邻居节点与训练文本的相关度是根据信息传播图上邻居节点所在的强连通分支的分数确定的,信息传播图中的节点包括初始概念图谱中的节点,在初始概念图谱中的第一节点是第二节点的一跳邻居的情况下,信息传播图中的第二节点和第一节点之间存在由第二节点指向第一节点的有向边。Combined with the second aspect, in some implementations of the second aspect, the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph. The nodes in the information propagation graph include the initial For nodes in the concept map, when the first node in the initial concept map is a one-hop neighbor of the second node, there is a link between the second node and the first node in the information propagation graph from the second node to the first node. ’s directed edge.
结合第二方面,在第二方面的某些实现方式中,信息传播图上的强连通分支的分数是通过将主题节点所在的强连通分支的初始分数根据拓扑排序传播到下游的强连通分支后得到的,主题节点所在的强连通分支的初始分数是根据主题节点所在的强连通分支中的节点的重要度的最大值确定的。Combined with the second aspect, in some implementations of the second aspect, the score of the strongly connected branch on the information propagation graph is by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected branch according to the topological sorting. Obtained, the initial score of the strongly connected branch where the topic node is located is determined based on the maximum importance of the nodes in the strongly connected branch where the topic node is located.
结合第二方面,在第二方面的某些实现方式中,方法还包括:基于待处理的文本的文本编码和待处理的文本的知识编码输出待处理的文本的概念图谱中的知识路径(knowledge path),该知识路径用于指示处理结果的判断依据。Combined with the second aspect, in some implementations of the second aspect, the method further includes: outputting the knowledge path (knowledge path) in the concept map of the text to be processed based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed. path), this knowledge path is used to indicate the basis for judging the processing results.
一条知识路径指的是概念图谱中介于两个节点之间的路径。例如,节点e q和节点e q+k之间的k跳知识路径可以表示为(e q,r q,e q+1,r q+1,…,r q+k-1,e q+k),(e q,r q,e q+1)即为一个三元组,r q表示两个节点之间的实体关系,以此类推。q为正整数。 A knowledge path refers to the path between two nodes in the concept map. For example, the k-hop knowledge path between node e q and node e q+k can be expressed as (e q ,r q ,e q+1 ,r q+1 ,…,r q+k-1 ,e q+ k ), (e q ,r q ,e q+1 ) is a triplet, r q represents the entity relationship between the two nodes, and so on. q is a positive integer.
知识路径能够提高模型的可解释性,为用户提供处理结果的判断依据,即得出该处理结果的完整逻辑,有利于提高用户的信任度。The knowledge path can improve the interpretability of the model and provide users with a basis for judging the processing results, that is, the complete logic of the processing results can be obtained, which is conducive to improving the user's trust.
第三方面,提供了一种文本处理模型的训练装置,该装置包括用于执行上述第一方面的任意一种实现方式的方法的单元。In a third aspect, a text processing model training device is provided. The device includes a unit for executing the method of any implementation of the first aspect.
第四方面,提供了一种文本处理的装置,该装置包括用于执行上述第二方面的任意一种实现方式的方法的单元。A fourth aspect provides a text processing device, which includes a unit for executing the method of any implementation of the second aspect.
第五方面,提供了一种文本处理模型的训练装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第一方面的任意一种实现方式中的方法。In a fifth aspect, a training device for a text processing model is provided. The device includes: a memory for storing a program; a processor for executing the program stored in the memory. When the program stored in the memory is executed, The processor is configured to execute the method in any implementation manner of the first aspect.
上述第五方面中的处理器既可以是中央处理器(central processing unit,CPU),也可以是CPU与神经网络运算处理器的组合,这里的神经网络运算处理器可以包括图形处理器(graphics processing unit,GPU)、神经网络处理器(neural-network processing unit,NPU)和张量处理器(tensor processing unit,TPU)等等。其中,TPU是谷歌(google) 为机器学习全定制的人工智能加速器专用集成电路。The processor in the fifth aspect mentioned above can be either a central processing unit (CPU) or a combination of a CPU and a neural network computing processor. The neural network computing processor here can include a graphics processor (graphics processing unit (GPU), neural-network processing unit (NPU) and tensor processing unit (TPU), etc. Among them, TPU is an artificial intelligence accelerator dedicated integrated circuit fully customized by Google for machine learning.
第六方面,提供了一种文本处理的装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行上述第二方面的任意一种实现方式的方法的单元。In a sixth aspect, a text processing device is provided. The device includes: a memory for storing a program; a processor for executing the program stored in the memory. When the program stored in the memory is executed, the The processor is a unit configured to execute the method of any implementation manner of the second aspect.
上述第六方面中的处理器既可以是CPU,也可以是CPU与神经网络运算处理器的组合,这里的神经网络运算处理器可以包括GPU、NPU或TPU等等。The processor in the sixth aspect may be a CPU or a combination of a CPU and a neural network computing processor. The neural network computing processor here may include a GPU, NPU or TPU, etc.
第七方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行第一方面至第二方面的任一方面的任意一种实现方式中的方法。In a seventh aspect, a computer-readable medium is provided. The computer-readable medium stores program code for device execution. The program code includes any implementation manner for executing any one of the first to second aspects. method in.
第八方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面至第二方面的任一方面的任意一种实现方式中的方法。An eighth aspect provides a computer program product containing instructions, which when the computer program product is run on a computer, causes the computer to execute the method in any implementation of any one of the above-mentioned first to second aspects.
第九方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面至第二方面的任一方面的任意一种实现方式中的方法。A ninth aspect provides a chip. The chip includes a processor and a data interface. The processor reads instructions stored in a memory through the data interface and executes any one of the above first to second aspects. method in any implementation.
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面至第二方面的任一方面的任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may further include a memory, in which instructions are stored, and the processor is configured to execute the instructions stored in the memory. When the instructions are executed, the The processor is configured to execute the method in any implementation manner of any one of the first aspect to the second aspect.
附图说明Description of the drawings
图1是本申请实施例提供的自然语言处理系统的示意性框图;Figure 1 is a schematic block diagram of a natural language processing system provided by an embodiment of the present application;
图2是本申请实施例提供的一种系统架构的示意性框图;Figure 2 is a schematic block diagram of a system architecture provided by an embodiment of the present application;
图3是本申请实施例提供的一种文本处理系统的示意性框图;Figure 3 is a schematic block diagram of a text processing system provided by an embodiment of the present application;
图4是本申请实施例提供的一种文本处理模型的示意性框图;Figure 4 is a schematic block diagram of a text processing model provided by an embodiment of the present application;
图5是本申请实施例提供的一种文本处理模型的训练方法的示意性流程图;Figure 5 is a schematic flow chart of a text processing model training method provided by an embodiment of the present application;
图6是本申请实施例提供的一种知识抽取的示意图;Figure 6 is a schematic diagram of knowledge extraction provided by an embodiment of the present application;
图7是本申请实施例提供的一种概念图谱的构建过程的示意图;Figure 7 is a schematic diagram of the construction process of a concept map provided by the embodiment of the present application;
图8是本申请实施例提供的一种主题相关图的示意图;Figure 8 is a schematic diagram of a subject-related diagram provided by an embodiment of the present application;
图9是本申请实施例提供的一种信息传播图的示意图;Figure 9 is a schematic diagram of an information dissemination diagram provided by an embodiment of the present application;
图10是本申请实施例提供的另一种文本处理模型的训练方法的示意图;Figure 10 is a schematic diagram of another text processing model training method provided by an embodiment of the present application;
图11是本申请实施例提供的一种文本处理的方法的示意性流程图;Figure 11 is a schematic flow chart of a text processing method provided by an embodiment of the present application;
图12是本申请实施例提供的另一种文本处理的方法的示意性流程图;Figure 12 is a schematic flow chart of another text processing method provided by an embodiment of the present application;
图13是本申请实施例提供的一种文本分类结果的示意图;Figure 13 is a schematic diagram of a text classification result provided by an embodiment of the present application;
图14是本申请实施例提供的一种训练装置的示意性框图;Figure 14 is a schematic block diagram of a training device provided by an embodiment of the present application;
图15是本申请实施例提供的一种文本处理的装置的示意性框图;Figure 15 is a schematic block diagram of a text processing device provided by an embodiment of the present application;
图16是本申请实施例提供的另一种训练装置的示意性框图;Figure 16 is a schematic block diagram of another training device provided by an embodiment of the present application;
图17是本申请实施例提供的另一种文本处理的装置的示意性框图。Figure 17 is a schematic block diagram of another text processing device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请中的技术方案进行描述。The technical solutions in this application will be described below with reference to the accompanying drawings.
自然语言处理是人工智能领域的一个重要的研究方向,能够让人机之间通过自然语言进行交互。自然语言处理任务通常是基于自然语言的文本本身执行处理过程,自然语言的文本本身所包含的特征相对有限,文本处理效果可能无法满足预期。为了提高文本处理的效果,一些方案引入了知识图谱作为文本处理的辅助信息。但引入知识图谱可能带来其他问题,例如,知识图谱中的实体的歧义性以及知识图谱的噪声,可能导致在文本处理过程中引入与文本内容完全不相关的知识图谱信息,无法保证文本处理结果的准确性。Natural language processing is an important research direction in the field of artificial intelligence, which enables humans and machines to interact through natural language. Natural language processing tasks are usually performed based on the natural language text itself. The natural language text itself contains relatively limited features, and the text processing effect may not meet expectations. In order to improve the effect of text processing, some solutions introduce knowledge graphs as auxiliary information for text processing. However, the introduction of knowledge graphs may bring other problems. For example, the ambiguity of entities in the knowledge graph and the noise of the knowledge graph may lead to the introduction of knowledge graph information that is completely irrelevant to the text content during text processing, and the text processing results cannot be guaranteed. accuracy.
本申请实施例提供了一种文本处理的方法,有利于提高文本处理的效果。The embodiment of the present application provides a text processing method, which is beneficial to improving the effect of text processing.
为了便于理解本申请实施例,下面先对本申请实施例涉及的相关术语的相关概念进行介绍。In order to facilitate understanding of the embodiments of the present application, the relevant concepts of relevant terms involved in the embodiments of the present application are first introduced below.
(1)自然语言处理(1)Natural language processing
自然语言(natural language)即人类语言,自然语言处理就是对人类语言的处理。自然语言处理是以一种智能与高效的方式,对文本数据进行系统化分析、理解与信息提取的过程。NLP及其组件可以管理非常大块的文本数据,或者执行大量的自动化任务,并且解决各式各样的问题,如自动摘要(automatic summarization),机器翻译(machine translation,MT),命名实体识别(named entity recognition,NER),关系提取(relation extraction,RE),信息抽取(information extraction,IE),情感分析,语音识别(speech recognition),问答系统(question answering)以及主题分割等等。Natural language is human language, and natural language processing is the processing of human language. Natural language processing is the process of systematically analyzing, understanding and extracting information from text data in an intelligent and efficient way. NLP and its components can manage very large blocks of text data, or perform a large number of automated tasks, and solve a variety of problems, such as automatic summarization (automatic summarization), machine translation (MT), named entity recognition ( named entity recognition (NER), relationship extraction (RE), information extraction (IE), sentiment analysis, speech recognition (speech recognition), question answering system (question answering), topic segmentation, etc.
(2)知识图谱(knowledge graph,KG)(2) Knowledge graph (KG)
知识图谱是一种知识库,是通过图结构的数据模型对现实世界的事实整合而成。知识图谱通常用于存储彼此之间相互联系的实体。例如,一个代表两个实体(entity)之间存在某种实体关系(relation)的事实可以表示为三元组数据结构,形式为(实体,实体关系,实体)。Knowledge graph is a knowledge base that integrates real-world facts through a graph-structured data model. Knowledge graphs are often used to store entities that are related to each other. For example, a fact that represents the existence of some entity relationship between two entities can be expressed as a triple data structure in the form of (entity, entity relationship, entity).
实体,表现为知识图谱中的节点,表示现实世界中的概念实体。例如,“北京大学(组织)”,“维生素B12(医学元素)”和“血红蛋白(医学元素)”等。实体关系,表现为知识图谱中的两个实体对应的节点之间的边,表示现实世界中的两个实体之间的关系。例如,如“维生素B12”和“血红蛋白”之间的实体关系为“增加”。(维生素B12,增加,血红蛋白)所指示的事实为,维生素B12增加血红蛋白。Entities are represented as nodes in the knowledge graph and represent conceptual entities in the real world. For example, "Peking University (Organization)", "Vitamin B12 (Medical Element)" and "Hemoglobin (Medical Element)" etc. Entity relationships are represented by edges between nodes corresponding to two entities in the knowledge graph, representing the relationship between two entities in the real world. For example, the entity relationship between "vitamin B12" and "hemoglobin" is "increase". (Vitamin B12, increases hemoglobin) indicates the fact that vitamin B12 increases hemoglobin.
专业领域的知识图谱指的是包含该专业领域的实体、关系和事实的知识图谱。例如,金融领域的知识图谱用于指示金融领域的实体、关系和事实。医药领域的知识图谱用于指示医药领域的实体、关系和事实。The knowledge graph in the professional field refers to the knowledge graph containing entities, relationships and facts in the professional field. For example, knowledge graphs in the financial domain are used to indicate entities, relationships, and facts in the financial domain. The knowledge graph in the medical field is used to indicate entities, relationships and facts in the medical field.
在自然语言文本中抽取的能够表达事实的三元组数据结构,可以称为文本的知识三元组,形式为(名词词组,关系词组,名词词组)。其中,名词词组可以包括一个词语,也可以包括多个词语。关系词组可以包括一个词语,也可以包括,多个词语。The triplet data structure extracted from natural language text that can express facts can be called the knowledge triplet of the text, in the form of (noun phrase, relational phrase, noun phrase). Among them, a noun phrase can include one word or multiple words. A relational phrase can include one word or multiple words.
名词词组可以对应于知识图谱内的一个或多个候选实体。例如,中文的名词词组“苹果”可以对应“苹果(水果)”或“苹果(公司)”等候选实体。再如,英文的名词词组“anemia”可对应“anemia(disease)”、“anemia(symptom)”或“anemia(plant)”等候选实体。A noun phrase can correspond to one or more candidate entities within the knowledge graph. For example, the Chinese noun phrase "apple" can correspond to candidate entities such as "apple (fruit)" or "apple (company)". For another example, the English noun phrase "anemia" can correspond to candidate entities such as "anemia(disease)", "anemia(symptom)" or "anemia(plant)".
(3)知识图谱嵌入(knowledge graph embedding)(3) Knowledge graph embedding
知识图谱嵌入指的是将知识图谱中的实体和实体关系映射到低维向量空间,得到知识图谱的嵌入表示(representation),实现对实体和实体关系的语义信息表示。知识图谱的嵌入表示可以用于各种和知识图谱相关的任务。示例性地,知识图谱的嵌入表示可以包括以下至少一项:实体的嵌入表示或关系的嵌入表示等。Knowledge graph embedding refers to mapping the entities and entity relationships in the knowledge graph to a low-dimensional vector space to obtain the embedded representation of the knowledge graph and realize the semantic information representation of entities and entity relationships. The embedded representation of knowledge graphs can be used for various tasks related to knowledge graphs. For example, the embedded representation of the knowledge graph may include at least one of the following: an embedded representation of an entity or an embedded representation of a relationship, etc.
通过知识图谱嵌入模型可以得到知识图谱的嵌入表示。知识图谱嵌入模型可以基于图神经网络(graph neural network,GNN)实现。The embedded representation of the knowledge graph can be obtained through the knowledge graph embedding model. The knowledge graph embedding model can be implemented based on graph neural network (GNN).
(4)节点的k跳邻居(k-hop neighbours)(4) K-hop neighbors of the node (k-hop neighbors)
在图论中,图中的节点的k跳邻居,指的是从该节点出发,寻找到的所有与该节点之间的最短路径为k跳的节点的集合。k为正整数。In graph theory, the k-hop neighbors of a node in the graph refer to the set of all nodes whose shortest path to the node is k-hops starting from the node. k is a positive integer.
(5)有向无环图(directed acyclic graph,DAG)(5) Directed acyclic graph (DAG)
在图论中,如果一个有向图从任意节点出发无法经过若干条边回到该点,则这个图是一个有向无环图。In graph theory, if a directed graph starts from any node and cannot return to that point through several edges, then the graph is a directed acyclic graph.
(6)拓扑排序(6)Topological sorting
对一个有向无环图G进行拓扑排序,是将G中所有节点排成一个线性序列,使得图中任意一对节点u和v,边<u,v>表示从节点u到节点v的路径,边集E(G)表示G中的各个节点之间的边的集合,若边<u,v>∈E(G),则u在线性序列中出现在v之前。通常,这样的线性序列称为满足拓扑次序(topological order)的序列,简称拓扑序列。Topological sorting of a directed acyclic graph G is to arrange all the nodes in G into a linear sequence, so that for any pair of nodes u and v in the graph, the edge <u,v> represents the path from node u to node v. , the edge set E(G) represents the set of edges between each node in G. If the edge <u,v>∈E(G), then u appears before v in the linear sequence. Usually, such a linear sequence is called a sequence that satisfies topological order, or a topological sequence for short.
换言之,拓扑序列需要满足两个条件:In other words, a topological sequence needs to satisfy two conditions:
1)有向无环图G中的每个节点在拓扑序列中出现且只出现一次。1) Each node in the directed acyclic graph G appears only once in the topological sequence.
2)若在有向无环图G中存在一条从节点u到节点v的路径,那么在拓扑序列中节点u出现在节点v之前。2) If there is a path from node u to node v in the directed acyclic graph G, then node u appears before node v in the topological sequence.
一个有向无环图可以有一个或多个拓扑排序序列。A directed acyclic graph can have one or more topologically ordered sequences.
(7)强连通分支(7) Strongly connected branch
如果一个有向图G,对于其中任意两个节点v、u,都存在从v到u以及从u到v的有向路径,则称有向图G为强连通图。在有向图G中,若其中两个节点u、v在两个方向上都存在有向路径,则称u和v强连通(strongly connected)。If a directed graph G has a directed path from v to u and from u to v for any two nodes v and u, then the directed graph G is called a strongly connected graph. In a directed graph G, if two nodes u and v have directed paths in both directions, then u and v are said to be strongly connected.
换言之,如果有向图G中的两个节点v、u可以相互通达,则这两个节点强连通。如果有向图G的每两个节点都强连通,则称有向图G是一个强连通图。In other words, if two nodes v and u in the directed graph G can reach each other, then the two nodes are strongly connected. If every two nodes of a directed graph G are strongly connected, then the directed graph G is said to be a strongly connected graph.
对于不是强连通图的有向图G的一个强连通子图S,若向S添加任意节点都会导致S失去强连通的属性,则强连通子图S为G的极大强连通子图,或者,也可以称S为G的强连通分支。For a strongly connected subgraph S of a directed graph G that is not a strongly connected graph, if adding any node to S will cause S to lose the property of strong connectivity, then the strongly connected subgraph S is a maximal strongly connected subgraph of G, or , S can also be called a strongly connected branch of G.
示例性地,Tarjan算法可以用于求解有向图的强连通分支。具体地,该算法可以用于计算有向图中的每个强连通分支的大小、每个强连通分支的节点以及强连通分支的总数等。For example, Tarjan's algorithm can be used to solve strongly connected branches of directed graphs. Specifically, this algorithm can be used to calculate the size of each strongly connected branch in the directed graph, the nodes of each strongly connected branch, the total number of strongly connected branches, etc.
(8)连通子图(8) Connected subgraph
在图论中,无向图的连通子图中任何两个节点通过路径相互连接,并且在超图中不连接节点。In graph theory, any two nodes in a connected subgraph of an undirected graph are connected to each other by a path, and nodes are not connected in a hypergraph.
(9)特征向量中心性(eigenvector centrality)(9) Eigenvector centrality (eigenvector centrality)
在图论中,特征向量中心性是衡量节点对网络影响的一种方式。In graph theory, eigenvector centrality is a way to measure the influence of a node on a network.
一个节点的重要性通常取决于该节点的邻居节点的数量(即该节点的度)以及该节点的邻居节点的重要性。与之相连的邻居节点越重要,则该节点就越重要。The importance of a node usually depends on the number of neighbor nodes of the node (that is, the degree of the node) and the importance of the node's neighbor nodes. The more important the neighbor nodes connected to it are, the more important the node is.
例如,可以以分数表示节点的重要性,分数越高,该节点越重要。对于连接数相同的节点,相邻节点的分数高的节点会比相邻节点的分数低的节点分数高,依据此原则可以给所有节点分配对应的分数。特征向量得分较高意味着该节点与许多自身得分较高的节点相连接。For example, the importance of a node can be expressed as a score. The higher the score, the more important the node is. For nodes with the same number of connections, the node with a higher score of the adjacent node will have a higher score than the node with a lower score of the adjacent node. According to this principle, all nodes can be assigned corresponding scores. A higher eigenvector score means that the node is connected to many nodes that themselves have higher scores.
(10)神经网络(10)Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以如下所示: The neural network can be composed of neural units. The neural unit can refer to an arithmetic unit that takes x s and intercept 1 as input. The output of the arithmetic unit can be as follows:
Figure PCTCN2022103682-appb-000001
Figure PCTCN2022103682-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。 Among them, s=1, 2,...n, n is a natural number greater than 1, W s is the weight of x s , and b is the bias of the neural unit.
f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号变换为输出信号。该激活函数的输出信号可以作为下一层的输入。例如,激活函数可以是ReLU,tanh或sigmoid函数。f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to transform the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next layer. For example, the activation function can be ReLU, tanh or sigmoid function.
神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.
(11)图神经网络(11)Graph neural network
图神经网络是指使用神经网络来学习图结构的数据,提取和发掘图结构的数据中的特征和模式,满足聚类、分类、预测、分割、生成等图学习任务需求的算法总称。Graph neural network refers to the general term for algorithms that use neural networks to learn graph-structured data, extract and explore features and patterns in graph-structured data, and meet the needs of graph learning tasks such as clustering, classification, prediction, segmentation, and generation.
对于图结构的数据,由于每个节点和其邻居节点都具有紧密的关联,图神经网络可以将每个节点的邻域信息聚合起来以得到每个节点的嵌入表示。For graph-structured data, since each node has a close relationship with its neighbor nodes, the graph neural network can aggregate the neighborhood information of each node to obtain the embedded representation of each node.
注意力机制可以让一个神经网络只关注任务学习所需要的信息。在GNN中引入注意力机制形成图注意力网络(graph attention network,GAT),GAT关注对任务更加相关的节点和边,能够提升处理效果。The attention mechanism allows a neural network to focus only on the information required for task learning. The attention mechanism is introduced into GNN to form a graph attention network (GAT). GAT focuses on nodes and edges that are more relevant to the task, which can improve the processing effect.
(12)关系感知注意力网络(relation-aware graph attention network,RGAT)(12)Relation-aware graph attention network (RGAT)
RGAT是一种图神经网络,能够通过图注意力机制对图结构中的多种关系建模,以得到图结构中的节点和节点之间的关系在低维空间上的向量表达。示例性地,RGAT可以用于对输入的知识图谱进行处理,以得到知识图谱上的每个实体和实体关系的低维空间向量。RGAT is a graph neural network that can model various relationships in the graph structure through the graph attention mechanism to obtain the vector expression of the nodes in the graph structure and the relationships between nodes in a low-dimensional space. For example, RGAT can be used to process the input knowledge graph to obtain the low-dimensional space vector of each entity and entity relationship on the knowledge graph.
如前所述,本申请实施例的方案可以应用于自然语言处理任务中。图1的(a)示出了自然语言处理系统的一种应用场景。在该场景中,自然语言处理系统包括用户设备以及数据处理设备。用户设备包括用户以及手机、个人电脑或者信息处理中心等智能终端。用户设备为自然语言数据处理的发起端,作为语言问答或者查询等请求的发起方,通常用户通过用户设备发起请求。As mentioned above, the solution of the embodiment of the present application can be applied to natural language processing tasks. Figure 1(a) shows an application scenario of the natural language processing system. In this scenario, the natural language processing system includes user equipment and data processing equipment. User equipment includes users and smart terminals such as mobile phones, personal computers, or information processing centers. User equipment is the initiator of natural language data processing. As the initiator of language question and answer or query requests, users usually initiate requests through user equipment.
数据处理设备可以是云服务器、网络服务器、应用服务器以及管理服务器等具有数据处理功能的设备或服务器。数据处理设备通过交互接口接收来自智能终端的查询语句/语音/文本等问句,再通过存储数据的存储器以及数据处理的处理器环节进行机器学习,深度学习,搜索,推理,决策等方式的语言数据处理。存储器可以是一个统称,包括本地存 储以及存储历史数据的数据库,数据库可以位于数据处理设备上,也可以位于其它网络服务器上。Data processing equipment can be cloud servers, network servers, application servers, management servers and other devices or servers with data processing functions. The data processing equipment receives query statements/voice/text and other questions from the smart terminal through an interactive interface, and then performs machine learning, deep learning, search, reasoning, decision-making, etc. through the memory that stores the data and the processor that processes the data. data processing. Storage can be a general term that includes local storage and databases that store historical data. The database can be located on the data processing device or on other network servers.
图1的(b)示出了自然语言处理系统的另一个应用场景。此场景中用户设备直接作为数据处理设备,直接接收来自用户的输入并直接由用户设备本身的硬件进行处理,具体过程与图1的(a)相似,可参考上面的描述,在此不再赘述。(b) of Figure 1 shows another application scenario of the natural language processing system. In this scenario, the user device directly serves as a data processing device, directly receiving input from the user and processing it directly by the hardware of the user device itself. The specific process is similar to (a) in Figure 1. Please refer to the above description and will not be repeated here. .
图1的(c)示出了是本申请实施例提供的自然语言处理系统的相关设备的示意图。自然语言处理系统可以包括本地设备101、本地设备102以及执行设备110和数据存储系统150,其中,本地设备101和本地设备102通过通信网络与执行设备110连接。(c) of FIG. 1 shows a schematic diagram of related equipment of the natural language processing system provided by the embodiment of the present application. The natural language processing system may include a local device 101, a local device 102, an execution device 110 and a data storage system 150, where the local device 101 and the local device 102 are connected to the execution device 110 through a communication network.
执行设备110由一个或多个服务器实现,可选的,与其它计算设备配合,例如:数据存储、路由器、负载均衡器等设备;执行设备110可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备110可以使用数据存储系统150中的数据,或者调用数据存储系统150中的程序代码实现本申请实施例的文本处理模型的训练方法。The execution device 110 is implemented by one or more servers, and optionally cooperates with other computing devices, such as data storage, routers, load balancers and other devices; the execution device 110 can be arranged on a physical site, or distributed across multiple on the physical site. The execution device 110 can use the data in the data storage system 150 or call the program code in the data storage system 150 to implement the training method of the text processing model in the embodiment of the present application.
需要说明的是,上述执行设备110也可以称为云端设备,此时执行设备110可以部署在云端。或者,上述执行设备110也可以为终端设备,此时,执行设备110可以部署在用户终端侧,本申请实施例对此并不限定。It should be noted that the above-mentioned execution device 110 can also be called a cloud device, and in this case, the execution device 110 can be deployed in the cloud. Alternatively, the execution device 110 may also be a terminal device. In this case, the execution device 110 may be deployed on the user terminal side, which is not limited in the embodiments of the present application.
用户可以操作各自的用户设备(例如本地设备101和本地设备102)与执行设备110进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。Users may operate respective user devices (eg, local device 101 and local device 102) to interact with execution device 110. Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, game console, etc.
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备110进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。Each user's local device can interact with the execution device 110 through a communication network of any communication mechanism/communication standard. The communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
数据存储系统150可以集成在执行设备110、本地设备101或本地设备102上,也可以设置在云上或其它网络服务器上。The data storage system 150 can be integrated on the execution device 110, the local device 101 or the local device 102, or can be set up on the cloud or other network servers.
在一种实现方式中,本地设备101或本地设备102可以从执行设备110获取到文本处理模型的相关参数,本地设备101、本地设备102上利用该文本处理模型得到文本处理任务的执行结果。In one implementation, the local device 101 or the local device 102 can obtain the relevant parameters of the text processing model from the execution device 110, and use the text processing model on the local device 101 or the local device 102 to obtain the execution result of the text processing task.
在另一种实现方式中,执行设备110上可以直接部署文本处理模型,执行设备110通过从本地设备101和本地设备102获取待处理的文本,并通过文本处理模型得到文本处理任务的执行结果。In another implementation, the text processing model can be deployed directly on the execution device 110. The execution device 110 obtains the text to be processed from the local device 101 and the local device 102, and obtains the execution result of the text processing task through the text processing model.
图1的(a)和图1的(b)中的用户设备可以是图1的(c)中的本地设备101或102,图1的(a)和图1的(b)中的数据处理设备可以是图1的(c)中的执行设备110。The user equipment in Figure 1(a) and Figure 1(b) may be the local device 101 or 102 in Figure 1(c). The data processing in Figure 1(a) and Figure 1(b) The device may be the execution device 110 in (c) of Figure 1 .
图2示出了本申请实施例提供的一种系统架构200。数据采集设备260用于采集训练数据并存入数据库230,训练设备220基于数据库230中维护的训练数据生成目标模型/规则201,例如,本申请实施例中的文本处理模型。本申请实施例中的模型可以为神经网络模型,或者,也可以为其他模型。训练数据可以包括训练文本以及训练文本的目标处理结果,例如训练文本的标签。Figure 2 shows a system architecture 200 provided by an embodiment of the present application. The data collection device 260 is used to collect training data and store it in the database 230. The training device 220 generates a target model/rule 201 based on the training data maintained in the database 230, for example, the text processing model in the embodiment of the present application. The model in the embodiment of this application may be a neural network model, or it may also be other models. The training data may include training text and target processing results of the training text, such as labels of the training text.
需要说明的是,在实际的应用中,数据库230中维护的训练数据不一定都来自于数据采集设备260的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备220也不一定完全基于数据库230维护的训练数据进行目标模型/规则201的训练,也有可 能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。It should be noted that in actual applications, the training data maintained in the database 230 may not necessarily be collected by the data collection device 260, but may also be received from other devices. In addition, it should be noted that the training device 220 does not necessarily perform training of the target model/rules 201 based entirely on the training data maintained by the database 230. It may also obtain training data from the cloud or other places for model training. The above description should not be regarded as a limitation of this application. Limitations of Examples.
图2所示为数据处理过程中的功能模块图。示例性地,图2中的客户设备240可以是图1的用户设备。在图1中的用户设备数据处理能力比较强大时,图2中的执行设备210以及数据存储系统250可以集成在图1的用户设备内。在一些实施例中,图2中的执行设备210以及数据存储系统250也可以集成在图1中的数据处理设备上。图2中的数据库230、训练设备220以及数据采集设备260可以对应集成在图1中的数据处理设备上,可以设置在云上或网络上的其它服务器上。Figure 2 shows the functional module diagram in the data processing process. For example, client device 240 in FIG. 2 may be the user device of FIG. 1 . When the data processing capability of the user equipment in Figure 1 is relatively strong, the execution device 210 and the data storage system 250 in Figure 2 can be integrated into the user equipment in Figure 1 . In some embodiments, the execution device 210 and the data storage system 250 in Figure 2 can also be integrated on the data processing device in Figure 1 . The database 230, training device 220 and data collection device 260 in Figure 2 can be integrated correspondingly on the data processing device in Figure 1, and can be set up on the cloud or other servers on the network.
示例性地,数据采集设备260可以是终端设备,也可以是服务器或者云的输入输出接口,用于获取用户输入以及返回处理结果的交互层(interface)。For example, the data collection device 260 may be a terminal device, or an input and output interface of a server or cloud, an interaction layer (interface) used to obtain user input and return processing results.
训练设备220得到的目标模型/规则可以应用不同的系统或设备中。如应用于图2所示的执行设备210,该执行设备210可以是终端,如手机终端,平板电脑,笔记本电脑,AR/VR,车载终端等,还可以是服务器或者云端等。在图2中,执行设备210配置有I/O接口212,与外部设备进行数据交互,“用户”可以通过客户设备240向I/O接口212输入数据。The target model/rule obtained by the training device 220 can be applied in different systems or devices. As applied to the execution device 210 shown in Figure 2, the execution device 210 can be a terminal, such as a mobile phone terminal, a tablet computer, a laptop, AR/VR, a vehicle-mounted terminal, etc., or it can also be a server or cloud. In Figure 2, the execution device 210 is configured with an I/O interface 212 for data interaction with external devices. The "user" can input data to the I/O interface 212 through the client device 240.
在执行设备210对输入数据进行预处理,或者在执行设备210的计算模块211执行计算等相关的处理过程中,执行设备210可以调用数据存储系统250中的数据、代码等,也可以将数据、指令等存入数据存储系统250中。When the execution device 210 preprocesses the input data, or when the calculation module 211 of the execution device 210 performs calculations and other related processes, the execution device 210 can call the data, code, etc. in the data storage system 250, or can transfer the data, Instructions, etc. are stored in data storage system 250.
最后,I/O接口212将处理结果返回给客户设备240,提供给用户。Finally, the I/O interface 212 returns the processing results to the client device 240 and provides them to the user.
值得说明的是,训练设备220可以针对不同的目标,基于不同的数据生成相应的目标模型/规则201,以给用户提供更佳的结果。It is worth mentioning that the training device 220 can generate corresponding target models/rules 201 based on different data for different goals to provide users with better results.
在图2中所示情况下,用户可以手动指定输入执行设备210中的数据,例如,在I/O接口212提供的界面中操作。另一种情况下,客户设备240可以自动地向I/O接口212输入数据并获得结果,如果客户设备240自动输入数据需要获得用户的授权,用户可以在客户设备240中设置相应权限。用户可以在客户设备240查看执行设备210输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备240也可以作为数据采集端将采集到的数据存入数据库230。In the situation shown in FIG. 2 , the user can manually specify the data to be input into the execution device 210 , for example, by operating in the interface provided by the I/O interface 212 . In another case, the client device 240 can automatically input data to the I/O interface 212 and obtain the results. If the client device 240 automatically inputs data and requires the user's authorization, the user can set corresponding permissions in the client device 240 . The user can view the results output by the execution device 210 on the client device 240, and the specific presentation form may be display, sound, action, etc. The client device 240 can also serve as a data collection terminal to store the collected data in the database 230.
值得注意的,图2仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图2中,数据存储系统250相对执行设备210是外部存储器,在其它情况下,也可以将数据存储系统250置于执行设备210中。It is worth noting that Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application. The positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in Figure 2, data storage The system 250 is an external memory relative to the execution device 210. In other cases, the data storage system 250 can also be placed in the execution device 210.
图3示出了本申请实施例提供的一种文本处理系统的示意性框图。图3中的服务器上可以部署图2中的执行设备210,图1的(a)和图1的(b)中的数据处理设备,或者,图1的(c)中的执行设备110。Figure 3 shows a schematic block diagram of a text processing system provided by an embodiment of the present application. The execution device 210 in Fig. 2, the data processing devices in (a) and (b) of Fig. 1, or the execution device 110 in (c) of Fig. 1 can be deployed on the server in Fig. 3.
本申请实施例中的文本处理模型可以通过部署于服务器的硬件上的程序代码实现。或者说,本申请实施例的文本处理模型可以在现有的软件平台的基础上进行修改实现。具体地,程序代码运行于服务器的主机存储(如图3中的主机内存或磁盘)和加速硬件(如GPU、FPGA或专用芯片等)内存。示例性地,专用芯片可以为神经网络运算处理器,能够用于执行神经网络模型的运算。The text processing model in the embodiment of the present application can be implemented through program code deployed on the hardware of the server. In other words, the text processing model of the embodiment of the present application can be modified and implemented on the basis of the existing software platform. Specifically, the program code runs in the server's host storage (host memory or disk as shown in Figure 3) and acceleration hardware (such as GPU, FPGA or dedicated chip, etc.) memory. For example, the dedicated chip may be a neural network operation processor, which can be used to perform operations on the neural network model.
图4示出了本申请实施例提供的一种文本处理模型的示意性结构图。如图4所示,该文本处理模型400包括知识抽取模块410、文本编码模块420、知识编码模块430和任务处理模块440。Figure 4 shows a schematic structural diagram of a text processing model provided by an embodiment of the present application. As shown in FIG. 4 , the text processing model 400 includes a knowledge extraction module 410 , a text encoding module 420 , a knowledge encoding module 430 and a task processing module 440 .
其中,知识抽取模块410用于从输入数据中抽取知识。输入的数据可以为文本。在训练过程中,输入数据可以为训练文本,在推理过程中,输入数据可以为待处理的文本。Among them, the knowledge extraction module 410 is used to extract knowledge from input data. The input data can be text. During the training process, the input data can be training text, and during the inference process, the input data can be text to be processed.
文本编码模块420用于对文本进行编码,以得到文本的文字层面的编码。The text encoding module 420 is used to encode the text to obtain text-level encoding of the text.
知识编码模块430用于通过RGAT对知识抽取模块410抽取的知识进行处理,生成文本的知识层面的编码。The knowledge encoding module 430 is used to process the knowledge extracted by the knowledge extraction module 410 through RGAT to generate knowledge-level encoding of the text.
在一种可能的实现方式中,文本的文字层面的编码可以作为知识编码模块430的一个输入,参与到知识编码模块430生成知识层面的编码的过程中。In a possible implementation, the text-level encoding of the text can be used as an input to the knowledge encoding module 430 and participate in the process of the knowledge encoding module 430 generating the knowledge-level encoding.
在训练过程中,知识编码模块430的输出可以理解为文本的知识层面的预测编码。During the training process, the output of the knowledge encoding module 430 can be understood as predictive encoding of the knowledge level of the text.
在推理过程中,知识编码模块430的输出即为文本的知识层面的编码。During the reasoning process, the output of the knowledge encoding module 430 is the encoding of the knowledge level of the text.
任务处理模块440用于基于文字层面的编码和知识层面的编码输出文本的处理结果。The task processing module 440 is configured to output text processing results based on text-level coding and knowledge-level coding.
例如,图4所示的文本处理模型可以为用于文本分类的文本分类模型。在该情况下,任务处理模块440可以用于输出文本分类的结果,即基于文字层面的编码和知识层面的编码预测文本的类别。For example, the text processing model shown in Figure 4 may be a text classification model for text classification. In this case, the task processing module 440 may be configured to output the result of text classification, that is, predict the category of the text based on text-level coding and knowledge-level coding.
文本处理模型400对文本进行处理的具体过程可以参考后文中的方法。For the specific process of text processing by the text processing model 400, please refer to the method below.
图5示出了本申请实施例提供的一种文本处理模型的训练方法的示意性流程图。图5所示的方法500可以由能够执行文本处理模型训练过程的装置或设备执行,例如,该装置可以是云服务设备,也可以是终端设备,例如,电脑、服务器、车辆、手机等运算能力足以用来执行文本处理模型的训练方法的装置,也可以是由云服务设备和终端设备构成的系统。示例性地,方法500可以由图1中的执行设备110、本地设备或图2中的训练设备220中的任一设备执行。Figure 5 shows a schematic flow chart of a text processing model training method provided by an embodiment of the present application. The method 500 shown in Figure 5 can be executed by a device or device capable of performing a text processing model training process. For example, the device can be a cloud service device or a terminal device, such as a computer, server, vehicle, mobile phone, etc. with computing capabilities. The device sufficient to execute the training method of the text processing model may also be a system composed of a cloud service device and a terminal device. For example, the method 500 may be executed by any one of the execution device 110 in FIG. 1 , a local device, or the training device 220 in FIG. 2 .
例如,方法500具体可以由如图2所示的训练设备220执行,方法500中的训练数据可以是如图2所示的数据库230中维护的训练数据。For example, the method 500 may be specifically executed by the training device 220 as shown in FIG. 2 , and the training data in the method 500 may be training data maintained in the database 230 as shown in FIG. 2 .
示例性地,方法500中的文本处理模型可以为图4所示的文本处理模型。该文本处理模型中的知识编码模块可以通过RGAT模型实现,文本处理模型的训练方法也可以理解为RGAT的训练方法。For example, the text processing model in method 500 may be the text processing model shown in Figure 4. The knowledge encoding module in the text processing model can be implemented through the RGAT model, and the training method of the text processing model can also be understood as the training method of RGAT.
方法500包括步骤510至步骤540。下面对步骤510至步骤540进行说明。 Method 500 includes steps 510 to 540. Steps 510 to 540 are described below.
510,获取训练文本。510, obtain training text.
520,获取知识图谱。520, obtain the knowledge graph.
530,基于知识图谱确定训练文本的初始概念图谱,该初始概念图谱中的节点包括主题节点,其中,主题节点包括知识图谱中与训练文本中的目标名词词组对应的候选实体,该初始概念图谱中的节点之间的边用于表示该初始概念图谱中的节点之间的实体关系。530. Determine an initial concept map of the training text based on the knowledge map. The nodes in the initial concept map include topic nodes, where the topic nodes include candidate entities in the knowledge map corresponding to the target noun phrases in the training text. In the initial concept map The edges between the nodes are used to represent the entity relationships between the nodes in the initial concept graph.
540,将初始概念图谱输入至RGAT模型进行训练以得到目标RGAT模型,在训练过程中,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与训练文本的相关度以及第二概念图谱中的边的权重确定的,i为正整数。第一概念图谱为初始概念图谱的子图,第二概念图谱为初始概念图谱的子图。540. Input the initial concept map into the RGAT model for training to obtain the target RGAT model. During the training process, the first concept map in the i+1 iteration is based on the nodes in the second concept map in the i-th iteration. Determined by the relevance to the training text and the weight of the edge in the second concept graph, i is a positive integer. The first concept map is a subgraph of the initial concept map, and the second concept map is a subgraph of the initial concept map.
根据本申请实施例的方案,在RGAT模型的迭代训练过程中,可以根据概念图谱中的 节点与训练文本的相关度,以及概念图谱中的边的重要度对概念图谱进行优化,并将优化后的概念图谱作为下一次迭代所使用的概念图谱。这样,有利于消减RGAT模型对文本相关度较小的知识的关注,加强模型对文本相关度较大的知识的关注,进而提高RGAT模型的训练效果,学习到更准确的知识层面的编码,从而提高下游文本处理任务的准确性。文本相关度较小的知识也可以理解为冗余知识或者歧义知识。文本相关度较大的知识也可以理解为关键知识。According to the solution of the embodiment of the present application, during the iterative training process of the RGAT model, the concept map can be optimized according to the correlation between the nodes in the concept map and the training text, and the importance of the edges in the concept map, and the optimized The concept map is used as the concept map used in the next iteration. In this way, it is helpful to reduce the RGAT model's focus on knowledge with low text relevance and strengthen the model's focus on knowledge with greater text relevance, thereby improving the training effect of the RGAT model and learning more accurate knowledge-level encoding, thereby Improve the accuracy of downstream text processing tasks. Knowledge with low textual relevance can also be understood as redundant knowledge or ambiguous knowledge. Knowledge with greater text relevance can also be understood as key knowledge.
示例性地,知识图谱可以表示为图结构的数据,用于表示现实世界中存在的实体及其关系。实体可以表示为知识图谱中的节点,实体关系可以表示为知识图谱中的边。在本申请实施例中,知识图谱中的节点也可以称为知识图谱中的实体节点或知识图谱中的实体。知识图谱中的边也可以称为知识图谱中的关系边或知识图谱中的实体关系。For example, the knowledge graph can be represented as graph-structured data, used to represent entities and their relationships that exist in the real world. Entities can be represented as nodes in the knowledge graph, and entity relationships can be represented as edges in the knowledge graph. In the embodiment of the present application, nodes in the knowledge graph may also be called entity nodes in the knowledge graph or entities in the knowledge graph. The edges in the knowledge graph can also be called relationship edges in the knowledge graph or entity relationships in the knowledge graph.
知识图谱可以是现有的知识图谱,或者,也可以是预先构建的知识图谱,本申请实施例对此不做限定。The knowledge graph may be an existing knowledge graph, or it may be a pre-constructed knowledge graph, which is not limited in the embodiments of the present application.
步骤520中的知识图谱可以为训练文本所属专业领域的知识图谱。The knowledge graph in step 520 may be a knowledge graph in the professional field to which the training text belongs.
例如,文本为医药领域的数据,该知识图谱可以为医药领域的知识图谱。For example, the text is data in the medical field, and the knowledge graph can be a knowledge graph in the medical field.
再如,文本为金融领域的数据,该知识图谱可以为金融领域的知识图谱。For another example, the text is data in the financial field, and the knowledge graph can be a knowledge graph in the financial field.
知识图谱可以是根据专业领域的语料构建的。例如,语料可以包括网站文章或图书等。基于不同的专业领域的语料可以构建不同的专业领域的知识图谱。Knowledge graphs can be constructed based on corpus in professional fields. For example, the corpus can include website articles or books, etc. Knowledge graphs in different professional fields can be constructed based on corpora in different professional fields.
示例性地,知识图谱中代表两个实体之间存在实体关系的事实可以表示为三元组数据结构。For example, the fact in the knowledge graph that represents the existence of an entity relationship between two entities can be represented as a triple data structure.
应理解,此处仅为示例,还可以通过三元组以外的其他形式表示知识图谱中的事实,本申请实施例对此不做限定。It should be understood that this is only an example, and the facts in the knowledge graph can also be expressed in other forms other than triples, which is not limited in the embodiments of the present application.
示例性地,初始概念图谱的构建过程可以由图4所示的知识编码模块430执行。Illustratively, the construction process of the initial concept map may be performed by the knowledge encoding module 430 shown in FIG. 4 .
初始概念图谱可以表示为图结构的数据,用于指示文本中的知识。初始概念图谱可以理解为知识图谱的子图。概念图谱中的节点也可以称为概念图谱中的实体或概念图谱中的实体节点。The initial concept map can be represented as graph-structured data that is used to indicate knowledge in the text. The initial concept graph can be understood as a subgraph of the knowledge graph. The nodes in the concept map can also be called entities in the concept map or entity nodes in the concept map.
初始概念图谱中的主题节点包括知识图谱中与训练文本中的目标名词词组对应的候选实体,也可以理解为,初始概念图谱中的主题节点对应于知识图谱中与训练文本中的目标名词词组对应的候选实体。主题节点和候选实体之间可以是一一对应的。The topic nodes in the initial concept map include candidate entities in the knowledge map that correspond to the target noun phrases in the training text. It can also be understood that the topic nodes in the initial concept map correspond to the target noun phrases in the knowledge map that correspond to the training text. candidate entities. There can be a one-to-one correspondence between topic nodes and candidate entities.
初始概念图谱中的边用于表示初始概念图谱中的节点之间的实体关系,也可以理解为,初始概念图谱中的边用于表示知识图谱中与初始概念图谱中的节点对应的实体之间的实体关系。该初始概念图谱中的节点之间的边是基于知识图谱中的实体关系确定的。根据知识图谱中的实体关系连接初始概念图谱中的节点,可以得到初始概念图谱中的节点之间的边。The edges in the initial concept map are used to represent the entity relationships between nodes in the initial concept map. It can also be understood that the edges in the initial concept map are used to represent entities in the knowledge map that correspond to the nodes in the initial concept map. entity relationship. The edges between nodes in the initial concept graph are determined based on the entity relationships in the knowledge graph. By connecting the nodes in the initial concept graph according to the entity relationships in the knowledge graph, the edges between the nodes in the initial concept graph can be obtained.
例如,初始概念图谱中的节点A和节点B之间的一条边,用于表示节点A和节点B之间的实体关系。初始概念图谱中的节点A对应于知识图谱中的实体A,初始概念图谱中的节点B对应于知识图谱中的实体B。节点A和节点B之间的实体关系即为知识图谱中的实体A和实体B之间的实体关系。知识图谱中的实体A也可以称为知识图谱中的节点A,知识图谱中的节点B也可以称为知识图谱中的节点B,节点A和节点B之间的实体关系即在知识图谱中节点A和节点B之间的实体关系。For example, an edge between node A and node B in the initial concept map is used to represent the entity relationship between node A and node B. Node A in the initial concept map corresponds to entity A in the knowledge map, and node B in the initial concept map corresponds to entity B in the knowledge map. The entity relationship between node A and node B is the entity relationship between entity A and entity B in the knowledge graph. Entity A in the knowledge graph can also be called node A in the knowledge graph. Node B in the knowledge graph can also be called node B in the knowledge graph. The entity relationship between node A and node B is the node in the knowledge graph. Entity relationship between node A and node B.
目标名词词组指的是文本中在知识图谱中对应至少一个候选实体的名词词组。换言之,若训练文本中的一个名词词组对应于知识图谱中的至少一个候选实体,则该名词词组即可作为训练文本中的一个目标名词词组。The target noun phrase refers to the noun phrase in the text that corresponds to at least one candidate entity in the knowledge graph. In other words, if a noun phrase in the training text corresponds to at least one candidate entity in the knowledge graph, then the noun phrase can be used as a target noun phrase in the training text.
文本中的目标名词词组可以是基于知识图谱获取的。The target noun phrase in the text can be obtained based on the knowledge graph.
示例性地,基于知识图谱在文本中抽取知识三元组,即文本的知识三元组。文本的知识三元组的形式可以为(名词词组,关系词组,名词词组)。其中,知识三元组中的名词词组为文本数据中的词组。知识三元组中的关系词组为文本数据中的词组。知识三元组中的名词词组对应于知识图谱中的至少一个候选实体。知识三元组中的关系词组可以对应于知识图谱中的至少一个实体关系。一个名词词组可以包括一个词语,也可以包括多个词语。一个关系词组可以包括一个词语,也可以包括多个词语。该知识三元组中的名词词组即为目标名词词组。换言之,在文本数据的名词词组中,在知识图谱中存在对应的候选实体的名词词组即可作为目标名词词组。该知识三元组中的关系词组即可作为目标关系词组。换言之,在文本数据的关系词组中,在知识图谱中存在对应的实体关系的关系词组即可作为目标关系词组。抽取文本数据中的知识三元组也可以理解为识别文本数据中构成知识三元组的名词词组和关系词组。For example, knowledge triples are extracted from the text based on the knowledge graph, that is, knowledge triples of the text. The form of the knowledge triplet of the text can be (noun phrase, relational phrase, noun phrase). Among them, the noun phrases in the knowledge triplet are phrases in the text data. The relational phrases in the knowledge triplet are phrases in the text data. The noun phrase in the knowledge triplet corresponds to at least one candidate entity in the knowledge graph. The relational phrases in the knowledge triplet may correspond to at least one entity relationship in the knowledge graph. A noun phrase can include one word or multiple words. A relational phrase can include one word or multiple words. The noun phrase in the knowledge triplet is the target noun phrase. In other words, among the noun phrases of the text data, the noun phrases that have corresponding candidate entities in the knowledge graph can be used as the target noun phrases. The relational phrases in the knowledge triplet can be used as the target relational phrases. In other words, among the relationship phrases of the text data, the relationship phrases that have corresponding entity relationships in the knowledge graph can be used as the target relationship phrases. Extracting knowledge triples in text data can also be understood as identifying noun phrases and relational phrases that constitute knowledge triples in text data.
基于知识图谱在文本中抽取知识三元组,也可以称为基于知识图谱在文本中抽取知识。Extracting knowledge triples from text based on knowledge graphs can also be called extracting knowledge from texts based on knowledge graphs.
示例性地,抽取知识的过程可以由图4中的知识抽取模块410执行。For example, the process of extracting knowledge may be performed by the knowledge extraction module 410 in FIG. 4 .
图6示出了一种知识抽取的示意图。例如,如图6所示,基于医药领域的知识图谱识别出该文本数据中的知识三元组。文本为“Vitamin B 12 creates risk for anemia”,即“维他命B12增加贫血的风险”,该文本中抽取的知识三元组为(Vitamin B12,creates risk for(+),anemia),其中,名词词组为“Vitamin B12”和“anemia”,关系词组为“creates risk for(+)”。Figure 6 shows a schematic diagram of knowledge extraction. For example, as shown in Figure 6, knowledge triples in the text data are identified based on the knowledge graph in the medical field. The text is "Vitamin B 12 creates risk for anemia", that is, "Vitamin B12 increases the risk of anemia". The knowledge triplet extracted from this text is (Vitamin B12, creates risk for (+), anemia), where the noun phrase For "Vitamin B12" and "anemia", the related phrase is "creates risk for (+)".
应理解,图6中仅以从文本中抽取一个知识三元组为例进行说明,在实际应用中,从文本中识别出的知识三元组也可以为多个,本申请实施例对此不做限定。It should be understood that Figure 6 only takes the extraction of one knowledge triplet from the text as an example. In practical applications, there may be multiple knowledge triplets identified from the text. This is not the case in the embodiment of the present application. Make limitations.
需要说明的是,知识三元组中的关系词组在知识图谱中所对应的实体关系不一定为该知识三元组中的名词词组在知识图谱中所对应的候选实体之间的实体关系。以图6为例,该文本中抽取的知识三元组为(Vitamin B12,creates risk for(+),anemia),其中,名词词组为“Vitamin B12”和“anemia”在知识图谱中均存在对应的候选实体,关系词组为“creates risk for(+)”在知识图谱中存在对应的实体关系。而在知识图谱中,“Vitamin B12”和“anemia”在知识图谱对应的候选实体之间的实体关系不一定为“creates risk for(+)”。It should be noted that the entity relationships corresponding to the relational phrases in the knowledge triplet in the knowledge graph are not necessarily the entity relationships between the candidate entities corresponding to the noun phrases in the knowledge triplet in the knowledge graph. Taking Figure 6 as an example, the knowledge triples extracted from the text are (Vitamin B12, creates risk for (+), anemia). Among them, the noun phrases "Vitamin B12" and "anemia" have correspondences in the knowledge map The candidate entity, the relationship phrase is "creates risk for (+)" and there is a corresponding entity relationship in the knowledge graph. In the knowledge map, the entity relationship between the candidate entities corresponding to "Vitamin B12" and "anemia" in the knowledge map is not necessarily "creates risk for (+)".
主题节点可以包括知识图谱中与文本数据中的所有目标名词词组对应的候选实体。换言之,主题节点可以包括文本数据中抽取出的所有知识三元组中的名词词组对应的候选实体。Topic nodes may include candidate entities in the knowledge graph corresponding to all target noun phrases in the text data. In other words, the topic node may include candidate entities corresponding to the noun phrases in all knowledge triples extracted from the text data.
可选地,主题节点包括知识图谱中与目标名词词组对应的全部候选实体。Optionally, the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
一个目标名词词组可能对应知识图谱中的一个或多个候选实体。A target noun phrase may correspond to one or more candidate entities in the knowledge graph.
例如,目标名词词组“anemia”可以对应“anemia(disease)”、“anemia(symptom)”或“anemia(plant)”等多个候选实体。For example, the target noun phrase "anemia" can correspond to multiple candidate entities such as "anemia(disease)", "anemia(symptom)" or "anemia(plant)".
在上述方案中,主题节点可以包括知识图谱中与每个目标名词词组对应的所有候选实 体,由此得到的概念图谱中涵盖了与文本数据相关的全部候选实体以及对应的实体关系。这样能够为后续的处理过程提供与文本相关的全面完整的知识,避免由于遗漏部分知识而导致不正确的推理路径。In the above solution, the topic node can include all candidate entities corresponding to each target noun phrase in the knowledge graph, and the resulting concept map covers all candidate entities related to the text data and the corresponding entity relationships. This can provide comprehensive and complete knowledge related to the text for subsequent processing and avoid incorrect reasoning paths due to missing partial knowledge.
可选地,初始概念图谱中的节点还包括邻居节点,邻居节点包括目标名词词组对应的候选实体在知识图谱中的邻居实体。Optionally, the nodes in the initial concept graph also include neighbor nodes, and the neighbor nodes include neighbor entities in the knowledge graph of the candidate entities corresponding to the target noun phrase.
换言之,初始概念图谱中的邻居节点对应于知识图谱中的目标名词词组对应的候选实体的邻居实体。邻居节点和邻居节点之间可以是一一对应的。In other words, the neighbor nodes in the initial concept graph correspond to the neighbor entities of the candidate entities corresponding to the target noun phrase in the knowledge graph. There can be a one-to-one correspondence between neighbor nodes.
邻居节点可以包括目标名词词组对应的候选实体在知识图谱中的k跳邻居。k为正整数。例如,k可以为小于或等于3的正整数。The neighbor nodes may include k-hop neighbors of the candidate entities corresponding to the target noun phrase in the knowledge graph. k is a positive integer. For example, k may be a positive integer less than or equal to 3.
示例性地,邻居节点可以包括目标名词词组对应的全部候选实体在知识图谱中的k跳邻居。For example, the neighbor nodes may include k-hop neighbors of all candidate entities corresponding to the target noun phrase in the knowledge graph.
邻居节点在知识推理中具有重要作用。本申请实施例的初始概念图谱中还包括知识图谱内与候选实体(即主题节点)相连的邻居实体,即邻居节点,能够进一步提供更加全面完整的知识,有利于提高知识层面的编码的准确性。Neighbor nodes play an important role in knowledge reasoning. The initial concept map in the embodiment of the present application also includes neighbor entities, that is, neighbor nodes, that are connected to the candidate entities (ie, topic nodes) in the knowledge map, which can further provide more comprehensive and complete knowledge and help improve the accuracy of coding at the knowledge level. .
进一步地,初始概念图谱还可以包括目标关系词组在知识图谱中对应的实体关系。Furthermore, the initial concept map may also include entity relationships corresponding to the target relationship phrases in the knowledge map.
图7示出了一种初始概念图谱的示意图。如图7所示,基于医药领域知识图谱和文本数据中抽取出的知识三元组构建文本数据对应的概念图谱。图7所示的初始概念图谱即为图6所示的文本对应的一种初始概念图谱。图7中的主题节点“anemia(disease)”、“anemia(symptom)”以及“anemia(plant)”为知识图谱中与图6所示的文本中的目标名词词组“anemia”对应的全部候选实体,图7中的主题节点“Vitamin B 12(chemical)”为知识图谱中与图6所示的文本中的目标名词词组“Vitamin B 12”对应的全部候选实体。图7中的邻居节点为知识图谱中的上述候选实体的一跳邻居实体。“Vitamin B12(chemical)”、“anemia(disease)”、“anemia(symptom)”和“anemia(plant)”即作为概念图谱中的主题节点。“hemoglobin(biological substance)”、“GI bleeding(biologic function)”和“plant(type)”即作为概念图谱中的邻居节点。图7中的节点之间的边用于表示知识图谱中的对应的实体之间的实体关系。Figure 7 shows a schematic diagram of an initial concept map. As shown in Figure 7, a concept map corresponding to the text data is constructed based on the knowledge map in the medical field and the knowledge triples extracted from the text data. The initial concept map shown in Figure 7 is an initial concept map corresponding to the text shown in Figure 6. The topic nodes "anemia(disease)", "anemia(symptom)" and "anemia(plant)" in Figure 7 are all candidate entities in the knowledge graph corresponding to the target noun phrase "anemia" in the text shown in Figure 6 , the topic node "Vitamin B 12 (chemical)" in Figure 7 is all candidate entities in the knowledge graph corresponding to the target noun phrase "Vitamin B 12" in the text shown in Figure 6. The neighbor nodes in Figure 7 are one-hop neighbor entities of the above candidate entities in the knowledge graph. "Vitamin B12(chemical)", "anemia(disease)", "anemia(symptom)" and "anemia(plant)" are the theme nodes in the concept map. "hemoglobin(biological substance)", "GI bleeding(biologic function)" and "plant(type)" are used as neighbor nodes in the concept map. The edges between nodes in Figure 7 are used to represent the entity relationships between corresponding entities in the knowledge graph.
示例性地,初始概念图谱中的节点与训练文本的相关度可以是根据节点的重要度确定的。例如,初始概念图谱中的节点与训练文本的相关度可以为节点的重要度。或者,初始概念图谱中的节点与训练文本的相关度可以与节点的重要度呈正相关关系,即节点的重要度越大,节点与训练文本的相关度越高。For example, the correlation between the nodes in the initial concept map and the training text may be determined based on the importance of the nodes. For example, the relevance of a node in the initial concept map to the training text can be the importance of the node. Alternatively, the correlation between the nodes in the initial concept map and the training text can be positively correlated with the importance of the node, that is, the greater the importance of the node, the higher the correlation between the node and the training text.
示例性地,节点的重要度可以由中心性测量方法(centrality measurement)确定,例如,由网页排名(PageRank)算法、度中心性(degree centrality)等确定。但这些方法存在一定的局限性,无法较准确地表示节点的重要度。For example, the importance of a node can be determined by a centrality measurement method (centrality measurement), for example, by a webpage ranking (PageRank) algorithm, degree centrality (degree centrality), etc. However, these methods have certain limitations and cannot accurately represent the importance of nodes.
示例性地,节点的重要度可以是由节点的特征向量中心性确定的。例如,节点的重要度可以为节点的特征向量中心性。或者,节点的重要度可以与节点的特征向量中心性呈正相关关系。For example, the importance of a node may be determined by the feature vector centrality of the node. For example, the importance of a node can be the eigenvector centrality of the node. Alternatively, the importance of a node can be positively correlated with the eigenvector centrality of the node.
可选地,初始概念图谱中的各个主题节点与训练文本的相关度可以为各个主题节点的特征向量中心性。Optionally, the correlation between each topic node in the initial concept map and the training text can be the feature vector centrality of each topic node.
进一步地,初始概念图谱中的各个主题节点与训练文本的相关度可以是主题相关图上 的各个主题节点的特征向量中心性。主题相关图中的节点均为主题节点。各个主题节点的特征向量中心性是根据各个主题节点的初始重要度和主题相关图中的边的权重确定的。主题相关图中的节点的初始重要度是根据该节点在知识图谱记录的事实中出现的概率设置的。主题相关图中的边是根据知识图谱中的候选实体之间的实体关系连接得到的。主题相关图中的边的权重是根据该条边所连接的两个主题节点在知识图谱中对应的候选实体之间的实体关系的数量,即候选实体之间的边的数量确定的。Further, the correlation between each topic node in the initial concept map and the training text can be the feature vector centrality of each topic node on the topic correlation graph. The nodes in the topic related graph are all topic nodes. The feature vector centrality of each topic node is determined based on the initial importance of each topic node and the weight of the edges in the topic correlation graph. The initial importance of a node in the topic related graph is set based on the probability of the node appearing in the facts recorded in the knowledge graph. The edges in the topic related graph are connected based on the entity relationships between candidate entities in the knowledge graph. The weight of an edge in a topic related graph is determined based on the number of entity relationships between the candidate entities corresponding to the two topic nodes connected by the edge in the knowledge graph, that is, the number of edges between candidate entities.
例如,主题相关图中的节点C和节点D分别为知识图谱中的候选实体C和候选实体D,在知识图谱中的候选实体C和候选实体D之间存在n条边,则在主题相关图中的节点C和节点D之间构建一条权重为n的边。n为正整数。For example, node C and node D in the topic related graph are respectively candidate entity C and candidate entity D in the knowledge graph. There are n edges between candidate entity C and candidate entity D in the knowledge graph, then in the topic related graph An edge with weight n is constructed between node C and node D in . n is a positive integer.
图8示出了本申请实施例提供的一种主题相关图,该主题相关图中的节点即为图7中的主题节点。FIG. 8 shows a topic correlation graph provided by an embodiment of the present application. The nodes in the topic correlation graph are the topic nodes in FIG. 7 .
在本申请实施例中,采用特征向量中心性计算主题节点与训练文本的相关度,根据一个节点的邻居节点的重要度确定该节点的重要度,且节点的初始重要度是根据该节点在知识图谱记录的事实中出现的概率设置,能够较为准确地反映节点的重要度。In the embodiment of this application, feature vector centrality is used to calculate the correlation between the topic node and the training text, the importance of a node is determined based on the importance of its neighbor nodes, and the initial importance of a node is based on the node's knowledge The probability setting that appears in the facts recorded in the graph can more accurately reflect the importance of the node.
进一步地,初始概念图谱中的各个邻居节点与训练文本的相关度可以为初始概念图谱对应的信息传播图上的各个邻居节点所在的强连通分支的分数。初始概念图谱对应的信息传播图中的节点为初始概念图谱中的节点。在初始概念图谱中的第一节点是第二节点的1跳邻居(1-hop neighbour)的情况下,在信息传播图中的第二节点至第一节点之间存在一条由第二节点指向第一节点的有向边。Further, the correlation between each neighbor node in the initial concept map and the training text can be the score of the strongly connected branch where each neighbor node on the information propagation graph corresponding to the initial concept map is located. The nodes in the information dissemination graph corresponding to the initial concept map are the nodes in the initial concept map. When the first node in the initial concept graph is a 1-hop neighbor of the second node, there is a path from the second node to the first node in the information propagation graph. A directed edge of a node.
图9示出了本申请实施例提供的一种信息传播图,图9所示的信息传播图为图7所示的初始概念图谱对应的信息传播图,或者说,图6所示的训练文本对应的信息传播图。Figure 9 shows an information propagation diagram provided by an embodiment of the present application. The information propagation diagram shown in Figure 9 is the information propagation diagram corresponding to the initial concept map shown in Figure 7, or in other words, the training text shown in Figure 6 The corresponding information dissemination diagram.
每个概念图谱可以对应一个信息传播图。信息传播图为有向图。有向图中的边为有向边,即有向图中的边具有方向性。概念图谱对应的信息传播图中的节点即为概念图谱中的节点。当且仅当概念图谱中的节点u是节点v的1跳邻居的情况下,在信息传播图中的节点v至节点u之间构建一条由节点v指向节点u的有向边。节点u可以为第一节点,节点v可以为第二节点。Each concept map can correspond to an information dissemination map. The information propagation graph is a directed graph. The edges in a directed graph are directed edges, that is, the edges in a directed graph have directionality. The nodes in the information dissemination diagram corresponding to the concept map are the nodes in the concept map. If and only if node u in the concept graph is a 1-hop neighbor of node v, a directed edge from node v to node u is constructed between node v and node u in the information propagation graph. Node u may be the first node, and node v may be the second node.
示例性地,信息传播图上的各个强连通分支的分数可以是通过将主题节点所在的强连通分支的初始分数根据拓扑排序传播到下游的强连通分支得到的。信息传播图上的各个强连通分支的初始分数可以是根据各个强连通分支中的节点的重要度的最大值确定的。例如,信息传播图上的各个强连通分支的初始分数可以为各个强连通分支中的节点的重要度的最大值。For example, the score of each strongly connected branch on the information propagation graph may be obtained by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected branch according to topological sorting. The initial score of each strongly connected branch on the information propagation graph may be determined based on the maximum value of the importance of the nodes in each strongly connected branch. For example, the initial score of each strongly connected branch on the information propagation graph may be the maximum value of the importance of the nodes in each strongly connected branch.
拓扑排序可以是通过对强连通分支进行深度优化搜索得到的。拓扑排序结果可以理解为强连通分支的深度优先搜索结果。“传播”可以理解为将根据拓扑排序将前序的强连通的分支的分数传递至后续的强连通分数,或者说,将下游的强连通分支的分数更新为上游的强连通分支的分数。Topological sorting can be obtained through deep optimization search of strongly connected branches. The topological sorting results can be understood as the depth-first search results of strongly connected branches. "Propagation" can be understood as passing the score of the previous strongly connected branch to the subsequent strongly connected score according to topological sorting, or in other words, updating the score of the downstream strongly connected branch to the score of the upstream strongly connected branch.
例如,拓扑排序可以为{C 1,C 2,C 3},其中,C 1,C 2,C 3分别表示三个强连通分支。“传播”可以理解为将C 2的分数更新为C 1的分数,将C 3的分数更新为C 2的分数。 For example, the topological sorting can be {C 1 , C 2 , C 3 }, where C 1 , C 2 , and C 3 respectively represent three strongly connected branches. "Propagation" can be understood as updating the score of C 2 to the score of C 1 , and updating the score of C 3 to the score of C 2 .
在步骤540中,对RGAT进行训练,以学习训练文本对应的知识编码表达。In step 540, RGAT is trained to learn the knowledge encoding expression corresponding to the training text.
示例性地,RGAT的训练过程可以按照以下流程进行。For example, the training process of RGAT can be carried out according to the following process.
1)将概念图谱输入至RGAT中进行处理,以得到RGAT输出的训练文本的预测知识编码。1) Input the concept map into RGAT for processing to obtain the predictive knowledge encoding of the training text output by RGAT.
2)根据训练文本的文本编码和预测知识编码确定训练文本的预测结果。2) Determine the prediction results of the training text based on the text encoding and prediction knowledge encoding of the training text.
3)基于训练文本的预测结果调整RGAT的参数。3) Adjust the parameters of RGAT based on the prediction results of the training text.
4)将调整后的RGAT作为步骤1)中的RGAT,重复上述步骤1)至步骤3)进行迭代训练。换言之,下一轮迭代过程中的RGAT即为当前迭代过程中调整后的RGAT。4) Use the adjusted RGAT as the RGAT in step 1), and repeat the above steps 1) to 3) for iterative training. In other words, the RGAT in the next iteration process is the adjusted RGAT in the current iteration process.
步骤1)至步骤3)可以视为一次迭代过程。Step 1) to step 3) can be regarded as an iterative process.
示例性地,步骤1)可以由图4中的知识编码模块430执行。Illustratively, step 1) may be performed by the knowledge encoding module 430 in FIG. 4 .
训练文本的知识编码也可以称为训练文本的知识层面的编码。知识编码可以表示为知识嵌入向量或知识特征向量等。The knowledge encoding of the training text can also be called the encoding of the knowledge level of the training text. Knowledge encoding can be expressed as knowledge embedding vector or knowledge feature vector, etc.
示例性地,训练文本的知识编码可以包括概念图谱中的节点的编码和概念图谱中的边的编码。For example, the knowledge encoding of the training text may include encoding of nodes in the concept graph and encoding of edges in the concept graph.
例如,训练文本的知识编码可以包括概念图谱中的节点的嵌入向量和边的嵌入向量。For example, the knowledge encoding of the training text may include embedding vectors of nodes and embedding vectors of edges in the concept graph.
在步骤2)中,训练文本的文本编码也可以称为训练文本的文字层面的编码。文字层面的编码指的是用于表达文本数据中的文字内容和文字排列序列等的低维空间向量。In step 2), the text encoding of the training text can also be called the text-level encoding of the training text. Text-level encoding refers to low-dimensional space vectors used to express text content and text arrangement sequences in text data.
文本编码可以表示为文本嵌入向量或文本特征向量。Text encoding can be represented as text embedding vectors or text feature vectors.
示例性地,训练文本的文本编码可以包括训练文本序列的文本编码和训练文本中的目标词组的文本编码。目标词组可以包括目标名词词组和目标关系词组。训练文本序列的文本编码也可以称为训练文本自身的文本编码。Exemplarily, the text encoding of the training text may include text encoding of the training text sequence and text encoding of the target phrases in the training text. The target phrase may include a target noun phrase and a target relative phrase. The text encoding of the training text sequence can also be called the text encoding of the training text itself.
具体地,可以通过预训练的语言模型对训练文本进行处理,以得到训练文本的文本编码。Specifically, the training text can be processed through a pre-trained language model to obtain the text encoding of the training text.
例如,可以通过基于变换器(transformer)的双向编码器表示(bidirectional encoder representation from transformers,BERT)模型、双向门控循环单元(bidirectional gating recurrent unit,BiGRU)或双向长短期记忆(bi-directional long short-term memory,BiLSTM)模型等方法对训练文本进行处理,以得到训练文本的文本编码。For example, it can be represented by a bidirectional encoder representation from transformers (BERT) model, a bidirectional gating recurrent unit (BiGRU) or a bidirectional long short-term memory (bi-directional long short memory). -term memory, BiLSTM) model and other methods process the training text to obtain the text encoding of the training text.
示例性地,获取训练文本的文本编码的过程可以由图4中的文本编码模块420执行。Exemplarily, the process of obtaining the text encoding of the training text may be performed by the text encoding module 420 in FIG. 4 .
示例性地,步骤2)可以由图4中的任务处理模块440执行。For example, step 2) may be performed by the task processing module 440 in FIG. 4 .
训练文本的预测结果的类型与文本处理任务的类型相关,即与下游任务的类型相关。The type of prediction result of the training text is related to the type of text processing task, that is, related to the type of downstream task.
示例性地,方法500用于文本分类任务,在该情况下,训练文本的预测结果可以为训练文本的预测类别。例如,将训练文本的文本编码和预测知识编码输入至分类器中,得到训练文本的预测类别。再如,将训练文本的文本编码和预测知识编码进行融合,将融合后的结果输入至分类器中,得到训练文本的预测类别。融合方式可以为将训练文本的文本编码和预测知识编码进行向量拼接,得到文本融合编码。该分类器可以为softmax函数。Illustratively, the method 500 is used for text classification tasks, in which case the prediction result of the training text may be the predicted category of the training text. For example, the text encoding and prediction knowledge encoding of the training text are input into the classifier to obtain the predicted category of the training text. Another example is to fuse the text encoding of the training text and the prediction knowledge encoding, and input the fusion result into the classifier to obtain the prediction category of the training text. The fusion method can be vector splicing of the text encoding of the training text and the prediction knowledge encoding to obtain the text fusion encoding. The classifier can be a softmax function.
在步骤3)中,可以以减少训练文本的目标处理结果和预测结果之间的差距为目标调整RGAT的参数。In step 3), the parameters of RGAT can be adjusted with the goal of reducing the gap between the target processing results and prediction results of the training text.
以文本分类任务为例,在步骤3)中,可以以减少训练文本的标签和训练文本的预测类别之间的差距为目标调整RGAT的参数。训练文本的标签即为训练文本的目标处理结果。训练文本的标签用于指示训练文本对应的类别的真值,即训练文本的真实类别。Taking the text classification task as an example, in step 3), the parameters of RGAT can be adjusted with the goal of reducing the gap between the label of the training text and the predicted category of the training text. The label of the training text is the target processing result of the training text. The label of the training text is used to indicate the true value of the category corresponding to the training text, that is, the true category of the training text.
应理解,以上仅为示例,方法500还可以用于其他自然语言处理任务,例如,多跳推 理问答任务。本申请实施例对此不做限定。It should be understood that the above are only examples, and the method 500 can also be used for other natural language processing tasks, such as multi-hop reasoning question and answer tasks. The embodiments of the present application do not limit this.
在第一次迭代过程中,步骤1)中的概念图谱可以为初始概念图谱。In the first iteration process, the concept map in step 1) can be the initial concept map.
第i+1次迭代中的第一概念图谱可以理解为,在第i+1次迭代过程中输入至RGAT的概念图谱。或者说,在第i+1次迭代过程中,RGAT基于第一概念图谱进行前向传播和反向传播。The first concept map in the i+1 iteration can be understood as the concept map input to RGAT during the i+1 iteration. In other words, in the i+1 iteration process, RGAT performs forward propagation and back propagation based on the first concept map.
第i次迭代中的第二概念图谱可以理解为,第i次迭代过程中输入至RGAT的概念图谱。在第i次迭代过程中,RGAT基于第二概念图谱进行前向传播和反向传播。The second concept map in the i-th iteration can be understood as the concept map input to RGAT during the i-th iteration. In the i-th iteration process, RGAT performs forward propagation and back propagation based on the second concept map.
第i次迭代可以为RGAT训练过程中的任一次迭代。The i-th iteration can be any iteration in the RGAT training process.
需要说明的是,“第一概念图谱”中的“第一”和第二概念图谱中的“第二”仅用于区别两次迭代过程中的RGAT的输入数据,不具有其他限定作用。It should be noted that the "first" in the "first concept map" and the "second" in the second concept map are only used to distinguish the input data of RGAT in the two iteration processes and have no other limiting effect.
在本申请实施例中,RGAT迭代训练的过程中可以不断优化概念图谱,每次迭代过程中的概念图谱可以是不同的。以上述训练流程为例,在不同的迭代过程中,步骤1)中的概念图谱可以是不同的。In the embodiment of the present application, the concept map can be continuously optimized during the iterative training of RGAT, and the concept map in each iteration process can be different. Taking the above training process as an example, in different iterative processes, the concept map in step 1) can be different.
示例性地,在RGAT学习概念图谱的表达的过程中,即RGAT的训练过程中,每一轮迭代中的概念图谱可以是根据上一轮迭代中的概念图谱中的节点与训练文本的相关度以及上一轮迭代中的概念图谱中的边的权重确定的。For example, in the process of RGAT learning the expression of the concept map, that is, during the training process of RGAT, the concept map in each iteration can be based on the correlation between the nodes in the concept map in the previous iteration and the training text. And the weights of the edges in the concept map in the previous iteration are determined.
迭代过程中的概念图谱均为初始概念图谱的子图。迭代过程中的概念图谱中的节点与训练文本的相关度即为初始概念图谱中的同一节点与训练文本的相关度。The concept maps in the iterative process are all subgraphs of the initial concept map. The correlation between the nodes in the concept map and the training text during the iterative process is the correlation between the same node in the initial concept map and the training text.
在RGAT的迭代训练过程中优化概念图谱,优化的方向可以理解为对训练文本的相关度较小的节点和/或权重较小的边进行剪枝处理,保留与训练文本的相关度较大的节点和/或保留权重较大的边。During the iterative training process of RGAT, the concept map is optimized. The direction of optimization can be understood as pruning nodes and/or edges with small weights that are less relevant to the training text, and retaining those that are more relevant to the training text. Nodes and/or edges with larger weights are retained.
概念图谱中的边的权也可以称为概念图谱中的事实的权重,即注意力权重。The weight of the edge in the concept map can also be called the weight of the facts in the concept map, that is, the attention weight.
在一种可能的实现方式中,第一概念图谱属于初始概念图谱的第一子图的集合,第一子图的第一消耗(cost)小于或等于阈值,第一概念图谱的收益(benefit)大于或等于第一子图的集合中的其他第一子图的收益,第一子图的第一消耗是根据第一子图内的边的第一消耗确定的,第一概念图谱的收益是根据第一概念图谱内的边的收益确定的,第一子图的收益根据第一子图内的边的收益确定的,边的收益和该边在第i次迭代过程中的权重呈正相关关系,边的第一消耗和该边所连接的两个节点与训练文本的相关度呈负相关关系。In a possible implementation, the first concept map belongs to the set of first subgraphs of the initial concept map, the first cost of the first subgraph is less than or equal to the threshold, and the benefit of the first concept map is is greater than or equal to the benefit of the other first subgraphs in the set of the first subgraph, the first cost of the first subgraph is determined based on the first cost of the edge within the first subgraph, and the benefit of the first concept graph is The income of the first subgraph is determined based on the income of the edge in the first concept graph. The income of the edge is positively correlated with the weight of the edge in the i-th iteration process. , there is a negative correlation between the first consumption of an edge and the correlation between the two nodes connected by the edge and the training text.
初始概念图谱的子图中,第一消耗小于或等于阈值的子图均可以称为第一子图。第一子图的集合即为第一消耗小于或等于阈值的子图的集合。第一概念图谱为该集合中的一个元素。换言之,第一概念图谱即为一个第一子图。Among the subgraphs of the initial concept map, any subgraph whose first consumption is less than or equal to the threshold can be called the first subgraph. The set of first subgraphs is the set of subgraphs whose first consumption is less than or equal to the threshold. The first concept map is an element in the set. In other words, the first concept map is a first subgraph.
换言之,在初始概念图谱的第一消耗小于或等于阈值的子图的集合中,将收益最大的第一子图作为第一概念图谱。或者说,选取初始概念图谱的一个子图,在子图的第一消耗小于或等于阈值的情况下,最大化子图的收益,即可得到第一概念图谱。In other words, among the set of subgraphs whose first consumption of the initial concept map is less than or equal to the threshold, the first subgraph with the greatest benefit is used as the first concept map. In other words, by selecting a subgraph of the initial concept map and maximizing the revenue of the subgraph when the first consumption of the subgraph is less than or equal to the threshold, the first concept map can be obtained.
一条边所连接的两个节点与训练文本的相关度越高,该条边的第一消耗越小。一条边所连接的两个节点与训练文本的相关度越低,该条边的第一消耗越大。例如,一条边所连接的两个节点与训练文本的相关度的平均值越高,该条边的第一消耗越小。再如,一条边所连接的两个节点与训练文本的相关度的平均值越低,该条边的第一消耗越大。The higher the correlation between the two nodes connected by an edge and the training text, the smaller the first cost of the edge. The lower the correlation between the two nodes connected by an edge and the training text, the greater the first consumption of the edge. For example, the higher the average correlation between the two nodes connected by an edge and the training text, the smaller the first cost of the edge. For another example, the lower the average correlation between the two nodes connected by an edge and the training text, the greater the first consumption of the edge.
示例性地,子图的第一消耗可以是根据子图内的所有边的第一消耗确定的。例如,子 图的第一消耗可以为子图内的所有边的第一消耗的总和。再如,子图的第一消耗可以为子图内的所有边的第一消耗的平均值。For example, the first consumption of the subgraph may be determined based on the first consumption of all edges within the subgraph. For example, the first cost of a subgraph can be the sum of the first costs of all edges within the subgraph. For another example, the first cost of the subgraph may be the average of the first costs of all edges in the subgraph.
第i次迭代过程中一条边的权重越大,该条边的收益越大。在第i次迭代过程中一条边的权重越小,该条边的收益越小。The greater the weight of an edge during the i-th iteration, the greater the benefit of the edge. The smaller the weight of an edge during the i-th iteration, the smaller the benefit of the edge.
示例性地,子图的收益可以是根据子图内的所有边的收益确定的。例如,子图的收益可以为子图内的所有边的收益的总和。再如,子图的收益可以为子图内的所有边的收益的平均值。For example, the benefit of a subgraph may be determined based on the benefits of all edges within the subgraph. For example, the payoff of a subgraph can be the sum of the payoffs of all edges within the subgraph. For another example, the revenue of a subgraph can be the average of the revenue of all edges in the subgraph.
示例性地,第i+1次迭代中的第一概念图谱可以是对第i次迭代中的第二概念图谱进行优化得到的,优化后的概念图谱可以为第i次迭代中的概念图谱的子图。换言之,第一概念图谱可以为第二概念图谱的子图。For example, the first concept map in the i+1 iteration can be obtained by optimizing the second concept map in the i iteration, and the optimized concept map can be the concept map in the i iteration. subplot. In other words, the first concept map may be a subgraph of the second concept map.
具体地,根据第二概念图谱中的节点与训练文本的相关度以及第i次迭代中的第二概念图谱中的边的权重对第二概念图谱进行优化,以得到第i+1次迭代中的第一概念图谱。Specifically, the second concept map is optimized according to the correlation between the nodes in the second concept map and the training text and the weight of the edges in the second concept map in the i-th iteration to obtain the value in the i+1 iteration. ’s first concept map.
可选地,第一边的收益与第一边的第一消耗之间的比值小于或等于第二边的收益与第二边的第一消耗之间的比值。第一边属于第二概念图谱,且第一边不属于第一概念图谱,第二边属于第一概念图谱。边的收益与第i次迭代中的边的权重呈正相关关系。边的第一消耗与边所连接的两个节点与训练文本的相关度呈负相关关系。第一概念图谱的第一消耗小于或等于阈值。第一概念图谱的第一消耗和第一边的第一消耗的总和大于阈值。Optionally, the ratio between the benefit of the first side and the first consumption of the first side is less than or equal to the ratio between the benefit of the second side and the first consumption of the second side. The first side belongs to the second concept map, and the first side does not belong to the first concept map, and the second side belongs to the first concept map. The benefit of an edge is positively related to the weight of the edge in the i-th iteration. The first consumption of an edge is negatively correlated with the correlation between the two nodes connected by the edge and the training text. The first consumption of the first concept map is less than or equal to the threshold. The sum of the first cost of the first concept map and the first cost of the first edge is greater than the threshold.
第二边属于第二概念图谱,第二边也属于第一概念图谱。The second side belongs to the second concept map, and the second side also belongs to the first concept map.
第二边为第一概念图谱中的任一边。第一边为第二概念图谱中的不属于第一概念图谱的任一边。换言之,第二概念图谱中,任一不属于第一概念图谱的边的收益和第一消耗之间的比值小于或等于第一概念图谱的任一边的收益和第一消耗之间的比值。The second side is any side in the first concept map. The first side is any side in the second concept map that does not belong to the first concept map. In other words, in the second concept map, the ratio between the revenue and the first consumption of any edge that does not belong to the first concept map is less than or equal to the ratio between the revenue and the first consumption of any side of the first concept map.
第i次迭代中的边的权重越大,则该条边的收益越大。边所连接的两个节点与训练文本的相关度越高,则该条边的第一消耗越小。例如,边所连接的两个节点与训练文本的相关度的平均值越高,则该条边的第一消耗越小。The greater the weight of the edge in the i-th iteration, the greater the benefit of the edge. The higher the correlation between the two nodes connected by an edge and the training text, the smaller the first cost of the edge. For example, the higher the average correlation between the two nodes connected by an edge and the training text, the smaller the first cost of the edge.
示例性地,第一概念图谱的第一消耗小于或等于阈值,可以为,第一概念图谱内的所有边的第一消耗的总和小于或等于阈值。For example, the first consumption of the first concept graph is less than or equal to the threshold, which may be that the sum of the first consumption of all edges in the first concept graph is less than or equal to the threshold.
可替换地,第一概念图谱的第一消耗小于或等于阈值,可以为,第一概念图谱内的所有边的第一消耗的平均值小于或等于阈值。Alternatively, the first consumption of the first concept graph is less than or equal to the threshold, which may be that the average of the first consumption of all edges in the first concept graph is less than or equal to the threshold.
示例性地,在第i+1次迭代开始前,第i+1次迭代中的第一概念图谱可以通过如下步骤确定。For example, before the i+1 iteration starts, the first concept map in the i+1 iteration can be determined through the following steps.
S11,获取第i次迭代中的第二概念图谱内的边的第一消耗和收益。S11: Obtain the first consumption and benefit of the edge in the second concept graph in the i-th iteration.
S12,按照边的收益与第一消耗之间的比值从大到小的顺序依次选取第二概念图谱中的边作为第一概念图谱的边,直至已被选取的边的第一消耗的总和大于阈值。第一概念图谱的第一消耗小于或等于阈值。S12: Select the edges in the second concept map as the edges of the first concept map in descending order according to the ratio between the edge income and the first consumption, until the sum of the first consumption of the selected edges is greater than threshold. The first consumption of the first concept map is less than or equal to the threshold.
换言之,第一概念图谱的任一边的收益与第一消耗之间的比值,大于或等于,第一概念图谱中的不属于第二概念图谱的边的收益与第一消耗之间的比值。In other words, the ratio between the revenue and the first consumption of any side of the first concept map is greater than or equal to the ratio between the revenue and the first consumption of the edge in the first concept map that does not belong to the second concept map.
示例性地,阈值可以为对不确定边的数量的容忍值。阈值W可以是根据容忍比例θ和初始概念图谱中的不确定边的数量N确定的。例如,容忍阈值可以为W=θN。0<θ<1。不确定边指的是初始概念图谱中权重小于1的边。N小于或等于初始概念图谱中的边的数 量。N为正整数。For example, the threshold may be a tolerance value for the number of uncertain edges. The threshold W may be determined based on the tolerance ratio θ and the number N of uncertain edges in the initial concept map. For example, the tolerance threshold may be W=θN. 0<θ<1. Uncertain edges refer to edges with weights less than 1 in the initial concept graph. N is less than or equal to the number of edges in the initial concept graph. N is a positive integer.
例如,初始概念图谱中的不确定边的数量N为60,容忍比例θ为0.5,则阈值W为30。在该情况下,步骤2)可以理解为,按照边的收益与第一消耗之间的比值从大到小的顺序依次选取第二概念图谱中的边作为第一概念图谱的边,直至已被选取的边的第一消耗的总和达到30。边的第一消耗可以是根据边所连接的两个节点与训练文本的相关度的平均值确定的。For example, if the number of uncertain edges N in the initial concept map is 60, the tolerance ratio θ is 0.5, then the threshold W is 30. In this case, step 2) can be understood as selecting the edges in the second concept map as the edges of the first concept map in descending order according to the ratio between the edge's income and the first consumption until it has been The sum of the first costs of the selected edges reaches 30. The first consumption of the edge may be determined based on the average correlation between the two nodes connected by the edge and the training text.
应理解,以上仅为示例,还可以通过其他方式优化概念图谱,保留概念图谱中的权重较大的边以及与训练文本的相关度较大的节点,以消减模型对与文本相关度小的知识的关注,加强模型对文本相关度大的知识的关注。It should be understood that the above are only examples, and the concept map can also be optimized in other ways to retain the edges with larger weights and nodes with greater correlation with the training text in the concept map to reduce the model's knowledge of small correlation with the text. Focus on strengthening the model's attention to knowledge with high text relevance.
在一种可能的实现方式中,第一概念图谱是根据初始概念图谱的连通子图的集合的第一子集确定的,第一子集的第二消耗小于或等于第二子集的第二消耗。第一子集的第二消耗是根据第一子集的连通子图内的边的第二消耗确定的。第二子集的第二消耗是根据第二子集的连通子图内的边的第二消耗确定的。边的第二消耗和该边所连接的两个节点与训练文本的相关度呈负相关关系,边的第二消耗和该边在第i次迭代过程中的权重呈负相关关系。In a possible implementation, the first concept map is determined based on a first subset of a set of connected subgraphs of the initial concept map, and the second consumption of the first subset is less than or equal to the second consumption of the second subset. consumption. The second cost of the first subset is determined based on the second cost of the edges within the connected subgraph of the first subset. The second cost of the second subset is determined based on the second cost of the edges within the connected subgraph of the second subset. There is a negative correlation between the second consumption of an edge and the correlation between the two nodes connected by the edge and the training text, and there is a negative correlation between the second consumption of the edge and the weight of the edge in the i-th iteration process.
示例性地,第一子集包括目标名词词组对应的至少一个候选实体,第二子集包括目标名词词组对应的至少一个候选实体。Exemplarily, the first subset includes at least one candidate entity corresponding to the target noun phrase, and the second subset includes at least one candidate entity corresponding to the target noun phrase.
连通子图的集合的一个子集包括目标名词词组对应的至少一个候选实体,可以理解为,目标名词词组对应的至少一个候选实体存在于该子集内的至少一个连通子图上。不同的目标名词词组对应的候选实体可能存在于该子集内的不同连通子图上,也可能存在于该子集内的同一个连通子图上。A subset of the set of connected subgraphs includes at least one candidate entity corresponding to the target noun phrase. It can be understood that at least one candidate entity corresponding to the target noun phrase exists on at least one connected subgraph in the subset. Candidate entities corresponding to different target noun phrases may exist on different connected subgraphs in the subset, or may exist on the same connected subgraph in the subset.
换言之,在初始概念图谱的连通子图的集合的子集中,将包含了所有目标名词词组对应的至少一个候选实体,且第二消耗最小的子集作为第一子集。或者说,选取初始概念图谱的连通子图的集合的一个子集,使得所有目标名词词组都有至少一个对应的候选实体存在于该子集中,且该子集的第二消耗是包含所有目标名词词组对应的至少一个候选实体的子集中第二消耗最小的,即得到第一子集,或者说,得到第一概念图谱。In other words, among the subsets of the set of connected subgraphs of the initial concept map, the subset that contains at least one candidate entity corresponding to all target noun phrases and has the smallest second consumption will be regarded as the first subset. In other words, select a subset of the set of connected subgraphs of the initial concept map so that all target noun phrases have at least one corresponding candidate entity in the subset, and the second consumption of the subset is to include all target nouns The second smallest consumption among the subsets of at least one candidate entity corresponding to the phrase is obtained, that is, the first subset is obtained, or in other words, the first concept map is obtained.
示例性地,第一子集包括目标关系词组对应的至少一个实体关系,第二子集包括目标关系词组对应的至少一个实体关系。Exemplarily, the first subset includes at least one entity relationship corresponding to the target relationship phrase, and the second subset includes at least one entity relationship corresponding to the target relationship phrase.
连通子图的集合的一个子集包括目标关系词组对应的至少一个实体关系,可以理解为,目标关系词组对应的至少一个实体关系存在于该子集内的至少一个连通子图上。不同的目标关系词组对应的实体关系可能存在于该子集内的不同连通子图上,也可能存在于该子集内的同一个连通子图上。A subset of the set of connected subgraphs includes at least one entity relationship corresponding to the target relationship phrase. It can be understood that at least one entity relationship corresponding to the target relationship phrase exists on at least one connected subgraph in the subset. Entity relationships corresponding to different target relationship phrases may exist on different connected subgraphs within the subset, or may exist on the same connected subgraph within the subset.
换言之,在初始概念图谱的连通子图的集合的子集中,将包含了所有目标关系词组对应的至少一个实体关系,且第二消耗最小的子集作为第一子集。或者说,选取初始概念图谱的连通子图的集合的一个子集,使得所有目标关系词组都有至少一个对应的实体关系存在于该子集中,且该子集的第二消耗是包含所有目标关系词组对应的至少一个实体关系的子集中第二消耗最小的,即得到第一子集,或者说,得到第一概念图谱。In other words, among the subsets of the set of connected subgraphs of the initial concept map, the subset that contains at least one entity relationship corresponding to all target relationship phrases and has the smallest second consumption is regarded as the first subset. In other words, select a subset of the set of connected subgraphs of the initial concept map so that all target relationship phrases have at least one corresponding entity relationship existing in the subset, and the second consumption of the subset is to include all target relationships. Among the subsets of at least one entity relationship corresponding to the phrase, the second smallest consumption is obtained, that is, the first subset is obtained, or in other words, the first concept map is obtained.
示例性地,第一子集包括目标名词词组对应的至少一个候选实体和目标关系词组对应的至少一个实体关系。第二子集包括目标名词词组对应的至少一个候选实体和目标关系词 组对应的至少一个实体关系。Exemplarily, the first subset includes at least one candidate entity corresponding to the target noun phrase and at least one entity relationship corresponding to the target relation phrase. The second subset includes at least one candidate entity corresponding to the target noun phrase and at least one entity relationship corresponding to the target relation phrase.
换言之,在初始概念图谱的连通子图的集合的子集中,将包含了所有目标名词词组对应的至少一个候选实体和所有目标关系词组对应的至少一个实体关系,且第二消耗最小的子集作为第一子集。或者说,选取初始概念图谱的连通子图的集合的一个子集,使得所有目标关系词组都有至少一个对应的实体关系存在于该子集中,使得所有的目标名词词组都有至少一个对应的候选实体存在于该子集中,且该子集的第二消耗是包含所有目标关系词组对应的至少一个实体关系以及所有目标名词词组对应的至少一个候选实体的子集中第二消耗最小的,即得到第一子集,或者说,得到第一概念图谱。In other words, in the subset of the set of connected subgraphs of the initial concept map, the subset that contains at least one candidate entity corresponding to all target noun phrases and at least one entity relationship corresponding to all target relation phrases, and has the smallest second consumption is taken as First subset. In other words, select a subset of the set of connected subgraphs of the initial concept map so that all target relation phrases have at least one corresponding entity relationship existing in the subset, so that all target noun phrases have at least one corresponding candidate The entity exists in the subset, and the second consumption of the subset is the smallest second consumption among the subsets containing at least one entity relationship corresponding to all target relation phrases and at least one candidate entity corresponding to all target noun phrases, that is, the second consumption is obtained A subset, or in other words, the first concept map is obtained.
一条边所连接的两个节点与训练文本的相关度越高,第i次迭代过程中一条边的权重越大,该条边的第二消耗越小。一条边所连接的两个节点与训练文本的相关度越低,在第i次迭代过程中一条边的权重越小,该条边的第二消耗越大。例如,一条边所连接的两个节点与训练文本的相关度的平均值越高,该条边的第二消耗越小。再如,一条边所连接的两个节点与训练文本的相关度的平均值越低,该条边的第二消耗越大。The higher the correlation between the two nodes connected by an edge and the training text, the greater the weight of an edge in the i-th iteration process, and the smaller the second consumption of the edge. The lower the correlation between the two nodes connected by an edge and the training text, the smaller the weight of an edge in the i-th iteration process, and the greater the second consumption of the edge. For example, the higher the average correlation between the two nodes connected by an edge and the training text, the smaller the second cost of the edge. For another example, the lower the average correlation between the two nodes connected by an edge and the training text, the greater the second cost of the edge.
示例性地,第一子集的第二消耗可以是根据第一子集内的所有连通子图内的所有边的第二消耗的总和。第二子集的第二消耗可以是根据第二子集内的所有连通子图内的所有边的第二消耗的总和。Exemplarily, the second cost of the first subset may be the sum of the second costs according to all edges within all connected subgraphs within the first subset. The second cost of the second subset may be the sum of the second costs according to all edges within all connected subgraphs within the second subset.
如前所述,第一概念图谱可以为第二概念图谱的子图。As mentioned above, the first concept map may be a subgraph of the second concept map.
可选地,第一连通子图的第二消耗大于或等于第二连通子图的第二消耗,第一连通子图属于第i次迭代中的第二概念图谱,且第一连通子图不属于第i+1次迭代中的第一概念图谱,第二连通子图为第一概念图谱内的第二消耗最大的连通子图。第一连通子图的第二消耗是根据第一连通子图内的边的第二消耗确定的,第二连通子图的第二消耗是根据第二连通子图内的边的第二消耗确定的,边的第二消耗与第i次迭代中的边的权重呈负相关关系,边的第二消耗与该边所连接的两个节点与训练文本的相关度呈负相关关系。第一概念图谱的节点包括目标名词词组对应的至少一个候选实体,第二连通子图的节点包括目标名词词组中的第一名词词组对应的至少一个候选实体,第一概念图谱中的其他连通子图的节点不包括第一名词词组对应的至少一个候选实体。Optionally, the second cost of the first connected subgraph is greater than or equal to the second cost of the second connected subgraph, the first connected subgraph belongs to the second concept graph in the i-th iteration, and the first connected subgraph does not Belonging to the first concept graph in the i+1 iteration, the second connected subgraph is the second most expensive connected subgraph in the first concept graph. The second cost of the first connected subgraph is determined based on the second cost of the edges within the first connected subgraph, and the second cost of the second connected subgraph is determined based on the second cost of the edges within the second connected subgraph. , the second consumption of the edge is negatively correlated with the weight of the edge in the i-th iteration, and the second consumption of the edge is negatively correlated with the correlation between the two nodes connected by the edge and the training text. The nodes of the first concept map include at least one candidate entity corresponding to the target noun phrase, the nodes of the second connected subgraph include at least one candidate entity corresponding to the first noun phrase in the target noun phrase, and other connected subgraphs in the first concept map The nodes of the graph do not include at least one candidate entity corresponding to the first noun phrase.
第二连通子图属于第一概念图谱,相应地,第二连通子图属于第二概念图谱。The second connected subgraph belongs to the first concept map, and accordingly, the second connected subgraph belongs to the second concept map.
第一连通子图为第二概念图谱中的不属于第一概念图谱的任一连通子图。换言之,第二概念图谱中,任一不属于第二概念图谱的连通子图的第二消耗,大于或等于,第二概念图谱的任一连通子图的第二消耗。The first connected subgraph is any connected subgraph in the second concept map that does not belong to the first concept map. In other words, the second consumption of any connected subgraph in the second concept map that does not belong to the second concept map is greater than or equal to the second consumption of any connected subgraph of the second concept map.
示例性地,连通子图的第二消耗可以为该连通子图中的所有边的第二消耗的总和。For example, the second cost of the connected subgraph may be the sum of the second costs of all edges in the connected subgraph.
或者,连通子图的第二消耗为该连通子图内的所有边的第二消耗的平均值。Alternatively, the second cost of the connected subgraph is the average of the second costs of all edges in the connected subgraph.
示例性地,在第i+1次迭代开始前,第i+1次迭代中的概念图谱可以通过如下步骤确定。For example, before the i+1th iteration starts, the concept map in the i+1th iteration can be determined through the following steps.
S21,获取第i次迭代中的第二概念图谱内的连通子图的第二消耗。S21: Obtain the second consumption of the connected subgraph in the second concept graph in the i-th iteration.
S22,按照连通子图的第二消耗从小到大的顺序依次选取第二概念图谱中的连通子图作为第一概念图谱的连通子图,直至被选取的所有连通子图包括训练文本的目标名词词组对应的至少一个候选实体。S22, sequentially select the connected subgraphs in the second concept map as the connected subgraphs of the first concept map according to the order of the second consumption of the connected subgraphs from small to large, until all the selected connected subgraphs include the target nouns of the training text. At least one candidate entity corresponding to the phrase.
例如,将初始子集设为空集,按照连通子图的第二消耗从小到大的顺序依次选取第二 概念图谱中的连通子图添加至该子集中,直至当前的子集包括训练文本的目标名词词组对应的至少一个候选实体。根据当前的子集即可确定第一概念图谱。第二连通子图即为最后一个被添加至该子集中的连通子图。第一连通子图可以为未被添加至该子集的第二概念图谱的任一连通子图。For example, the initial subset is set as an empty set, and the connected subgraphs in the second concept map are selected and added to the subset in order of the second consumption of the connected subgraph from small to large, until the current subset includes the training text. At least one candidate entity corresponding to the target noun phrase. The first concept map can be determined based on the current subset. The second connected subgraph is the last connected subgraph added to the subset. The first connected subgraph may be any connected subgraph that is not added to the second concept graph of the subset.
应理解,以上仅为示例。例如,还可以将上述步骤S22中的“目标名词词组对应的至少一个候选实体”替换为“目标关系词组对应的至少一个实体关系”。It should be understood that the above are examples only. For example, "at least one candidate entity corresponding to the target noun phrase" in the above step S22 can also be replaced with "at least one entity relationship corresponding to the target relation phrase".
目标RGAT即为训练好的RGAT。目标RGAT可以用于得到输入至目标RGAT的图结构数据的特征向量。将待处理的文本的概念图谱输入至目标RGAT中,输出数据可以作为待处理的文本的概念图谱的嵌入表达,即待处理的文本的知识层面的编码。The target RGAT is the trained RGAT. The target RGAT can be used to obtain the feature vector of the graph structure data input to the target RGAT. Input the concept map of the text to be processed into the target RGAT, and the output data can be used as an embedded expression of the concept map of the text to be processed, that is, the encoding of the knowledge level of the text to be processed.
根据本申请实施例的方案,在RGAT通过实体节点之间的迭代消息学习表达的过程中,即训练过程中,根据实体节点与文本的相关度以及边的权重优化概念图谱,例如,对训练文本的相关度较小的节点和/或权重较小的边进行剪枝处理,有利于消减模型对与文本相关度较小的知识的关注,增强对与文本相关度较大的知识的关注,从而提高RGAT的表达能力,进而有利于提高下游任务的准确性。According to the solution of the embodiment of the present application, in the process of RGAT learning expressions through iterative messages between entity nodes, that is, during the training process, the concept graph is optimized according to the correlation between the entity nodes and the text and the weight of the edges. For example, the training text is Pruning nodes with smaller correlations and/or edges with smaller weights will help reduce the model’s attention to knowledge that is less relevant to the text, and increase its focus on knowledge that is more relevant to the text, thus Improving the expressive ability of RGAT will help improve the accuracy of downstream tasks.
此外,在本申请实施例的方案中,初始概念图谱中的节点可以包括目标名词词组对应的所有候选实体,这样有利于保证概念图谱中不遗漏任何与文本相关的知识,保证了概念图谱中的知识的完整性,利用本申请实施例的方案学习文本的知识层面的表达,进一步保证了下游任务的准确性,避免由于遗漏部分知识而导致不正确的推理路径。In addition, in the solution of the embodiment of the present application, the nodes in the initial concept map can include all candidate entities corresponding to the target noun phrase. This is beneficial to ensuring that no text-related knowledge is omitted in the concept map, and ensures that The completeness of knowledge, using the solution of the embodiment of the present application to learn the knowledge-level expression of the text, further ensures the accuracy of downstream tasks and avoids incorrect reasoning paths due to missing part of the knowledge.
此外,在本申请实施例的方案中,初始概念图谱中的节点还可以包括每个候选实体的k跳邻居实体,能够进一步提高概念图谱中的知识的完整性。In addition, in the solution of the embodiment of the present application, the nodes in the initial concept map may also include k-hop neighbor entities of each candidate entity, which can further improve the integrity of the knowledge in the concept map.
图10示出了本申请实施例提供了一种文本处理模型的训练方法800,该文本处理模型可以为图4中的文本处理模型。方法800可以视为方法500的一种具体实现方式。为了描述简洁,在描述方法800时适当省略部分描述。FIG. 10 shows a training method 800 for a text processing model provided by an embodiment of the present application. The text processing model may be the text processing model in FIG. 4 . Method 800 can be regarded as a specific implementation of method 500. For simplicity of description, part of the description is appropriately omitted when describing the method 800.
方法800包括步骤810至步骤850。Method 800 includes steps 810 to 850.
810,对训练文本进行知识抽取。810. Extract knowledge from the training text.
示例性地,步骤810可以由图4中的知识抽取模块410执行。For example, step 810 may be performed by the knowledge extraction module 410 in FIG. 4 .
示例性地,该知识图谱为与文本数据相关的业务领域的知识图谱。为了便于理解和描述,后文中以该业务领域为医药领域为例对方法800进行说明,不对本申请实施例的方案构成限定。Illustratively, the knowledge graph is a knowledge graph of a business domain related to text data. In order to facilitate understanding and description, method 800 will be described below by taking the business field as the medical field as an example, which does not limit the solutions of the embodiments of the present application.
如图8所示,步骤810可以包括:基于知识图谱识别训练文本d中的知识三元组T d,以得到知识三元组中的名词词组M d和关系词组P d,即训练文本中的目标名词词组和目标关系词组,如图6所示。 As shown in Figure 8, step 810 may include: identifying the knowledge triplet Td in the training text d based on the knowledge graph to obtain the noun phrase Md and the relational phrase Pd in the knowledge triplet, that is, in the training text The target noun phrase and the target relational phrase are shown in Figure 6.
820,通过自然语言预训练模型生成训练文本的文字层面的编码。820. Generate text-level encoding of the training text through the natural language pre-training model.
示例性地,步骤820可以由图4中的文本编码模块420执行。Illustratively, step 820 may be performed by text encoding module 420 in FIG. 4 .
训练文本的文字层面的编码可以包括训练文本d中的每个名词词组M d的文字层面的编码、每个关系词组P d的文字层面的编码以及训练文本序列的文字层面的编码。 The text-level coding of the training text may include the text-level coding of each noun phrase M d in the training text d, the text-level coding of each relational phrase P d , and the text-level coding of the training text sequence.
830,通过RGAT对抽取的知识进行处理,以生成训练文本的知识层面的预测编码。830. Process the extracted knowledge through RGAT to generate predictive coding at the knowledge level of the training text.
示例性地,步骤830可以由图4中的知识编码模块430执行。Illustratively, step 830 may be performed by the knowledge encoding module 430 in FIG. 4 .
在一种可能的实现方式中,训练文本的文字层面的编码可以作为RGAT的一个输入, 参与生成知识层面的预测编码的过程。In a possible implementation, the text-level coding of the training text can be used as an input to RGAT and participate in the process of generating knowledge-level predictive coding.
840,基于文字层面的编码和知识层面的预测编码得到预测结果。840. Prediction results are obtained based on text-level coding and knowledge-level predictive coding.
示例性地,步骤840可以由图4中的任务处理模块440执行。For example, step 840 may be performed by the task processing module 440 in FIG. 4 .
例如,下游任务为文本分类任务。步骤840可以包括,将文字层面的编码和知识层面的预测编码进行向量拼接,以得到预测文本融合编码。将预测文本融合编码输入至分类器中得到训练文本的预测分类结果。For example, the downstream task is text classification task. Step 840 may include vector splicing text-level coding and knowledge-level predictive coding to obtain predictive text fusion coding. Input the predictive text fusion code into the classifier to obtain the predictive classification result of the training text.
850,基于该预测结果对RGAT进行迭代训练。850, perform iterative training on RGAT based on the prediction result.
在迭代训练前,即步骤830之前,可以构建训练文本的初始概念图谱,并计算初始概念图谱中的节点和训练文本之间的相关度。Before iterative training, that is, before step 830, an initial concept map of the training text may be constructed, and the correlation between the nodes in the initial concept map and the training text may be calculated.
示例性地,从知识图谱内定位训练文本中的每个名词词组M d对应的全部候选实体,并定位每一个候选实体的k跳邻居实体,作为初始概念图谱的节点。根据知识图谱中记录的每对实体之间的实体关系连接初始概念图谱中的节点,以得到初始概念图谱。该初始概念图谱包含完整的知识。 For example, all candidate entities corresponding to each noun phrase M d in the training text are located from the knowledge graph, and k-hop neighbor entities of each candidate entity are located as nodes of the initial concept graph. The nodes in the initial concept map are connected according to the entity relationship between each pair of entities recorded in the knowledge map to obtain the initial concept map. This initial concept map contains complete knowledge.
具体地,可以通过以下步骤构建初始概念图谱,并计算初始概念图谱中的节点和训练文本的相关度。Specifically, the following steps can be used to construct an initial concept map and calculate the correlation between the nodes in the initial concept map and the training text.
(1)定位主题节点(1) Locate the topic node
对于步骤810中得到每个名词词组M d,定位知识图谱中该名词词组对应的全部候选实体作为主题节点。 For each noun phrase M d obtained in step 810, all candidate entities corresponding to the noun phrase in the knowledge graph are located as topic nodes.
将主题节点作为主题相关图中的节点,基于知识图谱中记录的候选实体之间的实体关系连接主题相关图上对应的主题节点,以得到主题相关图中的边。具体地,对于主题相关图中的一对节点,若知识图谱中该对节点对应的候选实体之间存在n条边,则在主题相关图上,该对主题节点之间的边的权重为n。The topic nodes are used as nodes in the topic related graph, and the corresponding topic nodes on the topic related graph are connected based on the entity relationships between candidate entities recorded in the knowledge graph to obtain edges in the topic related graph. Specifically, for a pair of nodes in the topic correlation graph, if there are n edges between the candidate entities corresponding to the pair of nodes in the knowledge graph, then the weight of the edge between the pair of topic nodes on the topic correlation graph is n. .
(2)计算主题节点与训练文本的相关度(2) Calculate the correlation between topic nodes and training text
基于每个主题节点在知识图谱中记录的事实里出现的概率设置该主题节点的初始相关度。The initial relevance of each topic node is set based on the probability of each topic node appearing in the facts recorded in the knowledge graph.
在主题相关图上计算每个主题节点的特征向量中心性,以得到每个主题节点的重要度。将每个主题节点的重要度作为该主题节点与训练文本的相关度。The feature vector centrality of each topic node is calculated on the topic correlation graph to obtain the importance of each topic node. The importance of each topic node is taken as the correlation between the topic node and the training text.
(3)定位邻居节点(3) Locate neighbor nodes
对于每个主题节点,定位知识图谱中该主题节点的k-hop邻居实体作为邻居节点。For each topic node, locate the k-hop neighbor entities of the topic node in the knowledge graph as neighbor nodes.
根据hop层数递进关系构建信息传播图。Construct an information propagation graph based on the progressive relationship of hop layers.
(4)定位强连通分支(4) Locate strongly connected branches
在信息传播图上定位所有强连通分支,并根据每个强连通分支中的节点的重要度的最大值计算每个强连通分支的初始分数,时间复杂度为O(V,E)。其中,V表示信息传播图上的节点的集合。E表示信息传播图上的边的集合。Locate all strongly connected branches on the information propagation graph, and calculate the initial score of each strongly connected branch based on the maximum importance of the nodes in each strongly connected branch. The time complexity is O(V,E). Among them, V represents the set of nodes on the information propagation graph. E represents the set of edges on the information propagation graph.
(5)计算邻居节点与训练文本的相关度(5) Calculate the correlation between neighbor nodes and training text
根据拓扑排序结果,将主题节点所在的强连通分支的初始分数沿着拓扑排序结果传播到下游的强连通分支中,从而更新每个强连通分支的分数。According to the topological sorting result, the initial score of the strongly connected branch where the topic node is located is propagated to the downstream strongly connected branches along the topological sorting result, thereby updating the score of each strongly connected branch.
将更新后的每个邻居节点所在强连通分支的分数作为该邻居节点与训练文本的相关度。The updated score of the strongly connected branch where each neighbor node is located is used as the correlation between the neighbor node and the training text.
(6)构建概念图谱(6)Construct a concept map
将主题节点和邻居节点作为初始概念图谱中的节点,根据知识图谱中记录的实体关系连接对应的节点,以得到初始概念图谱,如图7所示。The topic nodes and neighbor nodes are used as nodes in the initial concept map, and the corresponding nodes are connected according to the entity relationships recorded in the knowledge map to obtain the initial concept map, as shown in Figure 7.
在迭代训练的过程中,即在步骤850中,优化概念图谱。During the iterative training process, that is, in step 850, the concept map is optimized.
示例性地,在基于RGAT通过节点之间的迭代消息传递来学习节点的表达的过程中,每一轮迭代前,可以根据节点与训练文本的相关度以及RGAT的前向传播中边的权重,消减RGAT对与文本相关度小的知识的关注,加强RGAT对与文本相关度大的知识的关注。For example, in the process of learning the expression of nodes through iterative message passing between nodes based on RGAT, before each round of iteration, according to the correlation between the node and the training text and the weight of the edge in the forward propagation of RGAT, Reduce RGAT's focus on knowledge that is less relevant to the text, and strengthen RGAT's focus on knowledge that is more relevant to the text.
下面通过示例1或示例2对迭代过程中概念图谱的优化方式进行示例性说明。The following is an illustrative explanation of the optimization method of the concept map during the iterative process through Example 1 or Example 2.
示例1Example 1
定义待解决的优化问题:选取初始概念图谱的子图,使得该子图内的全部边的第一消耗的总和小于或等于阈值,且该子图内的全部边的收益的总和最大化。该优化问题的解即为优化后的概念图谱。Define the optimization problem to be solved: select a subgraph of the initial concept graph so that the sum of the first costs of all edges in the subgraph is less than or equal to the threshold, and the sum of the benefits of all edges in the subgraph is maximized. The solution to this optimization problem is the optimized concept map.
(1)在每一轮迭代前,获取当前的概念图谱中的每条边的第一消耗和收益。(1) Before each iteration, obtain the first consumption and benefit of each edge in the current concept map.
示例性地,与一条边相连的两个节点与训练文本的相关度的平均值越低,该条边的第一消耗越大。For example, the lower the average correlation between the two nodes connected to an edge and the training text, the greater the first consumption of the edge.
示例性地,在上一轮迭代过程中,一条边的权重越高,该条边的收益越大。For example, in the previous iteration process, the higher the weight of an edge, the greater the benefit of the edge.
(2)利用解决上述优化问题的近似算法得到优化后的概念图谱。(2) Use the approximate algorithm to solve the above optimization problem to obtain the optimized concept map.
具体地,优先选取当前的概念图谱中收益和第一消耗之间的比值最大的边,直至已选取的边的第一消耗的总和达到阈值即停止。这样,在O(|E|*log|E|)线性时间复杂度下即可求得优化方案的解,即一个优化后的概念图谱。优化后的概念图谱即为下一轮迭代中的概念图谱。Specifically, the edge with the largest ratio between the revenue and the first consumption in the current concept map is selected first, and stops until the sum of the first consumption of the selected edges reaches the threshold. In this way, the solution of the optimization plan can be obtained under the linear time complexity of O(|E|*log|E|), that is, an optimized concept map. The optimized concept map is the concept map in the next iteration.
示例2Example 2
定义待解决的优化问题:选取初始概念图谱的连通子图的集合的子集,使得训练文本中的每一个名词词组M d或关系词组P d,都存在至少一个对应的节点或边被包含在该子集中,且该子集中的连通子图的第一消耗是所有子集的选取方式中最小的。该优化问题的解即为优化后的概念图谱。 Define the optimization problem to be solved: select a subset of the set of connected subgraphs of the initial concept graph so that for each noun phrase M d or relational phrase P d in the training text, there is at least one corresponding node or edge included in In this subset, the first cost of the connected subgraph in this subset is the smallest among all subset selection methods. The solution to this optimization problem is the optimized concept map.
(1)在每一轮迭代前,获取当前的概念图谱中的每条边的第二消耗。(1) Before each iteration, obtain the second cost of each edge in the current concept map.
计算该概念图谱中的每个连通子图的第二消耗。示例性地,连通子图的第二消耗为该连通子图内的所有边的第二消耗的平均值。Calculate the second cost for each connected subgraph in the concept graph. Illustratively, the second cost of the connected subgraph is the average of the second costs of all edges within the connected subgraph.
示例性地,与一条边相连的两个节点与训练文本的相关度的平均值越低,且上一轮迭代过程中该条边的权重越低,该条边的第二消耗越大。For example, the lower the average correlation between the two nodes connected to an edge and the training text, and the lower the weight of the edge in the previous iteration process, the greater the second consumption of the edge.
(2)利用解决上述优化问题的近似算法得到优化后的概念图谱。(2) Use the approximate algorithm to solve the above optimization problem to obtain the optimized concept map.
具体地,子集的初始状态为空集的状态,优先将当前的概念图谱中第二消耗最小的连通子图添加至该子集中,直至训练文本中的每一个名词词组M d或关系词组P d,都存在至少一个对应的节点或边被包含在该子集中。这样,在O(|V|+|E|)线性时间复杂度下可求得优化问题的近似解,即一个优化后的概念图谱。优化后的概念图谱即为下一轮迭代中的概念图谱。 Specifically, the initial state of the subset is the state of the empty set, and the connected subgraph with the second smallest consumption in the current concept map is added to the subset first, until every noun phrase M d or relational phrase P in the training text d , there is at least one corresponding node or edge included in the subset. In this way, an approximate solution to the optimization problem can be obtained under O(|V|+|E|) linear time complexity, that is, an optimized concept map. The optimized concept map is the concept map in the next iteration.
图11示出了本申请实施例提供的文本处理的方法900的示意性流程图,该方法可以由能够进行文本处理的装置或设备执行,例如,该装置可以是云服务设备,也可以是终端 设备,例如,电脑、服务器等运算能力足以用来执行文本处理的方法的装置,也可以是由云服务设备和终端设备构成的系统。示例性地,方法900可以由图2中的执行设备210或图1中的执行设备110或本地设备执行。Figure 11 shows a schematic flowchart of a text processing method 900 provided by an embodiment of the present application. The method can be executed by a device or device capable of text processing. For example, the device can be a cloud service device or a terminal. Equipment, for example, computers, servers and other devices with sufficient computing power to perform text processing methods, may also be a system composed of cloud service equipment and terminal equipment. Exemplarily, the method 900 may be executed by the execution device 210 in FIG. 2 or the execution device 110 in FIG. 1 or a local device.
例如,方法900具体可以由如图2所示的执行设备210执行,方法900中的待处理的文本可以是如图2所示的客户设备240给出的输入数据。For example, the method 900 may be specifically executed by the execution device 210 as shown in FIG. 2 , and the text to be processed in the method 900 may be input data provided by the client device 240 as shown in FIG. 2 .
图11中的文本处理的方法900中使用的模型可以是通过上述图5或图10中的方法构建的。相关描述可以参照前述方法500或方法800,为了避免不必要的重复,下面在介绍方法900时适当省略重复的描述。The model used in the text processing method 900 in Figure 11 can be constructed by the method in Figure 5 or Figure 10 described above. Relevant descriptions may refer to the aforementioned method 500 or method 800. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing method 900 below.
方法900包括步骤910至步骤960,下面对步骤910至步骤960进行描述。The method 900 includes steps 910 to 960, which are described below.
910,获取待处理的文本。910, get the text to be processed.
920,获取知识图谱。920, obtain the knowledge graph.
930,确定待处理文本的文本编码。930. Determine the text encoding of the text to be processed.
940,基于知识图谱确定待处理的文本的概念图谱。940. Determine the concept map of the text to be processed based on the knowledge map.
950,将概念图谱输入至目标RGAT中进行处理,以得到待处理的文本的知识编码。950. Input the concept map into the target RGAT for processing to obtain the knowledge encoding of the text to be processed.
960,基于待处理的文本的文本编码和待处理的文本的知识编码确定待处理的文本的处理结果。960. Determine the processing result of the text to be processed based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed.
其中,目标RGAT是通过将训练文本的初始概念图谱输入至RGAT进行训练得到的,在训练过程中,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与训练文本的相关度以及第二概念图谱中的边的权重确定的,i为正整数。第一概念图谱为初始概念图谱的子图,第二概念图谱为初始概念图谱的子图。该初始概念图谱中的节点包括主题节点,其中,主题节点包括知识图谱中与训练文本中的目标名词词组对应的候选实体,该初始概念图谱中的节点之间的边用于表示该初始概念图谱中的节点之间的实体关系。Among them, the target RGAT is obtained by inputting the initial concept map of the training text into the RGAT for training. During the training process, the first concept map in the i+1 iteration is based on the second concept map in the i iteration. The correlation between the nodes in and the training text and the weight of the edges in the second concept graph are determined, i is a positive integer. The first concept map is a subgraph of the initial concept map, and the second concept map is a subgraph of the initial concept map. The nodes in the initial concept graph include topic nodes, where the topic nodes include candidate entities in the knowledge graph corresponding to the target noun phrases in the training text, and the edges between the nodes in the initial concept graph are used to represent the initial concept graph. Entity relationships between nodes in .
应理解,本申请实施例中的步骤编号仅为了描述方便使用,不对步骤的执行顺序构成限定。It should be understood that the step numbers in the embodiments of the present application are only used for convenience of description and do not limit the execution order of the steps.
在步骤940中,待处理的文本的概念图谱中的主题节点可以包括知识图谱中与待处理的文本中的目标名词词组对应的候选实体,待处理的文本的概念图谱中的节点之间的边用于表示该概念图谱中的节点之间的实体关系。In step 940, the topic nodes in the concept map of the text to be processed may include candidate entities in the knowledge graph corresponding to the target noun phrase in the text to be processed, and edges between nodes in the concept map of the text to be processed Used to represent the entity relationships between nodes in the concept map.
根据本申请实施例的方案,在RGAT通过实体节点之间的迭代消息学习表达的过程中,即训练过程中,根据实体节点与文本的相关度以及边的权重优化概念图谱,例如,对训练文本的相关度较小的节点和/或权重较小的边进行剪枝处理,有利于消减模型对与文本相关度较小的知识的关注,增强对与文本相关度较大的知识的关注,从而提高RGAT的表达能力,进而有利于提高下游任务的准确性。According to the solution of the embodiment of the present application, in the process of RGAT learning expressions through iterative messages between entity nodes, that is, during the training process, the concept graph is optimized according to the correlation between the entity nodes and the text and the weight of the edges. For example, the training text is Pruning nodes with smaller correlations and/or edges with smaller weights will help reduce the model’s attention to knowledge that is less relevant to the text, and increase its focus on knowledge that is more relevant to the text, thus Improving the expressive ability of RGAT will help improve the accuracy of downstream tasks.
可选地,初始概念图谱还包括邻居节点,邻居节点包括训练文本中的目标名词词组对应的候选实体在知识图谱中的邻居实体。Optionally, the initial concept graph also includes neighbor nodes, and the neighbor nodes include neighbor entities in the knowledge graph of the candidate entities corresponding to the target noun phrases in the training text.
相应地,待处理的文本的概念图谱中的邻居节点可以包括待处理的文本中的目标名词词组对应的候选实体在知识图谱中的邻居实体。Correspondingly, the neighbor nodes in the concept map of the text to be processed may include neighbor entities in the knowledge map of the candidate entities corresponding to the target noun phrases in the text to be processed.
可选地,初始概念图谱中的主题节点包括知识图谱中与训练文本中的目标名词词组对应的全部候选实体。Optionally, the topic nodes in the initial concept graph include all candidate entities in the knowledge graph corresponding to the target noun phrase in the training text.
待处理的文本的概念图谱中的主题节点可以包括知识图谱中与待处理的文本中的目标名词词组对应的全部候选实体。The topic nodes in the concept map of the text to be processed may include all candidate entities in the knowledge map that correspond to the target noun phrases in the text to be processed.
可选地,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与训练文本的相关度以及第二概念图谱中的边的权重确定的,包括:按照第二概念图谱中的边的收益与第一消耗之间的比值从大到小的顺序,选取第二概念图谱中的边作为第一概念图谱中的边,直至被选取的边的第一消耗的总和大于阈值,第二概念图谱中的边的收益与第i次迭代中的边的权重呈正相关关系,第二概念图谱中的边的第一消耗与边所连接的两个节点与训练文本的相关度呈负相关关系。Optionally, the first concept graph in the i+1 iteration is determined based on the correlation between the nodes in the second concept graph in the i-th iteration and the training text and the weight of the edges in the second concept graph, It includes: selecting the edge in the second concept map as the edge in the first concept map in descending order of the ratio between the income of the edge in the second concept map and the first consumption, until the edge of the selected edge is The sum of the first consumption is greater than the threshold. The income of the edge in the second concept graph is positively correlated with the weight of the edge in the i-th iteration. The first consumption of the edge in the second concept graph is related to the two nodes connected by the edge. There is a negative correlation with the relevance of the training text.
可选地,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与训练文本的相关度以及第二概念图谱中的边的权重确定的,包括:按照第二概念图谱中的连通子图的第二消耗从小到大的顺序,选取第二概念图谱中的连通子图作为第一概念图谱的连通子图,直至被选取的全部连通子图包括目标名词词组对应的至少一个候选实体。Optionally, the first concept graph in the i+1 iteration is determined based on the correlation between the nodes in the second concept graph in the i-th iteration and the training text and the weight of the edges in the second concept graph, Including: selecting the connected subgraph in the second concept map as the connected subgraph of the first concept map in order from small to large in the second cost of the connected subgraph in the second concept map, until all connected subgraphs are selected Include at least one candidate entity corresponding to the target noun phrase.
可选地,主题节点与训练文本的相关度是根据主题节点在主题相关图上的特征向量中心性确定的,主题相关图中的节点包括主题节点,主题相关图中的边的权重是根据边所连接的两个节点在知识图谱中对应的实体之间的实体关系的数量确定的。Optionally, the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic related graph. The nodes in the topic related graph include topic nodes, and the weight of the edge in the topic related graph is based on the edge. The number of entity relationships between the corresponding entities in the knowledge graph between the two connected nodes is determined.
可选地邻居节点与训练文本的相关度是根据信息传播图上邻居节点所在的强连通分支的分数确定的,信息传播图中的节点包括初始概念图谱中的节点,在初始概念图谱中的第一节点是第二节点的一跳邻居的情况下,信息传播图中的第二节点和第一节点之间存在由第二节点指向第一节点的有向边。Optionally, the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph. The nodes in the information propagation graph include the nodes in the initial concept map, and the nodes in the initial concept map are When a node is a one-hop neighbor of a second node, there is a directed edge from the second node to the first node between the second node and the first node in the information propagation graph.
可选地信息传播图上的强连通分支的分数是通过将主题节点所在的强连通分支的初始分数根据拓扑排序传播到下游的强连通分支后得到的,主题节点所在的强连通分支的初始分数是根据主题节点所在的强连通分支中的节点的重要度的最大值确定的。Optionally, the score of the strongly connected branch on the information propagation graph is obtained by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected branch according to topological sorting. The initial score of the strongly connected branch where the topic node is located is It is determined based on the maximum importance of the nodes in the strongly connected branch where the topic node is located.
可选地,方法900还包括:基于待处理的文本的文本编码和待处理的文本的知识编码输出待处理的文本的概念图谱中的知识路径(knowledge path),该知识路径用于指示处理结果的判断依据。Optionally, the method 900 further includes: outputting a knowledge path (knowledge path) in the concept map of the text to be processed based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed, and the knowledge path is used to indicate the processing result. basis for judgment.
一条知识路径指的是概念图谱中介于两个节点之间的路径。例如,节点e q和节点e q+k之间的k跳知识路径可以表示为(e q,r q,e q+1,r q+1,…,r q+k-1,e q+k),(e q,r q,e q+1)即为一个三元组,r q表示两个节点之间的实体关系,以此类推。q为正整数。 A knowledge path refers to the path between two nodes in the concept map. For example, the k-hop knowledge path between node e q and node e q+k can be expressed as (e q ,r q ,e q+1 ,r q+1 ,…,r q+k-1 ,e q+ k ), (e q ,r q ,e q+1 ) is a triplet, r q represents the entity relationship between the two nodes, and so on. q is a positive integer.
知识路径能够提高模型的可解释性,为用户提供处理结果的判断依据,有利于提高用户的信任度。而且,本申请实施例的方案中的概念图谱具有全面准确的知识,有利于保证知识路径的完整性以及准确性。The knowledge path can improve the interpretability of the model, provide users with a basis for judging processing results, and help improve users' trust. Moreover, the concept map in the solution of the embodiment of the present application has comprehensive and accurate knowledge, which is conducive to ensuring the integrity and accuracy of the knowledge path.
可选地,该知识路径的权重是根据知识路径中的三元组的注意力权重确定的。Optionally, the weight of the knowledge path is determined based on the attention weight of the triplet in the knowledge path.
三元组的注意力权重用于指示RGAT的推理过程中的该三元组的重要度。该知识路径的注意力权重用于该知识路径的重要度。The attention weight of a triplet is used to indicate the importance of the triplet in the reasoning process of RGAT. The attention weight of the knowledge path is used to determine the importance of the knowledge path.
示例性地,该知识路径的权重可以为知识路径中的三元组的注意力权重的平均值。For example, the weight of the knowledge path may be the average of the attention weights of the triples in the knowledge path.
示例性地,在RGAT的推理过程中,节点i在第l+1层的表达
Figure PCTCN2022103682-appb-000002
可以满足如下公式:
For example, during the reasoning process of RGAT, the expression of node i at the l+1th layer
Figure PCTCN2022103682-appb-000002
The following formula can be satisfied:
Figure PCTCN2022103682-appb-000003
Figure PCTCN2022103682-appb-000003
其中,ψ表示知识图谱中的所有三元组的集合。σ表示激活函数,例如,该激活函数 可以为二值阶跃函数(binary step function)、线性激活函数(liner activation function)、Sigmoid函数、整流线性单元(rectified linear unit,ReLU)或带泄露的ReLU(LeakyReLU)等。本申请实施例中可以采用LeakyReLU作为激活函数。
Figure PCTCN2022103682-appb-000004
表示节点i的邻居节点j在第l层的表达。
Figure PCTCN2022103682-appb-000005
表示节点j对节点i在关系r中的重要度。三元组(j,r,i)的注意力权重即为
Figure PCTCN2022103682-appb-000006
R'表示关系的集合。l为大于或等于0的整数。
Among them, ψ represents the set of all triples in the knowledge graph. σ represents the activation function. For example, the activation function can be a binary step function (binary step function), a linear activation function (liner activation function), a Sigmoid function, a rectified linear unit (ReLU) or a leaky ReLU. (LeakyReLU) etc. In the embodiment of this application, LeakyReLU can be used as the activation function.
Figure PCTCN2022103682-appb-000004
Represents the expression of node i’s neighbor node j in the lth layer.
Figure PCTCN2022103682-appb-000005
Indicates the importance of node j to node i in relationship r. The attention weight of the triplet (j, r, i) is
Figure PCTCN2022103682-appb-000006
R' represents a set of relations. l is an integer greater than or equal to 0.
图12示出了本申请实施例的一种文本分类的方法的示意性流程图,图12所示的方法可以视为图11所示的方法的一种具体实现方式。为了描述简洁,在描述方法1000时适当省略部分描述。Figure 12 shows a schematic flow chart of a text classification method according to an embodiment of the present application. The method shown in Figure 12 can be regarded as a specific implementation of the method shown in Figure 11. For simplicity of description, part of the description is appropriately omitted when describing the method 1000.
方法1000包括步骤1010至步骤1040。Method 1000 includes steps 1010 to 1040.
1010,对待处理的文本进行知识抽取。1010. Extract knowledge from the text to be processed.
示例性地,步骤1010可以由图4中的知识抽取模块410执行。For example, step 1010 may be performed by the knowledge extraction module 410 in FIG. 4 .
示例性地,该知识图谱为与文本数据相关的业务领域的知识图谱。为了便于理解和描述,后文中以该业务领域为医药领域为例对方法1010进行说明,不对本申请实施例的方案构成限定。Illustratively, the knowledge graph is a knowledge graph of a business domain related to text data. In order to facilitate understanding and description, method 1010 will be described below by taking the business field as the medical field as an example, which does not limit the solutions of the embodiments of the present application.
如图10所示,步骤1010可以包括:基于知识图谱识别待处理的文本d中的知识三元组T d,以得到知识三元组中的名词词组M d和关系词组P d,即待处理的文本中的目标名词词组和目标关系词组。 As shown in Figure 10, step 1010 may include: identifying the knowledge triplet T d in the text d to be processed based on the knowledge graph to obtain the noun phrase M d and the relational phrase P d in the knowledge triplet, that is, to be processed The target noun phrase and the target relative phrase in the text.
1020,生成待处理的文本的文字层面的编码。1020. Generate text-level encoding of the text to be processed.
示例性地,步骤1020可以由图4中的文本编码模块420执行。Illustratively, step 1020 may be performed by text encoding module 420 in FIG. 4 .
待处理的文本的文字层面的编码可以包括待处理的文本d中的每个名词词组M d的文字层面的编码、每个关系词组P d的文字层面的编码以及待处理的文本序列的文字层面的编码。 The text-level coding of the text to be processed may include the text-level coding of each noun phrase M d in the text d to be processed, the text-level coding of each relational phrase P d , and the text-level coding of the text sequence to be processed. encoding.
1030,通过目标RGAT生成待处理的文本的知识层面的编码。1030. Generate knowledge-level encoding of the text to be processed through the target RGAT.
在一种可能的实现方式中,待处理的文本的文字层面的编码可以参与目标RGAT生成待处理的文本的知识层面的编码的过程中。In a possible implementation, the text-level encoding of the text to be processed can participate in the process of the target RGAT generating the knowledge-level encoding of the text to be processed.
图10中所使用的目标RGAT可以是通过图10所示的方法800训练得到的,具体的训练方法可以参考方法800的描述,此处不再赘述。The target RGAT used in Figure 10 can be trained by the method 800 shown in Figure 10. For the specific training method, please refer to the description of the method 800, which will not be described again here.
示例性地,步骤1030可以由图4中的知识编码模块430执行。Illustratively, step 1030 may be performed by the knowledge encoding module 430 in FIG. 4 .
示例性地,从知识图谱内定位待处理的文本中的每个名词词组M d对应的全部候选实体,并定位每一个候选实体的k跳邻居实体,作为待处理的文本的概念图谱的节点。根据知识图谱中记录的每对实体之间的实体关系连接待处理的文本的概念图谱中的节点,以得到待处理的文本的概念图谱。该待处理的文本的概念图谱包含完整的知识。 For example, all candidate entities corresponding to each noun phrase M d in the text to be processed are located from the knowledge graph, and k-hop neighbor entities of each candidate entity are located as nodes of the concept map of the text to be processed. The nodes in the concept map of the text to be processed are connected according to the entity relationship between each pair of entities recorded in the knowledge map to obtain the concept map of the text to be processed. The concept map of the text to be processed contains complete knowledge.
待处理的文本的概念图谱的具体的构建方式可以参考前文中的训练文本的初始概念图谱的构建方式,只要将相关描述中的训练文本替换为待处理的文本即可,此处不再赘述。The specific construction method of the concept map of the text to be processed can refer to the construction method of the initial concept map of the training text in the previous article. As long as the training text in the relevant description is replaced with the text to be processed, it will not be described again here.
1040,基于文字层面的编码和知识层面的编码得到预测分类结果。1040. Obtain prediction classification results based on text-level coding and knowledge-level coding.
示例性地,步骤1040可以由图4中的任务处理模块440执行。For example, step 1040 may be performed by the task processing module 440 in FIG. 4 .
步骤1040可以包括,将文字层面的编码和知识层面的编码进行向量拼接,以得到文本融合编码。将文本融合编码输入至分类器中得到待处理的文本的分类结果。Step 1040 may include vector splicing text-level coding and knowledge-level coding to obtain text fusion coding. Input the text fusion code into the classifier to obtain the classification result of the text to be processed.
图13示出了本申请实施例的一种文本分类结果的示意图。如图13所示,文本的大意 为“…糖尿病已成为流行病,2型糖尿病患者正以惊人的速度增加。我们知道控制饮食和西式生活方式会导致2型糖尿病和心血管疾病…”。通过本申请实施例的方案判断出该段文本中含有错误信息(false information)。首要的判断依据为从概念图谱中获取的知识路径('diet','reducesRiskFor','atherosclerosis','causes','cardiovascular diseases'),即(控制饮食,降低风险(-),动脉硬化,导致(+),心血管疾病),所占权重为0.99998。次要的判断依据为从概念图谱中获取的知识路径('diet','alleviates','diabetes'),即(控制饮食,缓解(-),糖尿病),所占权重为0.57651。这两条知识路径与文本语义中的“控制饮食会导致2型糖尿病和心血管疾病”相悖,判断该文本包含错误信息。本申请实施例的方案可以提高分类任务的准确性,同时生成具有权重的知识路径作为可解释的分类依据。Figure 13 shows a schematic diagram of a text classification result according to an embodiment of the present application. As shown in Figure 13, the main text of the text is "...diabetes has become an epidemic, and the number of patients with type 2 diabetes is increasing at an alarming rate. We know that controlling diet and Western lifestyle can lead to type 2 diabetes and cardiovascular disease...". Through the solutions of the embodiments of this application, it is determined that the text contains false information. The primary basis for judgment is the knowledge path obtained from the concept map ('diet', 'reducesRiskFor', 'atherosclerosis', 'causes', 'cardiovascular diseases'), namely (control diet, reduce risk (-), arteriosclerosis, Causes (+), cardiovascular disease), with a weight of 0.99998. The secondary judgment basis is the knowledge path ('diet', 'alleviates', 'diabetes') obtained from the concept map, that is (controlling diet, relieving (-), diabetes), with a weight of 0.57651. These two knowledge paths are contrary to the text semantics of "controlling diet can lead to type 2 diabetes and cardiovascular disease", and the text is judged to contain wrong information. The solution of the embodiment of the present application can improve the accuracy of the classification task and at the same time generate a weighted knowledge path as an interpretable classification basis.
表1示出了通过本申请实施例的方案与现有的用于检测医疗保健错误信息的知识引导图注意网络(knowledge guided graph attention network for detecting healthcare misinformation,DETERRENT)在糖尿病和癌症这两个数据集上进行文本分类的性能指标的对比结果。表1中示出了4种性能指标,分别为:准确率(accuracy)、精度(precision)、召回率(recall)和F1分数(F1Score)。表1中本申请实施例的方案所使用的RGAT模型的训练过程中的概念图谱的优化方案为示例1中的方案。Table 1 shows the two data of diabetes and cancer through the scheme of the embodiment of the present application and the existing knowledge guided graph attention network for detecting healthcare misinformation (DETERRENT). Comparative results of performance indicators for text classification on the set. Table 1 shows four performance indicators, namely: accuracy, precision, recall and F1 score. The optimization scheme of the concept map used in the training process of the RGAT model used in the scheme of the embodiment of the present application in Table 1 is the scheme in Example 1.
表1Table 1
Figure PCTCN2022103682-appb-000007
Figure PCTCN2022103682-appb-000007
从表1中可以看出,本申请实施例的方案在上述各项指标上均比现有方案提升了1-5个百分点。本申请实施例的方案能够有效提高分类结果的准确性。As can be seen from Table 1, the solution of the embodiment of the present application improves the above-mentioned indicators by 1-5 percentage points compared with the existing solution. The solutions of the embodiments of the present application can effectively improve the accuracy of classification results.
表2示出了通过本申请实施例的方案与现有的DETERRENT在糖尿病和癌症这两个数据集上进行文本分类的性能指标的对比结果。表2中本申请实施例的方案所使用的RGAT模型的训练过程中的概念图谱的优化方案为示例2中的方案。Table 2 shows the comparison results of the performance indicators of text classification on the two data sets of diabetes and cancer through the scheme of the embodiment of the present application and the existing DETERRENT. The optimization scheme of the concept map used in the training process of the RGAT model used in the scheme of the embodiment of the present application in Table 2 is the scheme in Example 2.
表2Table 2
Figure PCTCN2022103682-appb-000008
Figure PCTCN2022103682-appb-000008
从表2中可以看出,本申请实施例的方案在上述各项指标上均比现有方案提升了1-58个百分点。本申请实施例的方案能够有效提高分类结果的准确性。As can be seen from Table 2, the solution of the embodiment of the present application improves the above-mentioned indicators by 1-58 percentage points compared with the existing solution. The solutions of the embodiments of the present application can effectively improve the accuracy of classification results.
下面结合附图对本申请实施例的装置进行详细的描述,应理解,下面描述的装置能够执行前述本申请实施例的方法。为了避免不必要的重复,下面在介绍本申请实施例的装置时适当省略重复的描述。The device according to the embodiment of the present application will be described in detail below with reference to the accompanying drawings. It should be understood that the device described below can perform the foregoing method of the embodiment of the present application. In order to avoid unnecessary repetition, repeated descriptions will be appropriately omitted when introducing the devices of the embodiments of the present application.
图14是本申请实施例的训练装置的示意性框图。图14所示的训练装置3000包括获取单元3010和处理单元3020。Figure 14 is a schematic block diagram of a training device according to an embodiment of the present application. The training device 3000 shown in FIG. 14 includes an acquisition unit 3010 and a processing unit 3020.
训练装置可以用于执行本申请实施例的方法500或方法800。The training device can be used to perform the method 500 or the method 800 in the embodiment of the present application.
在一种可能的实现方式中,获取单元3010可以执行上述步骤510和步骤520。处理单元3020可以执行上述步骤530至步骤540。需要说明的是,用于执行步骤510的获取单元和用于执行步骤520的获取单元可以相同,也可以不同。In a possible implementation, the acquisition unit 3010 can perform the above steps 510 and 520. The processing unit 3020 may perform the above steps 530 to 540. It should be noted that the obtaining unit used to perform step 510 and the obtaining unit used to perform step 520 may be the same or different.
图15是本申请实施例的文本处理的装置的示意性框图。图15所示的装置4000包括获取单元4010和处理单元4020。Figure 15 is a schematic block diagram of a text processing device according to an embodiment of the present application. The device 4000 shown in FIG. 15 includes an acquisition unit 4010 and a processing unit 4020.
装置4000可以用于执行本申请实施例的方法900。The device 4000 may be used to perform the method 900 in the embodiment of the present application.
在一种可能的实现方式中,获取单元4010可以执行上述步骤910和步骤920。处理单元4020可以执行上述步骤930至步骤960。需要说明的是,用于执行步骤910的获取单元和用于执行步骤920的获取单元可以相同,也可以不同。In a possible implementation, the acquisition unit 4010 can perform the above steps 910 and 920. The processing unit 4020 may perform the above steps 930 to 960. It should be noted that the obtaining unit used to perform step 910 and the obtaining unit used to perform step 920 may be the same or different.
需要说明的是,上述训练装置3000和装置4000以功能单元的形式体现。这里的术语“单元”可以通过软件和/或硬件形式实现,对此不作具体限定。It should be noted that the above-mentioned training device 3000 and device 4000 are embodied in the form of functional units. The term "unit" here can be implemented in the form of software and/or hardware, and is not specifically limited.
例如,“单元”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。For example, a "unit" may be a software program, a hardware circuit, or a combination of both that implements the above functions. The hardware circuit may include an application specific integrated circuit (ASIC), an electronic circuit, a processor (such as a shared processor, a dedicated processor, or a group processor) for executing one or more software or firmware programs. etc.) and memory, merged logic circuitry, and/or other suitable components to support the described functionality.
因此,在本申请的实施例中描述的各示例的单元,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所 描述的功能,但是这种实现不应认为超出本申请的范围。Therefore, the units of each example described in the embodiments of the present application can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
图16是本申请实施例提供的一种训练装置的硬件结构示意图。图16所示的训练装置5000(该装置5000具体可以是一种计算机设备)包括存储器5001、处理器5002、通信接口5003以及总线5004。其中,存储器5001、处理器5002、通信接口5003通过总线5004实现彼此之间的通信连接。Figure 16 is a schematic diagram of the hardware structure of a training device provided by an embodiment of the present application. The training device 5000 shown in Figure 16 (the device 5000 may specifically be a computer device) includes a memory 5001, a processor 5002, a communication interface 5003 and a bus 5004. Among them, the memory 5001, the processor 5002, and the communication interface 5003 implement communication connections between each other through the bus 5004.
存储器5001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器5001可以存储程序,当存储器5001中存储的程序被处理器5002执行时,处理器5002用于执行本申请实施例的训练方法的各个步骤。例如,处理器5002可以执行上文中方法500或方法800。The memory 5001 may be a read only memory (ROM), a static storage device, a dynamic storage device or a random access memory (RAM). The memory 5001 can store programs. When the program stored in the memory 5001 is executed by the processor 5002, the processor 5002 is used to execute various steps of the training method according to the embodiment of the present application. For example, the processor 5002 may execute the method 500 or the method 800 above.
处理器5002可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请方法实施例的训练方法。The processor 5002 may be a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more The integrated circuit is used to execute relevant programs to implement the training method of the method embodiment of the present application.
处理器5002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的训练方法的各个步骤可以通过处理器5002中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 5002 may also be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the training method of the present application can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 5002.
上述处理器5002还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器5001,处理器5002读取存储器5001中的信息,结合其硬件完成图14所示的装置中包括的单元所需执行的功能,或者,执行本申请方法实施例的方法500或方法800。The above-mentioned processor 5002 can also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gate or transistor logic devices, discrete hardware components. Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 5001, and the processor 5002 reads the information in the memory 5001, and combines its hardware to complete the functions required to be performed by the units included in the device shown in Figure 14, or to execute the method 500 of the method embodiment of the present application or Method 800.
通信接口5003使用例如但不限于收发器一类的收发装置,来实现装置5000与其他设备或通信网络之间的通信。例如,可以通过通信接口5003获取训练文本和知识图谱。The communication interface 5003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 5000 and other devices or communication networks. For example, the training text and knowledge graph can be obtained through the communication interface 5003.
总线5004可包括在装置5000各个部件(例如,存储器5001、处理器5002、通信接口5003)之间传送信息的通路。Bus 5004 may include a path that carries information between various components of device 5000 (eg, memory 5001, processor 5002, communication interface 5003).
图17是本申请实施例提供的一种文本处理的装置的硬件结构示意图。图17所示的装置6000(该装置6000具体可以是一种计算机设备)包括存储器6001、处理器6002、通信接口6003以及总线6004。其中,存储器6001、处理器6002、通信接口6003通过总线6004实现彼此之间的通信连接。FIG. 17 is a schematic diagram of the hardware structure of a text processing device provided by an embodiment of the present application. The device 6000 shown in Figure 17 (the device 6000 may specifically be a computer device) includes a memory 6001, a processor 6002, a communication interface 6003 and a bus 6004. Among them, the memory 6001, the processor 6002, and the communication interface 6003 implement communication connections between each other through the bus 6004.
存储器6001可以是ROM,静态存储设备,动态存储设备或者RAM。存储器6001可以存储程序,当存储器6001中存储的程序被处理器6002执行时,处理器6002用于执行本申请实施例的文本处理的方法的各个步骤。例如,处理器6002可以执行上文中方法900。Memory 6001 may be ROM, static storage device, dynamic storage device or RAM. The memory 6001 can store programs. When the program stored in the memory 6001 is executed by the processor 6002, the processor 6002 is used to execute various steps of the text processing method according to the embodiment of the present application. For example, the processor 6002 can execute the method 900 above.
处理器6002可以采用通用的CPU,微处理器,ASIC,GPU或者一个或多个集成电路,用于执行相关程序,以实现本申请方法实施例的文本处理的方法。The processor 6002 can use a general-purpose CPU, microprocessor, ASIC, GPU or one or more integrated circuits to execute relevant programs to implement the text processing method of the method embodiment of the present application.
处理器6002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的文本处理的方法的各个步骤可以通过处理器6002中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 6002 may also be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the text processing method of the present application can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 6002.
上述处理器6002还可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器6001,处理器6002读取存储器6001中的信息,结合其硬件完成图15所示的装置中包括的单元所需执行的功能,或者,执行本申请方法实施例的方法900。The above-mentioned processor 6002 can also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component. Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 6001. The processor 6002 reads the information in the memory 6001, and combines its hardware to complete the functions required to be performed by the units included in the device shown in Figure 15, or to execute the method 900 of the method embodiment of the present application.
通信接口6003使用例如但不限于收发器一类的收发装置,来实现装置6000与其他设备或通信网络之间的通信。例如,可以通过通信接口6003获取待处理的文本和知识图谱。The communication interface 6003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 6000 and other devices or communication networks. For example, the text to be processed and the knowledge graph can be obtained through the communication interface 6003.
总线6004可包括在装置6000各个部件(例如,存储器6001、处理器6002、通信接口6003)之间传送信息的通路。Bus 6004 may include a path that carries information between various components of device 6000 (eg, memory 6001, processor 6002, communication interface 6003).
本申请实施例还提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行本申请实施例中的文本处理模型的训练方法或文本处理的方法中的任一项。Embodiments of the present application also provide a computer-readable medium that stores program code for device execution. The program code includes a method for executing the training method or text processing of the text processing model in the embodiment of the present application. any of the methods.
本申请实施例还提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行本申请实施例中的文本处理模型的训练方法或文本处理的方法中的任一项。Embodiments of the present application also provide a computer program product containing instructions. When the computer program product is run on a computer, it causes the computer to execute any of the text processing model training methods or text processing methods in the embodiments of the present application. item.
本申请实施例还提供一种芯片,该芯片包括处理器与数据接口,该处理器通过该数据接口读取存储器上存储的指令,执行本申请实施例中的文本处理模型的训练方法或文本处理的方法中的任一项。An embodiment of the present application also provides a chip. The chip includes a processor and a data interface. The processor reads instructions stored in the memory through the data interface and executes the text processing model training method or text processing in the embodiment of the present application. any of the methods.
可选地,作为一种实现方式,该芯片还可以包括存储器,该存储器中存储有指令,该处理器用于执行该存储器上存储的指令,当该指令被执行时,该处理器用于执行本申请实施例中文本处理模型的训练方法或文本处理的方法中的任一项。Optionally, as an implementation manner, the chip may also include a memory, in which instructions are stored, and the processor is used to execute the instructions stored in the memory. When the instructions are executed, the processor is used to execute the present application. Any of the text processing model training methods or text processing methods in the embodiment.
还应理解,本申请实施例中,该存储器可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。处理器的一部分还可以包括非易失性随机存取存储器。例如,处理器还可以存储设备类型的信息。It should also be understood that in embodiments of the present application, the memory may include read-only memory and random access memory, and provide instructions and data to the processor. Part of the processor may also include non-volatile random access memory. For example, the processor may also store information about the device type.
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" in this article is only an association relationship describing related objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, and A and B exist simultaneously. , there are three situations of B alone. In addition, the character "/" in this article generally indicates that the related objects are an "or" relationship.
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that in the various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以 硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:通用串行总线闪存盘(USB flash disk,UFD),UFD也可以简称为U盘或者优盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: Universal Serial Bus flash disk (UFD), UFD can also be referred to as U disk or USB flash drive, mobile hard disk, read-only memory (ROM), random access memory (random access memory, RAM), disk or optical disk and other media that can store program code.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application. should be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (29)

  1. 一种文本处理模型的训练方法,其特征在于,包括:A training method for text processing models, which is characterized by including:
    获取训练文本;Get training text;
    获取知识图谱;Get the knowledge graph;
    基于所述知识图谱确定所述训练文本的初始概念图谱,其中,所述初始概念图谱中的节点包括主题节点,所述主题节点包括所述知识图谱中与所述训练文本中的目标名词词组对应的候选实体,所述初始概念图谱中的节点之间的边用于表示所述初始概念图谱中的节点之间的实体关系;An initial concept map of the training text is determined based on the knowledge map, wherein the nodes in the initial concept map include topic nodes, and the topic nodes include the target noun phrases in the knowledge map corresponding to the training text. Candidate entities, the edges between nodes in the initial concept graph are used to represent entity relationships between nodes in the initial concept graph;
    将所述初始概念图谱输入至关系感知注意力网络RGAT模型进行训练以得到目标RGAT模型,在所述训练过程中,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与所述训练文本的相关度以及所述第二概念图谱中的边的权重确定的,i为正整数,所述第一概念图谱为所述初始概念图谱的子图,所述第二概念图谱为所述初始概念图谱的子图。The initial concept map is input into the relationship-aware attention network RGAT model for training to obtain the target RGAT model. During the training process, the first concept map in the i+1 iteration is based on the first concept map in the i-th iteration. The correlation between the nodes in the second concept graph and the training text and the weight of the edges in the second concept graph are determined, i is a positive integer, and the first concept graph is a subgraph of the initial concept graph. , the second concept map is a subgraph of the initial concept map.
  2. 根据权利要求1所述的训练方法,其特征在于,所述主题节点包括所述知识图谱中与所述目标名词词组对应的全部候选实体。The training method according to claim 1, wherein the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
  3. 根据权利要求1或2所述的训练方法,其特征在于,所述第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与所述训练文本的相关度以及所述第二概念图谱中的边的权重确定的,包括:The training method according to claim 1 or 2, characterized in that the first concept map in the i+1 iteration is based on the nodes in the second concept map in the i-th iteration and the training text. The correlation degree and the weight of the edges in the second concept graph are determined, including:
    按照所述第二概念图谱中的边的收益与第一消耗之间的比值从大到小的顺序,选取所述第二概念图谱中的边作为所述第一概念图谱中的边,直至所述被选取的边的第一消耗的总和大于阈值,所述第二概念图谱中的边的收益与所述第i次迭代中的边的权重呈正相关关系,所述第二概念图谱中的边的第一消耗与所述边所连接的两个节点与所述训练文本的相关度呈负相关关系。According to the descending order of the ratio between the income of the edge in the second concept map and the first consumption, the edge in the second concept map is selected as the edge in the first concept map until the The sum of the first costs of the selected edges is greater than the threshold, the revenue of the edge in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map The first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
  4. 根据权利要求1或2所述的训练方法,其特征在于,所述第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与所述训练文本的相关度以及所述第二概念图谱中的边的权重确定的,包括:The training method according to claim 1 or 2, characterized in that the first concept map in the i+1 iteration is based on the nodes in the second concept map in the i-th iteration and the training text. The correlation degree and the weight of the edges in the second concept graph are determined, including:
    按照所述第二概念图谱中的连通子图的第二消耗从小到大的顺序,选取所述第二概念图谱中的连通子图作为所述第一概念图谱的连通子图,直至所述被选取的全部连通子图包括所述目标名词词组对应的至少一个候选实体。According to the order of the second consumption of the connected subgraphs in the second concept map from small to large, the connected subgraphs in the second concept map are selected as the connected subgraphs of the first concept map until the connected subgraph is All selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.
  5. 根据权利要求1至4中任一项所述的训练方法,其特征在于,所述主题节点与所述训练文本的相关度是根据所述主题节点在主题相关图上的特征向量中心性确定的,所述主题相关图中的节点包括所述主题节点,所述主题相关图中的边的权重是根据所述边所连接的两个节点在所述知识图谱中对应的实体之间的实体关系的数量确定的。The training method according to any one of claims 1 to 4, characterized in that the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic correlation graph. , the nodes in the topic related graph include the topic node, and the weight of the edge in the topic related graph is based on the entity relationship between the corresponding entities in the knowledge graph between the two nodes connected by the edge. The quantity is determined.
  6. 根据权利要求1至5中任一项所述的训练方法,其特征在于,所述初始概念图谱还包括邻居节点,所述邻居节点包括所述目标名词词组对应的候选实体在所述知识图谱中的邻居实体。The training method according to any one of claims 1 to 5, characterized in that the initial concept graph further includes neighbor nodes, and the neighbor nodes include candidate entities corresponding to the target noun phrase in the knowledge graph. neighbor entities.
  7. 根据权利要求6所述的训练方法,其特征在于,所述邻居节点与所述训练文本的相 关度是根据信息传播图上所述邻居节点所在的强连通分支的分数确定的,所述信息传播图中的节点包括所述初始概念图谱中的节点,在所述初始概念图谱中的第一节点是第二节点的一跳邻居的情况下,所述信息传播图中的所述第二节点和所述第一节点之间存在由第二节点指向第一节点的有向边。The training method according to claim 6, characterized in that the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph, and the information propagation The nodes in the graph include nodes in the initial concept graph. When the first node in the initial concept graph is a one-hop neighbor of the second node, the second node in the information propagation graph and There is a directed edge between the first nodes that points from the second node to the first node.
  8. 根据权利要求7所述的训练方法,其特征在于,所述信息传播图上的强连通分支的分数是通过将所述主题节点所在的强连通分支的初始分数根据拓扑排序传播到下游的强连通分支后得到的,所述主题节点所在的强连通分支的初始分数是根据所述主题节点所在的强连通分支中的节点的重要度的最大值确定的。The training method according to claim 7, characterized in that, the score of the strongly connected branch on the information propagation graph is by propagating the initial score of the strongly connected branch where the topic node is located to the downstream strongly connected according to topological sorting. Obtained after branching, the initial score of the strongly connected branch where the topic node is located is determined based on the maximum importance value of the nodes in the strongly connected branch where the topic node is located.
  9. 一种文本处理的方法,其特征在于,包括:A text processing method, characterized by including:
    获取待处理的文本;Get the text to be processed;
    获取知识图谱;Get the knowledge graph;
    确定所述待处理的文本的文本编码;Determine the text encoding of the text to be processed;
    基于所述知识图谱确定所述待处理的文本的概念图谱;Determine a concept map of the text to be processed based on the knowledge map;
    通过目标RGAT对所述待处理的文本的概念图谱进行处理,以得到所述待处理的文本的知识编码,其中,所述目标RGAT是通过训练文本的初始概念图谱输入至RGAT中进行训练得到的,在所述训练过程中,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与所述训练文本的相关度以及所述第二概念图谱中的边的权重确定的,i为正整数,所述第一概念图谱为所述初始概念图谱的子图,所述第二概念图谱为所述初始概念图谱的子图,所述初始概念图谱中的节点包括主题节点,所述主题节点包括所述知识图谱中与所述训练文本中的目标名词词组对应的候选实体,所述初始概念图谱中的节点之间的边用于表示所述初始概念图谱中的节点之间的实体关系;The concept map of the text to be processed is processed through the target RGAT to obtain the knowledge encoding of the text to be processed, wherein the target RGAT is obtained by inputting the initial concept map of the training text into the RGAT for training. , during the training process, the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and the second concept map The weight of the edge in is determined, i is a positive integer, the first concept map is a subgraph of the initial concept map, the second concept map is a subgraph of the initial concept map, and the initial concept map The nodes in include topic nodes, the topic nodes include candidate entities in the knowledge graph corresponding to the target noun phrases in the training text, and the edges between the nodes in the initial concept graph are used to represent the initial Entity relationships between nodes in the concept map;
    基于所述待处理的文本的文本编码和所述待处理的文本的知识编码确定所述待处理的文本的处理结果。The processing result of the text to be processed is determined based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed.
  10. 根据权利要求9所述的方法,其特征在于,所述主题节点包括所述知识图谱中与所述目标名词词组对应的全部候选实体。The method of claim 9, wherein the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
  11. 根据权利要求9或10所述的方法,其特征在于,所述第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与所述训练文本的相关度以及所述第二概念图谱中的边的权重确定的,包括:The method according to claim 9 or 10, characterized in that the first concept map in the i+1 iteration is based on the node in the second concept map in the i iteration and the training text. The correlation degree and the weight of the edges in the second concept graph are determined, including:
    按照所述第二概念图谱中的边的收益与第一消耗之间的比值从大到小的顺序,选取所述第二概念图谱中的边作为所述第一概念图谱中的边,直至所述被选取的边的第一消耗的总和大于阈值,所述第二概念图谱中的边的收益与所述第i次迭代中的边的权重呈正相关关系,所述第二概念图谱中的边的第一消耗与所述边所连接的两个节点与所述训练文本的相关度呈负相关关系。According to the descending order of the ratio between the income of the edge in the second concept map and the first consumption, the edge in the second concept map is selected as the edge in the first concept map until the The sum of the first costs of the selected edges is greater than the threshold, the revenue of the edge in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map The first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
  12. 根据权利要求9或10所述的方法,其特征在于,所述第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与所述训练文本的相关度以及所述第二概念图谱中的边的权重确定的,包括:The method according to claim 9 or 10, characterized in that the first concept map in the i+1 iteration is based on the node in the second concept map in the i iteration and the training text. The correlation degree and the weight of the edges in the second concept graph are determined, including:
    按照所述第二概念图谱中的连通子图的第二消耗从小到大的顺序,选取所述第二概念图谱中的连通子图作为所述第一概念图谱的连通子图,直至所述被选取的全部连通子图包括所述目标名词词组对应的至少一个候选实体。According to the order of the second consumption of the connected subgraphs in the second concept map from small to large, the connected subgraphs in the second concept map are selected as the connected subgraphs of the first concept map until the connected subgraph is All selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.
  13. 根据权利要求9至12中任一项所述的方法,其特征在于,所述初始概念图谱还包括邻居节点,所述邻居节点包括所述目标名词词组对应的候选实体在所述知识图谱中的邻居实体。The method according to any one of claims 9 to 12, characterized in that the initial concept graph further includes neighbor nodes, and the neighbor nodes include candidate entities corresponding to the target noun phrase in the knowledge graph. Neighbor entities.
  14. 一种文本处理模型的训练装置,其特征在于,包括:A training device for text processing models, which is characterized by including:
    获取单元,用于:Get the unit for:
    获取训练文本;Get training text;
    获取知识图谱;Get the knowledge graph;
    处理单元,用于:Processing unit for:
    基于所述知识图谱确定所述训练文本的初始概念图谱,其中,所述初始概念图谱中的节点包括主题节点,所述主题节点包括所述知识图谱中与所述训练文本中的目标名词词组对应的候选实体,所述初始概念图谱中的节点之间的边用于表示所述初始概念图谱中的节点之间的实体关系;An initial concept map of the training text is determined based on the knowledge map, wherein the nodes in the initial concept map include topic nodes, and the topic nodes include the target noun phrases in the knowledge map corresponding to the training text. Candidate entities, the edges between nodes in the initial concept graph are used to represent entity relationships between nodes in the initial concept graph;
    将所述初始概念图谱输入至关系感知注意力网络RGAT模型进行训练以得到目标RGAT模型,在所述训练过程中,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与所述训练文本的相关度以及所述第二概念图谱中的边的权重确定的,i为正整数,所述第一概念图谱为所述初始概念图谱的子图,所述第二概念图谱为所述初始概念图谱的子图。The initial concept map is input into the relationship-aware attention network RGAT model for training to obtain the target RGAT model. During the training process, the first concept map in the i+1 iteration is based on the first concept map in the i-th iteration. The correlation between the nodes in the second concept graph and the training text and the weight of the edges in the second concept graph are determined, i is a positive integer, and the first concept graph is a subgraph of the initial concept graph. , the second concept map is a subgraph of the initial concept map.
  15. 根据权利要求14所述的训练装置,其特征在于,所述主题节点包括所述知识图谱中与所述目标名词词组对应的全部候选实体。The training device according to claim 14, wherein the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
  16. 根据权利要求14或15所述的训练装置,其特征在于,所述处理单元具体用于:The training device according to claim 14 or 15, characterized in that the processing unit is specifically used to:
    按照所述第二概念图谱中的边的收益与第一消耗之间的比值从大到小的顺序,选取所述第二概念图谱中的边作为所述第一概念图谱中的边,直至所述被选取的边的第一消耗的总和大于阈值,所述第二概念图谱中的边的收益与所述第i次迭代中的边的权重呈正相关关系,所述第二概念图谱中的边的第一消耗与所述边所连接的两个节点与所述训练文本的相关度呈负相关关系。According to the descending order of the ratio between the income of the edge in the second concept map and the first consumption, the edge in the second concept map is selected as the edge in the first concept map until the The sum of the first costs of the selected edges is greater than the threshold, the revenue of the edge in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map The first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
  17. 根据权利要求14或15所述的训练装置,其特征在于,所述处理单元具体用于:The training device according to claim 14 or 15, characterized in that the processing unit is specifically used to:
    按照所述第二概念图谱中的连通子图的第二消耗从小到大的顺序,选取所述第二概念图谱中的连通子图作为所述第一概念图谱的连通子图,直至所述被选取的全部连通子图包括所述目标名词词组对应的至少一个候选实体。According to the order of the second consumption of the connected subgraphs in the second concept map from small to large, the connected subgraphs in the second concept map are selected as the connected subgraphs of the first concept map until the connected subgraph is All selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.
  18. 根据权利要求14至17中任一项所述的训练装置,其特征在于,所述主题节点与所述训练文本的相关度是根据所述主题节点在主题相关图上的特征向量中心性确定的,所述主题相关图中的节点包括所述主题节点,所述主题相关图中的边的权重是根据所述边所连接的两个节点在所述知识图谱中对应的实体之间的实体关系的数量确定的。The training device according to any one of claims 14 to 17, characterized in that the correlation between the topic node and the training text is determined based on the feature vector centrality of the topic node on the topic correlation graph. , the nodes in the topic related graph include the topic node, and the weight of the edge in the topic related graph is based on the entity relationship between the corresponding entities in the knowledge graph between the two nodes connected by the edge. The quantity is determined.
  19. 根据权利要求14至18中任一项所述的训练装置,其特征在于,所述初始概念图谱还包括邻居节点,所述邻居节点包括所述目标名词词组对应的候选实体在所述知识图谱中的邻居实体。The training device according to any one of claims 14 to 18, wherein the initial concept graph further includes neighbor nodes, and the neighbor nodes include candidate entities corresponding to the target noun phrase in the knowledge graph. neighbor entities.
  20. 根据权利要求19所述的训练装置,其特征在于,所述邻居节点与所述训练文本的相关度是根据信息传播图上所述邻居节点所在的强连通分支的分数确定的,所述信息传播图中的节点包括所述初始概念图谱中的节点,在所述初始概念图谱中的第一节点是第二节 点的一跳邻居的情况下,所述信息传播图中的所述第二节点和所述第一节点之间存在由第二节点指向第一节点的有向边。The training device according to claim 19, characterized in that the correlation between the neighbor node and the training text is determined based on the score of the strongly connected branch where the neighbor node is located on the information propagation graph, and the information propagation The nodes in the graph include nodes in the initial concept graph. When the first node in the initial concept graph is a one-hop neighbor of the second node, the second node in the information propagation graph and There is a directed edge between the first nodes that points from the second node to the first node.
  21. 根据权利要求20所述的训练装置,其特征在于,所述信息传播图上的强连通分支的分数是通过将所述主题节点所在的强连通分支的初始分数根据拓扑排序传播到下游的强连通分支后得到的,所述主题节点所在的强连通分支的初始分数是根据所述主题节点所在的强连通分支中的节点的重要度的最大值确定的。The training device according to claim 20, wherein the score of the strongly connected branch on the information propagation graph is obtained by propagating the initial score of the strongly connected branch where the subject node is located to the downstream strongly connected branch according to topological sorting. Obtained after branching, the initial score of the strongly connected branch where the topic node is located is determined based on the maximum importance value of the nodes in the strongly connected branch where the topic node is located.
  22. 一种文本处理的装置,其特征在于,包括:A text processing device, characterized by including:
    获取单元,用于:Get the unit for:
    获取待处理的文本;Get the text to be processed;
    获取知识图谱;Get the knowledge graph;
    处理单元,用于:Processing unit for:
    确定所述待处理的文本的文本编码;Determine the text encoding of the text to be processed;
    基于所述知识图谱确定所述待处理的文本的概念图谱;Determine a concept map of the text to be processed based on the knowledge map;
    通过目标RGAT对所述待处理的文本的概念图谱进行处理,以得到所述待处理的文本的知识编码,其中,所述目标RGAT是通过训练文本的初始概念图谱输入至RGAT中进行训练得到的,在所述训练过程中,第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与所述训练文本的相关度以及所述第二概念图谱中的边的权重确定的,i为正整数,所述第一概念图谱为所述初始概念图谱的子图,所述第二概念图谱为所述初始概念图谱的子图,所述初始概念图谱中的节点包括主题节点,所述主题节点包括所述知识图谱中与所述训练文本中的目标名词词组对应的候选实体,所述初始概念图谱中的节点之间的边用于表示所述初始概念图谱中的节点之间的实体关系;The concept map of the text to be processed is processed through the target RGAT to obtain the knowledge encoding of the text to be processed, wherein the target RGAT is obtained by inputting the initial concept map of the training text into the RGAT for training. , during the training process, the first concept map in the i+1 iteration is based on the correlation between the nodes in the second concept map in the i-th iteration and the training text and the second concept map The weight of the edge in is determined, i is a positive integer, the first concept map is a subgraph of the initial concept map, the second concept map is a subgraph of the initial concept map, and the initial concept map The nodes in include topic nodes, the topic nodes include candidate entities in the knowledge graph corresponding to the target noun phrases in the training text, and the edges between the nodes in the initial concept graph are used to represent the initial Entity relationships between nodes in the concept map;
    基于所述待处理的文本的文本编码和所述待处理的文本的知识编码确定所述待处理的文本的处理结果。The processing result of the text to be processed is determined based on the text encoding of the text to be processed and the knowledge encoding of the text to be processed.
  23. 根据权利要求22所述的装置,其特征在于,所述主题节点包括所述知识图谱中与所述目标名词词组对应的全部候选实体。The device according to claim 22, wherein the topic node includes all candidate entities corresponding to the target noun phrase in the knowledge graph.
  24. 根据权利要求22或23所述的装置,其特征在于,所述第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与所述训练文本的相关度以及所述第二概念图谱中的边的权重确定的,包括:The device according to claim 22 or 23, characterized in that the first concept map in the i+1 iteration is based on the node in the second concept map in the i-th iteration and the training text. The correlation degree and the weight of the edges in the second concept graph are determined, including:
    按照所述第二概念图谱中的边的收益与第一消耗之间的比值从大到小的顺序,选取所述第二概念图谱中的边作为所述第一概念图谱中的边,直至所述被选取的边的第一消耗的总和大于阈值,所述第二概念图谱中的边的收益与所述第i次迭代中的边的权重呈正相关关系,所述第二概念图谱中的边的第一消耗与所述边所连接的两个节点与所述训练文本的相关度呈负相关关系。According to the descending order of the ratio between the income of the edge in the second concept map and the first consumption, the edge in the second concept map is selected as the edge in the first concept map until the The sum of the first costs of the selected edges is greater than the threshold, the revenue of the edge in the second concept map is positively correlated with the weight of the edge in the i-th iteration, and the edge in the second concept map The first consumption of is negatively correlated with the correlation between the two nodes connected by the edge and the training text.
  25. 根据权利要求22或23所述的装置,其特征在于,所述第i+1次迭代中的第一概念图谱是根据第i次迭代中的第二概念图谱中的节点与所述训练文本的相关度以及所述第二概念图谱中的边的权重确定的,包括:The device according to claim 22 or 23, characterized in that the first concept map in the i+1 iteration is based on the node in the second concept map in the i-th iteration and the training text. The correlation degree and the weight of the edges in the second concept graph are determined, including:
    按照所述第二概念图谱中的连通子图的第二消耗从小到大的顺序,选取所述第二概念图谱中的连通子图作为所述第一概念图谱的连通子图,直至所述被选取的全部连通子图包括所述目标名词词组对应的至少一个候选实体。According to the order of the second consumption of the connected subgraphs in the second concept map from small to large, the connected subgraphs in the second concept map are selected as the connected subgraphs of the first concept map until the connected subgraph is All selected connected subgraphs include at least one candidate entity corresponding to the target noun phrase.
  26. 根据权利要求22至25中任一项所述的装置,其特征在于,所述初始概念图谱还包括邻居节点,所述邻居节点包括所述目标名词词组对应的候选实体在所述知识图谱中的邻居实体。The device according to any one of claims 22 to 25, characterized in that the initial concept graph further includes neighbor nodes, and the neighbor nodes include candidate entities corresponding to the target noun phrase in the knowledge graph. Neighbor entities.
  27. 一种文本处理的装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行如权利要求1至8或权利要求9至13中任一项所述的方法。A text processing device, characterized in that it includes a processor and a memory, the memory is used to store program instructions, and the processor is used to call the program instructions to execute claims 1 to 8 or claims 9 to 13 any one of the methods.
  28. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行如权利要求1至8或权利要求9至13中任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable medium stores program code for device execution, and the program code includes a program code for executing any one of claims 1 to 8 or claims 9 to 13 the method described.
  29. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至8或权利要求9至13中任一项所述的方法。A computer program product containing instructions, characterized in that, when the computer program product is run on a computer, the computer is caused to perform the method as claimed in any one of claims 1 to 8 or 9 to 13 .
PCT/CN2022/103682 2022-07-04 2022-07-04 Training method for text processing model, and text processing method and device WO2024007119A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/103682 WO2024007119A1 (en) 2022-07-04 2022-07-04 Training method for text processing model, and text processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/103682 WO2024007119A1 (en) 2022-07-04 2022-07-04 Training method for text processing model, and text processing method and device

Publications (1)

Publication Number Publication Date
WO2024007119A1 true WO2024007119A1 (en) 2024-01-11

Family

ID=89454678

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/103682 WO2024007119A1 (en) 2022-07-04 2022-07-04 Training method for text processing model, and text processing method and device

Country Status (1)

Country Link
WO (1) WO2024007119A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017150820A1 (en) * 2016-02-29 2017-09-08 경기대학교 산학협력단 Knowledge base-based conceptual-graph expansion system
CN110727806A (en) * 2019-12-17 2020-01-24 北京百度网讯科技有限公司 Text processing method and device based on natural language and knowledge graph
CN113255918A (en) * 2021-04-13 2021-08-13 国家计算机网络与信息安全管理中心 General knowledge generation reasoning method for strengthening aggregation knowledge guidance
CN114138985A (en) * 2022-02-08 2022-03-04 深圳希施玛数据科技有限公司 Text data processing method and device, computer equipment and storage medium
CN114281956A (en) * 2021-09-30 2022-04-05 腾讯科技(深圳)有限公司 Text processing method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017150820A1 (en) * 2016-02-29 2017-09-08 경기대학교 산학협력단 Knowledge base-based conceptual-graph expansion system
CN110727806A (en) * 2019-12-17 2020-01-24 北京百度网讯科技有限公司 Text processing method and device based on natural language and knowledge graph
CN113255918A (en) * 2021-04-13 2021-08-13 国家计算机网络与信息安全管理中心 General knowledge generation reasoning method for strengthening aggregation knowledge guidance
CN114281956A (en) * 2021-09-30 2022-04-05 腾讯科技(深圳)有限公司 Text processing method and device, computer equipment and storage medium
CN114138985A (en) * 2022-02-08 2022-03-04 深圳希施玛数据科技有限公司 Text data processing method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111353310B (en) Named entity identification method and device based on artificial intelligence and electronic equipment
Tang et al. Sentiment embeddings with applications to sentiment analysis
Ma et al. An attention-based rumor detection model with tree-structured recursive neural networks
US11531824B2 (en) Cross-lingual information retrieval and information extraction
US20200387574A1 (en) Linguistically rich cross-lingual text event embeddings
WO2021139229A1 (en) Text rhetorical sentence generation method, apparatus and device, and readable storage medium
Rani et al. A journey of Indian languages over sentiment analysis: a systematic review
CN111898374B (en) Text recognition method, device, storage medium and electronic equipment
WO2023029506A1 (en) Illness state analysis method and apparatus, electronic device, and storage medium
CN113806582B (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
Saleh et al. A web page distillation strategy for efficient focused crawling based on optimized Naïve bayes (ONB) classifier
Altheneyan et al. Big data ML-based fake news detection using distributed learning
CN111368555B (en) Data identification method and device, storage medium and electronic equipment
Liu et al. Visual question answering via combining inferential attention and semantic space mapping
Feng et al. A pretraining numerical reasoning model for ordinal constrained question answering on knowledge base
Gharagozlou et al. RLAS-BIABC: A reinforcement learning-based answer selection using the bert model boosted by an improved ABC algorithm
Wang et al. Entity understanding with hierarchical graph learning for enhanced text classification
Dangi et al. An efficient model for sentiment analysis using artificial rabbits optimized vector functional link network
US20230385317A1 (en) Information Retrieval Method, Related System, and Storage Medium
KR102543343B1 (en) Method and device for generating search word dictionary and searching based on artificial neural network
WO2024007119A1 (en) Training method for text processing model, and text processing method and device
Nguyen et al. A model of convolutional neural network combined with external knowledge to measure the question similarity for community question answering systems
CN115129885A (en) Entity chain pointing method, device, equipment and storage medium
Rao et al. Biomedical multi-hop question answering using knowledge graph embeddings and language models
Thambi et al. A novel technique using graph neural networks and relevance scoring to improve the performance of knowledge graph-based question answering systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22949710

Country of ref document: EP

Kind code of ref document: A1