CN117932082A - Text content reference digestion method, device, equipment and storage medium thereof - Google Patents

Text content reference digestion method, device, equipment and storage medium thereof Download PDF

Info

Publication number
CN117932082A
CN117932082A CN202410036611.5A CN202410036611A CN117932082A CN 117932082 A CN117932082 A CN 117932082A CN 202410036611 A CN202410036611 A CN 202410036611A CN 117932082 A CN117932082 A CN 117932082A
Authority
CN
China
Prior art keywords
entity
entity data
data
relation
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410036611.5A
Other languages
Chinese (zh)
Inventor
孔令格
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202410036611.5A priority Critical patent/CN117932082A/en
Publication of CN117932082A publication Critical patent/CN117932082A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the application belongs to the technical field of finance and technology, is applied to a text content reference resolution scene, and relates to a text content reference resolution method, a device, equipment and a storage medium thereof, wherein the method comprises the steps of obtaining a text to be subjected to reference resolution; extracting all entity data contained in the text; carrying out synonym classification and arrangement on all entity data contained in the text to be subjected to reference resolution; inputting the classification result into a representation learning model trained according to the target recognition spectrum to obtain a corresponding entity relation vector containing semantic information; and carrying out reference resolution on the entity data contained in the text to be subjected to the reference resolution according to the corresponding entity relation vector containing the semantic information. Entity relation vectors contained in the target recognition spectrum are obtained through the representation learning model and serve as supervision signals, entity relation vectors among all entity data in the text to be subjected to reference resolution are determined, and the text content reference resolution is performed by the aid of the financial processing system better.

Description

Text content reference digestion method, device, equipment and storage medium thereof
Technical Field
The application relates to the technical field of finance and technology, and is applied to a text content reference resolution scene, in particular to a text content reference resolution method, a device, equipment and a storage medium thereof.
Background
With the rapid development of the internet, various industries seek industry breakthrough points by relying on the internet, and in recent years, the financial industry is expanding online business around the internet. The financial industry is concerned with greater amounts of traffic and data, resulting in a need for better understanding of the meaning of text in the processing of financial business text.
Reference resolution is an important task in natural language processing, and its objective is to solve the reference relationship of noun phrases in a piece of text, that is, to determine all places where the same entity (such as a person, place, article, etc.) appears in the text, so that we can better understand the meaning of the text.
Current reference resolution methods generally focus on the input text itself, focusing on mining information in the input text: if a researcher uses an end-to-end neural network to calculate vector representation of phrases in the text and a head attention mechanism of the phrases in the text, the aim of reference resolution is achieved; still other students use the form of a question-answer model to firstly extract candidate references from a text, secondly take sentences in which each reference is located as question sentences and the whole text as context, splice the two together, and finally extract all co-reference words of the reference in the text through a question-answer model.
Disclosure of Invention
The embodiment of the application aims to provide a text content reference resolution method, a device, equipment and a storage medium thereof, which are used for solving the problems that the reference resolution method only processes input text, the obtained information is very limited, and a financial processing system cannot be better assisted to perform text content reference resolution.
In order to solve the above technical problems, the embodiment of the present application provides a text content reference resolution method, which adopts the following technical scheme:
A text content reference resolution method comprising the steps of:
Acquiring a target knowledge graph;
analyzing the target knowledge graph to obtain all entity data and relation characterization data among the entity data;
inputting all the entity data and the relation characterization data among all the entity data into a preset representation learning model, and obtaining entity relation vectors containing semantic information according to the output result of the representation learning model, wherein the entity relation vectors containing semantic information are formed by fitting head entity data, relation characterization data and tail entity data;
Obtaining text to be subjected to reference digestion;
Inputting the text to be subjected to reference digestion into a preset entity extraction model, and extracting all entity data contained in the text to be subjected to reference digestion according to the entity extraction model;
Carrying out synonym classification and arrangement on all entity data contained in the text to be subjected to reference digestion according to a preset synonym dictionary to obtain classified and arranged entity data to be trained;
taking the entity relation vector containing semantic information as a supervision signal, inputting the entity data to be trained into the representation learning model, and obtaining the entity relation vector containing semantic information corresponding to the entity data to be trained;
And carrying out reference resolution on the entity data contained in the text to be subjected to reference resolution according to the entity relation vector containing semantic information corresponding to the entity data to be trained.
Further, the step of analyzing the target knowledge graph to obtain all entity data and the relationship characterization data among all entity data specifically includes:
Performing preliminary analysis on the target knowledge graph to obtain all graph nodes contained in the target knowledge graph and connecting lines among all graph nodes;
Determining entity data corresponding to all nodes respectively according to the map node identifiers and a preset first reference table, wherein the first reference table comprises mapping relations between the map node identifiers and the entity data;
And determining the corresponding relation representation data of the connection lines among all map nodes according to the connection line identification symbol and a preset second reference table, wherein the second reference table contains the mapping relation between the connection line identification coincidence and the relation representation data.
Further, before the step of inputting the all entity data and the relationship characterization data among all entity data into a preset representation learning model, and obtaining an entity relationship vector containing semantic information according to an output result of the representation learning model, the method further includes:
deploying the target knowledge graph as a learning reference graph into the representation learning model;
Initializing the target knowledge graph to obtain an entity position vector and a relation representation vector contained in the target knowledge graph, wherein the entity position vector is obtained according to the position information of all entity data in the target knowledge graph, the relation representation vector is composed of model length information and angle information, and the model length information and the angle information are obtained according to the position information calculation of two entity data with a connection relation in the target knowledge graph.
Further, the step of inputting the all entity data and the relationship characterization data among the all entity data into a preset representation learning model, and obtaining the entity relationship vector containing semantic information according to the output result of the representation learning model specifically includes:
Identifying entity position vectors corresponding to all the entity data respectively through comparison;
selecting one entity data from all the entity data as head entity data;
Selecting one entity data from all the entity data as tail entity data;
calculating modular length information and angle information between the head entity data and the tail entity data based on entity position vectors respectively corresponding to the head entity data and the tail entity data;
Judging whether direct relation representation data exists between the head entity data and the tail entity data according to the target knowledge graph;
If direct relation representation data exists between the head entity data and the tail entity data, constructing the entity relation vector containing semantic information according to the entity position vector of the head entity data, the entity position vector of the tail entity data and the relation representation data;
If no direct relation representation data exists between the head entity data and the tail entity data, indirect relation representation data under an optimal path contained between the head entity data and the tail entity data is obtained according to a preset screening strategy, wherein the optimal path is a path when the quantity of the contained indirect relation representation data identified by the screening strategy is the minimum value;
and constructing the entity relation vector containing semantic information according to the entity position vector of the head entity data, the entity position vector of the tail entity data and the indirect relation characterization data under the optimal path contained between the head entity data and the tail entity data.
Further, the entity extraction model is a BERT entity extraction model based on a CRF algorithm, and the step of inputting the text to be subjected to reference resolution into a preset entity extraction model and extracting all entity data contained in the text to be subjected to reference resolution according to the entity extraction model specifically includes:
Performing word segmentation processing on the text to be subjected to reference digestion according to a word segmentation component in the entity extraction model to obtain a word segmentation processing result;
Performing part-of-speech analysis on the word segmentation processing result through a part-of-speech analysis component in the entity extraction model to obtain noun fields in the word segmentation processing result;
And outputting the noun field according to an output component in the entity extraction model to finish extraction of all entity data contained in the text to be subjected to reference resolution.
Further, the step of performing reference resolution on the entity data included in the text to be subjected to reference resolution according to the entity relation vector including semantic information corresponding to the entity data to be trained specifically includes:
selecting one entity data to be trained from the entity data to be trained as comparison entity data;
Obtaining entity relation vectors containing semantic information corresponding to the comparison entity data as first vectors;
sequentially obtaining entity relation vectors which are corresponding to the entity data to be trained and contain semantic information in the entity data to be trained except the comparison entity data, and taking the entity relation vectors as second vectors;
calculating the similarity of the first vector and the second vector according to a preset similarity algorithm;
If the similarity meets a preset similarity threshold, the comparison entity data is used for referring to the entity data to be trained corresponding to the second vector, and the comparison entity data is used for referring to synonyms of the entity data to be trained corresponding to the second vector.
Further, the step of calculating the similarity between the first vector and the second vector according to a preset similarity algorithm specifically includes:
acquiring a first vector which corresponds to the comparison entity data and is formed by fitting head entity data, relationship representation data and tail entity data;
Acquiring vector data corresponding to the second vector, wherein the vector data is formed by fitting head entity data, relationship characterization data and tail entity data;
Calculating the similarity of head entity data, relationship characterization data and tail entity data in the first vector and the second vector by adopting a cosine similarity algorithm;
And taking the similarity of the head entity data, the relationship characterization data and the tail entity data in the first vector and the second vector as the similarity of the first vector and the second vector.
In order to solve the technical problems, the embodiment of the application also provides a text content reference digestion device, which adopts the following technical scheme:
A textual content reference resolution apparatus, comprising:
the knowledge graph acquisition module is used for acquiring a target knowledge graph;
the knowledge graph analysis module is used for analyzing the target knowledge graph to obtain all entity data and relation characterization data among the entity data;
The entity relation vector acquisition module is used for inputting all the entity data and relation characterization data among all the entity data into a preset representation learning model, and obtaining entity relation vectors containing semantic information according to an output result of the representation learning model, wherein the entity relation vectors containing the semantic information are formed by fitting head entity data, relation characterization data and tail entity data;
the text acquisition module is used for acquiring a text to be subjected to reference digestion;
The entity data extraction module is used for inputting the text to be subjected to the reference resolution into a preset entity extraction model, and extracting all entity data contained in the text to be subjected to the reference resolution according to the entity extraction model;
The synonym classifying and sorting module is used for classifying and sorting synonyms according to a preset synonym dictionary, so as to obtain classified and sorted entity data to be trained;
The supervised learning module is used for taking the entity relation vector containing the semantic information as a supervising signal, inputting the entity data to be trained into the representation learning model, and obtaining the entity relation vector containing the semantic information corresponding to the entity data to be trained;
And the reference resolution processing module is used for performing reference resolution on the entity data contained in the text to be subjected to reference resolution according to the entity relation vector containing the semantic information corresponding to the entity data to be trained.
In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:
a computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the text content reference resolution method described above.
In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:
A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of a text content reference resolution method as described above.
Compared with the prior art, the embodiment of the application has the following main beneficial effects:
According to the text content reference resolution method, all entity data and relation characterization data among all entity data are obtained through resolving the target knowledge graph; inputting all entity data and relation characterization data among all entity data into a preset representation learning model to obtain entity relation vectors containing semantic information; obtaining text to be subjected to reference digestion; inputting the text to be subjected to reference digestion into a preset entity extraction model, and extracting all entity data contained in the text to be subjected to reference digestion; carrying out synonym classification and arrangement on all entity data contained in the text to be subjected to reference resolution to obtain entity data to be trained; inputting entity data to be trained into a representation learning model, and taking the entity relation vector containing semantic information as a supervision signal to obtain an entity relation vector containing semantic information corresponding to the entity data to be trained; and carrying out reference resolution on the entity data contained in the text to be subjected to reference resolution according to the entity relation vector containing semantic information corresponding to the entity data to be trained. Entity relation vectors contained in the target recognition spectrum are obtained through the representation learning model and serve as supervision signals, entity relation vectors among all entity data in the text to be subjected to reference resolution are determined, and the text content reference resolution is performed by the aid of the financial processing system better.
Drawings
In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of one embodiment of a textual content reference resolution method in accordance with the present application;
FIG. 3 is a flow chart of one embodiment of step 202 of FIG. 2;
FIG. 4 is a flow chart of one embodiment of step 203 shown in FIG. 2;
FIG. 5 is a flow chart of one embodiment of step 205 of FIG. 2;
FIG. 6 is a flow chart of one embodiment of step 208 of FIG. 2;
FIG. 7 is a flow chart of one embodiment of step 604 shown in FIG. 6;
FIG. 8 is a schematic diagram illustrating the construction of one embodiment of a textual content reference digestion device in accordance with the present application;
FIG. 9 is a schematic structural view of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture ExpertsGroup Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving PictureExperts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the text content reference resolution method provided by the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the text content reference resolution device is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow chart of one embodiment of a textual content reference resolution method in accordance with the present application is shown. The text content reference digestion method comprises the following steps:
Step 201, obtaining a target knowledge graph, wherein the target knowledge graph is a knowledge graph constructed based on a preset text knowledge base.
And 202, analyzing the target knowledge graph to obtain all entity data and relation characterization data among the entity data.
In this embodiment, the target knowledge graph includes a financial business knowledge graph, for example, an insurance business knowledge graph including an insurance subject, a protected subject, an underwriting subject, and an association relationship between the insurance subject and the underwriting subject, where in general, the target knowledge graph includes all subjects and relationship characterization data between all subjects, where the relationship characterization data is formed by a pointing line and a description field in the target knowledge graph.
With continued reference to FIG. 3, FIG. 3 is a flow chart of one embodiment of step 202 shown in FIG. 2, comprising:
Step 301, performing preliminary analysis on the target knowledge graph to obtain all graph nodes contained in the target knowledge graph and connecting lines among all graph nodes;
Step 302, determining entity data corresponding to all nodes respectively according to a map node identifier and a preset first reference table, wherein the first reference table comprises mapping relations between the map node identifier and the entity data;
step 303, determining the corresponding relation representation data of the connection lines among all map nodes according to the connection line identification symbol and a preset second reference table, wherein the second reference table contains the mapping relation between the connection line identification coincidence and the relation representation data.
And obtaining all entity data and relation characterization data among all entity data contained in the target knowledge graph through analysis, so that the representation learning model training can be conveniently carried out according to all entity data and relation characterization data among all entity data contained in the target knowledge graph.
And 203, inputting the all entity data and the relation characterization data among the all entity data into a preset representation learning model, and obtaining entity relation vectors containing semantic information according to the output result of the representation learning model, wherein the entity relation vectors containing semantic information are formed by fitting head entity data, relation characterization data and tail entity data.
In this embodiment, before the step of inputting the all entity data and the relationship characterization data between all entity data into a preset representation learning model, and obtaining an entity relationship vector including semantic information according to an output result of the representation learning model, the method further includes: deploying the target knowledge graph as a learning reference graph into the representation learning model; initializing the target knowledge graph to obtain an entity position vector and a relation representation vector contained in the target knowledge graph, wherein the entity position vector is obtained according to the position information of all entity data in the target knowledge graph, the relation representation vector is composed of model length information and angle information, and the model length information and the angle information are obtained according to the position information calculation of two entity data with a connection relation in the target knowledge graph.
In this embodiment, the step of initializing the target knowledge graph to obtain the entity position vector and the relationship characterization vector included in the target knowledge graph specifically includes: randomly selecting one point in the target knowledge graph as a three-dimensional coordinate origin; acquiring three-dimensional coordinate information of all map nodes in the target knowledge map; determining entity position information of all entity data according to the three-dimensional coordinate information of all map nodes, and obtaining entity position vectors corresponding to all entity data respectively based on the entity position information; and calculating modular length information and angle information between the two entity data with the connection relation according to the entity position vectors respectively corresponding to all the entity data.
With continued reference to fig. 4, fig. 4 is a flow chart of one embodiment of step 203 shown in fig. 2, comprising:
step 401, identifying entity position vectors corresponding to all the entity data respectively through comparison;
Step 402, arbitrarily selecting one entity data from all the entity data as head entity data;
Step 403, selecting any one entity data from the all entity data as tail entity data;
step 404, calculating module length information and angle information between the head entity data and the tail entity data based on the entity position vectors respectively corresponding to the head entity data and the tail entity data;
Step 405, judging whether direct relation characterization data exists between the head entity data and the tail entity data according to the target knowledge graph;
Step 406, if there is direct relation representation data between the head entity data and the tail entity data, constructing the entity relation vector containing semantic information according to the entity position vector of the head entity data, the entity position vector of the tail entity data and the relation representation data;
Step 407, if no direct relationship representation data exists between the head entity data and the tail entity data, obtaining indirect relationship representation data under an optimal path included between the head entity data and the tail entity data according to a preset screening policy, where the optimal path is a path when the number of the indirect relationship representation data included is identified to be the minimum value by the screening policy;
Step 408, constructing the entity relation vector containing semantic information according to the entity position vector of the head entity data, the entity position vector of the tail entity data and the indirect relation characterization data under the optimal path contained between the head entity data and the tail entity data.
The entity relation vector among all entity data is determined by randomly screening head entity data and tail entity data from all entity data and constructing entity relation vectors containing semantic information according to the head entity data and the tail entity data, so that the entity relation vector among all entity data is convenient to be used as a supervision signal in the follow-up process, and the entity relation vector among all entity data in a text to be subjected to reference resolution is determined, and the reference resolution is better carried out by a financial processing system.
In step 204, text to be reference digested is obtained.
In this embodiment, the text to be subjected to reference resolution is insurance business text to be subjected to reference resolution. For example, newly entered insurance contract text.
Step 205, inputting the text to be subjected to reference resolution into a preset entity extraction model, and extracting all entity data contained in the text to be subjected to reference resolution according to the entity extraction model.
In this embodiment, the entity extraction model is a BERT entity extraction model based on a CRF algorithm. Specifically, the CRF algorithm is a conditional random field algorithm, and is applied to the BERT entity extraction model, so that random screening can be performed on the text to be subjected to the reference digestion, and the word segmentation of different parts of speech in the text to be subjected to the reference digestion can be rapidly identified, thereby improving the entity extraction speed.
With continued reference to fig. 5, fig. 5 is a flow chart of one embodiment of step 205 shown in fig. 2, comprising:
step 501, performing word segmentation processing on the text to be subjected to reference digestion according to a word segmentation component in the entity extraction model to obtain a word segmentation processing result;
Step 502, performing part-of-speech analysis on the word segmentation processing result through a part-of-speech analysis component in the entity extraction model to obtain noun fields in the word segmentation processing result;
and step 503, outputting the noun field according to an output component in the entity extraction model, and completing extraction of all entity data contained in the text to be subjected to reference resolution.
And 206, carrying out synonym classification and arrangement on all entity data contained in the text to be subjected to reference digestion according to a preset synonym dictionary, and obtaining classified and arranged entity data to be trained.
In this embodiment, the nouns included in the preset synonym dictionary are entity data in the target knowledge graph, and entity data with the same or similar semantics corresponding to the entity data in the target knowledge graph, where all entity data included in the text to be subjected to reference resolution may be entity data in a non-target knowledge graph, so that the purpose of using the synonym dictionary is to uniformly classify and sort entity data included in the text to be subjected to reference resolution, which is not included in the target knowledge graph, into entity data in the target knowledge graph, so that entity relationship vector representation is conveniently performed on all entity data included in the text to be subjected to reference resolution by combining the representation learning model.
And step 207, inputting the entity data to be trained into the representation learning model by taking the entity relation vector containing the semantic information as a supervision signal, and obtaining the entity relation vector containing the semantic information corresponding to the entity data to be trained.
The entity relation vector containing semantic information trained according to the target knowledge graph is used as a supervision signal, so that learning supervision is conveniently performed when the entity relation vector containing semantic information corresponding to the entity data to be trained is represented, and accuracy of entity relation vector representation is guaranteed.
And step 208, performing reference resolution on the entity data contained in the text to be subjected to reference resolution according to the entity relation vector containing the semantic information corresponding to the entity data to be trained.
With continued reference to FIG. 6, FIG. 6 is a flow chart of one embodiment of step 208 of FIG. 2, including:
step 601, selecting one entity data to be trained from the entity data to be trained as comparison entity data;
step 602, obtaining an entity relation vector containing semantic information corresponding to the comparative entity data as a first vector;
Step 603, sequentially obtaining entity relation vectors containing semantic information corresponding to the entity data to be trained except the comparison entity data in the entity data to be trained, as second vectors;
Step 604, calculating the similarity between the first vector and the second vector according to a preset similarity algorithm;
With continued reference to fig. 7, fig. 7 is a flow chart of one embodiment of step 604 shown in fig. 6, comprising:
Step 701, obtaining a first vector which corresponds to the comparison entity data and is formed by fitting head entity data, relationship characterization data and tail entity data;
Step 702, obtaining vector data corresponding to the second vector, which is formed by fitting head entity data, relationship characterization data and tail entity data;
step 703, calculating the similarity of the head entity data, the relationship characterization data and the tail entity data in the first vector and the second vector by adopting a cosine similarity algorithm;
And step 704, taking the similarity of the head entity data, the relationship characterization data and the tail entity data in the first vector and the second vector as the similarity of the first vector and the second vector.
Step 605, if the similarity meets a preset similarity threshold, the comparison entity data is used to refer to the entity data to be trained corresponding to the second vector, and the comparison entity data is used to refer to the synonym of the entity data to be trained corresponding to the second vector.
Analyzing the target knowledge graph to obtain all entity data and relation characterization data among the entity data; inputting all entity data and relation characterization data among all entity data into a preset representation learning model to obtain entity relation vectors containing semantic information; obtaining text to be subjected to reference digestion; inputting the text to be subjected to reference digestion into a preset entity extraction model, and extracting all entity data contained in the text to be subjected to reference digestion; carrying out synonym classification and arrangement on all entity data contained in the text to be subjected to reference resolution to obtain entity data to be trained; inputting entity data to be trained into a representation learning model, and taking the entity relation vector containing semantic information as a supervision signal to obtain an entity relation vector containing semantic information corresponding to the entity data to be trained; and carrying out reference resolution on the entity data contained in the text to be subjected to reference resolution according to the entity relation vector containing semantic information corresponding to the entity data to be trained. Entity relation vectors contained in the target recognition spectrum are obtained through the representation learning model and serve as supervision signals, entity relation vectors among all entity data in the text to be subjected to reference resolution are determined, and the text content reference resolution is performed by the aid of the financial processing system better.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, large text content reference resolution technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
In the embodiment of the application, all entity data and the relation characterization data among all entity data are obtained by analyzing the target knowledge graph; inputting all entity data and relation characterization data among all entity data into a preset representation learning model to obtain entity relation vectors containing semantic information; obtaining text to be subjected to reference digestion; inputting the text to be subjected to reference digestion into a preset entity extraction model, and extracting all entity data contained in the text to be subjected to reference digestion; carrying out synonym classification and arrangement on all entity data contained in the text to be subjected to reference resolution to obtain entity data to be trained; inputting entity data to be trained into a representation learning model, and taking the entity relation vector containing semantic information as a supervision signal to obtain an entity relation vector containing semantic information corresponding to the entity data to be trained; and carrying out reference resolution on the entity data contained in the text to be subjected to reference resolution according to the entity relation vector containing semantic information corresponding to the entity data to be trained. Entity relation vectors contained in the target recognition spectrum are obtained through the representation learning model and serve as supervision signals, entity relation vectors among all entity data in the text to be subjected to reference resolution are determined, and the text content reference resolution is performed by the aid of the financial processing system better.
With further reference to FIG. 8, as an implementation of the method of FIG. 2 described above, the present application provides an embodiment of a textual content reference digestion apparatus, corresponding to the method embodiment of FIG. 2, which is particularly applicable to a variety of electronic devices.
As shown in fig. 8, the text content reference resolution apparatus 800 according to the present embodiment includes: a knowledge graph acquisition module 801, a knowledge graph analysis module 802, an entity relationship vector acquisition module 803, a text acquisition module 804, an entity data extraction module 805, a synonym categorization and arrangement module 806, a supervised learning module 807, and an reference resolution processing module 808. Wherein:
A knowledge graph acquisition module 801, configured to acquire a target knowledge graph, where the target knowledge graph is a knowledge graph constructed based on a preset text knowledge base;
The knowledge graph analysis module 802 is configured to analyze the target knowledge graph to obtain all entity data and relationship characterization data among the entity data;
The entity relation vector obtaining module 803 is configured to input the all entity data and the relation characterization data between all entity data into a preset representation learning model, and obtain an entity relation vector containing semantic information according to an output result of the representation learning model, where the entity relation vector containing semantic information is formed by fitting three of head entity data, relation characterization data and tail entity data;
a text obtaining module 804, configured to obtain a text to be subjected to reference resolution;
the entity data extraction module 805 is configured to input the text to be subjected to reference resolution into a preset entity extraction model, and extract all entity data included in the text to be subjected to reference resolution according to the entity extraction model;
The synonym classifying and sorting module 806 is configured to perform synonym classifying and sorting on all entity data included in the text to be subjected to reference resolution according to a preset synonym dictionary, so as to obtain entity data to be trained after classifying and sorting;
a supervised learning module 807, configured to input the entity data to be trained into the representation learning model by using the entity relationship vector containing semantic information as a supervision signal, to obtain an entity relationship vector containing semantic information corresponding to the entity data to be trained;
And the reference resolution processing module 808 is configured to perform reference resolution on entity data included in the text to be subjected to reference resolution according to an entity relationship vector including semantic information corresponding to the entity data to be trained.
Analyzing the target knowledge graph to obtain all entity data and relation characterization data among the entity data; inputting all entity data and relation characterization data among all entity data into a preset representation learning model to obtain entity relation vectors containing semantic information; obtaining text to be subjected to reference digestion; inputting the text to be subjected to reference digestion into a preset entity extraction model, and extracting all entity data contained in the text to be subjected to reference digestion; carrying out synonym classification and arrangement on all entity data contained in the text to be subjected to reference resolution to obtain entity data to be trained; inputting entity data to be trained into a representation learning model, and taking the entity relation vector containing semantic information as a supervision signal to obtain an entity relation vector containing semantic information corresponding to the entity data to be trained; and carrying out reference resolution on the entity data contained in the text to be subjected to reference resolution according to the entity relation vector containing semantic information corresponding to the entity data to be trained. Entity relation vectors contained in the target recognition spectrum are obtained through the representation learning model and serve as supervision signals, entity relation vectors among all entity data in the text to be subjected to reference resolution are determined, and the text content reference resolution is performed by the aid of the financial processing system better.
Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by computer readable instructions, stored on a computer readable storage medium, that the program when executed may comprise the steps of embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 9, fig. 9 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 9 comprises a memory 9a, a processor 9b, a network interface 9c communicatively connected to each other via a system bus. It should be noted that only a computer device 9 having components 9a-9c is shown in the figures, but it should be understood that not all of the illustrated components need be implemented, and that more or fewer components may alternatively be implemented. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 9a includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 9a may be an internal storage unit of the computer device 9, such as a hard disk or a memory of the computer device 9. In other embodiments, the memory 9a may also be an external storage device of the computer device 9, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 9. Of course, the memory 9a may also comprise both an internal memory unit of the computer device 9 and an external memory device. In this embodiment, the memory 9a is typically used to store an operating system and various application software installed on the computer device 9, such as computer readable instructions of a text content reference resolution method. Further, the memory 9a may be used to temporarily store various types of data that have been output or are to be output.
The processor 9b may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other textual content-aware chip in some embodiments. The processor 9b is typically used to control the overall operation of the computer device 9. In this embodiment, the processor 9b is configured to execute computer readable instructions stored in the memory 9a or process data, such as computer readable instructions for executing the text content reference resolution method.
The network interface 9c may comprise a wireless network interface or a wired network interface, which network interface 9c is typically used for establishing a communication connection between the computer device 9 and other electronic devices.
The computer equipment provided by the embodiment belongs to the technical field of financial science and technology, and is applied to a text content reference resolution scene. Analyzing the target knowledge graph to obtain all entity data and relation characterization data among the entity data; inputting all entity data and relation characterization data among all entity data into a preset representation learning model to obtain entity relation vectors containing semantic information; obtaining text to be subjected to reference digestion; inputting the text to be subjected to reference digestion into a preset entity extraction model, and extracting all entity data contained in the text to be subjected to reference digestion; carrying out synonym classification and arrangement on all entity data contained in the text to be subjected to reference resolution to obtain entity data to be trained; inputting entity data to be trained into a representation learning model, and taking the entity relation vector containing semantic information as a supervision signal to obtain an entity relation vector containing semantic information corresponding to the entity data to be trained; and carrying out reference resolution on the entity data contained in the text to be subjected to reference resolution according to the entity relation vector containing semantic information corresponding to the entity data to be trained. Entity relation vectors contained in the target recognition spectrum are obtained through the representation learning model and serve as supervision signals, entity relation vectors among all entity data in the text to be subjected to reference resolution are determined, and the text content reference resolution is performed by the aid of the financial processing system better.
The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by a processor to cause the processor to perform the steps of the text content reference resolution method as described above.
The computer readable storage medium provided by the embodiment belongs to the technical field of financial science and technology, and is applied to a text content reference resolution scene. Analyzing the target knowledge graph to obtain all entity data and relation characterization data among the entity data; inputting all entity data and relation characterization data among all entity data into a preset representation learning model to obtain entity relation vectors containing semantic information; obtaining text to be subjected to reference digestion; inputting the text to be subjected to reference digestion into a preset entity extraction model, and extracting all entity data contained in the text to be subjected to reference digestion; carrying out synonym classification and arrangement on all entity data contained in the text to be subjected to reference resolution to obtain entity data to be trained; inputting entity data to be trained into a representation learning model, and taking the entity relation vector containing semantic information as a supervision signal to obtain an entity relation vector containing semantic information corresponding to the entity data to be trained; and carrying out reference resolution on the entity data contained in the text to be subjected to reference resolution according to the entity relation vector containing semantic information corresponding to the entity data to be trained. Entity relation vectors contained in the target recognition spectrum are obtained through the representation learning model and serve as supervision signals, entity relation vectors among all entity data in the text to be subjected to reference resolution are determined, and the text content reference resolution is performed by the aid of the financial processing system better.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims (10)

1. A text content reference resolution method, comprising the steps of:
Acquiring a target knowledge graph;
analyzing the target knowledge graph to obtain all entity data and relation characterization data among the entity data;
inputting all the entity data and the relation characterization data among all the entity data into a preset representation learning model, and obtaining entity relation vectors containing semantic information according to the output result of the representation learning model, wherein the entity relation vectors containing semantic information are formed by fitting head entity data, relation characterization data and tail entity data;
Obtaining text to be subjected to reference digestion;
Inputting the text to be subjected to reference digestion into a preset entity extraction model, and extracting all entity data contained in the text to be subjected to reference digestion according to the entity extraction model;
Carrying out synonym classification and arrangement on all entity data contained in the text to be subjected to reference digestion according to a preset synonym dictionary to obtain classified and arranged entity data to be trained;
taking the entity relation vector containing semantic information as a supervision signal, inputting the entity data to be trained into the representation learning model, and obtaining the entity relation vector containing semantic information corresponding to the entity data to be trained;
And carrying out reference resolution on the entity data contained in the text to be subjected to reference resolution according to the entity relation vector containing semantic information corresponding to the entity data to be trained.
2. The text content reference resolution method according to claim 1, wherein the step of resolving the target knowledge graph to obtain all entity data and relationship characterization data between all entity data specifically includes:
Performing preliminary analysis on the target knowledge graph to obtain all graph nodes contained in the target knowledge graph and connecting lines among all graph nodes;
Determining entity data corresponding to all nodes respectively according to the map node identifiers and a preset first reference table, wherein the first reference table comprises mapping relations between the map node identifiers and the entity data;
And determining the corresponding relation representation data of the connection lines among all map nodes according to the connection line identification symbol and a preset second reference table, wherein the second reference table contains the mapping relation between the connection line identification coincidence and the relation representation data.
3. The text content reference resolution method according to claim 1, wherein before the step of inputting the all entity data and the relationship characterization data among all entity data into a preset representation learning model, obtaining an entity relationship vector containing semantic information according to an output result of the representation learning model is performed, the method further comprises:
deploying the target knowledge graph as a learning reference graph into the representation learning model;
Initializing the target knowledge graph to obtain an entity position vector and a relation representation vector contained in the target knowledge graph, wherein the entity position vector is obtained according to the position information of all entity data in the target knowledge graph, the relation representation vector is composed of model length information and angle information, and the model length information and the angle information are obtained according to the position information calculation of two entity data with a connection relation in the target knowledge graph.
4. A text content reference resolution method according to claim 1 or 3, wherein the step of inputting the all entity data and the relationship characterization data between all entity data into a preset representation learning model, and obtaining an entity relationship vector containing semantic information according to an output result of the representation learning model specifically comprises:
Identifying entity position vectors corresponding to all the entity data respectively through comparison;
selecting one entity data from all the entity data as head entity data;
Selecting one entity data from all the entity data as tail entity data;
calculating modular length information and angle information between the head entity data and the tail entity data based on entity position vectors respectively corresponding to the head entity data and the tail entity data;
Judging whether direct relation representation data exists between the head entity data and the tail entity data according to the target knowledge graph;
If direct relation representation data exists between the head entity data and the tail entity data, constructing the entity relation vector containing semantic information according to the entity position vector of the head entity data, the entity position vector of the tail entity data and the relation representation data;
If no direct relation representation data exists between the head entity data and the tail entity data, indirect relation representation data under an optimal path contained between the head entity data and the tail entity data is obtained according to a preset screening strategy, wherein the optimal path is a path when the quantity of the contained indirect relation representation data identified by the screening strategy is the minimum value;
and constructing the entity relation vector containing semantic information according to the entity position vector of the head entity data, the entity position vector of the tail entity data and the indirect relation characterization data under the optimal path contained between the head entity data and the tail entity data.
5. The text content reference resolution method according to claim 1, wherein the entity extraction model is a BERT entity extraction model based on a CRF algorithm, the step of inputting the text to be reference resolved into a preset entity extraction model, and extracting all entity data contained in the text to be reference resolved according to the entity extraction model specifically includes:
Performing word segmentation processing on the text to be subjected to reference digestion according to a word segmentation component in the entity extraction model to obtain a word segmentation processing result;
Performing part-of-speech analysis on the word segmentation processing result through a part-of-speech analysis component in the entity extraction model to obtain noun fields in the word segmentation processing result;
And outputting the noun field according to an output component in the entity extraction model to finish extraction of all entity data contained in the text to be subjected to reference resolution.
6. The text content reference resolution method according to claim 1, wherein the step of performing reference resolution on the entity data included in the text to be reference resolved according to the entity relation vector including semantic information corresponding to the entity data to be trained specifically includes:
selecting one entity data to be trained from the entity data to be trained as comparison entity data;
Obtaining entity relation vectors containing semantic information corresponding to the comparison entity data as first vectors;
sequentially obtaining entity relation vectors which are corresponding to the entity data to be trained and contain semantic information in the entity data to be trained except the comparison entity data, and taking the entity relation vectors as second vectors;
calculating the similarity of the first vector and the second vector according to a preset similarity algorithm;
If the similarity meets a preset similarity threshold, the comparison entity data is used for referring to the entity data to be trained corresponding to the second vector, and the comparison entity data is used for referring to synonyms of the entity data to be trained corresponding to the second vector.
7. The text content reference resolution method of claim 6, wherein the step of calculating the similarity of the first vector and the second vector according to a preset similarity algorithm specifically comprises:
acquiring a first vector which corresponds to the comparison entity data and is formed by fitting head entity data, relationship representation data and tail entity data;
Acquiring vector data corresponding to the second vector, wherein the vector data is formed by fitting head entity data, relationship characterization data and tail entity data;
Calculating the similarity of head entity data, relationship characterization data and tail entity data in the first vector and the second vector by adopting a cosine similarity algorithm;
And taking the similarity of the head entity data, the relationship characterization data and the tail entity data in the first vector and the second vector as the similarity of the first vector and the second vector.
8. A textual content reference resolution apparatus, comprising:
the knowledge graph acquisition module is used for acquiring a target knowledge graph;
the knowledge graph analysis module is used for analyzing the target knowledge graph to obtain all entity data and relation characterization data among the entity data;
The entity relation vector acquisition module is used for inputting all the entity data and relation characterization data among all the entity data into a preset representation learning model, and obtaining entity relation vectors containing semantic information according to an output result of the representation learning model, wherein the entity relation vectors containing the semantic information are formed by fitting head entity data, relation characterization data and tail entity data;
the text acquisition module is used for acquiring a text to be subjected to reference digestion;
The entity data extraction module is used for inputting the text to be subjected to the reference resolution into a preset entity extraction model, and extracting all entity data contained in the text to be subjected to the reference resolution according to the entity extraction model;
The synonym classifying and sorting module is used for classifying and sorting synonyms according to a preset synonym dictionary, so as to obtain classified and sorted entity data to be trained;
The supervised learning module is used for taking the entity relation vector containing the semantic information as a supervising signal, inputting the entity data to be trained into the representation learning model, and obtaining the entity relation vector containing the semantic information corresponding to the entity data to be trained;
And the reference resolution processing module is used for performing reference resolution on the entity data contained in the text to be subjected to reference resolution according to the entity relation vector containing the semantic information corresponding to the entity data to be trained.
9. A computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the text content reference resolution method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the text content reference resolution method of any of claims 1 to 7.
CN202410036611.5A 2024-01-08 2024-01-08 Text content reference digestion method, device, equipment and storage medium thereof Pending CN117932082A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410036611.5A CN117932082A (en) 2024-01-08 2024-01-08 Text content reference digestion method, device, equipment and storage medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410036611.5A CN117932082A (en) 2024-01-08 2024-01-08 Text content reference digestion method, device, equipment and storage medium thereof

Publications (1)

Publication Number Publication Date
CN117932082A true CN117932082A (en) 2024-04-26

Family

ID=90762383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410036611.5A Pending CN117932082A (en) 2024-01-08 2024-01-08 Text content reference digestion method, device, equipment and storage medium thereof

Country Status (1)

Country Link
CN (1) CN117932082A (en)

Similar Documents

Publication Publication Date Title
CN112632385B (en) Course recommendation method, course recommendation device, computer equipment and medium
CN109858010B (en) Method and device for recognizing new words in field, computer equipment and storage medium
CN113722438B (en) Sentence vector generation method and device based on sentence vector model and computer equipment
CN112863683A (en) Medical record quality control method and device based on artificial intelligence, computer equipment and storage medium
CN112100377A (en) Text classification method and device, computer equipment and storage medium
CN115438149A (en) End-to-end model training method and device, computer equipment and storage medium
CN115757731A (en) Dialogue question rewriting method, device, computer equipment and storage medium
CN116796857A (en) LLM model training method, device, equipment and storage medium thereof
CN112084779A (en) Entity acquisition method, device, equipment and storage medium for semantic recognition
CN114817478A (en) Text-based question and answer method and device, computer equipment and storage medium
CN114090792A (en) Document relation extraction method based on comparison learning and related equipment thereof
CN113569118A (en) Self-media pushing method and device, computer equipment and storage medium
CN115730237B (en) Junk mail detection method, device, computer equipment and storage medium
CN115238077A (en) Text analysis method, device and equipment based on artificial intelligence and storage medium
CN112364649B (en) Named entity identification method and device, computer equipment and storage medium
CN114091451A (en) Text classification method, device, equipment and storage medium
CN113569741A (en) Answer generation method and device for image test questions, electronic equipment and readable medium
CN117932082A (en) Text content reference digestion method, device, equipment and storage medium thereof
CN112417858A (en) Entity weight scoring method, system, electronic equipment and storage medium
CN115495541B (en) Corpus database, corpus database maintenance method, apparatus, device and medium
CN114647733B (en) Question and answer corpus evaluation method and device, computer equipment and storage medium
CN117056836B (en) Program classification model training and program category identification method and device
CN117493563A (en) Session intention analysis method, device, equipment and storage medium thereof
CN116502624A (en) Corpus expansion method and device, computer equipment and storage medium
CN117932049A (en) Medical record abstract generation method, device, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination