CN112507126B - Entity linking device and method based on recurrent neural network - Google Patents

Entity linking device and method based on recurrent neural network Download PDF

Info

Publication number
CN112507126B
CN112507126B CN202011416594.6A CN202011416594A CN112507126B CN 112507126 B CN112507126 B CN 112507126B CN 202011416594 A CN202011416594 A CN 202011416594A CN 112507126 B CN112507126 B CN 112507126B
Authority
CN
China
Prior art keywords
entity
link
candidate
result
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011416594.6A
Other languages
Chinese (zh)
Other versions
CN112507126A (en
Inventor
洪万福
钱智毅
赵青欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yuanting Information Technology Co ltd
Original Assignee
Xiamen Yuanting Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Yuanting Information Technology Co ltd filed Critical Xiamen Yuanting Information Technology Co ltd
Priority to CN202011416594.6A priority Critical patent/CN112507126B/en
Publication of CN112507126A publication Critical patent/CN112507126A/en
Application granted granted Critical
Publication of CN112507126B publication Critical patent/CN112507126B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an entity linking device and method based on a recurrent neural network, wherein the device comprises the following steps: a text input unit; an entity recognition unit which performs an inference process of an entity recognition model on the target text input from the text input unit and outputs candidate entities; the knowledge base matching unit is used for performing database matching according to the candidate entities and outputting a preselected link result corresponding to each candidate entity; the text vectorization unit is used for vectorizing the input target text, the candidate entity and the preselected link result corresponding to the candidate entity to combine the target text, the candidate entity and the preselected link result into an embedded vector for output; the link model reasoning unit is used for carrying out entity link reasoning according to the embedded vector and outputting a reasoning result; and the link result output unit is used for determining the entity link result of each candidate entity in the knowledge base according to the inference result. The implementation mode can fully utilize external knowledge, thereby improving the accuracy of entity link.

Description

Entity linking device and method based on recurrent neural network
Technical Field
The invention relates to the field of artificial intelligence, in particular to an entity linking device and method based on a recurrent neural network.
Background
With the arrival of a new wave of artificial intelligence wave in the years, deep learning related technologies are applied to various industries and fields. The knowledge graph is a very important research direction in deep learning, at present, after the knowledge graph is extracted through entity-relation, the research technology is basically formed, but problems still exist when the knowledge graph is really used to a large extent, and mainly because natural language has multiple characteristics of complexity, multiple meanings and ambiguity.
Entity linking is the task of linking entities mentioned in the text with the corresponding entities in their knowledge base, and is to solve the ambiguity problem existing between the entities. Its potential applications include information extraction, information retrieval, and knowledge base population, but this task is challenging due to name variations and entity ambiguity.
The ambiguity of an entity is shown in two aspects, firstly, there may be multiple synonyms (need to link) in the entity, i.e. one entity can be represented by multiple entity references, for example, the Massachusetts institute of technology and MIT refer to the same entity in Massachusetts in America. Meanwhile, an entity also has a phenomenon of word ambiguity (needs to be disambiguated), that is, the same entity name can represent multiple entities, for example, apple can be fruit or Apple company. The entity linking algorithm needs to link the entity to the correct mapped entity in the knowledge-graph by the target knowledge-graph through the entity's designation and its context's text information.
Disclosure of Invention
In view of the above-mentioned defects of the prior art, the present invention aims to provide an entity linking apparatus and method, so as to fully utilize external knowledge, optimize the link model reasoning process and improve the accuracy of entity linking.
In order to achieve the above object, the present invention provides an entity linking apparatus based on a recurrent neural network, including:
the text input unit is used for inputting text data, performing data processing on the text data and outputting a target text;
the entity recognition unit is used for executing a reasoning process of an entity recognition model on the input target text and outputting candidate entities;
the knowledge base matching unit is used for inputting the candidate entities of the entity identification unit, performing database matching according to the candidate entities and outputting a preselected link result corresponding to each candidate entity;
the text vectorization unit is used for vectorizing the input target text, the candidate entity and the preselected link result corresponding to the candidate entity to combine the input target text, the candidate entity and the preselected link result into an embedded vector for outputting;
the link model reasoning unit is used for inputting the embedded vector, carrying out entity link reasoning according to the embedded vector and outputting a reasoning result;
and the link result output unit is used for inputting the inference result and determining the entity link result of each candidate entity in the knowledge base, namely outputting the id, the entity name, the entity type and the text information of each candidate entity in the knowledge base.
Further, the text input unit includes:
the file reading module is used for receiving input text data;
and the data processing module is used for converting the input text data into a specified structured text to form a target text.
Further, the entity identification unit includes:
the data preprocessing module is configured to perform a data preprocessing process on input text data, wherein the data preprocessing process comprises data cleaning, screening and word segmentation;
a vectorization processing module configured to perform a vector encoding operation after data preprocessing, and output an embedded vector;
the entity recognition model storage module is used for storing the trained entity recognition model;
the entity recognition model loading module is used for loading an entity recognition model and determining all candidate entities in the target text;
and a candidate entity result output module for performing normalization processing for outputting the candidate entity.
Further, the knowledge base matching unit includes:
the knowledge base storage module is used for storing a pre-prepared knowledge base file;
and the knowledge base matching module is used for matching the input candidate entity with the knowledge base file and acquiring a preselected link result of the candidate entity in the knowledge base.
Further, the link model inference unit includes:
the entity link model storage module is used for storing the entity link model which is trained;
and the entity link model loading module is used for loading the entity link model and the embedded vector and executing model reasoning.
Further, the link result output unit comprises an entity link result output module, and the entity link result output module is used for performing standardization processing on the acquired entity link results of all candidate entities after the model reasoning is finished, and outputting the results according to a set output mode and an output format.
The invention also provides an entity linking method based on the recurrent neural network, which comprises the following steps:
step S1: inputting text data, performing data processing on the text data, and outputting a target text;
step S2: executing a reasoning process of an entity recognition model on the target text, and outputting candidate entities;
and step S3: obtaining a preselected link result corresponding to each candidate entity through knowledge base matching;
and step S4: vectorizing the target text, the candidate entities and the preselected link results corresponding to the candidate entities, and combining the vectorized vectors into an embedded vector;
step S5: executing the reasoning process of the entity link model according to the embedded vector, and outputting a reasoning result;
step S6: and determining an entity link result of each candidate entity in the knowledge base according to the reasoning result.
Further, the vectorization processing in step S4 specifically includes: processing a target text, a candidate entity and a preselected link result of the candidate entity by adopting a mode of splicing a plurality of semantic codes, wherein the plurality of semantic codes comprise: word encoding, word segmentation, and n _ gram models.
Further, the step S5 specifically includes: inputting the context semantics of the candidate entities and the preselected link result corresponding to each candidate into a trained entity link model, and outputting a reasoning result; the entity link model adopts a recurrent neural network, and the frame of the entity link model is based on BilSTM + CNN + CRF, wherein the BilSTM is used for acquiring the information of the whole sequence of the preselected link result; the CNN is used for extracting local features of the current word; the CRF is used for sequence labeling to provide relevance separation at the output level.
Further, the entity link result in step S6 at least includes the specific id, entity name, entity type and text information of the entity in the knowledge base.
The invention realizes the following technical effects:
according to the entity linking method, the neural network models are arranged in the entity linking process and the link reasoning process to carry out model reasoning to obtain the candidate entities, and the entity linking result is obtained by carrying out model reasoning according to the context semantics of the candidate entities and the preselected link result corresponding to each candidate, so that the external knowledge can be fully utilized, the link model reasoning process is optimized, and the accuracy of entity linking is improved.
Drawings
FIG. 1 is a system framework and flow diagram of a physically linking device of the present invention;
FIG. 2 is a schematic diagram of the entity linking method of the present invention;
FIG. 3 is a flow chart of entity recognition model training of the present invention;
FIG. 4 is a flow chart of the training of the entity-link model of the present invention.
Detailed Description
To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. With these references, one of ordinary skill in the art will appreciate other possible embodiments and advantages of the present invention. Elements in the figures are not drawn to scale and like reference numerals are generally used to indicate like elements.
The invention will now be further described with reference to the accompanying drawings and detailed description.
Referring to fig. 1 to 4, the present invention discloses an entity linking apparatus based on a recurrent neural network, which is a set of application programs or a set of control components applied to a server side, and includes: the system comprises a text input unit, an entity recognition unit, a knowledge base matching unit, a text vectorization unit, a link model reasoning unit and a link result output unit. The following description will be made for each functional unit:
1. and the text input unit is used for inputting text data, performing data processing on the text data and outputting a target text. The system specifically comprises a file reading module and a data processing module, wherein the file reading module is used for receiving input text data. The file reading module is configured to receive a text format of a text, wherein the text format can be an unstructured text (such as txt), a semi-structured text, a structured text (such as excel, csv, json) and other text uploading modes; and the data processing module is configured to convert the input text data into a structured text according to different text formats to form a target text output.
2. And the entity identification unit is used for executing an inference process of an entity identification model on the input target text and outputting candidate entities. The method specifically comprises the following steps: the system comprises a data preprocessing module, a vectorization processing module, an entity recognition model storage module, an entity recognition model loading module and a candidate entity result output module. The data preprocessing module is configured to perform a data preprocessing process on input text data, wherein the data preprocessing process comprises data cleaning, screening and word segmentation; a vectorization processing module configured to perform vector encoding operations after the data pre-processing module to provide an embedded vector for the entity identification module; the entity recognition model storage module is used for storing the entity recognition model which is trained; the entity recognition model loading module is used for loading the entity recognition model and determining and acquiring all candidate entities in the target text; and the candidate entity result output module is used for executing standardization processing and outputting the candidate entities.
In the entity recognition unit, the entity recognition model is obtained by training, as shown in fig. 3, the training process of the entity recognition model includes: inputting text data as training data, and preprocessing the training data, such as performing data cleaning, screening, word segmentation and other operations; the vectorization processing module carries out vectorization processing on the training data; inputting the embedded vector output after vectorization processing into an entity recognition model framework for training; and monitoring the training effect of the entity model, and storing the trained entity recognition model.
3. And the knowledge base matching unit is used for inputting the candidate entities of the entity identification unit, performing database matching according to the candidate entities and outputting a preselected link result corresponding to each candidate entity. The method specifically comprises the following steps: the system comprises a knowledge base storage module and a knowledge base matching module. The knowledge base storage module is used for storing a pre-prepared knowledge base file; and the knowledge base matching module is used for matching the input candidate entity with the knowledge base file and acquiring a preselected link result of the candidate entity in the knowledge base.
4. And the text vectorization unit is used for vectorizing the input target text, the candidate entity and the preselected link result corresponding to the candidate entity to combine the input target text, the candidate entity and the preselected link result into an embedded vector for outputting. The vectorization processing refers to processing a target text and a candidate entity by adopting a splicing mode of multiple semantic codes, wherein the multiple semantic codes comprise: word encoding, word segmentation, and multiple word segmentation encoding of N _ gram models (also referred to as N-grams).
5. And the link model reasoning unit is used for inputting the embedded vector, carrying out entity link reasoning according to the embedded vector and outputting a reasoning result. The method specifically comprises the following steps: the entity link model storage module is used for storing the entity link model which is trained; and the entity link model loading module is used for loading the entity link model and the embedded vector and executing model reasoning. The entity link model is a cyclic neural network model, and the cyclic neural network framework is mainly BilSTM + CNN + CRF, wherein the BilSTM can acquire the information of the whole sequence, and the context information of the input sequence can be fully utilized in the entity link task to more accurately match a certain entity in the knowledge base unit. When processing sequence data, because BilSTM adds a backward calculation process compared with the one-way LSTM, the process can utilize the following information of the sequence, and finally, the values calculated in the forward direction and the backward direction are simultaneously output to an output layer, thus obtaining all information of the sequence through the two directions. However, biLSTM may discard some important information due to the problem of model capacity when learning some longer sentences, so a CNN layer is added to the model to extract local features of the current word. The CRF (conditional random field) is used as a sequence labeling module to separate the relevance of an output layer, the relevance of context information can be fully considered when an entity in a knowledge base is predicted, more importantly, the Viterbi algorithm for solving the CRF utilizes a dynamic programming method to calculate a path with the maximum probability, which is quite consistent with the aim of an entity linking task, and the problem that an illegal sequence of a 'B-LOC' label followed by an 'I-ORG' label appears in a result can be avoided.
In the link model inference unit, the entity link model is obtained through training. Referring to fig. 4, the training process of the entity link model is similar to the training process of the entity recognition model, and includes: inputting text data as training data, and preprocessing the training data; associating the training data with a knowledge base, and verifying the correctness of the training data; vectorizing the training data; introducing the embedded vector output after vectorization processing into an entity link model framework for training; monitoring the training effect of the entity link model and storing the trained entity link model.
6. And the link result output unit is used for inputting the inference result and determining an entity link result of each candidate entity in the knowledge base, namely outputting the id, the entity name, the entity type and the text information of each candidate entity in the knowledge base. The method specifically comprises the following steps: and the entity link result output module is used for executing standardized processing on the acquired entity link results of all candidate entities after the model reasoning is finished and outputting the result after the standardization. The standardized processing comprises that the names and the output formats of the output fields need to be standardized, and the output modes and the output formats which are specified in advance and the corresponding meanings of each field need to be standardized so that the system can correctly receive the output result and carry out correct processing.
The invention also discloses an entity linking method, which comprises the following steps:
step S1: inputting original text data, performing structural conversion processing on the text data, and outputting a structured target text.
Step S2: and executing the inference process of the entity recognition model on the target text, and outputting a candidate entity.
And step S3: and obtaining a preselected link result corresponding to each candidate entity through knowledge base matching.
And step S4: and vectorizing the target text, the candidate entities and the preselected link results corresponding to the candidate entities, and combining the vectorized vectors into an embedded vector.
More specifically, a target text and a candidate entity are processed in a splicing mode of multiple semantic codes, wherein the multiple semantic codes comprise: word coding, word segmentation, various word segmentation coding in an n _ gram mode and the like.
Step S5: and performing entity link reasoning according to the embedded vector, and outputting a reasoning result.
And transmitting the embedded vector into the entity link model to execute a model reasoning process, namely inputting the context semantics of the candidate entities and the preselected link result corresponding to each candidate into a trained entity link model built on the basis of the recurrent neural network, and outputting a reasoning result.
Step S6: and determining an entity link result of each candidate entity in the knowledge base according to the reasoning result. I.e., the entity's specific id in the knowledge base, entity name, entity type, textual information, etc.
Example 2
To facilitate understanding of those skilled in the art, a specific implementation example of the entity linking method of the present invention is as follows:
step S1: in this embodiment, a data interaction example is expressed in a JSON format (JSON is a lightweight data storage format unrelated to development language, and is a standard specification of a data format), and an example of a format of a sent request data is as follows:
Figure BDA0002820243410000091
examples of return data are as follows:
Figure BDA0002820243410000092
Figure BDA0002820243410000101
and S2, identifying candidate entities. And after the target text is obtained, identifying candidate entities in the target text by adopting an entity identification technology. The entity identification technology here refers to: and vectorizing the target text to generate an embedded vector, transmitting the embedded vector into the trained entity recognition model, and acquiring an entity recognition result. Operationally, the process comprises: the client sends an entity identification request to the server service portal, and the server returns a result to the client. The request data format sent by the client to the server is as follows:
Figure BDA0002820243410000102
wherein "component" and "entity _ identification" represent an entity identification request; "text" represents the original text, "Montreal Bank Foundation and precious Metal derivatives deal with Tai Wong," gold still benefits from USD due to the high uncertainty of this year's incentive program. "is the content of the target text.
An example of the returned result from the server to the client is as follows:
Figure BDA0002820243410000111
Figure BDA0002820243410000121
the above is an example of entity linking in the financial field, that is, the server side outputs the name of the related entity mentioned in the text as an output result through a series of receiving data, model reasoning and finally outputting a reasoning result.
Briefly described: since the target entities in the financial field can be divided into: products (which may be subdivided into metal futures, agricultural futures, foreign exchange futures, etc.), organization names (which may be subdivided into marketing companies, futures companies, other companies, etc.). The relevant entities in the text are thus taken as one output in the above example.
And 3, matching the preselected link result of the candidate entity by the knowledge base. And the client side issues a knowledge base matching instruction, and the server side quickly queries the preselected link result of the candidate entity in the knowledge base after receiving the instruction. An example of the result sent by the server to the client is as follows:
Figure BDA0002820243410000131
the result shows that in the knowledge base, the entity "tengwangge's preselected link result includes" thining "," music "," poery ", i.e. article, musical composition, poetry composition.
Step 4, inputting a text vectorization unit according to a target text input by a user and the obtained candidate entity and the obtained preselected link result of the candidate entity, wherein the unit processes the target text and the candidate entity by adopting a splicing mode of multiple semantic codes, and the multiple semantic codes comprise: character coding, word segmentation, various word segmentation codes in an n _ gram mode and the like; to output an embedded vector to the server for the next vector application, and to output a vector result to the client.
And 5, transmitting the embedded vector obtained in the step 4 into a recurrent neural network model, and performing a model reasoning process, namely inputting the context semantics of the candidate entity and the possible result corresponding to each candidate into a trained model built based on the recurrent neural network. The reasoning process is operated at the server side, and the client side obtains a reasoning completion progress;
and 6, according to the inference result obtained in the step 5, the server side obtains an entity link result corresponding to each candidate entity, namely the concrete id, entity name, entity type, text information and the like of the entity in the knowledge base. Meanwhile, the inference result is output to the client, the client performs interface display, and examples of information sent to the server by the client are as follows:
Figure BDA0002820243410000141
examples of information sent by the server to the client are as follows:
Figure BDA0002820243410000142
Figure BDA0002820243410000151
Figure BDA0002820243410000161
according to the reasoning result, the 'Tengwangge' is a musical composition.
According to the entity linking method, the neural network models are arranged in the entity linking process and the link reasoning process to carry out model reasoning to obtain the candidate entities, and the entity linking result is obtained by carrying out model reasoning according to the context semantics of the candidate entities and the preselected link result corresponding to each candidate, so that the external knowledge can be fully utilized, the link model reasoning process is optimized, and the entity linking accuracy is improved.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (2)

1. An entity linking device based on a recurrent neural network, comprising:
the text input unit is used for inputting text data, performing structured conversion processing on the text data and outputting a target text; the structural conversion processing adopts JSON format;
the entity recognition unit is used for executing a reasoning process of an entity recognition model on the input target text and outputting candidate entities;
the knowledge base matching unit is used for inputting the candidate entities of the entity identification unit, performing database matching according to the candidate entities and outputting a preselected link result corresponding to each candidate entity;
the text vectorization unit is used for vectorizing the input target text, the candidate entity and a preselected link result corresponding to the candidate entity, combining the vectorized target text, the candidate entity and the preselected link result into an embedded vector and outputting the embedded vector;
the link model reasoning unit is used for inputting the embedded vector, carrying out a reasoning process of the entity link model according to the embedded vector and outputting a reasoning result; the entity-linking model is a recurrent neural network-based model;
the link result output unit is used for inputting the reasoning result and determining an entity link result of each candidate entity in the knowledge base, namely outputting id, entity name, entity type and text information of each candidate entity in the knowledge base;
the text input unit includes: the file reading module is used for receiving input text data; the data processing module is used for converting the input text data into a specified structured text to form a target text;
the entity identification unit includes: the data preprocessing module is configured to perform data preprocessing on input text data; a vectorization processing module configured to perform a vector encoding operation after data preprocessing, and output an embedded vector; the entity recognition model storage module is used for storing the trained entity recognition model; the entity recognition model loading module is used for loading an entity recognition model and determining all candidate entities in the target text; and a candidate entity result output module for performing a normalization process for outputting a candidate entity; the vectorization processing module processes a target text, a candidate entity and a preselected link result of the candidate entity in a mode of splicing a plurality of semantic codes, wherein the plurality of semantic codes comprise: word encoding, word segmentation and n _ gram models;
the knowledge base matching unit includes: the knowledge base storage module is used for storing a pre-prepared knowledge base file; the knowledge base matching module is used for matching the input candidate entity with the knowledge base file and acquiring a preselected link result of the candidate entity in the knowledge base;
the link model inference unit includes: the entity link model storage module is used for storing the entity link model which is trained; the entity link model loading module is used for loading the entity link model and the embedded vector and executing model reasoning; the entity link model adopts a recurrent neural network, and the framework of the entity link model is based on BilSTM + CNN + CRF, wherein the BilSTM is used for acquiring the information of the whole sequence of the preselected link result; the CNN is used for extracting local features of the current word; CRF is used for sequence marking to provide relevance separation of output level;
and the link result output unit comprises an entity link result output module, and the entity link result output module is used for executing standardized processing on the acquired entity link results of all candidate entities after the model reasoning is finished and outputting the results according to a set output mode and an output format.
2. An entity linking method based on a recurrent neural network is characterized by comprising the following steps:
step S1: inputting text data, performing structured conversion processing on the text data, and outputting a target text; the structural conversion processing adopts JSON format;
step S2: executing a reasoning process of an entity recognition model on the target text, and outputting a candidate entity;
and step S3: obtaining a preselected link result corresponding to each candidate entity through knowledge base matching;
and step S4: vectorizing the target text, the candidate entities and the preselected link results corresponding to the candidate entities, and combining the vectorized results into an embedded vector;
step S5: executing the reasoning process of the entity link model according to the embedded vector, and outputting a reasoning result;
step S6: determining an entity link result of each candidate entity in a knowledge base according to the reasoning result;
the vectorization processing in step S4 specifically includes: processing a target text, a candidate entity and a preselected link result of the candidate entity by adopting a mode of splicing a plurality of semantic codes, wherein the plurality of semantic codes comprise: word encoding, word segmentation and n _ gram models;
the step S5 specifically includes: inputting the context semantics of the candidate entities and the preselected link result corresponding to each candidate entity into a trained entity link model, and outputting a reasoning result; the entity link model adopts a recurrent neural network, and the framework of the entity link model is based on BilSTM + CNN + CRF, wherein the BilSTM is used for acquiring the information of the whole sequence of the preselected link result; the CNN is used for extracting local features of the current word; CRF is used for sequence marking to provide relevance separation of output level;
the entity link result in step S6 at least includes the specific id, entity name, entity type and text information of the entity in the knowledge base.
CN202011416594.6A 2020-12-07 2020-12-07 Entity linking device and method based on recurrent neural network Active CN112507126B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011416594.6A CN112507126B (en) 2020-12-07 2020-12-07 Entity linking device and method based on recurrent neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011416594.6A CN112507126B (en) 2020-12-07 2020-12-07 Entity linking device and method based on recurrent neural network

Publications (2)

Publication Number Publication Date
CN112507126A CN112507126A (en) 2021-03-16
CN112507126B true CN112507126B (en) 2022-11-15

Family

ID=74970716

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011416594.6A Active CN112507126B (en) 2020-12-07 2020-12-07 Entity linking device and method based on recurrent neural network

Country Status (1)

Country Link
CN (1) CN112507126B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674317A (en) * 2019-09-30 2020-01-10 北京邮电大学 Entity linking method and device based on graph neural network
CN111563149A (en) * 2020-04-24 2020-08-21 西北工业大学 Entity linking method for Chinese knowledge map question-answering system
CN111639498A (en) * 2020-04-21 2020-09-08 平安国际智慧城市科技股份有限公司 Knowledge extraction method and device, electronic equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295796B (en) * 2016-07-22 2018-12-25 浙江大学 entity link method based on deep learning
CN108536679B (en) * 2018-04-13 2022-05-20 腾讯科技(成都)有限公司 Named entity recognition method, device, equipment and computer readable storage medium
CN110110324B (en) * 2019-04-15 2022-12-02 大连理工大学 Biomedical entity linking method based on knowledge representation
EP3646245A4 (en) * 2019-04-25 2020-07-01 Alibaba Group Holding Limited Identifying entities in electronic medical records
CN110413756B (en) * 2019-07-29 2022-02-15 北京小米智能科技有限公司 Method, device and equipment for processing natural language
CN110928961B (en) * 2019-11-14 2023-04-28 出门问问(苏州)信息科技有限公司 Multi-mode entity linking method, equipment and computer readable storage medium
CN111428443B (en) * 2020-04-15 2022-09-13 中国电子科技网络信息安全有限公司 Entity linking method based on entity context semantic interaction
CN111783462B (en) * 2020-06-30 2023-07-04 大连民族大学 Chinese named entity recognition model and method based on double neural network fusion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674317A (en) * 2019-09-30 2020-01-10 北京邮电大学 Entity linking method and device based on graph neural network
CN111639498A (en) * 2020-04-21 2020-09-08 平安国际智慧城市科技股份有限公司 Knowledge extraction method and device, electronic equipment and storage medium
CN111563149A (en) * 2020-04-24 2020-08-21 西北工业大学 Entity linking method for Chinese knowledge map question-answering system

Also Published As

Publication number Publication date
CN112507126A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
CN111737474B (en) Method and device for training business model and determining text classification category
CN110222188B (en) Company notice processing method for multi-task learning and server
CN109062893B (en) Commodity name identification method based on full-text attention mechanism
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN112434535B (en) Element extraction method, device, equipment and storage medium based on multiple models
CN116304748B (en) Text similarity calculation method, system, equipment and medium
CN110852110A (en) Target sentence extraction method, question generation method, and information processing apparatus
US20220044119A1 (en) A deep learning model for learning program embeddings
CN112084435A (en) Search ranking model training method and device and search ranking method and device
CN112328655B (en) Text label mining method, device, equipment and storage medium
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN116150367A (en) Emotion analysis method and system based on aspects
CN112100377A (en) Text classification method and device, computer equipment and storage medium
CN113239702A (en) Intention recognition method and device and electronic equipment
CN114647713A (en) Knowledge graph question-answering method, device and storage medium based on virtual confrontation
CN110275953B (en) Personality classification method and apparatus
CN111666375A (en) Matching method of text similarity, electronic equipment and computer readable medium
CN116955644A (en) Knowledge fusion method, system and storage medium based on knowledge graph
CN111611796A (en) Hypernym determination method and device for hyponym, electronic device and storage medium
CN112507126B (en) Entity linking device and method based on recurrent neural network
CN115718889A (en) Industry classification method and device for company profile
CN112487811B (en) Cascading information extraction system and method based on reinforcement learning
CN114595329A (en) Few-sample event extraction system and method for prototype network
CN113536790A (en) Model training method and device based on natural language processing
CN116502624A (en) Corpus expansion method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant