CN111126039B - Relation extraction-oriented sentence structure information acquisition method - Google Patents

Relation extraction-oriented sentence structure information acquisition method Download PDF

Info

Publication number
CN111126039B
CN111126039B CN201911355241.7A CN201911355241A CN111126039B CN 111126039 B CN111126039 B CN 111126039B CN 201911355241 A CN201911355241 A CN 201911355241A CN 111126039 B CN111126039 B CN 111126039B
Authority
CN
China
Prior art keywords
sentence
entity
relation
entities
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911355241.7A
Other languages
Chinese (zh)
Other versions
CN111126039A (en
Inventor
秦永彬
杨卫哲
程华龄
陈艳平
黄瑞章
王凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou University
Original Assignee
Guizhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou University filed Critical Guizhou University
Priority to CN201911355241.7A priority Critical patent/CN111126039B/en
Publication of CN111126039A publication Critical patent/CN111126039A/en
Application granted granted Critical
Publication of CN111126039B publication Critical patent/CN111126039B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a relation-oriented extraction sentence structure information acquisition method, which comprises the following steps: extracting a relation mention statement which comprises two entities and has a known entity semantic relation category from a data set; secondly, separating and marking the entities in the relation mentioning statement extracted in the first step by using entity markers and separators; thirdly, performing vector mapping on the text based on a pre-training word vector lookup table or a random word vector lookup table; carrying out convolution operation on a vector matrix representing the text through a neural network to extract sentence structure characteristics; fifthly, performing maximum pooling operation on the convolved result to further obtain abstract features; and step six, predicting a classification result by a full connection and Softmax layer. By marking and separating sentence entities before the convolutional neural network, semantic information of contents of each part can be better obtained, sentence structural characteristics taking the entities as the center are obtained, relationship extraction is carried out, and a better performance can be achieved.

Description

Relation extraction-oriented sentence structure information acquisition method
Technical Field
The invention relates to a processing method of input data into a neural network, in particular to a relation-oriented extraction sentence structure information acquisition method, and belongs to the technical field of natural language processing.
Background
With the rapid worldwide popularization of computers and the rapid development of internet technology, various data such as video, audio, pictures, texts and the like are increased rapidly, and a large amount of information appears in the presence of users in an electronic digital form. In order to solve the serious challenge brought by information explosion, a professional automation tool is urgently needed to extract real valuable information from massive data, and information extraction is carried out. The information extraction technology is widely applied in the field of natural language processing, and the relation extraction is an important component in the extraction of text information. Named entities refer to proper nouns in the text representing names of people, places and organization names, and relationship extraction refers to extracting semantic relationships existing between entity pairs in the text from the text in which the entity pairs are marked. For example, the sentence in the ACE RDC2005 dataset "but at least 1000 people are in the firm as estimated by the european security and collaboration organization, for the two named entities" 1000 people "and" jail "present in the sentence, the relationship extraction system can recognize that there is a" PHYS "(geographical relationship) relationship between the two entities.
The information extraction aims at extracting structured information from large-scale unstructured or semi-structured natural language texts, and the main tasks include entity extraction, relation extraction and event extraction. The main content of the relation extraction research is to extract semantic relations between entities from text content, the semantic relations are used as important semantic knowledge carriers in the text, the relation extraction plays an important role in information extraction, after the relation extraction is provided as one of subtasks of the information extraction, the relation extraction is highly emphasized by academic circles, and a series of extensive researches are carried out.
Named entities in the text exist in the form of expression of continuous characters, semantic relation recognition is carried out on the entity pairs by using a relation extraction method after the entities are recognized in the text and marked, and the method mainly can be used for eliminating ambiguity problems caused by different meanings of the same word expressed in different contexts through different word expression methods. Therefore, the entity marks in the text enable the original unified whole to be segmented, and the characteristics of the semantic extraction of each part after the entity segmentation can be used for extracting the entity semantic relationship. The same characters in different contexts are often rich in different semantic information, and in order to ensure the integrity of the original text semantics, it is necessary that each part of the text after entity segmentation is respectively pooled to extract features.
From the language aspect, Chinese and English culture have difference, Chinese and western thinking modes are different, and the difference of Chinese and English information structures and information implementation modes is large due to the difference of background environments generated by Chinese characters and English. The language is divided into analytic type and comprehensive type. The main feature of analytic languages is that the language order is fixed, while the main feature of synthetic languages is that the language order is flexible. English belongs to Indonesian system, is a comprehensive language, has more language structure depending on formal analysis and logical reasoning, has strict language grammar and more clause forms, so sentences are generally longer; the Chinese language belongs to the Tibetan language system and is an analytic language, the language sequence is generally fixed and has no tortuous change, the words are combined into sentences depending on the language sequence and the virtual words, and short sentences in the Chinese language are common. Chinese is an analytic language, which has more stringent structural requirements for word combinations. For example, the text "the bridge of the Changjiang river in Nanjing is located in the Drum district of Nanjing city," and the entity extraction needs to be performed before the relationship extraction, but because the Chinese character combination is easy to generate ambiguity and has a high requirement on the structure of the text combination, different combinations can generate different meanings, and further the result of natural language processing can be influenced. As in the above sentence, the possible results of text combining are as follows: "Nanjing city", "Nanjing city chang", "Yangtze bridge", "Changjiang bridge" … …, it is obvious that the entity pair obtained by different combinations will have a significant influence on the relationship between the extracted entities. The entity "Nanjing city" and "Changjiang river bridge" have an inclusion relationship between places, and if the "Changjiang river bridge" is identified as a name entity, it will have an inclusion relationship between people and places with the entity "Nanjing city Tuohan", etc. In the same sentence, the entity marking result is only "Nanjing", "Yangtze River Bridge" or "Gulou District" in English, which is expressed as "Nanjing Yangtze River Bridge" in English, and will not generate the ambiguous marking like "Nanjing city long" in the Chinese text. It can be seen that if in the relationship extraction, entities of different combinations give different text combination structures in the text, the results of the relationship extraction will be distinct. Therefore, highlighting the structure of the entity in the text and the combination mode of the entity, and acquiring more semantic information through the structural feature will affect a plurality of natural language processing tasks such as relation extraction.
From a theoretical level, the technical research of relationship extraction can provide theoretical support for other natural language processing technologies, and is a natural language processing project worthy of proceeding. The relation extraction has important research significance in the aspects of semantic role marking, chapter understanding and machine translation. In 2013, structural information is extracted by a mode matching method, and a dynamic mode library is used for improving the extraction accuracy, but the recognition effect is influenced by the structure of word segmentation and the existence of professional vocabularies. The existing machine learning method for relation extraction is divided into a supervised method, a semi-supervised method, an unsupervised method and the like. Supervised machine learning methods generally view relationship extraction as a classification problem, i.e., classifying relationships in different sentences for different entities, generally requires defining the category of the relationship in advance. Socher et al began 2012 to solve the relational extraction problem using a recurrent neural network that first parsed sentences and then learned a vector representation for each node on the syntactic tree. Through the recurrent neural network, iterative combination can be carried out from the word vector at the lowest end of the syntax tree according to the syntax structure of the sentence, and finally the vector representation of the sentence is obtained and used for relation classification. The method can effectively consider the syntactic structure information of the sentence, but cannot consider the positions and semantic information of the two entities in the sentence. The semi-supervised method such as the bootstrap method reduces the dependence on the labeled linguistic data in the training process, reduces the cost of manual labeling, but has the semantic drift problem. The unsupervised method mainly uses a clustering algorithm, can be applied to the field of large-scale open information, but is difficult to accurately describe the relationship names. The unsupervised entity relation extraction method does not need to depend on entity relation labeling linguistic data, and two processes of relation instance clustering and relation type word selection are realized. The entity pairs with high similarity are firstly grouped into a class according to the appearance context of the entity pairs, and then representative words are selected to mark the relation.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: on the basis of fully utilizing complete information of sentence texts, an entity marking strategy is adopted, a neural network technology is introduced, the characteristic that high-dimensional abstract features are automatically extracted by neural network layering is fully exerted, and structural features obtained by convolution pooling of all parts of texts marked by entities are extracted. The entity semantic relation extraction is carried out by entity marks in the sentence, so that the neural network obtains relative position information and semantic relation information between words and entity pairs except the entities in the sentence, thereby obtaining structural information of the sentence taking the two entities as the center, and avoiding the characteristic sparsity problem generated by the traditional machine learning method to a certain extent, thereby improving the relation extraction performance and effectively solving the problem that the sentence structural information cannot be well utilized.
The technical scheme of the invention is as follows: a relation-oriented extraction sentence structure information acquisition method comprises the following steps: extracting a relation mention statement which contains two entities and has a known entity semantic relation category from a data set (ACE or SemEval data set); secondly, separating and marking the entities in the relation mentioning statement extracted in the first step by using entity markers and separators; thirdly, performing vector mapping on the text based on a pre-training word vector lookup table or a random word vector lookup table; carrying out convolution operation on a vector matrix representing the text through a neural network to extract sentence structure characteristics; fifthly, performing maximum pooling operation on the convolved result to further obtain abstract features; and step six, predicting a classification result by a full connection and Softmax layer.
In the first step, sentences with entity pairs are extracted from a large amount of unstructured text data sets. The method is mainly applied to Chinese data, and the adopted data set is ACE RDC 2005. The data is stored in an xml format file, xml is an inherent hierarchical data format, and the most natural representation method is to use a tree, so that a tree structure analysis method is adopted to obtain the relation mention statement from the data set. The data extraction method provided in the first step is to use an xml. The module implements a simple and efficient API for parsing and creating xml data, extracting statements and entities from xml files in a dataset. The ElementTree represents that the whole XML document is a tree, elements represent single nodes in the tree, and the nodes of the tree in the data set are referred to by entities, relations, entity headers and the like. Interactions with the entire document (reading and writing files) are typically done at the ElementTree level, and interactions with individual XML elements and their sub-elements are done at the element level. In the first step, the tree structure is used for analyzing the articles in the ACE RDC2005 data set, and information such as relation mention sentences, entity contents, relation types and the like is extracted from the data set according to a storage format of 'entity 1 entity2 relation mention sentence semantic relation'. The characteristics of automatic characteristic extraction of neural network layering are fully exerted, sentence structure information formed by entities in sentences and vocabularies except the entities is obtained, and loss of semantic information is effectively prevented.
In the second step, the only two entities in the relation mentioning sentence extracted in the first step are extracted to the top of the sentence, the mark symbols are respectively used for marking the starting position and the ending position of the two entities, then the marked entity pair in the sentence is copied to the starting position of the sentence, the entity and the entity are separated by the character 0, the entity and the sentence are expected to be sensed by the neural network, and the neural network can acquire the structural information of the sentence in this way.
In the third step, the CNN neural network model based on construction comprises an input layer, a hidden layer and an output layer, wherein word vector mapping is carried out on a text, sentences are converted into vector matrixes and used as the input of a network.
And according to the character vector characteristics and the format required in natural language processing, carrying out vector mapping on characters in the text by using a randomly initialized character vector lookup table and a loaded pre-training character vector lookup table to obtain a vector representation matrix X of the text.
In the fourth step, a convolution operation is performed on the vector matrix X after mapping through the word vector lookup table, and the convolution result is C, where C ═ conv (X).
The scheme performs entity marking and separation on sentences extracted from an ACE RDC2005 Chinese data set by using special markers and separators in a data processing part, and the sentences are mentioned as sentence headers. After the abstract structure features obtained through convolution are input into a neural network, the maximum pooling is carried out on the abstract structure features, high-level abstract features comprehensively expressed by the context to the entity semantic relation are obtained, then the classification result of the Chinese relation extraction task is obtained, and the better performance can be achieved.
The invention has the beneficial effects that: compared with the prior art, the technical scheme of the invention adopts entity marking and separation strategies on the basis of fully utilizing complete information of sentence texts, introduces a neural network technology, fully exerts the characteristic of the neural network for layering and automatically extracting high-dimensional abstract features, extracts the pooling features of each part of the texts marked and separated by the entities, and avoids the feature sparseness problem generated by the traditional machine learning method to a certain extent, thereby improving the performance of relation extraction, combining the characteristic of the neural network for layering and automatically extracting the abstract features with the advantage that sentences subjected to entity marking and separation more enhance the influence of vocabularies except the entities in the texts on the semantic relation of the whole part, and obtaining excellent performance in the aspect of relation extraction.
Drawings
FIG. 1 is a schematic drawing of the extraction technique of the present invention;
FIG. 2 is a drawing model diagram of the present invention;
FIG. 3 is a schematic diagram of the entity tagging and separation method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
Example 1: as shown in fig. 1 to 3, a method for obtaining sentence structure information extracted in a relationship-oriented manner includes the following steps: extracting a relation mention statement which contains two entities and has a known entity semantic relation category from a data set (ACE or SemEval data set); secondly, separating and marking the entities in the relation mentioning statement extracted in the first step by using entity markers and separators; thirdly, performing vector mapping on the text based on a pre-training word vector lookup table or a random word vector lookup table; carrying out convolution operation on a vector matrix representing the text through a neural network to extract sentence structure characteristics; fifthly, performing maximum pooling operation on the convolved result to further obtain abstract features; and step six, predicting a classification result by a full connection and Softmax layer.
In step one, sentences with entity pairs are extracted from a large amount of unstructured text data sets. The method is mainly applied to Chinese data, and the adopted data set is ACE RDC 2005. The data is stored in an xml format file, xml is an inherent hierarchical data format, and the most natural representation method is to use a tree, so that a tree structure analysis method is adopted to obtain the relation mention statement from the data set. The data extraction method provided in the first step is to use an xml. The module implements a simple and efficient API for parsing and creating xml data, extracting statements and entities from xml files in a dataset. The ElementTree represents that the whole XML document is a tree, elements represent single nodes in the tree, and the nodes of the tree in the data set are referred to by entities, relations, entity headers and the like. Interactions with the entire document (reading and writing files) are typically done at the ElementTree level, and interactions with individual XML elements and their sub-elements are done at the element level. In the first step, the tree structure is used for analyzing the articles in the ACE RDC2005 data set, and information such as relation mention sentences, entity contents, relation types and the like is extracted from the data set according to a storage format of 'entity 1 entity2 relation mention sentence semantic relation'. The characteristics of automatic characteristic extraction of neural network layering are fully exerted, sentence structure information formed by entities in sentences and vocabularies except the entities is obtained, and loss of semantic information is effectively prevented.
In the second step, only two entities in the relation mention sentences extracted in the first step are referred to the top of the sentence and are marked with symbols
Figure BDA0002335720920000061
Marking the beginning and end positions of the entity1 with symbols
Figure BDA0002335720920000062
Marking the starting position and the ending position of the entity2, copying the marked entity pair in the sentence to the starting position of the sentence, and separating the entity from the entity, the entity and the sentence by using a character '0', so that the neural network can sense the existence and the position of the entity pair in the sentence, and the neural network can acquire the structural information of the sentence in this way.
Assuming that the original sentence is S, when the sentence satisfies the general format, i.e. two entities in the sentence are composed of multiple chinese characters, and the Left part (Left) of the entity1 exists, the Middle part (Middle) between the entity1 and the entity2 is not empty, and the Right part (Right) of the entity2 also exists, the original sentence S can be expressed as:
s=(s1,s2,...,si,si+1,...,si+k,si+k+1,...,sj,sj+1,...,sj+1,sj+i+1,...,sn),
wherein s isi+1,...,si+kAnd sj+1,...,sj+iRepresenting two entities in the original sentence, and processing the sentence S into S _bythe entity marking method proposed in the step two:
Figure BDA0002335720920000071
s(j+1),...,s(j+t),Mark(2End),s(j+t+1),...,sn) wherein the start and end tags of the entity are
Figure BDA0002335720920000072
Figure BDA0002335720920000073
For representing the boundaries of an entity, it may be replaced with various symbols. The entity tag used in this embodiment is: entity1 starts<*>Entity1 ends</*>Entity2 begins<#>Entity2 ends</#>。
In the third step, the structural composition of the CNN neural network model based on the construction is shown in fig. 2, and the CNN neural network model includes an input layer, a hidden layer, and an output layer. The text is subjected to word vector mapping, and sentences are converted into vector matrixes to be used as input of the network. The neural network constructed in our experiment is CNN, and the hidden layer comprises a convolution layer, a pooling layer and a full-connection layer. The method mainly comprises the steps of extracting required character vector characteristics and formats in a task according to a Chinese relation aiming at a Lookup Table Lookup Table part in the model, randomly initializing the Lookup Table Lookup Table or loading a pre-training character vector Lookup Table, mapping characters in a text marked by an entity into vectors, and obtaining a vector representation matrix X of the text.
In step four, the vector matrix X after being mapped by the word vector lookup table is convolved, and the convolution result is C, where C is conv (X). The multilayer convolution is to carry out layer-by-layer mapping, a complex function is integrally formed, the training process is to learn the weight required by each local mapping, the process can be regarded as a function fitting process, and features can be extracted through convolution operation. The entity in the sentence is marked with the beginning and the end in the first step and the second step, so that the mark of the entity boundary is obtained, the vector mapping is obtained, then the abstract feature of the sentence is obtained through the convolution operation in the third step, and the feature obtained in the third step is called as the sentence structure feature.
And step five, extracting abstract features by further performing maximum pooling operation on the result generated after convolution, wherein the abstract features can be extracted by the part on the premise of keeping original sentence information as much as possible, so that the size of a sentence is reduced, the receptive field of a convolution kernel is increased, high-level features are extracted, the parameter quantity of a neural network is reduced, and over-fitting training is prevented. The marks and the separation of the entity parts in the text are mainly utilized, so that the vectorized result has better perception capability for the neural network, and sentence structure information is obtained.
The structural characteristics of the sentence are obtained, namely, the sentence subjected to relationship extraction is subjected to Entity marking by using special symbols according to two entities appearing in the sentence in a data preprocessing mode, and before the sentence is placed with Entity1 and Entity2 after marking, an Entity separator is placed between the Entity1 and the Entity2 and between the Entity and the three parts of Left, Middle and Right in the sentence. Because Chinese characters forming the entity in the Chinese or English words forming the compound entity in English belong to an entity whole, the entity and the words except the entity in the sentence can be respectively considered as a whole, the influence of each part on the semantic relation of the entity is obtained after separation, and the structural characteristics of the sentence are obtained by utilizing a neural network. The meanings of "Left", "Middle", "Right", "Entity 1" and "Entity 2" mentioned above are as follows.
Left: the content of the left part of entity1 in the sentence;
entity 1: an entity 1;
middle part: the content of the part between the entity1 and the entity2 in the sentence;
entity 2: an entity 2;
right: the content of the right part of the sentence entity 2.
And step six, performing Softmax operation on the result obtained by the previous vectorization, convolution operation and pooling operation to obtain the output of the neural network.
The present invention will be further described with reference to the following examples:
firstly, executing a step one, namely obtaining sentences and entities of an ACE RDC2005 Chinese data set by using a tree analysis method; then executing the step two, and marking, separating and precondition the entities in the obtained relation mention sentences; then, executing a third step, mapping words in the text marked by the entity into vectors by using a vector lookup table, and obtaining a vector representation matrix X of the text; then executing step four, carrying out convolution operation on the vector quantization matrix X; then, executing a fifth step, and further performing maximum pooling operation on the result generated after convolution to extract abstract features; and finally, executing the step six, namely pooling, full connection and Softmax, and outputting a result.
For example, step one is executed to extract a sentence "from the data set, but according to the estimation of the european security and cooperation organization, at least 1000 people are in the firm, two entities in the sentence are entity 1" 1000 people "and entity 2" firm ", respectively, and the semantic relationship between the entities is" PHYS "(geographical position relationship); then step two is performed, which is processed into by entity labeling and partitioning "
Figure BDA0002335720920000081
1000 persons
Figure BDA0002335720920000082
Firm
Figure BDA0002335720920000083
But according to the evaluation of the European safety and cooperative organization, at least
Figure BDA0002335720920000084
1000 persons
Figure BDA0002335720920000085
Is closed at
Figure BDA0002335720920000086
Firm
Figure BDA0002335720920000087
The method can enable the neural network to obtain the position of an entity, the beginning and the end of the entity, then obtain the integral representation of the entity, and further obtain the sentence structure information centered by the entity in a sentence; then executing a step three, and vectorizing all characters in the sentence by searching a Google-News pre-training character vector lookup table and a character vector lookup table randomly generated within a certain range; then, step four is executed, and the convolution operation is carried out on the vector quantization matrix. Step five is executed again, abstract features are extracted through further maximal pooling operation on results generated after convolution; and finally, performing feature fusion by using full connection, and obtaining the sentence structure feature through a convolution neural network through a Softmax layer prediction result.
In conclusion, experiments prove that the relation-oriented extraction sentence structure information acquisition method provided by the invention has excellent performance.
The scheme of the invention marks and separates sentence entities before the convolutional neural network, can better obtain semantic information of each part of content and the influence of each part of content in the sentence on the semantic relationship of the two entities, obtains the structural characteristics of the sentence taking the entity as the center, extracts the relationship and can achieve better performance.
The present invention is not described in detail, but is known to those skilled in the art. Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims (4)

1. A sentence structure information acquisition method for relational extraction is characterized in that: the method comprises the following steps: extracting a relation mention statement which comprises two entities and has a known entity semantic relation category from a data set; secondly, separating and marking the entities in the relation mentioning statement extracted in the first step by using entity markers and separators; thirdly, performing vector mapping on the text based on a pre-training word vector lookup table or a randomly generated word vector lookup table; carrying out convolution operation on a vector matrix representing the text through a neural network to extract sentence structure characteristics; fifthly, performing maximum pooling operation on the convolved result to further obtain abstract features; step six, predicting classification results of the fully-connected Softmax layer;
in the second step, the only two entities in the relation mentioning sentence extracted in the first step are extracted to the top of the sentence, the mark symbols are respectively used for marking the starting position and the ending position of the two entities, then the marked entity pair in the sentence is copied to the starting position of the sentence, the entity and the entity are separated by the character 0, the entity and the sentence are expected to be sensed by the neural network, and the neural network can acquire the structural information of the sentence in this way.
2. The relation-oriented extraction sentence structure information acquisition method according to claim 1, wherein: in the third step, the CNN neural network model based on construction comprises an input layer, a hidden layer and an output layer, wherein word vector mapping is carried out on a text, sentences are converted into vector matrixes to be used as the input of the network, and according to the character vector characteristics and the format required in natural language processing, the characters in the text are subjected to vector mapping by using a randomly generated word vector lookup table and a loaded pre-training word vector lookup table to obtain a vector representation matrix X of the text.
3. The relation-oriented extraction sentence structure information acquisition method according to claim 1, wherein: in the fourth step, the vector matrix X after being mapped by the word vector lookup table is subjected to convolution operation, and the convolution result is C, where C = conv (X).
4. The relation-oriented extraction sentence structure information acquisition method according to claim 1, wherein: the data set is an ACE or SemEval data set.
CN201911355241.7A 2019-12-25 2019-12-25 Relation extraction-oriented sentence structure information acquisition method Active CN111126039B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911355241.7A CN111126039B (en) 2019-12-25 2019-12-25 Relation extraction-oriented sentence structure information acquisition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911355241.7A CN111126039B (en) 2019-12-25 2019-12-25 Relation extraction-oriented sentence structure information acquisition method

Publications (2)

Publication Number Publication Date
CN111126039A CN111126039A (en) 2020-05-08
CN111126039B true CN111126039B (en) 2022-04-01

Family

ID=70503580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911355241.7A Active CN111126039B (en) 2019-12-25 2019-12-25 Relation extraction-oriented sentence structure information acquisition method

Country Status (1)

Country Link
CN (1) CN111126039B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084790B (en) * 2020-09-24 2022-07-05 中国民航大学 Relation extraction method and system based on pre-training convolutional neural network
CN112685549B (en) * 2021-01-08 2022-07-29 昆明理工大学 Document-related news element entity identification method and system integrating discourse semantics
CN112784605A (en) * 2021-02-09 2021-05-11 柳州智视科技有限公司 Entity name recognition method based on sentences

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102013204245A1 (en) * 2013-03-12 2014-09-18 Bayerische Motoren Werke Aktiengesellschaft Method and apparatus for providing extracted data
KR20180094664A (en) * 2017-02-16 2018-08-24 포항공과대학교 산학협력단 Method for information extraction from text data and apparatus therefor
CN110188193A (en) * 2019-04-19 2019-08-30 四川大学 A kind of electronic health record entity relation extraction method based on most short interdependent subtree
CN110516239A (en) * 2019-08-26 2019-11-29 贵州大学 A kind of segmentation pond Relation extraction method based on convolutional neural networks

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8280719B2 (en) * 2005-05-05 2012-10-02 Ramp, Inc. Methods and systems relating to information extraction
US10140344B2 (en) * 2016-01-13 2018-11-27 Microsoft Technology Licensing, Llc Extract metadata from datasets to mine data for insights
US11269929B2 (en) * 2018-05-04 2022-03-08 International Business Machines Corporation Combining semantic relationship information with entities and non-entities for predictive analytics in a cognitive system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102013204245A1 (en) * 2013-03-12 2014-09-18 Bayerische Motoren Werke Aktiengesellschaft Method and apparatus for providing extracted data
KR20180094664A (en) * 2017-02-16 2018-08-24 포항공과대학교 산학협력단 Method for information extraction from text data and apparatus therefor
CN110188193A (en) * 2019-04-19 2019-08-30 四川大学 A kind of electronic health record entity relation extraction method based on most short interdependent subtree
CN110516239A (en) * 2019-08-26 2019-11-29 贵州大学 A kind of segmentation pond Relation extraction method based on convolutional neural networks

Also Published As

Publication number Publication date
CN111126039A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
Zubrinic et al. The automatic creation of concept maps from documents written using morphologically rich languages
CN111126039B (en) Relation extraction-oriented sentence structure information acquisition method
CN114254653A (en) Scientific and technological project text semantic extraction and representation analysis method
CN110516239B (en) Segmentation pooling relation extraction method based on convolutional neural network
CN113312922B (en) Improved chapter-level triple information extraction method
CN111061882A (en) Knowledge graph construction method
CN111813931A (en) Method and device for constructing event detection model, electronic equipment and storage medium
CN113515632A (en) Text classification method based on graph path knowledge extraction
CN110728151A (en) Information deep processing method and system based on visual features
Milosevic et al. Disentangling the structure of tables in scientific literature
CN114265936A (en) Method for realizing text mining of science and technology project
Kalo et al. Knowlybert-hybrid query answering over language models and knowledge graphs
CN113963748B (en) Protein knowledge graph vectorization method
CN118364816A (en) Open information extraction method based on lexical information enhancement
Safar Digital library of online PDF sources: An ETL approach
WO2021226184A1 (en) Automated knowledge base
Netisopakul et al. A survey of Thai knowledge extraction for the semantic web research and tools
Du et al. Knowledge extract and ontology construction method of assembly process text
Wassie et al. A word sense disambiguation model for amharic words using semi-supervised learning paradigm
CN111858885B (en) Keyword separation user question intention identification method
CN111914570A (en) Entity representation method integrating multiple element analysis
Altaf et al. Efficient natural language classification algorithm for detecting duplicate unsupervised features
Roopa et al. The Role of Artificial Neural Network in Word Sense Disambiguation (WSD)—A Survey
Yang et al. Applications research of machine learning algorithm in translation system
Beumer Evaluation of Text Document Clustering using k-Means

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant