CN115391569B - Method for automatically constructing industry chain map from research report and related equipment - Google Patents
Method for automatically constructing industry chain map from research report and related equipment Download PDFInfo
- Publication number
- CN115391569B CN115391569B CN202211325252.2A CN202211325252A CN115391569B CN 115391569 B CN115391569 B CN 115391569B CN 202211325252 A CN202211325252 A CN 202211325252A CN 115391569 B CN115391569 B CN 115391569B
- Authority
- CN
- China
- Prior art keywords
- target
- attribute
- triple
- entity
- relationship
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000011160 research Methods 0.000 title claims abstract description 105
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000000605 extraction Methods 0.000 claims abstract description 72
- 230000001360 synchronised effect Effects 0.000 claims abstract description 22
- 238000007781 pre-processing Methods 0.000 claims abstract description 14
- 239000012634 fragment Substances 0.000 claims description 38
- 238000001228 spectrum Methods 0.000 claims description 27
- 238000010586 diagram Methods 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 14
- 238000003860 storage Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 8
- 238000012795 verification Methods 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 claims description 6
- 238000012015 optical character recognition Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000004140 cleaning Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000002457 bidirectional effect Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000009897 systematic effect Effects 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 238000013145 classification model Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Artificial Intelligence (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Manufacturing & Machinery (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for automatically constructing an industrial chain map from research and report and related equipment. The method comprises the following steps: loading a research and report oriented industrial chain chart mode; acquiring an original research message file set, and respectively preprocessing each original research message book in the original research message set to obtain a target text; simultaneously extracting target triples and target independent entities in a sentence sequence by adopting an entity relation synchronous extraction model; extracting a target attribute pair in a sentence sequence containing index description by adopting an index attribute extraction model; matching and aligning the obtained one or more target attribute pairs with the initial second triple to obtain a target second triple; adding the target first triple and the target second triple to the target industry chain atlas. The method for automatically constructing the industrial chain map from the research report can effectively meet the requirement of automatically constructing the large-scale industrial chain map through the research report under the complex situation, and reduces the manpower loss and the time cost.
Description
Technical Field
The invention relates to the technical field of word processing, in particular to a method for automatically constructing an industrial chain map from research and report and related equipment.
Background
With the development of financial technology and the continuous expansion of the global capital market, the financial field generates a great amount of industry information data every day, wherein abundant valuable information is contained. The knowledge graph describes and stores knowledge contained in data in a structured form, can express information of the Internet into a form closer to human cognition, has strong capacity of organizing, managing and understanding mass information, utilizes the graph to carry out incidence relation mining and reasoning analysis, and has wide application in academia and industry. The industry chain map is based on industry chain data of industry subdivision products, and can better describe the upstream and downstream relationship, the product hierarchical relationship, the main and operation relationship between a company and a product, and the relationship between related economic indexes and the company, the product and the industry. The industrial chain map can provide an accurate and instant solution for a client, is beneficial to relevant personnel to capture the internal dynamic state of the industry, and brings certain economic benefit for enterprises. Reasoning is carried out along the industrial chain map, potential accident risks and investment business opportunities can be found, and then people are assisted in making intelligent investment decisions, so that actual financial business scenes such as investment, wind control, investment and marketing services are enabled.
However, the current financial field still lacks a large-scale, open-source industry chain panoramic knowledge map. The research on the construction of the industrial chain map in the systematic exposition field is relatively deficient, and most of the researches fail to effectively focus on the complex index attributes of the relationship. The traditional manual-based key information extraction method cannot meet the requirement of rapidly processing massive information, and has high labor cost and time consumption.
Thus, there is a need for improvements and enhancements in the art.
Disclosure of Invention
Aiming at the defects in the prior art, a method for automatically constructing an industry chain map from research and report and related equipment are provided, and the method and the related equipment aim to solve the problems that systematic methods for automatically constructing the industry chain map in the prior art are few and complex index attributes of relationships cannot be effectively concerned.
In a first aspect of the present invention, there is provided a method for automatically constructing an industry chain map from a research report, comprising:
loading a research-oriented industrial chain graph spectrum mode containing a target entity type, a target relationship type and a target attribute type, predefining entity type information needing to be extracted and triple type information needing to be extracted in the industrial chain graph spectrum mode, wherein the triples are first triples or second triples, the first triples and the second triples are of a structure of head entity type-relationship type-tail entity type, in the second triples, the relationship type further comprises at least one attribute pair corresponding to the relationship type, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is the name of the attribute pair, and the second attribute is the value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attributes in the complex attribute pair comprise values of the complex attribute pair and at least one constraint on the complex attribute pair;
acquiring an original research message file set, and respectively preprocessing each original research message book in the original research message set to obtain a target text, wherein the target text consists of a non-empty sentence sequence;
simultaneously extracting a target triple and a target independent entity in the sentence sequence by adopting an entity relation synchronous extraction model, wherein the target triple is a target first triple or an initial second triple;
extracting a target attribute pair in a sentence sequence containing index description by adopting an index attribute extraction model, wherein the target attribute pair comprises a target first attribute and a target second attribute;
for a sentence sequence containing attribute pairs, matching and aligning the obtained one or more target attribute pairs with the initial second triple to obtain a target second triple, wherein the target second triple contains the initial second triple and one or more target attribute pairs corresponding to the initial second triple;
adding the target first triple and the target second triple to a target industry chain graph.
The method for automatically constructing an industry chain map from research reports, wherein the preprocessing of each original research report in the original research report set comprises:
performing text recognition on the original text book by an optical character recognition technology to obtain a first text which is convenient to read and write;
performing text cleaning on the first text, and removing noise characters in the first text to obtain a second text, wherein the noise characters are characters without actual description effect on a real text;
and carrying out sentence segmentation processing on the second word text, and dividing the second word text into a non-empty sentence sequence to obtain the target text.
The method for automatically constructing the industrial chain map from the research and the report is characterized in that the entity relationship synchronous extraction model comprises a sentence sequence coding module, a subtask feature selection module and a subtask target information prediction module;
the sentence sequence coding module codes the sentence sequence by adopting a general pre-training model based on a training set and a verification set of the labeled entity and the relationship information to obtain a target vector;
the subtask feature selection module is used for acquiring feature information corresponding to an entity extraction subtask and a relation prediction subtask respectively, and the entity extraction subtask is used for extracting a target entity fragment in the sentence sequence according to the target vector;
the subtask target information prediction module judges whether the type of the target entity fragment belongs to the target entity type or not based on the characteristic information of the entity extraction subtask, if so, the target entity fragment is reserved, and if not, the target entity fragment is discarded;
the subtask target information prediction module is also used for judging the relationship between the entity pairs based on the characteristic information of the relationship prediction subtask to obtain the characteristic representation of the target relationship, judging whether the type of the target relationship belongs to the target relationship type or not according to the characteristic representation of the target relationship, if so, keeping the target relationship, and if not, discarding the target relationship;
and obtaining a target triple according to the target entity fragment and the target relation corresponding to the target entity fragment, wherein the target entity fragment without the corresponding relation is the target independent entity information.
The method for automatically constructing the industry chain map from the research and the report, wherein the extracting of the target attribute pair in the sentence sequence containing the index description comprises the following steps:
judging whether the sentence sequence contains indexes, if so, extracting a target attribute pair in the sentence sequence by adopting the index attribute extraction model;
the target attribute pair is a simple attribute pair or a complex attribute pair.
The method for automatically constructing an industry chain atlas from a research report, wherein the matching and aligning the obtained one or more target attribute pairs with the initial second triplet includes:
and matching and aligning the target second attribute in the obtained target attribute pair with the corresponding initial second triple, wherein a relationship between part of the target second attribute and the corresponding initial second triple is aligned, and values corresponding to the other part of the target second attribute are matched and aligned with a head entity or a tail entity in the triple, so that a target second triple is obtained, and the target second triple comprises the initial second triple and attribute information corresponding to the initial second triple.
The method for automatically constructing the industry chain map from the research and report is characterized in that the list of the target entity types is dynamically adjusted according to the target text and the target task scene requirements;
the list of the target relation types is dynamically adjusted according to the target entity types and the target texts;
and the list of the target attribute types is dynamically adjusted according to the target attribute types and the target text.
In a second aspect of the present invention, there is provided an apparatus for automatically constructing an industry chain map from a research report, comprising:
the system comprises an industry chain diagram spectrum pattern loading module, a relation model loading module and a relation model loading module, wherein the industry chain diagram spectrum pattern loading module is used for loading a research-oriented industry chain diagram spectrum pattern containing a target entity type, a target relation type and a target attribute type, entity type information needing to be extracted and triple type information needing to be extracted are predefined in the industry chain diagram spectrum pattern, the triple is a first triple or a second triple, the first triple and the second triple are in a structure of head entity type-relation type-tail entity type, in the second triple, the relation type further comprises at least one attribute pair corresponding to the relation type, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is the name of the attribute pair, and the second attribute is the value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attributes in the complex attribute pair comprise values of the complex attribute pair and at least one constraint on the complex attribute pair;
the system comprises a target text acquisition module, a target text acquisition module and a search module, wherein the target text acquisition module is used for acquiring an original research message book set and respectively preprocessing each original research message book in the original research message set to obtain a target text, and the target text consists of a non-empty sentence sequence;
the entity relationship synchronous extraction module is used for simultaneously extracting a target triple and a target independent entity in the sentence sequence by adopting an entity relationship synchronous extraction model, wherein the target triple is a target first triple or an initial second triple;
the index attribute extraction module is used for adopting an index attribute extraction model, the index attribute extraction model is used for extracting a target attribute pair in a sentence sequence containing index description, and the target attribute pair comprises a target first attribute and a target second attribute;
the attribute-relationship alignment module is configured to, for a sentence sequence including an attribute pair, perform matching alignment on the obtained one or more target attribute pairs and the initial second triple to obtain a target second triple, where the target second triple includes the initial second triple and one or more target attribute pairs corresponding to the initial second triple;
a target industry chain map spectrum obtaining module, configured to add the target first triple and the target second triple to a target industry chain map.
In a third aspect of the present invention, a terminal is provided, which includes: the system comprises a processor and a storage medium which is in communication connection with the processor, wherein the storage medium is suitable for storing a plurality of instructions, and the processor is suitable for calling the instructions in the storage medium to execute the steps of realizing the method for automatically constructing the industry chain map from the research report.
In a fourth aspect of the present invention, there is provided a storage medium storing one or more programs executable by one or more processors to implement the steps of any one of the above methods for automatically constructing an industry chain graph from a research report.
Has the advantages that: the method comprises the steps of loading a research-report-oriented industrial chain spectrum mode containing a target entity type, a target relationship type and a target attribute type, then respectively preprocessing each original research report in an original research report set to obtain a target text, simultaneously extracting a target triple and a target independent entity in a sentence sequence by using an entity relationship synchronous extraction model, wherein the target triple is a target first triple or an initial second triple, extracting a target attribute pair in the sentence sequence containing index description by using an index attribute extraction model, the target attribute pair comprises a target first attribute and a target second attribute, for the sentence sequence containing the attribute pair, matching one or more obtained target attribute pairs with the initial second triple to obtain a target second triple, and the target second triple comprises the initial second attribute and the initial second triple or a plurality of obtained target attribute pairs are added to the target triple and the target second triple corresponding to the industrial chain spectrum. The method for automatically constructing the industrial chain atlas from the research report can effectively meet the requirement of automatically constructing the large-scale industrial chain atlas from the research report under the complex situation, effectively pay attention to the complex index attribute of the relation, meet the extraction requirement of the entity relation and the related attribute by using a more accurate and efficient model, and reduce the manpower loss and the time cost.
Drawings
FIG. 1 is a flow chart of an embodiment of a method for automatically constructing an industry chain graph from a research report provided by the present invention;
FIG. 2 is a flowchart illustrating a preprocessing process of an original research report according to an embodiment of the method for automatically constructing an industry chain map from the research report of the present invention;
FIG. 3 is a schematic structural diagram of an entity relationship synchronous extraction model in an embodiment of a method for automatically constructing an industry chain graph from a research report according to the present invention;
FIG. 4 is a flowchart illustrating the extraction of index attributes according to an embodiment of the method for automatically constructing an industry chain graph from a research report;
FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for automatically constructing an industry chain map from a research report according to the present invention;
fig. 6 is a schematic structural diagram of an embodiment of a terminal provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The method for automatically constructing the industry chain map from the research report can be applied to a terminal with computing power, and the terminal can execute the task of extracting the target first triple and the target second triple in the original research report file set and constructing the target industry chain map by the method for automatically constructing the industry chain map from the research report provided by the invention.
Example one
In this embodiment, a method for automatically constructing an industry chain map from a survey is provided. As shown in fig. 1, the method for automatically constructing an industry chain map from a research report provided by the present invention comprises the steps of:
s100, loading a research-and-report-oriented industrial chain diagram spectrum mode containing a target entity type, a target relationship type and a target attribute type, predefining entity type information to be extracted and triple type information to be extracted in the industrial chain diagram spectrum mode, wherein the triplets are first triplets or second triplets, the first triplets and the second triplets are of a structure of head entity type-relationship type-tail entity type, in the second triplets, the relationship types further comprise at least one attribute pair corresponding to the relationship types, the attribute pairs are simple attribute pairs or complex attribute pairs, the simple attribute pairs comprise a first attribute and a second attribute, the first attribute is the name of the attribute pair, and the second attribute is the value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attributes in the complex attribute pair comprise values of the complex attribute pair and at least one constraint on the complex attribute pair.
Specifically, before processing a research report book, a research report-oriented industry chain chart pattern containing entity, relationship, attribute types and definitions needs to be loaded first. Besides simple definition between entity relations, the triple overlapping situation is covered, and at the same time, necessary relation attributes are defined to describe a large amount of index data in the research text.
The loading of the industry chain diagram mode which is oriented to the research and the report and contains a target entity type, a target relation type and a target attribute type comprises the following steps:
and loading the predefined target entity type according to the target task scene requirement. And the list of the target entity types is dynamically adjusted according to the target text and the target task scene requirement. Specifically, according to a target task scenario requirement, based on analysis of the content of the research report, the predefined target entity types are loaded, specifically including but not limited to companies, characters, brands, products, industries, regions, services, risk events, and the like, and the list of the target entity types is dynamically adjusted according to the target text and the target task scenario requirement.
And loading the predefined target relationship types among the target entity types according to the requirements of target task scenes, wherein the list of the target relationship types is dynamically adjusted according to the target entity types and the target texts. Specifically, according to the requirements of a target task scene, the target relationship types among the predefined target entity types are loaded, including but not limited to the upstream and downstream relationships among industries and businesses, the production and sales relationships between companies and products, and the like, and the list of the target relationship types is dynamically adjusted according to the target entity types and the target text.
And loading the predefined target attribute type according to the requirements of a target task scene, wherein the target attribute type comprises a first attribute and a second attribute, the first attribute is a specific name of the index corresponding to the second triple, and the second attribute is a value of the index corresponding to the second triple and other descriptions for restraining the index corresponding to the second triple. Wherein the list of target attribute types is dynamically adjusted according to the target attribute types and the target text. Specifically, according to the requirements of a target task scene, loading predefined target attribute types, wherein the target attributes refer to index data in a research and report text, that is, the index data in the research and report text is used for describing attributes shared by the entities and the relationships between the entities. The target attribute is divided into the first attribute and the second attribute, wherein a specific name of an index corresponds to the first attribute, the first attribute is a specific name of an index corresponding to the second triple, and a value of the index corresponding to the second triple is the second attribute. Since the first attribute only contains the specific name of the index corresponding to the second triple, the target relationship attribute is divided into the first attribute and the second attribute, and the situation that one sentence contains a plurality of indexes can be effectively described.
In the industrial chain diagram spectrum mode, entity type information to be extracted and type information of a triple to be extracted are further predefined, the triple is a first triple or a second triple, the first triple and the second triple are both relation triples with a structure of 'head entity type-relation type-tail entity type', the first triple is a simple triple and does not contain attribute information, the second triple is a complex triple, the relation type further comprises at least one attribute pair corresponding to the relation type in the second triple, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is a name of the attribute pair, and the second attribute is a value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attribute in the complex attribute pair comprises a value of the complex attribute pair and at least one constraint on the complex attribute pair
And after loading the industrial chain chart pattern, acquiring an original research message file set, and preprocessing the original research message file set.
S200, acquiring an original research message file set, and respectively preprocessing each original research message book in the original research message set to obtain a target text, wherein the target text consists of a non-empty sentence sequence.
Referring to fig. 2, the preprocessing each original research report in the original research report set includes:
s210, performing text recognition on the original research text through an optical character recognition technology to obtain a first text convenient to read and write.
In this embodiment, the original research and report text is an industry research and report document, and the original research and report text is converted into the first text convenient for reading and writing by an Optical Character Recognition (OCR) technology.
S220, text cleaning is carried out on the first text, noise characters in the first text are removed, and a second text is obtained, wherein the noise characters are characters without actual description effect on a real text.
Further, text cleaning is carried out on the first text, redundant spaces, special identifiers and more than 6 continuous solid point numbers in the first text are removed in a unified mode, and the second text is obtained.
And S230, performing sentence segmentation processing on the second text, and dividing the second text into non-empty sentence sequences to obtain the target text.
The principle of sentence segmentation processing is to ensure that entities contained in sentences are not segmented as far as possible. First, common sentence separators are used, including but not limited to. ","! "," \8230; "8230;", "; "etc., dividing the first textual text into a sequence of sentences. And for the long sentence with more than 512 characters after division, performing secondary segmentation by using the terms of "" and the like on the basis of following the sentence division principle to obtain the target text.
The method for automatically constructing the industry chain map from the research and report further comprises the following steps:
s300, simultaneously extracting a target triple and a target independent entity in the sentence sequence by adopting an entity relation synchronous extraction model, wherein the target triple is a target first triple or an initial second triple.
Referring to fig. 3, the entity relationship synchronous extraction model includes a sentence sequence encoding module, a subtask feature selection module and a subtask target information prediction module;
s310, the sentence sequence coding module codes the sentence sequence by adopting a general pre-training model based on a training set and a verification set of the labeled entity and the relationship information to obtain a target vector.
Specifically, a training set verification set is manually marked, a universal pre-training model is input, the universal pre-training model is finely adjusted based on the training set and the verification set, a sentence sequence coding model suitable for the method for automatically constructing the industry chain map from the research and the report is obtained, and the sentence sequence in the target text is coded based on the finely adjusted sentence sequence coding model, so that a target vector is obtained.
And S320, the subtask feature selection module is used for acquiring feature information corresponding to an entity extraction subtask and a relation prediction subtask respectively, wherein the entity extraction subtask is used for extracting a target entity fragment in the sentence sequence according to the target vector.
And capturing the respective feature information of the entity extraction subtask and the relationship prediction subtask according to the obtained target vector, and calculating the shared feature information between the entity extraction subtask and the relationship prediction subtask, thereby realizing the feature division of the tasks. Wherein. And the entity extraction subtask is used for extracting a target entity fragment in the sentence sequence according to the target vector.
According to the characteristic information shared between the entity extraction subtask and the relation prediction subtask and the characteristic information specific to the subtask, the characteristics between the entity extraction subtask and the relation prediction subtask are recombined, so that new characteristic information of each subtask is obtained, bidirectional information interaction between subtasks can be promoted, and the interference of redundant characteristics is avoided.
Through the characteristic selection and recombination mechanism, the bidirectional information interaction between the entity extraction subtasks and the relation prediction subtasks is promoted, the influence on precision and efficiency caused by error transmission and redundant calculation is relieved, and meanwhile, nested entities and complex extraction scenes of single entity overlapping and entity pair overlapping in the triple overlapping problem can be effectively dealt with.
S330, the subtask target information prediction module judges whether the type of the target entity fragment belongs to the target entity type based on the characteristic information of the entity extraction subtask, if so, the target entity fragment is reserved, and if not, the target entity fragment is discarded.
Specifically, the subtask target information is based on the entity extraction subtask feature information, a target entity segment in the sentence sequence is extracted according to the target vector, features of a start position and an end position of a connection character level and sentence level features are obtained in the sentence sequence, a target entity segment and feature representations of the target entity segment are obtained, and whether the target entity segment belongs to an entity with a type k is predicted according to the feature representations of the target entity segment, where a value range of k is the target entity type corpus predefined in the industrial chain graph mode in this embodiment.
S340, the subtask target information prediction module further judges the relationship between the entity pairs based on the feature information of the relationship prediction subtask to obtain the feature representation of the target relationship, judges whether the type of the target relationship belongs to the target relationship type according to the feature representation of the target relationship, if so, retains the target relationship, and if not, discards the target relationship.
Specifically, the subtask target information prediction module judges the relationship between the entity pairs based on the feature information of the relationship prediction subtask, and refines the judgment of the relationship between the entity pairs into the type judgment between the corresponding start positions and end positions of the head entity and the tail entity. Taking a start position as an example, taking respective character level features of head and tail entities, connecting sentence level features, judging a relationship between the entities according to a target relationship prediction feature to obtain a feature representation of a target relationship, and predicting whether the target relationship belongs to a relationship with a type l according to the feature representation of the target relationship, wherein a value range of l is the target relationship type corpus predefined in the industrial chain graph spectrum mode in this embodiment. The same applies to the calculation of the type of relationship between the end positions of the entity pairs.
S350, obtaining a target triple according to the target entity fragment and the corresponding target relation thereof, wherein the target entity fragment without the corresponding relation is the target independent entity information
Combining the target entity fragments and the corresponding target relationships thereof, and combining the target entity fragments with the corresponding relationships into a triple, wherein the triple is a relationship triple with a structure of a head entity-relationship-tail entity, and the target entity fragments without the corresponding relationships are the independent entity information, and the target triple is a target first triple or an initial second triple.
Referring again to fig. 1, the method for automatically constructing an industry chain map from a research report further comprises the steps of:
s400, extracting a target attribute pair in the sentence sequence containing the index description by adopting an index attribute extraction model, wherein the target attribute pair comprises a target first attribute and a target second attribute.
Referring to fig. 4, the extracting a target attribute pair in a sentence sequence containing an index description includes:
s410, judging whether the sentence sequence contains indexes, if so, extracting a target attribute pair in the sentence sequence by adopting the index attribute extraction model;
and S420, the target attribute pair is a simple attribute pair or a complex attribute pair.
Specifically, whether the sentence sequence contains indexes or not is judged through a text classification model, if yes, a target attribute pair in the sentence sequence is extracted through the index attribute extraction model, the target attribute pair is one or more, the target attribute pair is a target simple attribute pair or a target complex attribute pair, the target simple attribute pair comprises a target first attribute and a target second attribute, the target first attribute is the name of the index corresponding to the target attribute pair, and the target second attribute is the value of the index corresponding to the target attribute pair; the target complex attribute pair comprises a target first attribute and a plurality of target second attributes, and the target second attributes in the target complex attribute pair comprise values of the target complex attribute pair and at least one constraint on the target complex attribute pair.
And S500, for a sentence sequence containing attribute pairs, matching and aligning the obtained one or more target attribute pairs with the initial second triple to obtain a target second triple, wherein the target second triple contains the initial second triple and one or more target attribute pairs corresponding to the initial second triple.
The matching and aligning the obtained one or more target attribute pairs with the initial second triple includes:
and matching and aligning the target second attribute in the obtained target attribute pair with the corresponding initial second triple, wherein a relationship between part of the target second attribute and the corresponding initial second triple is aligned, and values corresponding to the other part of the target second attribute are matched and aligned with a head entity or a tail entity in the triple, so that a target second triple is obtained, and the target second triple comprises the initial second triple and attribute information corresponding to the initial second triple.
For the complex sentence sequence containing the index description, the initial second triple and a corresponding target attribute pair thereof are respectively obtained through the entity relationship synchronous extraction model and the index attribute extraction model, the target attribute pair comprises the target first attribute and the target second attribute, and partial attributes in the target second attribute are matched with head and tail entities in the triples by matching and aligning the target first attribute and the target second attribute with the triples, so that the alignment between the attributes and the relationships is completed, the information expression of the initial second triple is perfected, and the target second triple is obtained.
S600, adding the target first triple and the target second triple to a target industry chain map.
And constructing a target industry chain map according to the obtained target first triple and the target second triple, adding the target first triple and the target second triple into the industry chain map, constructing a complex situation covering triple overlapping except simple entity relation definition, and defining the target industry chain map with necessary relation attributes for describing a large amount of index data in a research and report text.
The extracted target independent entity can facilitate subsequent reasoning evolution, and when more research reports are added to jointly construct the target industry chain map, more related features can be extracted from the newly added research reports more quickly, so that the subsequent reasoning evolution is facilitated.
The embodiment provides a method for automatically constructing an industrial chain map from research and report, which can automatically convert natural language long text description containing industrial chain knowledge into entities with attributes and relationship links in the map. The embodiment provides a method for automatically constructing an industrial chain atlas from research and report, which uses an entity relationship extraction model, promotes the bidirectional information interaction between tasks, alleviates the influence on precision and efficiency caused by error transmission and redundant computation, and can effectively cope with nested entities and complex extraction scenes of single entity overlapping and entity pair overlapping in the triple overlapping problem. In addition, the attribute extraction can dig out beneficial information contained in a large amount of index data in the research and report text. Further, by aligning the index attributes to the corresponding relationships, a target industrial chain map composed of the target triple represented by the more complete information and the target independent entity information is finally obtained.
In summary, this embodiment provides a method for automatically constructing an industry chain map from a research report, where after a research report-oriented industry chain map pattern including a target entity type, a target relationship type, and a target attribute type is loaded, an original research message document set is used to respectively preprocess each original research message in the original research report set to obtain a target text, then an entity relationship synchronous extraction model is used to simultaneously extract a target triple and a target independent entity in the sentence sequence, where the target triple is a target first triple or an initial second triple, and then an index attribute extraction model is used to extract a target attribute pair in the sentence sequence including an index description, where the target attribute pair includes the target first attribute and the target second attribute, and for the sentence sequence including the attribute pair, the obtained one or more target attribute pairs are aligned with the initial second triple to obtain a target second triple, where the target second triple includes the initial second triple and one or more target attribute pairs corresponding to the initial second triple, and finally the target first triple and the target second triple are added to the industry chain map. The method for automatically constructing the industrial chain atlas from the research report can effectively meet the requirement of automatically constructing the large-scale industrial chain atlas from the research report under the complex situation, effectively pay attention to the complex index attribute of the relation, meet the extraction requirement of the entity relation and the related attribute by using a more accurate and efficient model, and reduce the manpower loss and the time cost.
It should be understood that, although the steps in the flowcharts shown in the drawings of the present specification are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps of the present invention are not limited to being performed in the exact order disclosed, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps of the present invention may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct Rambus Dynamic RAM (DRDRAM), and Rambus Dynamic RAM (RDRAM), among others.
Example two
Based on the above embodiment, the present invention further provides an apparatus for automatically constructing an industry chain map from a research report, a schematic diagram of functional modules of the apparatus is shown in fig. 5, and the apparatus for automatically constructing an industry chain map from a research report includes:
the system comprises an industry chain diagram spectrum pattern loading module, a relation model loading module and a relation model loading module, wherein the industry chain diagram spectrum pattern loading module is used for loading a research-oriented industry chain diagram spectrum pattern containing a target entity type, a target relation type and a target attribute type, entity type information needing to be extracted and triple type information needing to be extracted are predefined in the industry chain diagram spectrum pattern, the triple is a first triple or a second triple, the first triple and the second triple are in a structure of head entity type-relation type-tail entity type, in the second triple, the relation type further comprises at least one attribute pair corresponding to the relation type, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is the name of the attribute pair, and the second attribute is the value of the attribute pair; the complex attribute pair includes a first attribute and a plurality of second attributes, and the second attribute in the complex attribute pair includes a value of the complex attribute pair and at least one constraint on the complex attribute pair, which is specifically described in embodiment one;
a target text acquisition module, configured to acquire an original research report set, and respectively pre-process each original research report in the original research report set to obtain a target text, where the target text is composed of a non-empty sentence sequence, and is specifically described in embodiment one;
an entity relationship synchronous extraction module, configured to extract a target triple and a target independent entity in the sentence sequence simultaneously by using an entity relationship synchronous extraction model, where the target triple is a target first triple or an initial second triple, and the target triple is specifically as described in embodiment one;
an index attribute extraction module, configured to employ an index attribute extraction model, where the index attribute extraction model is configured to extract a target attribute pair in a sentence sequence containing an index description, where the target attribute pair includes a target first attribute and a target second attribute, and is specifically described in embodiment one;
an attribute-relationship alignment module, configured to, for a sentence sequence including an attribute pair, perform matching alignment on the obtained one or more target attribute pairs and the initial second triple to obtain a target second triple, where the target second triple includes the initial second triple and one or more target attribute pairs corresponding to the initial second triple, and the specific example is as described in embodiment one;
a target industry chain graph spectrum obtaining module, configured to add the target first triple and the target second triple to a target industry chain graph, as described in embodiment one.
EXAMPLE III
Based on the method for automatically constructing the industry chain map from the research report in the first embodiment, the invention also provides a terminal, and a schematic block diagram of the terminal can be shown in fig. 6. The terminal comprises a memory 10 and a processor 20, wherein the memory 10 stores a program for automatically constructing an industry chain map from a research report, and the processor 10 executes a computer program to realize at least the following steps:
loading a research-oriented industrial chain graph spectrum mode containing a target entity type, a target relationship type and a target attribute type, predefining entity type information needing to be extracted and triple type information needing to be extracted in the industrial chain graph spectrum mode, wherein the triples are first triples or second triples, the first triples and the second triples are of a structure of head entity type-relationship type-tail entity type, in the second triples, the relationship type further comprises at least one attribute pair corresponding to the relationship type, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is the name of the attribute pair, and the second attribute is the value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attributes in the complex attribute pair comprise values of the complex attribute pair and at least one constraint on the complex attribute pair;
acquiring an original research message file set, and respectively preprocessing each original research message book in the original research message set to obtain a target text, wherein the target text consists of a non-empty sentence sequence;
simultaneously extracting a target triple and a target independent entity in the sentence sequence by adopting an entity relation synchronous extraction model, wherein the target triple is a target first triple or an initial second triple;
extracting a target attribute pair in a sentence sequence containing index description by adopting an index attribute extraction model, wherein the target attribute pair comprises a target first attribute and a target second attribute;
for a sentence sequence containing attribute pairs, matching and aligning the obtained one or more target attribute pairs with the initial second triple to obtain a target second triple, wherein the target second triple contains the initial second triple and one or more target attribute pairs corresponding to the initial second triple;
adding the target first triple and the target second triple to a target industry chain graph.
Wherein the preprocessing each original research report in the original research report set includes:
performing text recognition on the original text book by an optical character recognition technology to obtain a first text which is convenient to read and write;
performing text cleaning on the first text, and removing noise characters in the first text to obtain a second text, wherein the noise characters are characters without actual description effect on a real text;
and carrying out sentence segmentation processing on the second word text, and dividing the second word text into a non-empty sentence sequence to obtain the target text.
The entity relationship synchronous extraction model comprises a sentence sequence coding module, a subtask feature selection module and a subtask target information prediction module;
the sentence sequence coding module codes the sentence sequence by adopting a general pre-training model based on a training set and a verification set of the labeled entity and the relationship information to obtain a target vector;
the subtask feature selection module is used for acquiring feature information corresponding to an entity extraction subtask and a relation prediction subtask respectively, and the entity extraction subtask is used for extracting a target entity fragment in the sentence sequence according to the target vector;
the subtask target information prediction module judges whether the type of the target entity fragment belongs to the target entity type or not based on the characteristic information of the entity extraction subtask, if so, the target entity fragment is reserved, and if not, the target entity fragment is discarded;
the subtask target information prediction module is also used for judging the relationship between the entity pairs based on the characteristic information of the relationship prediction subtask to obtain the characteristic representation of the target relationship, judging whether the type of the target relationship belongs to the target relationship type or not according to the characteristic representation of the target relationship, if so, keeping the target relationship, and if not, discarding the target relationship;
and obtaining a target triple according to the target entity fragment and the target relation corresponding to the target entity fragment, wherein the target entity fragment without the corresponding relation is the target independent entity information.
The extracting of the target attribute pair in the sentence sequence containing the index description comprises the following steps:
judging whether the sentence sequence contains indexes, if so, extracting a target attribute pair in the sentence sequence by adopting the index attribute extraction model;
the target attribute pair is a simple attribute pair or a complex attribute pair.
Wherein the matching and aligning the obtained one or more target attribute pairs with the initial second triple includes:
and matching and aligning the target second attribute in the obtained target attribute pair with the corresponding initial second triple, wherein a relationship between part of the target second attribute and the corresponding initial second triple is aligned, and values corresponding to the other part of the target second attribute are matched and aligned with a head entity or a tail entity in the triple, so that a target second triple is obtained, and the target second triple comprises the initial second triple and attribute information corresponding to the initial second triple.
The list of the target entity types is dynamically adjusted according to the target text and the target task scene requirements;
the list of the target relation types is dynamically adjusted according to the target entity types and the target texts;
and the list of the target attribute types is dynamically adjusted according to the target attribute types and the target text.
Example four
The present invention also provides a storage medium storing one or more programs executable by one or more processors to implement the steps of the method for automatically constructing an industry chain graph from a research report according to the above-described embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (8)
1. A method for automatically constructing an industry chain map from a research report, comprising:
loading a research-oriented industry chain diagram spectrum mode containing a target entity type, a target relationship type and a target attribute type, predefining entity type information to be extracted and triple type information to be extracted in the industry chain diagram spectrum mode, wherein the triple is a first triple or a second triple, the first triple and the second triple are of a structure of head entity type-relationship type-tail entity type, in the second triple, the relationship type further comprises at least one attribute pair corresponding to the relationship type, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is a name of the attribute pair, and the second attribute is a value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attributes in the complex attribute pair comprise values of the complex attribute pair and at least one constraint on the complex attribute pair;
acquiring an original research message file set, and respectively preprocessing each original research message book in the original research message file set to obtain a target text, wherein the target text consists of a non-empty sentence sequence;
simultaneously extracting a target triple and a target independent entity in the sentence sequence by adopting an entity relation synchronous extraction model, wherein the target triple is a target first triple or an initial second triple;
the entity relation synchronous extraction model comprises a sentence sequence coding module, a subtask feature selection module and a subtask target information prediction module;
the sentence sequence coding module codes the sentence sequence by adopting a general pre-training model based on a training set and a verification set of the labeled entity and the relationship information to obtain a target vector;
the subtask feature selection module is used for acquiring feature information corresponding to an entity extraction subtask and a relation prediction subtask respectively, and the entity extraction subtask is used for extracting a target entity fragment in the sentence sequence according to the target vector;
the subtask target information prediction module judges whether the type of the target entity fragment belongs to the target entity type or not based on the characteristic information of the entity extraction subtask, if so, the target entity fragment is reserved, and if not, the target entity fragment is discarded;
the subtask target information prediction module is also used for judging the relationship between the entity pairs based on the characteristic information of the relationship prediction subtask to obtain the characteristic representation of the target relationship, judging whether the type of the target relationship belongs to the target relationship type or not according to the characteristic representation of the target relationship, if so, keeping the target relationship, and if not, discarding the target relationship;
obtaining a target triple according to the target entity fragment and a target relation corresponding to the target entity fragment, wherein the target entity fragment without the corresponding relation is target independent entity information;
extracting a target attribute pair in a sentence sequence containing index description by adopting an index attribute extraction model, wherein the target attribute pair comprises a target first attribute and a target second attribute;
for a sentence sequence containing attribute pairs, matching and aligning the obtained one or more target attribute pairs with the initial second triple to obtain a target second triple, wherein the target second triple contains the initial second triple and one or more target attribute pairs corresponding to the initial second triple;
adding the target first triple and the target second triple to a target industry chain graph.
2. The method of automatically constructing an industry chain atlas from research reports of claim 1, wherein the preprocessing each original research report book in the set of original research report books comprises:
performing text recognition on the original text book by an optical character recognition technology to obtain a first text which is convenient to read and write;
performing text cleaning on the first text, and removing noise characters in the first text to obtain a second text, wherein the noise characters in the first text are characters without actual description on a real text;
and carrying out sentence segmentation processing on the second word text, and dividing the second word text into a non-empty sentence sequence to obtain the target text.
3. The method for automatically constructing an industry chain graph from research reports according to claim 1, wherein the extracting target attribute pairs in a sentence sequence containing index descriptions comprises:
judging whether the sentence sequence contains indexes, if so, extracting a target attribute pair in the sentence sequence by adopting the index attribute extraction model;
the target attribute pair is a simple attribute pair or a complex attribute pair.
4. The method for automatically constructing an industry chain atlas from a research report of claim 1, wherein the matching and aligning the obtained one or more target attribute pairs with the initial second triplet comprises:
and matching and aligning the target second attribute in the obtained target attribute pair with the corresponding initial second triple, wherein a relationship between part of the target second attribute and the corresponding initial second triple is aligned, and values corresponding to the other part of the target second attribute are matched and aligned with a head entity or a tail entity in the triple, so that a target second triple is obtained, and the target second triple comprises the initial second triple and attribute information corresponding to the initial second triple.
5. The method for automatically constructing an industry chain graph from a research report of claim 1, wherein the list of target entity types is dynamically adjusted according to the target text and the target task scenario requirements;
the list of the target relation types is dynamically adjusted according to the target entity types and the target texts;
and the list of the target attribute types is dynamically adjusted according to the target attribute types and the target text.
6. An apparatus for automatically constructing an industry chain atlas from a survey, the apparatus comprising:
the system comprises an industry chain diagram spectrum pattern loading module, a relation model loading module and a relation model loading module, wherein the industry chain diagram spectrum pattern loading module is used for loading a research-oriented industry chain diagram spectrum pattern containing a target entity type, a target relation type and a target attribute type, entity type information needing to be extracted and triple type information needing to be extracted are predefined in the industry chain diagram spectrum pattern, the triple is a first triple or a second triple, the first triple and the second triple are in a structure of head entity type-relation type-tail entity type, in the second triple, the relation type further comprises at least one attribute pair corresponding to the relation type, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is the name of the attribute pair, and the second attribute is the value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attributes in the complex attribute pair comprise values of the complex attribute pair and at least one constraint on the complex attribute pair;
the system comprises a target text acquisition module, a target text acquisition module and a search module, wherein the target text acquisition module is used for acquiring an original research message file set and respectively preprocessing each original research message book in the original research message file set to obtain a target text, and the target text consists of a non-empty sentence sequence;
the entity relationship synchronous extraction module is used for simultaneously extracting a target triple and a target independent entity in the sentence sequence by adopting an entity relationship synchronous extraction model, wherein the target triple is a target first triple or an initial second triple;
the entity relation synchronous extraction model comprises a sentence sequence coding module, a subtask feature selection module and a subtask target information prediction module;
the sentence sequence coding module codes the sentence sequence by adopting a general pre-training model based on a training set and a verification set of the labeled entity and the relationship information to obtain a target vector;
the subtask feature selection module is used for acquiring feature information corresponding to an entity extraction subtask and a relation prediction subtask respectively, and the entity extraction subtask is used for extracting a target entity fragment in the sentence sequence according to the target vector;
the subtask target information prediction module judges whether the type of the target entity fragment belongs to the target entity type or not based on the characteristic information of the entity extraction subtask, if so, the target entity fragment is reserved, and if not, the target entity fragment is discarded;
the subtask target information prediction module is also used for judging the relationship between the entity pairs based on the characteristic information of the relationship prediction subtask to obtain the characteristic representation of the target relationship, judging whether the type of the target relationship belongs to the target relationship type or not according to the characteristic representation of the target relationship, if so, keeping the target relationship, and if not, discarding the target relationship;
obtaining a target triple according to the target entity fragment and a target relation corresponding to the target entity fragment, wherein the target entity fragment without the corresponding relation is target independent entity information;
the index attribute extraction module is used for adopting an index attribute extraction model, the index attribute extraction model is used for extracting a target attribute pair in a sentence sequence containing index description, and the target attribute pair comprises a target first attribute and a target second attribute;
the attribute-relationship alignment module is configured to, for a sentence sequence including an attribute pair, perform matching alignment on the obtained one or more target attribute pairs and the initial second triple to obtain a target second triple, where the target second triple includes the initial second triple and one or more target attribute pairs corresponding to the initial second triple;
a target industry chain map spectrum obtaining module, configured to add the target first triple and the target second triple to a target industry chain map.
7. A terminal, characterized in that the terminal comprises: a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions, the processor adapted to invoke the instructions in the storage medium to perform the steps of implementing the method for automatically constructing an industry chain atlas from a research report of any of claims 1-5.
8. A storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of the method for automatically constructing an industry chain graph from a newspaper as recited in any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211325252.2A CN115391569B (en) | 2022-10-27 | 2022-10-27 | Method for automatically constructing industry chain map from research report and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211325252.2A CN115391569B (en) | 2022-10-27 | 2022-10-27 | Method for automatically constructing industry chain map from research report and related equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115391569A CN115391569A (en) | 2022-11-25 |
CN115391569B true CN115391569B (en) | 2023-03-24 |
Family
ID=84129424
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211325252.2A Active CN115391569B (en) | 2022-10-27 | 2022-10-27 | Method for automatically constructing industry chain map from research report and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115391569B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108363816A (en) * | 2018-03-21 | 2018-08-03 | 北京理工大学 | Open entity relation extraction method based on sentence justice structural model |
CN109165385A (en) * | 2018-08-29 | 2019-01-08 | 中国人民解放军国防科技大学 | Multi-triple extraction method based on entity relationship joint extraction model |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109255034A (en) * | 2018-08-08 | 2019-01-22 | 数据地平线(广州)科技有限公司 | A kind of domain knowledge map construction method based on industrial chain |
CN111967761B (en) * | 2020-08-14 | 2024-04-02 | 国网数字科技控股有限公司 | Knowledge graph-based monitoring and early warning method and device and electronic equipment |
CN113051365A (en) * | 2020-12-10 | 2021-06-29 | 深圳证券信息有限公司 | Industrial chain map construction method and related equipment |
CN112883197B (en) * | 2021-02-08 | 2023-02-07 | 广东电网有限责任公司广州供电局 | Knowledge graph construction method and system for closed switch equipment |
CN112860916B (en) * | 2021-03-09 | 2022-09-16 | 齐鲁工业大学 | Movie-television-oriented multi-level knowledge map generation method |
CN113139068B (en) * | 2021-05-10 | 2023-05-09 | 内蒙古工业大学 | Knowledge graph construction method and device, electronic equipment and storage medium |
CN114219089B (en) * | 2021-11-11 | 2022-07-22 | 山东人才发展集团信息技术有限公司 | Construction method and equipment of new-generation information technology industry knowledge graph |
CN115017322A (en) * | 2022-02-17 | 2022-09-06 | 甘肃农业大学 | Ontology-based potato industry chain knowledge graph construction method |
-
2022
- 2022-10-27 CN CN202211325252.2A patent/CN115391569B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108363816A (en) * | 2018-03-21 | 2018-08-03 | 北京理工大学 | Open entity relation extraction method based on sentence justice structural model |
CN109165385A (en) * | 2018-08-29 | 2019-01-08 | 中国人民解放军国防科技大学 | Multi-triple extraction method based on entity relationship joint extraction model |
Also Published As
Publication number | Publication date |
---|---|
CN115391569A (en) | 2022-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gruzauskas et al. | Robotic process automation for document processing: A case study of a logistics service provider | |
CN111651552A (en) | Structured information determination method and device and electronic equipment | |
CN115203403A (en) | Text sorting model based on network public sentiment | |
CN114818718A (en) | Contract text recognition method and device | |
CN118194842A (en) | Intelligent document identification method and device, electronic equipment and storage medium | |
CN116150367A (en) | Emotion analysis method and system based on aspects | |
CN113902569A (en) | Method for identifying the proportion of green assets in digital assets and related products | |
US11461616B2 (en) | Method and system for analyzing documents | |
CN115391569B (en) | Method for automatically constructing industry chain map from research report and related equipment | |
GV et al. | Document Classification and Information Extraction framework for Insurance Applications | |
CN114548325B (en) | Zero sample relation extraction method and system based on dual contrast learning | |
CN113779218B (en) | Question-answer pair construction method, question-answer pair construction device, computer equipment and storage medium | |
CN111046934B (en) | SWIFT message soft clause recognition method and device | |
CN114495138A (en) | Intelligent document identification and feature extraction method, device platform and storage medium | |
CN113627189A (en) | Entity identification information extraction, storage and display method for insurance clauses | |
CN113094447A (en) | Structured information extraction method oriented to financial statement image | |
Arafat et al. | Hydrating large-scale coronavirus pandemic tweets: A review of software for transportation research | |
Thiée et al. | Extraction of Information from Invoices–Challenges in the Extraction Pipeline | |
Kumar et al. | AI Enabled Invoice Management Application | |
Tan et al. | Information Extraction System for Cargo Invoices | |
CN112651246B (en) | Service demand conflict detection method integrating deep learning and workflow modes | |
CN114936563B (en) | Event extraction method, device and storage medium | |
Chiu et al. | Use text mining for financial reports analysis: long text to image converter | |
Heidenreich et al. | Large Language Models for Page Stream Segmentation | |
CN116402041A (en) | Contract element extraction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |