CN115391569B - Method for automatically constructing industry chain map from research report and related equipment - Google Patents

Method for automatically constructing industry chain map from research report and related equipment Download PDF

Info

Publication number
CN115391569B
CN115391569B CN202211325252.2A CN202211325252A CN115391569B CN 115391569 B CN115391569 B CN 115391569B CN 202211325252 A CN202211325252 A CN 202211325252A CN 115391569 B CN115391569 B CN 115391569B
Authority
CN
China
Prior art keywords
target
attribute
triple
entity
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211325252.2A
Other languages
Chinese (zh)
Other versions
CN115391569A (en
Inventor
陈清财
杨新兰
李东方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Original Assignee
Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology filed Critical Harbin Institute Of Technology shenzhen Shenzhen Institute Of Science And Technology Innovation Harbin Institute Of Technology
Priority to CN202211325252.2A priority Critical patent/CN115391569B/en
Publication of CN115391569A publication Critical patent/CN115391569A/en
Application granted granted Critical
Publication of CN115391569B publication Critical patent/CN115391569B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Manufacturing & Machinery (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for automatically constructing an industrial chain map from research and report and related equipment. The method comprises the following steps: loading a research and report oriented industrial chain chart mode; acquiring an original research message file set, and respectively preprocessing each original research message book in the original research message set to obtain a target text; simultaneously extracting target triples and target independent entities in a sentence sequence by adopting an entity relation synchronous extraction model; extracting a target attribute pair in a sentence sequence containing index description by adopting an index attribute extraction model; matching and aligning the obtained one or more target attribute pairs with the initial second triple to obtain a target second triple; adding the target first triple and the target second triple to the target industry chain atlas. The method for automatically constructing the industrial chain map from the research report can effectively meet the requirement of automatically constructing the large-scale industrial chain map through the research report under the complex situation, and reduces the manpower loss and the time cost.

Description

Method for automatically constructing industry chain map from research report and related equipment
Technical Field
The invention relates to the technical field of word processing, in particular to a method for automatically constructing an industrial chain map from research and report and related equipment.
Background
With the development of financial technology and the continuous expansion of the global capital market, the financial field generates a great amount of industry information data every day, wherein abundant valuable information is contained. The knowledge graph describes and stores knowledge contained in data in a structured form, can express information of the Internet into a form closer to human cognition, has strong capacity of organizing, managing and understanding mass information, utilizes the graph to carry out incidence relation mining and reasoning analysis, and has wide application in academia and industry. The industry chain map is based on industry chain data of industry subdivision products, and can better describe the upstream and downstream relationship, the product hierarchical relationship, the main and operation relationship between a company and a product, and the relationship between related economic indexes and the company, the product and the industry. The industrial chain map can provide an accurate and instant solution for a client, is beneficial to relevant personnel to capture the internal dynamic state of the industry, and brings certain economic benefit for enterprises. Reasoning is carried out along the industrial chain map, potential accident risks and investment business opportunities can be found, and then people are assisted in making intelligent investment decisions, so that actual financial business scenes such as investment, wind control, investment and marketing services are enabled.
However, the current financial field still lacks a large-scale, open-source industry chain panoramic knowledge map. The research on the construction of the industrial chain map in the systematic exposition field is relatively deficient, and most of the researches fail to effectively focus on the complex index attributes of the relationship. The traditional manual-based key information extraction method cannot meet the requirement of rapidly processing massive information, and has high labor cost and time consumption.
Thus, there is a need for improvements and enhancements in the art.
Disclosure of Invention
Aiming at the defects in the prior art, a method for automatically constructing an industry chain map from research and report and related equipment are provided, and the method and the related equipment aim to solve the problems that systematic methods for automatically constructing the industry chain map in the prior art are few and complex index attributes of relationships cannot be effectively concerned.
In a first aspect of the present invention, there is provided a method for automatically constructing an industry chain map from a research report, comprising:
loading a research-oriented industrial chain graph spectrum mode containing a target entity type, a target relationship type and a target attribute type, predefining entity type information needing to be extracted and triple type information needing to be extracted in the industrial chain graph spectrum mode, wherein the triples are first triples or second triples, the first triples and the second triples are of a structure of head entity type-relationship type-tail entity type, in the second triples, the relationship type further comprises at least one attribute pair corresponding to the relationship type, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is the name of the attribute pair, and the second attribute is the value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attributes in the complex attribute pair comprise values of the complex attribute pair and at least one constraint on the complex attribute pair;
acquiring an original research message file set, and respectively preprocessing each original research message book in the original research message set to obtain a target text, wherein the target text consists of a non-empty sentence sequence;
simultaneously extracting a target triple and a target independent entity in the sentence sequence by adopting an entity relation synchronous extraction model, wherein the target triple is a target first triple or an initial second triple;
extracting a target attribute pair in a sentence sequence containing index description by adopting an index attribute extraction model, wherein the target attribute pair comprises a target first attribute and a target second attribute;
for a sentence sequence containing attribute pairs, matching and aligning the obtained one or more target attribute pairs with the initial second triple to obtain a target second triple, wherein the target second triple contains the initial second triple and one or more target attribute pairs corresponding to the initial second triple;
adding the target first triple and the target second triple to a target industry chain graph.
The method for automatically constructing an industry chain map from research reports, wherein the preprocessing of each original research report in the original research report set comprises:
performing text recognition on the original text book by an optical character recognition technology to obtain a first text which is convenient to read and write;
performing text cleaning on the first text, and removing noise characters in the first text to obtain a second text, wherein the noise characters are characters without actual description effect on a real text;
and carrying out sentence segmentation processing on the second word text, and dividing the second word text into a non-empty sentence sequence to obtain the target text.
The method for automatically constructing the industrial chain map from the research and the report is characterized in that the entity relationship synchronous extraction model comprises a sentence sequence coding module, a subtask feature selection module and a subtask target information prediction module;
the sentence sequence coding module codes the sentence sequence by adopting a general pre-training model based on a training set and a verification set of the labeled entity and the relationship information to obtain a target vector;
the subtask feature selection module is used for acquiring feature information corresponding to an entity extraction subtask and a relation prediction subtask respectively, and the entity extraction subtask is used for extracting a target entity fragment in the sentence sequence according to the target vector;
the subtask target information prediction module judges whether the type of the target entity fragment belongs to the target entity type or not based on the characteristic information of the entity extraction subtask, if so, the target entity fragment is reserved, and if not, the target entity fragment is discarded;
the subtask target information prediction module is also used for judging the relationship between the entity pairs based on the characteristic information of the relationship prediction subtask to obtain the characteristic representation of the target relationship, judging whether the type of the target relationship belongs to the target relationship type or not according to the characteristic representation of the target relationship, if so, keeping the target relationship, and if not, discarding the target relationship;
and obtaining a target triple according to the target entity fragment and the target relation corresponding to the target entity fragment, wherein the target entity fragment without the corresponding relation is the target independent entity information.
The method for automatically constructing the industry chain map from the research and the report, wherein the extracting of the target attribute pair in the sentence sequence containing the index description comprises the following steps:
judging whether the sentence sequence contains indexes, if so, extracting a target attribute pair in the sentence sequence by adopting the index attribute extraction model;
the target attribute pair is a simple attribute pair or a complex attribute pair.
The method for automatically constructing an industry chain atlas from a research report, wherein the matching and aligning the obtained one or more target attribute pairs with the initial second triplet includes:
and matching and aligning the target second attribute in the obtained target attribute pair with the corresponding initial second triple, wherein a relationship between part of the target second attribute and the corresponding initial second triple is aligned, and values corresponding to the other part of the target second attribute are matched and aligned with a head entity or a tail entity in the triple, so that a target second triple is obtained, and the target second triple comprises the initial second triple and attribute information corresponding to the initial second triple.
The method for automatically constructing the industry chain map from the research and report is characterized in that the list of the target entity types is dynamically adjusted according to the target text and the target task scene requirements;
the list of the target relation types is dynamically adjusted according to the target entity types and the target texts;
and the list of the target attribute types is dynamically adjusted according to the target attribute types and the target text.
In a second aspect of the present invention, there is provided an apparatus for automatically constructing an industry chain map from a research report, comprising:
the system comprises an industry chain diagram spectrum pattern loading module, a relation model loading module and a relation model loading module, wherein the industry chain diagram spectrum pattern loading module is used for loading a research-oriented industry chain diagram spectrum pattern containing a target entity type, a target relation type and a target attribute type, entity type information needing to be extracted and triple type information needing to be extracted are predefined in the industry chain diagram spectrum pattern, the triple is a first triple or a second triple, the first triple and the second triple are in a structure of head entity type-relation type-tail entity type, in the second triple, the relation type further comprises at least one attribute pair corresponding to the relation type, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is the name of the attribute pair, and the second attribute is the value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attributes in the complex attribute pair comprise values of the complex attribute pair and at least one constraint on the complex attribute pair;
the system comprises a target text acquisition module, a target text acquisition module and a search module, wherein the target text acquisition module is used for acquiring an original research message book set and respectively preprocessing each original research message book in the original research message set to obtain a target text, and the target text consists of a non-empty sentence sequence;
the entity relationship synchronous extraction module is used for simultaneously extracting a target triple and a target independent entity in the sentence sequence by adopting an entity relationship synchronous extraction model, wherein the target triple is a target first triple or an initial second triple;
the index attribute extraction module is used for adopting an index attribute extraction model, the index attribute extraction model is used for extracting a target attribute pair in a sentence sequence containing index description, and the target attribute pair comprises a target first attribute and a target second attribute;
the attribute-relationship alignment module is configured to, for a sentence sequence including an attribute pair, perform matching alignment on the obtained one or more target attribute pairs and the initial second triple to obtain a target second triple, where the target second triple includes the initial second triple and one or more target attribute pairs corresponding to the initial second triple;
a target industry chain map spectrum obtaining module, configured to add the target first triple and the target second triple to a target industry chain map.
In a third aspect of the present invention, a terminal is provided, which includes: the system comprises a processor and a storage medium which is in communication connection with the processor, wherein the storage medium is suitable for storing a plurality of instructions, and the processor is suitable for calling the instructions in the storage medium to execute the steps of realizing the method for automatically constructing the industry chain map from the research report.
In a fourth aspect of the present invention, there is provided a storage medium storing one or more programs executable by one or more processors to implement the steps of any one of the above methods for automatically constructing an industry chain graph from a research report.
Has the advantages that: the method comprises the steps of loading a research-report-oriented industrial chain spectrum mode containing a target entity type, a target relationship type and a target attribute type, then respectively preprocessing each original research report in an original research report set to obtain a target text, simultaneously extracting a target triple and a target independent entity in a sentence sequence by using an entity relationship synchronous extraction model, wherein the target triple is a target first triple or an initial second triple, extracting a target attribute pair in the sentence sequence containing index description by using an index attribute extraction model, the target attribute pair comprises a target first attribute and a target second attribute, for the sentence sequence containing the attribute pair, matching one or more obtained target attribute pairs with the initial second triple to obtain a target second triple, and the target second triple comprises the initial second attribute and the initial second triple or a plurality of obtained target attribute pairs are added to the target triple and the target second triple corresponding to the industrial chain spectrum. The method for automatically constructing the industrial chain atlas from the research report can effectively meet the requirement of automatically constructing the large-scale industrial chain atlas from the research report under the complex situation, effectively pay attention to the complex index attribute of the relation, meet the extraction requirement of the entity relation and the related attribute by using a more accurate and efficient model, and reduce the manpower loss and the time cost.
Drawings
FIG. 1 is a flow chart of an embodiment of a method for automatically constructing an industry chain graph from a research report provided by the present invention;
FIG. 2 is a flowchart illustrating a preprocessing process of an original research report according to an embodiment of the method for automatically constructing an industry chain map from the research report of the present invention;
FIG. 3 is a schematic structural diagram of an entity relationship synchronous extraction model in an embodiment of a method for automatically constructing an industry chain graph from a research report according to the present invention;
FIG. 4 is a flowchart illustrating the extraction of index attributes according to an embodiment of the method for automatically constructing an industry chain graph from a research report;
FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for automatically constructing an industry chain map from a research report according to the present invention;
fig. 6 is a schematic structural diagram of an embodiment of a terminal provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The method for automatically constructing the industry chain map from the research report can be applied to a terminal with computing power, and the terminal can execute the task of extracting the target first triple and the target second triple in the original research report file set and constructing the target industry chain map by the method for automatically constructing the industry chain map from the research report provided by the invention.
Example one
In this embodiment, a method for automatically constructing an industry chain map from a survey is provided. As shown in fig. 1, the method for automatically constructing an industry chain map from a research report provided by the present invention comprises the steps of:
s100, loading a research-and-report-oriented industrial chain diagram spectrum mode containing a target entity type, a target relationship type and a target attribute type, predefining entity type information to be extracted and triple type information to be extracted in the industrial chain diagram spectrum mode, wherein the triplets are first triplets or second triplets, the first triplets and the second triplets are of a structure of head entity type-relationship type-tail entity type, in the second triplets, the relationship types further comprise at least one attribute pair corresponding to the relationship types, the attribute pairs are simple attribute pairs or complex attribute pairs, the simple attribute pairs comprise a first attribute and a second attribute, the first attribute is the name of the attribute pair, and the second attribute is the value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attributes in the complex attribute pair comprise values of the complex attribute pair and at least one constraint on the complex attribute pair.
Specifically, before processing a research report book, a research report-oriented industry chain chart pattern containing entity, relationship, attribute types and definitions needs to be loaded first. Besides simple definition between entity relations, the triple overlapping situation is covered, and at the same time, necessary relation attributes are defined to describe a large amount of index data in the research text.
The loading of the industry chain diagram mode which is oriented to the research and the report and contains a target entity type, a target relation type and a target attribute type comprises the following steps:
and loading the predefined target entity type according to the target task scene requirement. And the list of the target entity types is dynamically adjusted according to the target text and the target task scene requirement. Specifically, according to a target task scenario requirement, based on analysis of the content of the research report, the predefined target entity types are loaded, specifically including but not limited to companies, characters, brands, products, industries, regions, services, risk events, and the like, and the list of the target entity types is dynamically adjusted according to the target text and the target task scenario requirement.
And loading the predefined target relationship types among the target entity types according to the requirements of target task scenes, wherein the list of the target relationship types is dynamically adjusted according to the target entity types and the target texts. Specifically, according to the requirements of a target task scene, the target relationship types among the predefined target entity types are loaded, including but not limited to the upstream and downstream relationships among industries and businesses, the production and sales relationships between companies and products, and the like, and the list of the target relationship types is dynamically adjusted according to the target entity types and the target text.
And loading the predefined target attribute type according to the requirements of a target task scene, wherein the target attribute type comprises a first attribute and a second attribute, the first attribute is a specific name of the index corresponding to the second triple, and the second attribute is a value of the index corresponding to the second triple and other descriptions for restraining the index corresponding to the second triple. Wherein the list of target attribute types is dynamically adjusted according to the target attribute types and the target text. Specifically, according to the requirements of a target task scene, loading predefined target attribute types, wherein the target attributes refer to index data in a research and report text, that is, the index data in the research and report text is used for describing attributes shared by the entities and the relationships between the entities. The target attribute is divided into the first attribute and the second attribute, wherein a specific name of an index corresponds to the first attribute, the first attribute is a specific name of an index corresponding to the second triple, and a value of the index corresponding to the second triple is the second attribute. Since the first attribute only contains the specific name of the index corresponding to the second triple, the target relationship attribute is divided into the first attribute and the second attribute, and the situation that one sentence contains a plurality of indexes can be effectively described.
In the industrial chain diagram spectrum mode, entity type information to be extracted and type information of a triple to be extracted are further predefined, the triple is a first triple or a second triple, the first triple and the second triple are both relation triples with a structure of 'head entity type-relation type-tail entity type', the first triple is a simple triple and does not contain attribute information, the second triple is a complex triple, the relation type further comprises at least one attribute pair corresponding to the relation type in the second triple, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is a name of the attribute pair, and the second attribute is a value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attribute in the complex attribute pair comprises a value of the complex attribute pair and at least one constraint on the complex attribute pair
And after loading the industrial chain chart pattern, acquiring an original research message file set, and preprocessing the original research message file set.
S200, acquiring an original research message file set, and respectively preprocessing each original research message book in the original research message set to obtain a target text, wherein the target text consists of a non-empty sentence sequence.
Referring to fig. 2, the preprocessing each original research report in the original research report set includes:
s210, performing text recognition on the original research text through an optical character recognition technology to obtain a first text convenient to read and write.
In this embodiment, the original research and report text is an industry research and report document, and the original research and report text is converted into the first text convenient for reading and writing by an Optical Character Recognition (OCR) technology.
S220, text cleaning is carried out on the first text, noise characters in the first text are removed, and a second text is obtained, wherein the noise characters are characters without actual description effect on a real text.
Further, text cleaning is carried out on the first text, redundant spaces, special identifiers and more than 6 continuous solid point numbers in the first text are removed in a unified mode, and the second text is obtained.
And S230, performing sentence segmentation processing on the second text, and dividing the second text into non-empty sentence sequences to obtain the target text.
The principle of sentence segmentation processing is to ensure that entities contained in sentences are not segmented as far as possible. First, common sentence separators are used, including but not limited to. ","! "," \8230; "8230;", "; "etc., dividing the first textual text into a sequence of sentences. And for the long sentence with more than 512 characters after division, performing secondary segmentation by using the terms of "" and the like on the basis of following the sentence division principle to obtain the target text.
The method for automatically constructing the industry chain map from the research and report further comprises the following steps:
s300, simultaneously extracting a target triple and a target independent entity in the sentence sequence by adopting an entity relation synchronous extraction model, wherein the target triple is a target first triple or an initial second triple.
Referring to fig. 3, the entity relationship synchronous extraction model includes a sentence sequence encoding module, a subtask feature selection module and a subtask target information prediction module;
s310, the sentence sequence coding module codes the sentence sequence by adopting a general pre-training model based on a training set and a verification set of the labeled entity and the relationship information to obtain a target vector.
Specifically, a training set verification set is manually marked, a universal pre-training model is input, the universal pre-training model is finely adjusted based on the training set and the verification set, a sentence sequence coding model suitable for the method for automatically constructing the industry chain map from the research and the report is obtained, and the sentence sequence in the target text is coded based on the finely adjusted sentence sequence coding model, so that a target vector is obtained.
And S320, the subtask feature selection module is used for acquiring feature information corresponding to an entity extraction subtask and a relation prediction subtask respectively, wherein the entity extraction subtask is used for extracting a target entity fragment in the sentence sequence according to the target vector.
And capturing the respective feature information of the entity extraction subtask and the relationship prediction subtask according to the obtained target vector, and calculating the shared feature information between the entity extraction subtask and the relationship prediction subtask, thereby realizing the feature division of the tasks. Wherein. And the entity extraction subtask is used for extracting a target entity fragment in the sentence sequence according to the target vector.
According to the characteristic information shared between the entity extraction subtask and the relation prediction subtask and the characteristic information specific to the subtask, the characteristics between the entity extraction subtask and the relation prediction subtask are recombined, so that new characteristic information of each subtask is obtained, bidirectional information interaction between subtasks can be promoted, and the interference of redundant characteristics is avoided.
Through the characteristic selection and recombination mechanism, the bidirectional information interaction between the entity extraction subtasks and the relation prediction subtasks is promoted, the influence on precision and efficiency caused by error transmission and redundant calculation is relieved, and meanwhile, nested entities and complex extraction scenes of single entity overlapping and entity pair overlapping in the triple overlapping problem can be effectively dealt with.
S330, the subtask target information prediction module judges whether the type of the target entity fragment belongs to the target entity type based on the characteristic information of the entity extraction subtask, if so, the target entity fragment is reserved, and if not, the target entity fragment is discarded.
Specifically, the subtask target information is based on the entity extraction subtask feature information, a target entity segment in the sentence sequence is extracted according to the target vector, features of a start position and an end position of a connection character level and sentence level features are obtained in the sentence sequence, a target entity segment and feature representations of the target entity segment are obtained, and whether the target entity segment belongs to an entity with a type k is predicted according to the feature representations of the target entity segment, where a value range of k is the target entity type corpus predefined in the industrial chain graph mode in this embodiment.
S340, the subtask target information prediction module further judges the relationship between the entity pairs based on the feature information of the relationship prediction subtask to obtain the feature representation of the target relationship, judges whether the type of the target relationship belongs to the target relationship type according to the feature representation of the target relationship, if so, retains the target relationship, and if not, discards the target relationship.
Specifically, the subtask target information prediction module judges the relationship between the entity pairs based on the feature information of the relationship prediction subtask, and refines the judgment of the relationship between the entity pairs into the type judgment between the corresponding start positions and end positions of the head entity and the tail entity. Taking a start position as an example, taking respective character level features of head and tail entities, connecting sentence level features, judging a relationship between the entities according to a target relationship prediction feature to obtain a feature representation of a target relationship, and predicting whether the target relationship belongs to a relationship with a type l according to the feature representation of the target relationship, wherein a value range of l is the target relationship type corpus predefined in the industrial chain graph spectrum mode in this embodiment. The same applies to the calculation of the type of relationship between the end positions of the entity pairs.
S350, obtaining a target triple according to the target entity fragment and the corresponding target relation thereof, wherein the target entity fragment without the corresponding relation is the target independent entity information
Combining the target entity fragments and the corresponding target relationships thereof, and combining the target entity fragments with the corresponding relationships into a triple, wherein the triple is a relationship triple with a structure of a head entity-relationship-tail entity, and the target entity fragments without the corresponding relationships are the independent entity information, and the target triple is a target first triple or an initial second triple.
Referring again to fig. 1, the method for automatically constructing an industry chain map from a research report further comprises the steps of:
s400, extracting a target attribute pair in the sentence sequence containing the index description by adopting an index attribute extraction model, wherein the target attribute pair comprises a target first attribute and a target second attribute.
Referring to fig. 4, the extracting a target attribute pair in a sentence sequence containing an index description includes:
s410, judging whether the sentence sequence contains indexes, if so, extracting a target attribute pair in the sentence sequence by adopting the index attribute extraction model;
and S420, the target attribute pair is a simple attribute pair or a complex attribute pair.
Specifically, whether the sentence sequence contains indexes or not is judged through a text classification model, if yes, a target attribute pair in the sentence sequence is extracted through the index attribute extraction model, the target attribute pair is one or more, the target attribute pair is a target simple attribute pair or a target complex attribute pair, the target simple attribute pair comprises a target first attribute and a target second attribute, the target first attribute is the name of the index corresponding to the target attribute pair, and the target second attribute is the value of the index corresponding to the target attribute pair; the target complex attribute pair comprises a target first attribute and a plurality of target second attributes, and the target second attributes in the target complex attribute pair comprise values of the target complex attribute pair and at least one constraint on the target complex attribute pair.
And S500, for a sentence sequence containing attribute pairs, matching and aligning the obtained one or more target attribute pairs with the initial second triple to obtain a target second triple, wherein the target second triple contains the initial second triple and one or more target attribute pairs corresponding to the initial second triple.
The matching and aligning the obtained one or more target attribute pairs with the initial second triple includes:
and matching and aligning the target second attribute in the obtained target attribute pair with the corresponding initial second triple, wherein a relationship between part of the target second attribute and the corresponding initial second triple is aligned, and values corresponding to the other part of the target second attribute are matched and aligned with a head entity or a tail entity in the triple, so that a target second triple is obtained, and the target second triple comprises the initial second triple and attribute information corresponding to the initial second triple.
For the complex sentence sequence containing the index description, the initial second triple and a corresponding target attribute pair thereof are respectively obtained through the entity relationship synchronous extraction model and the index attribute extraction model, the target attribute pair comprises the target first attribute and the target second attribute, and partial attributes in the target second attribute are matched with head and tail entities in the triples by matching and aligning the target first attribute and the target second attribute with the triples, so that the alignment between the attributes and the relationships is completed, the information expression of the initial second triple is perfected, and the target second triple is obtained.
S600, adding the target first triple and the target second triple to a target industry chain map.
And constructing a target industry chain map according to the obtained target first triple and the target second triple, adding the target first triple and the target second triple into the industry chain map, constructing a complex situation covering triple overlapping except simple entity relation definition, and defining the target industry chain map with necessary relation attributes for describing a large amount of index data in a research and report text.
The extracted target independent entity can facilitate subsequent reasoning evolution, and when more research reports are added to jointly construct the target industry chain map, more related features can be extracted from the newly added research reports more quickly, so that the subsequent reasoning evolution is facilitated.
The embodiment provides a method for automatically constructing an industrial chain map from research and report, which can automatically convert natural language long text description containing industrial chain knowledge into entities with attributes and relationship links in the map. The embodiment provides a method for automatically constructing an industrial chain atlas from research and report, which uses an entity relationship extraction model, promotes the bidirectional information interaction between tasks, alleviates the influence on precision and efficiency caused by error transmission and redundant computation, and can effectively cope with nested entities and complex extraction scenes of single entity overlapping and entity pair overlapping in the triple overlapping problem. In addition, the attribute extraction can dig out beneficial information contained in a large amount of index data in the research and report text. Further, by aligning the index attributes to the corresponding relationships, a target industrial chain map composed of the target triple represented by the more complete information and the target independent entity information is finally obtained.
In summary, this embodiment provides a method for automatically constructing an industry chain map from a research report, where after a research report-oriented industry chain map pattern including a target entity type, a target relationship type, and a target attribute type is loaded, an original research message document set is used to respectively preprocess each original research message in the original research report set to obtain a target text, then an entity relationship synchronous extraction model is used to simultaneously extract a target triple and a target independent entity in the sentence sequence, where the target triple is a target first triple or an initial second triple, and then an index attribute extraction model is used to extract a target attribute pair in the sentence sequence including an index description, where the target attribute pair includes the target first attribute and the target second attribute, and for the sentence sequence including the attribute pair, the obtained one or more target attribute pairs are aligned with the initial second triple to obtain a target second triple, where the target second triple includes the initial second triple and one or more target attribute pairs corresponding to the initial second triple, and finally the target first triple and the target second triple are added to the industry chain map. The method for automatically constructing the industrial chain atlas from the research report can effectively meet the requirement of automatically constructing the large-scale industrial chain atlas from the research report under the complex situation, effectively pay attention to the complex index attribute of the relation, meet the extraction requirement of the entity relation and the related attribute by using a more accurate and efficient model, and reduce the manpower loss and the time cost.
It should be understood that, although the steps in the flowcharts shown in the drawings of the present specification are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps of the present invention are not limited to being performed in the exact order disclosed, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps of the present invention may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct Rambus Dynamic RAM (DRDRAM), and Rambus Dynamic RAM (RDRAM), among others.
Example two
Based on the above embodiment, the present invention further provides an apparatus for automatically constructing an industry chain map from a research report, a schematic diagram of functional modules of the apparatus is shown in fig. 5, and the apparatus for automatically constructing an industry chain map from a research report includes:
the system comprises an industry chain diagram spectrum pattern loading module, a relation model loading module and a relation model loading module, wherein the industry chain diagram spectrum pattern loading module is used for loading a research-oriented industry chain diagram spectrum pattern containing a target entity type, a target relation type and a target attribute type, entity type information needing to be extracted and triple type information needing to be extracted are predefined in the industry chain diagram spectrum pattern, the triple is a first triple or a second triple, the first triple and the second triple are in a structure of head entity type-relation type-tail entity type, in the second triple, the relation type further comprises at least one attribute pair corresponding to the relation type, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is the name of the attribute pair, and the second attribute is the value of the attribute pair; the complex attribute pair includes a first attribute and a plurality of second attributes, and the second attribute in the complex attribute pair includes a value of the complex attribute pair and at least one constraint on the complex attribute pair, which is specifically described in embodiment one;
a target text acquisition module, configured to acquire an original research report set, and respectively pre-process each original research report in the original research report set to obtain a target text, where the target text is composed of a non-empty sentence sequence, and is specifically described in embodiment one;
an entity relationship synchronous extraction module, configured to extract a target triple and a target independent entity in the sentence sequence simultaneously by using an entity relationship synchronous extraction model, where the target triple is a target first triple or an initial second triple, and the target triple is specifically as described in embodiment one;
an index attribute extraction module, configured to employ an index attribute extraction model, where the index attribute extraction model is configured to extract a target attribute pair in a sentence sequence containing an index description, where the target attribute pair includes a target first attribute and a target second attribute, and is specifically described in embodiment one;
an attribute-relationship alignment module, configured to, for a sentence sequence including an attribute pair, perform matching alignment on the obtained one or more target attribute pairs and the initial second triple to obtain a target second triple, where the target second triple includes the initial second triple and one or more target attribute pairs corresponding to the initial second triple, and the specific example is as described in embodiment one;
a target industry chain graph spectrum obtaining module, configured to add the target first triple and the target second triple to a target industry chain graph, as described in embodiment one.
EXAMPLE III
Based on the method for automatically constructing the industry chain map from the research report in the first embodiment, the invention also provides a terminal, and a schematic block diagram of the terminal can be shown in fig. 6. The terminal comprises a memory 10 and a processor 20, wherein the memory 10 stores a program for automatically constructing an industry chain map from a research report, and the processor 10 executes a computer program to realize at least the following steps:
loading a research-oriented industrial chain graph spectrum mode containing a target entity type, a target relationship type and a target attribute type, predefining entity type information needing to be extracted and triple type information needing to be extracted in the industrial chain graph spectrum mode, wherein the triples are first triples or second triples, the first triples and the second triples are of a structure of head entity type-relationship type-tail entity type, in the second triples, the relationship type further comprises at least one attribute pair corresponding to the relationship type, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is the name of the attribute pair, and the second attribute is the value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attributes in the complex attribute pair comprise values of the complex attribute pair and at least one constraint on the complex attribute pair;
acquiring an original research message file set, and respectively preprocessing each original research message book in the original research message set to obtain a target text, wherein the target text consists of a non-empty sentence sequence;
simultaneously extracting a target triple and a target independent entity in the sentence sequence by adopting an entity relation synchronous extraction model, wherein the target triple is a target first triple or an initial second triple;
extracting a target attribute pair in a sentence sequence containing index description by adopting an index attribute extraction model, wherein the target attribute pair comprises a target first attribute and a target second attribute;
for a sentence sequence containing attribute pairs, matching and aligning the obtained one or more target attribute pairs with the initial second triple to obtain a target second triple, wherein the target second triple contains the initial second triple and one or more target attribute pairs corresponding to the initial second triple;
adding the target first triple and the target second triple to a target industry chain graph.
Wherein the preprocessing each original research report in the original research report set includes:
performing text recognition on the original text book by an optical character recognition technology to obtain a first text which is convenient to read and write;
performing text cleaning on the first text, and removing noise characters in the first text to obtain a second text, wherein the noise characters are characters without actual description effect on a real text;
and carrying out sentence segmentation processing on the second word text, and dividing the second word text into a non-empty sentence sequence to obtain the target text.
The entity relationship synchronous extraction model comprises a sentence sequence coding module, a subtask feature selection module and a subtask target information prediction module;
the sentence sequence coding module codes the sentence sequence by adopting a general pre-training model based on a training set and a verification set of the labeled entity and the relationship information to obtain a target vector;
the subtask feature selection module is used for acquiring feature information corresponding to an entity extraction subtask and a relation prediction subtask respectively, and the entity extraction subtask is used for extracting a target entity fragment in the sentence sequence according to the target vector;
the subtask target information prediction module judges whether the type of the target entity fragment belongs to the target entity type or not based on the characteristic information of the entity extraction subtask, if so, the target entity fragment is reserved, and if not, the target entity fragment is discarded;
the subtask target information prediction module is also used for judging the relationship between the entity pairs based on the characteristic information of the relationship prediction subtask to obtain the characteristic representation of the target relationship, judging whether the type of the target relationship belongs to the target relationship type or not according to the characteristic representation of the target relationship, if so, keeping the target relationship, and if not, discarding the target relationship;
and obtaining a target triple according to the target entity fragment and the target relation corresponding to the target entity fragment, wherein the target entity fragment without the corresponding relation is the target independent entity information.
The extracting of the target attribute pair in the sentence sequence containing the index description comprises the following steps:
judging whether the sentence sequence contains indexes, if so, extracting a target attribute pair in the sentence sequence by adopting the index attribute extraction model;
the target attribute pair is a simple attribute pair or a complex attribute pair.
Wherein the matching and aligning the obtained one or more target attribute pairs with the initial second triple includes:
and matching and aligning the target second attribute in the obtained target attribute pair with the corresponding initial second triple, wherein a relationship between part of the target second attribute and the corresponding initial second triple is aligned, and values corresponding to the other part of the target second attribute are matched and aligned with a head entity or a tail entity in the triple, so that a target second triple is obtained, and the target second triple comprises the initial second triple and attribute information corresponding to the initial second triple.
The list of the target entity types is dynamically adjusted according to the target text and the target task scene requirements;
the list of the target relation types is dynamically adjusted according to the target entity types and the target texts;
and the list of the target attribute types is dynamically adjusted according to the target attribute types and the target text.
Example four
The present invention also provides a storage medium storing one or more programs executable by one or more processors to implement the steps of the method for automatically constructing an industry chain graph from a research report according to the above-described embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A method for automatically constructing an industry chain map from a research report, comprising:
loading a research-oriented industry chain diagram spectrum mode containing a target entity type, a target relationship type and a target attribute type, predefining entity type information to be extracted and triple type information to be extracted in the industry chain diagram spectrum mode, wherein the triple is a first triple or a second triple, the first triple and the second triple are of a structure of head entity type-relationship type-tail entity type, in the second triple, the relationship type further comprises at least one attribute pair corresponding to the relationship type, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is a name of the attribute pair, and the second attribute is a value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attributes in the complex attribute pair comprise values of the complex attribute pair and at least one constraint on the complex attribute pair;
acquiring an original research message file set, and respectively preprocessing each original research message book in the original research message file set to obtain a target text, wherein the target text consists of a non-empty sentence sequence;
simultaneously extracting a target triple and a target independent entity in the sentence sequence by adopting an entity relation synchronous extraction model, wherein the target triple is a target first triple or an initial second triple;
the entity relation synchronous extraction model comprises a sentence sequence coding module, a subtask feature selection module and a subtask target information prediction module;
the sentence sequence coding module codes the sentence sequence by adopting a general pre-training model based on a training set and a verification set of the labeled entity and the relationship information to obtain a target vector;
the subtask feature selection module is used for acquiring feature information corresponding to an entity extraction subtask and a relation prediction subtask respectively, and the entity extraction subtask is used for extracting a target entity fragment in the sentence sequence according to the target vector;
the subtask target information prediction module judges whether the type of the target entity fragment belongs to the target entity type or not based on the characteristic information of the entity extraction subtask, if so, the target entity fragment is reserved, and if not, the target entity fragment is discarded;
the subtask target information prediction module is also used for judging the relationship between the entity pairs based on the characteristic information of the relationship prediction subtask to obtain the characteristic representation of the target relationship, judging whether the type of the target relationship belongs to the target relationship type or not according to the characteristic representation of the target relationship, if so, keeping the target relationship, and if not, discarding the target relationship;
obtaining a target triple according to the target entity fragment and a target relation corresponding to the target entity fragment, wherein the target entity fragment without the corresponding relation is target independent entity information;
extracting a target attribute pair in a sentence sequence containing index description by adopting an index attribute extraction model, wherein the target attribute pair comprises a target first attribute and a target second attribute;
for a sentence sequence containing attribute pairs, matching and aligning the obtained one or more target attribute pairs with the initial second triple to obtain a target second triple, wherein the target second triple contains the initial second triple and one or more target attribute pairs corresponding to the initial second triple;
adding the target first triple and the target second triple to a target industry chain graph.
2. The method of automatically constructing an industry chain atlas from research reports of claim 1, wherein the preprocessing each original research report book in the set of original research report books comprises:
performing text recognition on the original text book by an optical character recognition technology to obtain a first text which is convenient to read and write;
performing text cleaning on the first text, and removing noise characters in the first text to obtain a second text, wherein the noise characters in the first text are characters without actual description on a real text;
and carrying out sentence segmentation processing on the second word text, and dividing the second word text into a non-empty sentence sequence to obtain the target text.
3. The method for automatically constructing an industry chain graph from research reports according to claim 1, wherein the extracting target attribute pairs in a sentence sequence containing index descriptions comprises:
judging whether the sentence sequence contains indexes, if so, extracting a target attribute pair in the sentence sequence by adopting the index attribute extraction model;
the target attribute pair is a simple attribute pair or a complex attribute pair.
4. The method for automatically constructing an industry chain atlas from a research report of claim 1, wherein the matching and aligning the obtained one or more target attribute pairs with the initial second triplet comprises:
and matching and aligning the target second attribute in the obtained target attribute pair with the corresponding initial second triple, wherein a relationship between part of the target second attribute and the corresponding initial second triple is aligned, and values corresponding to the other part of the target second attribute are matched and aligned with a head entity or a tail entity in the triple, so that a target second triple is obtained, and the target second triple comprises the initial second triple and attribute information corresponding to the initial second triple.
5. The method for automatically constructing an industry chain graph from a research report of claim 1, wherein the list of target entity types is dynamically adjusted according to the target text and the target task scenario requirements;
the list of the target relation types is dynamically adjusted according to the target entity types and the target texts;
and the list of the target attribute types is dynamically adjusted according to the target attribute types and the target text.
6. An apparatus for automatically constructing an industry chain atlas from a survey, the apparatus comprising:
the system comprises an industry chain diagram spectrum pattern loading module, a relation model loading module and a relation model loading module, wherein the industry chain diagram spectrum pattern loading module is used for loading a research-oriented industry chain diagram spectrum pattern containing a target entity type, a target relation type and a target attribute type, entity type information needing to be extracted and triple type information needing to be extracted are predefined in the industry chain diagram spectrum pattern, the triple is a first triple or a second triple, the first triple and the second triple are in a structure of head entity type-relation type-tail entity type, in the second triple, the relation type further comprises at least one attribute pair corresponding to the relation type, the attribute pair is a simple attribute pair or a complex attribute pair, the simple attribute pair comprises a first attribute and a second attribute, the first attribute is the name of the attribute pair, and the second attribute is the value of the attribute pair; the complex attribute pair comprises a first attribute and a plurality of second attributes, and the second attributes in the complex attribute pair comprise values of the complex attribute pair and at least one constraint on the complex attribute pair;
the system comprises a target text acquisition module, a target text acquisition module and a search module, wherein the target text acquisition module is used for acquiring an original research message file set and respectively preprocessing each original research message book in the original research message file set to obtain a target text, and the target text consists of a non-empty sentence sequence;
the entity relationship synchronous extraction module is used for simultaneously extracting a target triple and a target independent entity in the sentence sequence by adopting an entity relationship synchronous extraction model, wherein the target triple is a target first triple or an initial second triple;
the entity relation synchronous extraction model comprises a sentence sequence coding module, a subtask feature selection module and a subtask target information prediction module;
the sentence sequence coding module codes the sentence sequence by adopting a general pre-training model based on a training set and a verification set of the labeled entity and the relationship information to obtain a target vector;
the subtask feature selection module is used for acquiring feature information corresponding to an entity extraction subtask and a relation prediction subtask respectively, and the entity extraction subtask is used for extracting a target entity fragment in the sentence sequence according to the target vector;
the subtask target information prediction module judges whether the type of the target entity fragment belongs to the target entity type or not based on the characteristic information of the entity extraction subtask, if so, the target entity fragment is reserved, and if not, the target entity fragment is discarded;
the subtask target information prediction module is also used for judging the relationship between the entity pairs based on the characteristic information of the relationship prediction subtask to obtain the characteristic representation of the target relationship, judging whether the type of the target relationship belongs to the target relationship type or not according to the characteristic representation of the target relationship, if so, keeping the target relationship, and if not, discarding the target relationship;
obtaining a target triple according to the target entity fragment and a target relation corresponding to the target entity fragment, wherein the target entity fragment without the corresponding relation is target independent entity information;
the index attribute extraction module is used for adopting an index attribute extraction model, the index attribute extraction model is used for extracting a target attribute pair in a sentence sequence containing index description, and the target attribute pair comprises a target first attribute and a target second attribute;
the attribute-relationship alignment module is configured to, for a sentence sequence including an attribute pair, perform matching alignment on the obtained one or more target attribute pairs and the initial second triple to obtain a target second triple, where the target second triple includes the initial second triple and one or more target attribute pairs corresponding to the initial second triple;
a target industry chain map spectrum obtaining module, configured to add the target first triple and the target second triple to a target industry chain map.
7. A terminal, characterized in that the terminal comprises: a processor, a storage medium communicatively coupled to the processor, the storage medium adapted to store a plurality of instructions, the processor adapted to invoke the instructions in the storage medium to perform the steps of implementing the method for automatically constructing an industry chain atlas from a research report of any of claims 1-5.
8. A storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of the method for automatically constructing an industry chain graph from a newspaper as recited in any of claims 1-5.
CN202211325252.2A 2022-10-27 2022-10-27 Method for automatically constructing industry chain map from research report and related equipment Active CN115391569B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211325252.2A CN115391569B (en) 2022-10-27 2022-10-27 Method for automatically constructing industry chain map from research report and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211325252.2A CN115391569B (en) 2022-10-27 2022-10-27 Method for automatically constructing industry chain map from research report and related equipment

Publications (2)

Publication Number Publication Date
CN115391569A CN115391569A (en) 2022-11-25
CN115391569B true CN115391569B (en) 2023-03-24

Family

ID=84129424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211325252.2A Active CN115391569B (en) 2022-10-27 2022-10-27 Method for automatically constructing industry chain map from research report and related equipment

Country Status (1)

Country Link
CN (1) CN115391569B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363816A (en) * 2018-03-21 2018-08-03 北京理工大学 Open entity relation extraction method based on sentence justice structural model
CN109165385A (en) * 2018-08-29 2019-01-08 中国人民解放军国防科技大学 Multi-triple extraction method based on entity relationship joint extraction model

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255034A (en) * 2018-08-08 2019-01-22 数据地平线(广州)科技有限公司 A kind of domain knowledge map construction method based on industrial chain
CN111967761B (en) * 2020-08-14 2024-04-02 国网数字科技控股有限公司 Knowledge graph-based monitoring and early warning method and device and electronic equipment
CN113051365A (en) * 2020-12-10 2021-06-29 深圳证券信息有限公司 Industrial chain map construction method and related equipment
CN112883197B (en) * 2021-02-08 2023-02-07 广东电网有限责任公司广州供电局 Knowledge graph construction method and system for closed switch equipment
CN112860916B (en) * 2021-03-09 2022-09-16 齐鲁工业大学 Movie-television-oriented multi-level knowledge map generation method
CN113139068B (en) * 2021-05-10 2023-05-09 内蒙古工业大学 Knowledge graph construction method and device, electronic equipment and storage medium
CN114219089B (en) * 2021-11-11 2022-07-22 山东人才发展集团信息技术有限公司 Construction method and equipment of new-generation information technology industry knowledge graph
CN115017322A (en) * 2022-02-17 2022-09-06 甘肃农业大学 Ontology-based potato industry chain knowledge graph construction method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363816A (en) * 2018-03-21 2018-08-03 北京理工大学 Open entity relation extraction method based on sentence justice structural model
CN109165385A (en) * 2018-08-29 2019-01-08 中国人民解放军国防科技大学 Multi-triple extraction method based on entity relationship joint extraction model

Also Published As

Publication number Publication date
CN115391569A (en) 2022-11-25

Similar Documents

Publication Publication Date Title
Gruzauskas et al. Robotic process automation for document processing: A case study of a logistics service provider
CN111651552A (en) Structured information determination method and device and electronic equipment
CN115203403A (en) Text sorting model based on network public sentiment
CN114818718A (en) Contract text recognition method and device
CN118194842A (en) Intelligent document identification method and device, electronic equipment and storage medium
CN116150367A (en) Emotion analysis method and system based on aspects
CN113902569A (en) Method for identifying the proportion of green assets in digital assets and related products
US11461616B2 (en) Method and system for analyzing documents
CN115391569B (en) Method for automatically constructing industry chain map from research report and related equipment
GV et al. Document Classification and Information Extraction framework for Insurance Applications
CN114548325B (en) Zero sample relation extraction method and system based on dual contrast learning
CN113779218B (en) Question-answer pair construction method, question-answer pair construction device, computer equipment and storage medium
CN111046934B (en) SWIFT message soft clause recognition method and device
CN114495138A (en) Intelligent document identification and feature extraction method, device platform and storage medium
CN113627189A (en) Entity identification information extraction, storage and display method for insurance clauses
CN113094447A (en) Structured information extraction method oriented to financial statement image
Arafat et al. Hydrating large-scale coronavirus pandemic tweets: A review of software for transportation research
Thiée et al. Extraction of Information from Invoices–Challenges in the Extraction Pipeline
Kumar et al. AI Enabled Invoice Management Application
Tan et al. Information Extraction System for Cargo Invoices
CN112651246B (en) Service demand conflict detection method integrating deep learning and workflow modes
CN114936563B (en) Event extraction method, device and storage medium
Chiu et al. Use text mining for financial reports analysis: long text to image converter
Heidenreich et al. Large Language Models for Page Stream Segmentation
CN116402041A (en) Contract element extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant