CN116501895B - Typhoon time sequence knowledge graph construction method and terminal - Google Patents

Typhoon time sequence knowledge graph construction method and terminal Download PDF

Info

Publication number
CN116501895B
CN116501895B CN202310701261.5A CN202310701261A CN116501895B CN 116501895 B CN116501895 B CN 116501895B CN 202310701261 A CN202310701261 A CN 202310701261A CN 116501895 B CN116501895 B CN 116501895B
Authority
CN
China
Prior art keywords
relation
typhoon
entity
time sequence
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310701261.5A
Other languages
Chinese (zh)
Other versions
CN116501895A (en
Inventor
戴诗琪
林永清
单森华
洪水洁
徐能通
陈新伟
梁礼燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Istrong Technology Co ltd
Original Assignee
Istrong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Istrong Technology Co ltd filed Critical Istrong Technology Co ltd
Priority to CN202310701261.5A priority Critical patent/CN116501895B/en
Publication of CN116501895A publication Critical patent/CN116501895A/en
Application granted granted Critical
Publication of CN116501895B publication Critical patent/CN116501895B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a terminal for constructing a typhoon time sequence knowledge graph, which are characterized in that a trained cascade neural network model is used for carrying out joint extraction of entities and relations on typhoon text data to be constructed, time sequence triples corresponding to the typhoon text data to be constructed are obtained, and the typhoon time sequence knowledge graph is constructed according to the time sequence triples.

Description

Typhoon time sequence knowledge graph construction method and terminal
Technical Field
The invention relates to the technical field of knowledge graphs, in particular to a typhoon time sequence knowledge graph construction method and a typhoon time sequence knowledge graph construction terminal.
Background
A knowledge graph is a graphical model for representing and organizing knowledge that represents entities, attributes, and relationships that exist in the objective world as nodes and edges, forming a relational network for machine understanding and processing. The Google in 2012 provides a knowledge graph at the earliest, is mainly applied to a search engine, helps people to more accurately and rapidly search target information, and in recent years, the downstream of the knowledge graph also derives tasks such as knowledge question-answering, intelligent recommendation, semantic understanding and the like, so that the knowledge graph is more and more widely applied to various fields such as medical treatment, electronic commerce, finance and the like. However, as the internet information grows exponentially, whether it is a general knowledge graph or an industry knowledge graph, how to automatically extract entities from unstructured information and relationships between the entities becomes critical, which is a precondition for building a knowledge graph.
The observation data, forecast data, influence data and the like of typhoons have important values for researching occurrence rules, trends and influences of typhoons. However, as typhoons are observed and recorded more and more finely, typhoons related data are more and more massive, and related records are scattered in different data sources and documents. Meanwhile, due to limited historical observation and recording conditions, the existing typhoon historical data has uneven quality and contains a lot of unstructured data and semi-structured data. In addition, various attributes and time of typhoons have strong relevance, and the login sites of the typhoons can be multiple, so that the typhoons need to be associated with login time, the characteristics and influence of the typhoons are continuously changed along with the change of time, and the update timeliness of typhoons knowledge data is also important. Therefore, the construction work of typhoon knowledge maps has important significance and application value, such as research on occurrence rules, trends and influences of typhoons, but has small difficulty and challenges.
Extracting entities and relationships between entities from unstructured text are key to building a large-scale knowledge graph, however, the existing relationship extraction method has the following problems:
(1) Some models firstly perform entity identification and then extract the relationship, and have the problem of error transfer, namely the relationship extraction error is caused by entity missing identification or false identification, and meanwhile, the method also causes the waste of processing time and calculation resources and has low efficiency.
(2) In practice, entities in a sentence or paragraph may have one-to-many or many-to-many relationships, and it is difficult to extract overlapping triples as discrete labeling methods.
(3) Many relationships in the real world are time-attribute-bearing, and it is difficult to accurately describe only to extract triples, for example, many-to-many relationships in typhoon knowledge can be distinguished by time-limited descriptions.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the typhoon time sequence knowledge graph construction method and the typhoon time sequence knowledge graph construction terminal can improve construction efficiency and reliability of the typhoon time sequence knowledge graph.
In order to solve the technical problems, the invention adopts a technical scheme that:
a typhoon time sequence knowledge graph construction method comprises the following steps:
collecting typhoon text data to be constructed;
performing entity and relation joint extraction and time extraction on the typhoon text data to be constructed by using the trained cascade neural network model to obtain a time sequence triplet corresponding to the typhoon text data to be constructed;
and constructing and generating a typhoon time sequence knowledge graph according to the time sequence triples.
In order to solve the technical problems, the invention adopts another technical scheme that:
a terminal for constructing a typhoon time sequence knowledge graph, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the following steps when executing the computer program:
collecting typhoon text data to be constructed;
performing entity and relation joint extraction and time extraction on the typhoon text data to be constructed by using the trained cascade neural network model to obtain a time sequence triplet corresponding to the typhoon text data to be constructed;
and constructing and generating a typhoon time sequence knowledge graph according to the time sequence triples.
The invention has the beneficial effects that: the training-completed cascade neural network model is used for carrying out entity and relation joint extraction and time extraction on typhoon text data to be constructed, a time sequence triplet corresponding to the typhoon text data to be constructed is obtained, a typhoon time sequence knowledge graph is constructed and generated according to the time sequence triplet, and compared with the traditional pipeline method of carrying out entity identification and relation classification, the entity and relation joint extraction through the cascade neural network model breaks the dependence of each component, information interaction between two subtasks is increased, the problems of information loss and error transmission are avoided, the effect of entity and relation extraction is improved, the cascade network structure is adopted, one-to-many or many-to-many relation can be extracted, the time sequence triplet with time elements can be extracted, the accuracy of the final typhoon time sequence knowledge graph description is ensured, and therefore the construction efficiency and the reliability of the typhoon time sequence knowledge graph are improved.
Drawings
FIG. 1 is a flow chart of steps of a method for constructing a typhoon time sequence knowledge graph according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a terminal for constructing a typhoon time sequence knowledge graph according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a cascade neural network model after training in the method for constructing a typhoon time sequence knowledge graph according to the embodiment of the invention.
Detailed Description
In order to describe the technical contents, the achieved objects and effects of the present invention in detail, the following description will be made with reference to the embodiments in conjunction with the accompanying drawings.
Referring to fig. 1, an embodiment of the present invention provides a method for constructing a typhoon time sequence knowledge graph, including the steps of:
collecting typhoon text data to be constructed;
performing entity and relation joint extraction and time extraction on the typhoon text data to be constructed by using the trained cascade neural network model to obtain a time sequence triplet corresponding to the typhoon text data to be constructed;
and constructing and generating a typhoon time sequence knowledge graph according to the time sequence triples.
From the above description, the beneficial effects of the invention are as follows: the training-completed cascade neural network model is used for carrying out entity and relation joint extraction and time extraction on typhoon text data to be constructed, a time sequence triplet corresponding to the typhoon text data to be constructed is obtained, a typhoon time sequence knowledge graph is constructed and generated according to the time sequence triplet, and compared with the traditional pipeline method of carrying out entity identification and relation classification, the entity and relation joint extraction through the cascade neural network model breaks the dependence of each component, information interaction between two subtasks is increased, the problems of information loss and error transmission are avoided, the effect of entity and relation extraction is improved, the cascade network structure is adopted, one-to-many or many-to-many relation can be extracted, the time sequence triplet with time elements can be extracted, the accuracy of the final typhoon time sequence knowledge graph description is ensured, and therefore the construction efficiency and the reliability of the typhoon time sequence knowledge graph are improved.
Further, before collecting typhoon text data to be constructed, the method includes:
collecting typhoon text data for training;
performing time sequence triplet labeling on the typhoon text data for training to obtain labeling data;
constructing a cascade neural network model;
and training the cascade neural network model by using the typhoon text data for training and the labeling data with the maximum likelihood function as a target to obtain the trained cascade neural network model.
According to the description, through training the cascade neural network model, the context of the model is strong in understanding capability, the context and meaning in sentences can be better understood, the transfer learning can be performed in different tasks and fields, a large amount of labeling data is not needed, and the entity and the relation can be accurately extracted, so that the construction efficiency and accuracy of typhoon time sequence knowledge graph are improved.
Further, the likelihood function is:
wherein D represents the labeling data, x j Representing sentences in the dataset, j representing sentence number, T j Representing sentence x j In (1) the set of labeled time sequence triples, s represents the head entity, r represents the relationship, o represents the tail entity, t represents the time,representing a time sequence triplet belonging to sentence x j Is a triplet in the set of labeled time series triples,representing sentence x j Probability of presence of a timing triplet (s, r, o, t), +.>The representation relation r belongs to sentence x j In the marked time sequence triplet set, s is used as the time sequence triplet set of the head entity, and +.>Representing the head entity in the time ordered triplet set,/->Representing sentence x j Probability of s as head entity, +.>Representing sentence x j In (a) s as the head entity, there is a probability of tail entity o under the relation r,/i>Representing that relation r belongs to a headerOther relations unrelated to entity s->Representing sentence x j Probability of all other relation objects being empty, < ->Indicating that the tail entity is empty,/->Representing the time in the time sequence triplet set with s as the head entity, r as the relation, o as the tail entity,representing sentence x j The probability of the existence time t on the premise of taking the head entity s, the relation r and the tail entity o as triples.
From the above description, it can be seen that the likelihood function represents the likelihood of the model parameter, and by training the model with the objective of maximizing the likelihood function, the finally trained cascaded neural network model is better fitted with the given data, so as to ensure that the model can better realize the extraction of the triples.
Further, the constructing the cascade neural network model includes:
taking a Chinese bert pre-training model as a coding layer;
constructing a head entity identification layer, a relation and tail entity joint identification layer and a time identification layer, and generating a decoding layer according to the head entity identification layer, the relation and tail entity joint identification layer and the time identification layer;
and constructing a cascade neural network model according to the coding layer and the decoding layer.
As can be seen from the above description, the Chinese bert pre-training model is a language representation model based on a bidirectional transducer (a neural network model based on a self-attention mechanism), can well combine the context of each word for characterization, and can efficiently and accurately extract time sequence triplet elements according to a head entity recognition layer, a relation and a tail entity combined recognition layer and a decoding layer generated by the time recognition layer, thereby improving the effect of the whole model on entity and relation extraction.
Further, the step of performing entity and relationship joint extraction and time extraction on the typhoon text data to be constructed by using the trained cascade neural network model to obtain a time sequence triplet corresponding to the typhoon text data to be constructed includes:
coding the typhoon text data to be constructed by using the coding layer in the trained cascade neural network model to obtain a coding result;
decoding the coding result by using the head entity identification layer in the trained cascade neural network model to obtain all possible head entities, and obtaining a candidate head entity set from all possible head entities by using a nearest matching principle based on a linear layer and a sigmoid activation function;
identifying the relation and tail entity joint identification layer in the trained cascade neural network model based on the coding result and the candidate head entity set to obtain a relation and a tail entity set corresponding to the relation;
using the time identification layer in the trained cascade neural network model to identify based on the coding result, the candidate head entity set, the relation and the tail entity set corresponding to the relation to obtain a time element;
and generating a time sequence triplet corresponding to the typhoon text data to be constructed according to the candidate head entity set, the time element and the relation and the tail entity set corresponding to the relation.
According to the description, the tertiary cascade network structure based on the transformers is adopted to extract the triples with time relations from the unstructured text, so that the typhoon time sequence knowledge graph is constructed, typhoon knowledge information scattered in different data sources is integrated, a unified knowledge base is formed, convenience is brought to users to search and use, and the accessibility of typhoon information is improved.
Referring to fig. 2, a terminal for constructing a typhoon time sequence knowledge graph includes a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the following steps when executing the computer program:
collecting typhoon text data to be constructed;
performing entity and relation joint extraction and time extraction on the typhoon text data to be constructed by using the trained cascade neural network model to obtain a time sequence triplet corresponding to the typhoon text data to be constructed;
and constructing and generating a typhoon time sequence knowledge graph according to the time sequence triples.
From the above description, the beneficial effects of the invention are as follows: the training-completed cascade neural network model is used for carrying out entity and relation joint extraction and time extraction on typhoon text data to be constructed, a time sequence triplet corresponding to the typhoon text data to be constructed is obtained, a typhoon time sequence knowledge graph is constructed and generated according to the time sequence triplet, and compared with the traditional pipeline method of carrying out entity identification and relation classification, the entity and relation joint extraction through the cascade neural network model breaks the dependence of each component, information interaction between two subtasks is increased, the problems of information loss and error transmission are avoided, the effect of entity and relation extraction is improved, the cascade network structure is adopted, one-to-many or many-to-many relation can be extracted, the time sequence triplet with time elements can be extracted, the accuracy of the final typhoon time sequence knowledge graph description is ensured, and therefore the construction efficiency and the reliability of the typhoon time sequence knowledge graph are improved.
Further, before collecting typhoon text data to be constructed, the method includes:
collecting typhoon text data for training;
performing time sequence triplet labeling on the typhoon text data for training to obtain labeling data;
constructing a cascade neural network model;
and training the cascade neural network model by using the typhoon text data for training and the labeling data with the maximum likelihood function as a target to obtain the trained cascade neural network model.
According to the description, through training the cascade neural network model, the context of the model is strong in understanding capability, the context and meaning in sentences can be better understood, the transfer learning can be performed in different tasks and fields, a large amount of labeling data is not needed, and the entity and the relation can be accurately extracted, so that the construction efficiency and accuracy of typhoon time sequence knowledge graph are improved.
Further, the likelihood function is:
wherein D represents the labeling data, x j Representing sentences in the dataset, j representing sentence number, T j Representing sentence x j In (1) the set of labeled time sequence triples, s represents the head entity, r represents the relationship, o represents the tail entity, t represents the time,representing a time sequence triplet belonging to sentence x j Is a triplet in the set of labeled time series triples,representing sentence x j Probability of presence of a timing triplet (s, r, o, t), +.>The representation relation r belongs to sentence x j In the marked time sequence triplet set, s is used as the time sequence triplet set of the head entity, and +.>Representing the head entity in the time ordered triplet set,/->Expression sentenceSub x j Probability of s as head entity, +.>Representing sentence x j In (a) s as the head entity, there is a probability of tail entity o under the relation r,/i>The expression relation r belongs to other relations independent of the head entity s +.>Representing sentence x j Probability of all other relation objects being empty, < ->Indicating that the tail entity is empty,/->Representing the time in the time sequence triplet set with s as the head entity, r as the relation, o as the tail entity,representing sentence x j The probability of the existence time t on the premise of taking the head entity s, the relation r and the tail entity o as triples.
From the above description, it can be seen that the likelihood function represents the likelihood of the model parameter, and by training the model with the objective of maximizing the likelihood function, the finally trained cascaded neural network model is better fitted with the given data, so as to ensure that the model can better realize the extraction of the triples.
Further, the constructing the cascade neural network model includes:
taking a Chinese bert pre-training model as a coding layer;
constructing a head entity identification layer, a relation and tail entity joint identification layer and a time identification layer, and generating a decoding layer according to the head entity identification layer, the relation and tail entity joint identification layer and the time identification layer;
and constructing a cascade neural network model according to the coding layer and the decoding layer.
As can be seen from the above description, the Chinese bert pre-training model is a language representation model based on a bidirectional transducer (a neural network model based on a self-attention mechanism), can well combine the context of each word for characterization, and can efficiently and accurately extract time sequence triplet elements according to a head entity recognition layer, a relation and a tail entity combined recognition layer and a decoding layer generated by the time recognition layer, thereby improving the effect of the whole model on entity and relation extraction.
Further, the step of performing entity and relationship joint extraction and time extraction on the typhoon text data to be constructed by using the trained cascade neural network model to obtain a time sequence triplet corresponding to the typhoon text data to be constructed includes:
coding the typhoon text data to be constructed by using the coding layer in the trained cascade neural network model to obtain a coding result;
decoding the coding result by using the head entity identification layer in the trained cascade neural network model to obtain all possible head entities, and obtaining a candidate head entity set from all possible head entities by using a nearest matching principle based on a linear layer and a sigmoid activation function;
identifying the relation and tail entity joint identification layer in the trained cascade neural network model based on the coding result and the candidate head entity set to obtain a relation and a tail entity set corresponding to the relation;
using the time identification layer in the trained cascade neural network model to identify based on the coding result, the candidate head entity set, the relation and the tail entity set corresponding to the relation to obtain a time element;
and generating a time sequence triplet corresponding to the typhoon text data to be constructed according to the candidate head entity set, the time element and the relation and the tail entity set corresponding to the relation.
According to the description, the tertiary cascade network structure based on the transformers is adopted to extract the triples with time relations from the unstructured text, so that the typhoon time sequence knowledge graph is constructed, typhoon knowledge information scattered in different data sources is integrated, a unified knowledge base is formed, convenience is brought to users to search and use, and the accessibility of typhoon information is improved.
The method and the terminal for constructing the typhoon time sequence knowledge graph can be applied to scenes in which typhoon related data are required to be researched, and the method and the terminal are described in the following specific embodiments:
example 1
Referring to fig. 1 and 3, a method for constructing a typhoon time sequence knowledge graph according to the present embodiment includes the steps of:
s1, typhoon text data for training is collected, and the method specifically comprises the following steps:
s11, crawling initial typhoon text data from official data sources such as hundred degrees encyclopedia, a central meteorological network, typhoon annual checkups and the like.
S12, carrying out data cleaning on the initial typhoon text data, namely removing unnecessary blank spaces, unnecessary line-wrapping symbols, literature quotation marks and other irrelevant characters, and obtaining cleaned initial typhoon text data.
S13, segmenting the cleaned initial typhoon text data according to the paragraphs to obtain typhoon text data for training.
S2, performing time sequence triplet labeling on the typhoon text data for training to obtain labeling data;
for example, the typhoon text data for training is:
typhoon "plum blossom" was recognized by the central weather station as logging in Zhejiang Zhoushan with a strong typhoon level at 20 hours 9 months 14, was recognized as logging in Shanghai voxian a second time at 9 months 15 hours 0 hours, was recognized as logging in Shandong Qingdao a third time at a tropical storm level at 9 months 16 hours 0 hours, and was recognized as logging in Daliang a fourth time at a tropical storm level around 12 hours 9 months 16 hours 40 minutes.
The timing triplets include: the method comprises the steps of marking typhoon text data for training through time sequence triplets, wherein the typhoon is 30 minutes in 20 days of the 9 month, the Zhejiang Zhoushan, the 9 month and the 14 days, the typhoon is 30 minutes in 0 time of the 9 month and the 15 days, the typhoon is plum blossom, the login place, the Shandong Qingdao, the 9 month and the 16 days and the 0 time of the morning, the typhoon is plum blossom, the login place, liaoning Dalian and the 9 month and the 16 days and the 12 hours, and marking the typhoon text data for training through the time sequence triplets to obtain marking data;
from the above example, it can be seen that there is a one-to-many relationship in this example and the relationship is the same, and if only triples are extracted, it is difficult to describe accurately, the timing triples can solve the above problem exactly.
S3, constructing a cascade neural network model, which specifically comprises the following steps:
s31, taking a Chinese bert pre-training model (bert_base_Chinese model) as a coding layer.
S32, constructing a head entity identification layer, a relation and tail entity joint identification layer and a time identification layer, and generating a decoding layer according to the head entity identification layer, the relation and tail entity joint identification layer and the time identification layer;
the decoding layer is a three-level cascade structure, and first identifies all possible header entities, then identifies whether corresponding tail entities exist under a given relation category, and then identifies whether time elements exist under the relation and the tail entities.
S33, constructing a cascade neural network model according to the coding layer and the decoding layer.
S4, training the cascade neural network model by using the typhoon text data for training and the labeling data with the maximum likelihood function as a target to obtain a trained cascade neural network model;
wherein the likelihood function is:
wherein D represents the labeling data, x j Representing sentences in the dataset, j representing sentence number, T j Representing sentence x j In (1) the set of labeled time sequence triples, s represents the head entity, r represents the relationship, o represents the tail entity, t represents the time,representing a time sequence triplet belonging to sentence x j Is a triplet in the set of labeled time series triples,representing sentence x j Probability of presence of a timing triplet (s, r, o, t), +.>The representation relation r belongs to sentence x j In the marked time sequence triplet set, s is used as the time sequence triplet set of the head entity, and +.>Representing the head entity in the time ordered triplet set,/->Representing sentence x j Probability of s as head entity, +.>Representing sentence x j In (a) s as the head entity, there is a probability of tail entity o under the relation r,/i>The expression relation r belongs to other relations independent of the head entity s +.>Representing sentence x j Probability of all other relation objects being empty, < ->Indicating that the tail entity is empty,/->Representing the time in the time sequence triplet set with s as the head entity, r as the relation, o as the tail entity,representing sentence x j In the probability of existence time t on the premise of taking a head entity s, a relation r and a tail entity o as triples, < ->Representing a pair of relations, tail entities and times in a time-ordered triplet set, +.>Representing sentence x j The probability of having s as the head entity in relation r, tail entity o and time t,/>Representing a pair of relation and tail entities in the time sequence triplet set with s as head entity,/->Representing sentence x j S is taken as the probability that the head entity exists in relation to the tail entity o.
S5, collecting typhoon text data to be constructed;
s6, performing entity and relation joint extraction and time extraction on the typhoon text data to be constructed by using the trained cascade neural network model to obtain a time sequence triplet corresponding to the typhoon text data to be constructed, wherein the method specifically comprises the following steps of:
s61, coding the typhoon text data to be constructed by using the coding layer in the trained cascade neural network model to obtain a coding result;
as shown in fig. 3, typhoon text data to be constructed: and logging in from the north part of the Philippines on 9 months and 15 days, and inputting the BERT coding layer in the trained cascade neural network model for coding to obtain a coding result.
S62, decoding the coding result by using the head entity identification layer in the trained cascade neural network model to obtain all possible head entities, and obtaining a candidate head entity set from all possible head entities by using a nearest matching principle based on a linear layer and a sigmoid activation function;
specifically, whether each token (word segmentation word) is a start token and an end token of a head entity is judged based on a linear layer and a sigmoid activation function, and then the identified start and end are paired by utilizing a latest matching principle to obtain a candidate head entity set.
As shown in fig. 3, the encoding result is transferred to a header entity recognition layer, which includes 2 classifiers to respectively recognize whether it is a start token and an end token of a header entity, and then two entities are obtained according to the nearest matching principle: "mangosteen", "north philippines".
S63, identifying the relation and tail entity joint identification layer in the trained cascade neural network model based on the coding result and the candidate head entity set to obtain a relation and a tail entity set corresponding to the relation;
specifically, for each given relation type, the coding result and the candidate head entity set are combined, the start and end of the tail entity are identified by utilizing two classification structures, the identified start and end are paired according to the latest matching principle to obtain the tail entity under the corresponding relation, if the tail entity does not exist, the relation does not exist, and the relation and the tail entity set corresponding to the relation are obtained.
As shown in fig. 3, the information of the encoding result and the candidate head entity set is combined and transferred to a relationship and tail entity joint identification layer, and for each given relationship, two classifiers (i.e. a bipartite structure) are used for identifying whether a start token and an end token of the tail entity exist in the relationship, and matching is performed according to the nearest principle to obtain the north part of the tail entity philippines in the relationship of the login places.
S64, the time identification layer in the trained cascade neural network model is used for identifying based on the coding result, the candidate head entity set, the relation and the tail entity set corresponding to the relation, so that a time element is obtained;
specifically, combining the coding result, the candidate head entity set, the relation and the tail entity set corresponding to the relation, identifying all possible time elements, respectively judging whether each token is a start token and an end token of the time element by utilizing a two-classification structure, then pairing the identified start and end by utilizing a nearest matching principle to obtain the time element, and if the time element does not exist, determining that the time element of the triplet is empty.
As shown in fig. 3, the result combination of the previous three layers is transferred to a time identification layer, and two classifiers are used for identifying whether a start token and an end token of the time under the corresponding triplet combination exist, and the time is 9 months and 15 days according to the latest matching principle.
S65, generating a time sequence triplet corresponding to the typhoon text data to be constructed according to the candidate head entity set, the time element, the relation and the tail entity set corresponding to the relation.
S7, constructing and generating a typhoon time sequence knowledge graph according to the time sequence triples;
specifically, because the time sequence triples cannot be directly stored by using an RDF (Report Definition File ) format, the RDF is in a triples form, the format of the time sequence triples is converted, triples without time elements are imported according to the original structure, time is converted into attributes of corresponding relations for the triples with the time elements, and then a typhoon time sequence knowledge graph is constructed and generated.
The typhoon time sequence knowledge graph constructing method utilizes the pre-training language model to encode, has strong context understanding capability, can better understand the context and meaning in sentences, can perform transfer learning in different tasks and fields, does not need a large amount of marking data, and can accurately extract entities and relations; compared with the prior art, the pipeline method of carrying out entity identification and relationship classification in the combined extraction of the entities and the relationships breaks through the dependence of each component, increases the information interaction between two subtasks, avoids the problems of information loss and error transmission, is beneficial to improving the effect of entity and relationship extraction, and can also improve the processing efficiency; by adopting a three-level cascade network structure, the method not only can extract a one-to-many or many-to-many relationship, but also can extract a time sequence triplet with time elements, can be applied to event sequence analysis, time line generation, time line reasoning and the like, is suitable for more application scenes, and in conclusion, the construction efficiency and the reliability of typhoon time sequence knowledge graph are improved.
Example two
Referring to fig. 2, a terminal for constructing a typhoon timing knowledge graph according to the present embodiment includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the steps in the method for constructing a typhoon timing knowledge graph according to the first embodiment are implemented when the processor executes the computer program.
In summary, the method and the terminal for constructing the typhoon time sequence knowledge graph provided by the invention use the trained cascade neural network model to perform joint extraction of the entity and the relation of the typhoon text data to be constructed to obtain the time sequence triplet corresponding to the typhoon text data to be constructed, and the time sequence triplet is constructed to generate the typhoon time sequence knowledge graph according to the time sequence triplet. And a tertiary cascade network structure based on a transformer is adopted to extract triples with time relations from unstructured texts, so that the construction of typhoon time sequence knowledge graphs is completed, typhoon knowledge information scattered in different data sources is integrated, a unified knowledge base is formed, convenience is brought to users to search and use, and the accessibility of typhoon information is improved.
The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent changes made by the specification and drawings of the present invention, or direct or indirect application in the relevant art, are included in the scope of the present invention.

Claims (2)

1. The construction method of the typhoon time sequence knowledge graph is characterized by comprising the following steps:
collecting typhoon text data to be constructed;
performing entity and relation joint extraction and time extraction on the typhoon text data to be constructed by using the trained cascade neural network model to obtain a time sequence triplet corresponding to the typhoon text data to be constructed;
constructing and generating a typhoon time sequence knowledge graph according to the time sequence triples;
before the typhoon text data to be constructed are collected, the method comprises the following steps:
collecting typhoon text data for training;
performing time sequence triplet labeling on the typhoon text data for training to obtain labeling data;
constructing a cascade neural network model;
training the cascade neural network model by using the typhoon text data for training and the labeling data with the maximum likelihood function as a target to obtain a trained cascade neural network model;
the likelihood function is:
wherein D represents the labeling data, x j Representing sentences in the dataset, j representing sentence number, T j Representing sentence x j In the labeling time sequence triplet set, s represents the head entityR represents a relationship, o represents a tail entity, t represents a time,representing a time sequence triplet belonging to sentence x j Is marked by a triplet in the time sequence triplet set,/->Representing sentence x j Probability of presence of a timing triplet (s, r, o, t), +.>The representation relation r belongs to sentence x j In the marked time sequence triplet set, s is used as the time sequence triplet set of the head entity, and +.>Representing the head entity in the time ordered triplet set,/->Representing sentence x j Probability of s as head entity, +.>Representing sentence x j In (a) s as the head entity, there is a probability of tail entity o under the relation r,/i>The expression relation r belongs to other relations independent of the head entity s +.>Representing sentence x j Probability of all other relation objects being empty, < ->Indicating that the tail entity is empty,/->Representing time sequence threeTime in tuple set with s as head entity, r as relation, o as tail entity,/->Representing sentence x j The probability of the existence time t on the premise of taking a head entity s, a relation r and a tail entity o as triples;
the constructing the cascade neural network model comprises the following steps:
taking a Chinese bert pre-training model as a coding layer;
constructing a head entity identification layer, a relation and tail entity joint identification layer and a time identification layer, and generating a decoding layer according to the head entity identification layer, the relation and tail entity joint identification layer and the time identification layer;
constructing a cascade neural network model according to the coding layer and the decoding layer;
the step of performing entity and relation joint extraction and time extraction on the typhoon text data to be constructed by using the trained cascade neural network model to obtain a time sequence triplet corresponding to the typhoon text data to be constructed comprises the following steps:
coding the typhoon text data to be constructed by using the coding layer in the trained cascade neural network model to obtain a coding result;
decoding the coding result by using the head entity identification layer in the trained cascade neural network model to obtain all possible head entities, and obtaining a candidate head entity set from all possible head entities by using a nearest matching principle based on a linear layer and a sigmoid activation function;
identifying the relation and tail entity joint identification layer in the trained cascade neural network model based on the coding result and the candidate head entity set to obtain a relation and a tail entity set corresponding to the relation;
using the time identification layer in the trained cascade neural network model to identify based on the coding result, the candidate head entity set, the relation and the tail entity set corresponding to the relation to obtain a time element;
and generating a time sequence triplet corresponding to the typhoon text data to be constructed according to the candidate head entity set, the time element and the relation and the tail entity set corresponding to the relation.
2. A terminal for constructing a typhoon time sequence knowledge graph, comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the following steps when executing the computer program:
collecting typhoon text data to be constructed;
performing entity and relation joint extraction and time extraction on the typhoon text data to be constructed by using the trained cascade neural network model to obtain a time sequence triplet corresponding to the typhoon text data to be constructed;
constructing and generating a typhoon time sequence knowledge graph according to the time sequence triples;
before the typhoon text data to be constructed are collected, the method comprises the following steps:
collecting typhoon text data for training;
performing time sequence triplet labeling on the typhoon text data for training to obtain labeling data;
constructing a cascade neural network model;
training the cascade neural network model by using the typhoon text data for training and the labeling data with the maximum likelihood function as a target to obtain a trained cascade neural network model;
the likelihood function is:
wherein D represents the labeling data, x j Representing sentences in the dataset, j representing sentence number, T j Representing sentence x j In (1) the set of labeled time sequence triples, s represents the head entity, r represents the relationship, o represents the tail entity, t represents the time,representing a time sequence triplet belonging to sentence x j Is marked by a triplet in the time sequence triplet set,/->Representing sentence x j Probability of presence of a timing triplet (s, r, o, t), +.>The representation relation r belongs to sentence x j In the marked time sequence triplet set, s is used as the time sequence triplet set of the head entity, and +.>Representing the head entity in the time ordered triplet set,/->Representing sentence x j Probability of s as head entity, +.>Representing sentence x j In (a) s as the head entity, there is a probability of tail entity o under the relation r,/i>The expression relation r belongs to other relations independent of the head entity s +.>Representing sentence x j Probability of all other relation objects being empty, < ->Representing tail entitiesEmpty (empty) or (empty)>Representing the time in the time sequence triplet set with s as head entity, r as relation, o as tail entity,/for the time sequence triplet set>Representing sentence x j The probability of the existence time t on the premise of taking a head entity s, a relation r and a tail entity o as triples;
the constructing the cascade neural network model comprises the following steps:
taking a Chinese bert pre-training model as a coding layer;
constructing a head entity identification layer, a relation and tail entity joint identification layer and a time identification layer, and generating a decoding layer according to the head entity identification layer, the relation and tail entity joint identification layer and the time identification layer;
constructing a cascade neural network model according to the coding layer and the decoding layer;
the step of performing entity and relation joint extraction and time extraction on the typhoon text data to be constructed by using the trained cascade neural network model to obtain a time sequence triplet corresponding to the typhoon text data to be constructed comprises the following steps:
coding the typhoon text data to be constructed by using the coding layer in the trained cascade neural network model to obtain a coding result;
decoding the coding result by using the head entity identification layer in the trained cascade neural network model to obtain all possible head entities, and obtaining a candidate head entity set from all possible head entities by using a nearest matching principle based on a linear layer and a sigmoid activation function;
identifying the relation and tail entity joint identification layer in the trained cascade neural network model based on the coding result and the candidate head entity set to obtain a relation and a tail entity set corresponding to the relation;
using the time identification layer in the trained cascade neural network model to identify based on the coding result, the candidate head entity set, the relation and the tail entity set corresponding to the relation to obtain a time element;
and generating a time sequence triplet corresponding to the typhoon text data to be constructed according to the candidate head entity set, the time element and the relation and the tail entity set corresponding to the relation.
CN202310701261.5A 2023-06-14 2023-06-14 Typhoon time sequence knowledge graph construction method and terminal Active CN116501895B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310701261.5A CN116501895B (en) 2023-06-14 2023-06-14 Typhoon time sequence knowledge graph construction method and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310701261.5A CN116501895B (en) 2023-06-14 2023-06-14 Typhoon time sequence knowledge graph construction method and terminal

Publications (2)

Publication Number Publication Date
CN116501895A CN116501895A (en) 2023-07-28
CN116501895B true CN116501895B (en) 2023-09-01

Family

ID=87330475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310701261.5A Active CN116501895B (en) 2023-06-14 2023-06-14 Typhoon time sequence knowledge graph construction method and terminal

Country Status (1)

Country Link
CN (1) CN116501895B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914096A (en) * 2020-07-06 2020-11-10 同济大学 Public transport passenger satisfaction evaluation method and system based on public opinion knowledge graph
CN113360671A (en) * 2021-06-16 2021-09-07 浙江工业大学 Medical insurance medical document auditing method and system based on knowledge graph
CN114580639A (en) * 2022-02-23 2022-06-03 中南民族大学 Knowledge graph construction method based on automatic extraction and alignment of government affair triples
CN115114442A (en) * 2022-04-07 2022-09-27 腾讯科技(深圳)有限公司 Knowledge graph updating method and device, storage medium and electronic equipment
CN116226408A (en) * 2023-03-27 2023-06-06 中国科学院空天信息创新研究院 Agricultural product growth environment knowledge graph construction method and device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168745B (en) * 2021-11-30 2022-08-09 大连理工大学 Knowledge graph construction method for production process of ethylene oxide derivative

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914096A (en) * 2020-07-06 2020-11-10 同济大学 Public transport passenger satisfaction evaluation method and system based on public opinion knowledge graph
CN113360671A (en) * 2021-06-16 2021-09-07 浙江工业大学 Medical insurance medical document auditing method and system based on knowledge graph
CN114580639A (en) * 2022-02-23 2022-06-03 中南民族大学 Knowledge graph construction method based on automatic extraction and alignment of government affair triples
CN115114442A (en) * 2022-04-07 2022-09-27 腾讯科技(深圳)有限公司 Knowledge graph updating method and device, storage medium and electronic equipment
CN116226408A (en) * 2023-03-27 2023-06-06 中国科学院空天信息创新研究院 Agricultural product growth environment knowledge graph construction method and device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的实体关系联合抽取研究综述;张仰森 等;电子学报;第51卷(第4期);1093-1116 *

Also Published As

Publication number Publication date
CN116501895A (en) 2023-07-28

Similar Documents

Publication Publication Date Title
CN111428053B (en) Construction method of tax field-oriented knowledge graph
CN110825881B (en) Method for establishing electric power knowledge graph
WO2019137033A1 (en) Automatic construction method for software bug oriented domain knowledge graph
Arasu et al. Extracting structured data from web pages
CN109635108B (en) Man-machine interaction based remote supervision entity relationship extraction method
CN112579707A (en) Log data knowledge graph construction method
CN113806563B (en) Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material
CN105512687A (en) Emotion classification model training and textual emotion polarity analysis method and system
CN110765277B (en) Knowledge-graph-based mobile terminal online equipment fault diagnosis method
CN113486667A (en) Medical entity relationship joint extraction method based on entity type information
WO2020010834A1 (en) Faq question and answer library generalization method, apparatus, and device
CN111581376A (en) Automatic knowledge graph construction system and method
CN113239111B (en) Knowledge graph-based network public opinion visual analysis method and system
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN117151659B (en) Ecological restoration engineering full life cycle tracing method based on large language model
CN111680205B (en) Event evolution analysis method and device based on event map
CN112989208A (en) Information recommendation method and device, electronic equipment and storage medium
CN110674313B (en) Method for dynamically updating knowledge graph based on user log
CN117574898A (en) Domain knowledge graph updating method and system based on power grid equipment
CN115599899A (en) Intelligent question-answering method, system, equipment and medium based on aircraft knowledge graph
CN114911893A (en) Method and system for automatically constructing knowledge base based on knowledge graph
CN117235280A (en) Operation ticket generation method, device, equipment and medium
CN116501895B (en) Typhoon time sequence knowledge graph construction method and terminal
CN111680163A (en) Knowledge graph visualization method for electric power scientific and technological achievements
CN115617981A (en) Information level abstract extraction method for short text of social network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant