CN114238524A - Satellite frequency-orbit data information extraction method based on enhanced sample model - Google Patents

Satellite frequency-orbit data information extraction method based on enhanced sample model Download PDF

Info

Publication number
CN114238524A
CN114238524A CN202111570758.5A CN202111570758A CN114238524A CN 114238524 A CN114238524 A CN 114238524A CN 202111570758 A CN202111570758 A CN 202111570758A CN 114238524 A CN114238524 A CN 114238524A
Authority
CN
China
Prior art keywords
sentence
entity
data
satellite
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111570758.5A
Other languages
Chinese (zh)
Other versions
CN114238524B (en
Inventor
何元智
李志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Original Assignee
Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences filed Critical Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Priority to CN202111570758.5A priority Critical patent/CN114238524B/en
Publication of CN114238524A publication Critical patent/CN114238524A/en
Application granted granted Critical
Publication of CN114238524B publication Critical patent/CN114238524B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a satellite frequency orbit data information extraction method based on an enhanced sample model, which comprises the following steps: defining entity types and relation sets; a structured frequency-track data relation extraction stage, namely selecting required data information from a database and matching related entities; representing the entity pairs and the relationships thereof by triples; in the unstructured frequency-track data relation extraction stage, labeling the text data after word segmentation, training an entity recognition model and completing entity recognition; enhancing a sample model, generating a text supplement training sentence library by using the structured data, solving the problem of long tail, and classifying correct label sentences and noise sentences in a sentence bag by using reinforcement learning; and training the segmented convolutional neural network model to complete classification and extraction of entity relationships. The invention fully utilizes the structured data and the noise sentences, can efficiently complete the knowledge extraction of the satellite frequency-orbit data, and enriches the satellite frequency-orbit knowledge base; the method has the advantages of high scheme flexibility and high relation extraction accuracy.

Description

Satellite frequency-orbit data information extraction method based on enhanced sample model
Technical Field
The invention relates to the technical field of satellite data processing, in particular to a satellite frequency and orbit data information extraction method based on an enhanced sample model.
Background
At present, with the rapid development of aerospace technology, a plurality of satellites are launched into the outer space in different countries of the world, a large number of frequency-orbit resource data records are generated, and the data contain a lot of useful information. Although the traditional database storage method records a large amount of structural data, the data information is not complete enough to construct a complete frequency-track data map. The relationship between the data can be visually shown by establishing a frequency-orbit diagram knowledge model, and a technical basis can be laid for the mining and utilization of the data. There are many useful unstructured satellite frequency orbit data on the network, and the data volume is often huge, and the data volume can be used as a supplement of the structured data.
How to identify the required entities and the relationships thereof from the unstructured frequency-track data is a basic problem to be solved for constructing a complete frequency-track data map. The method mainly comprises two key links of named entity identification and relationship extraction for the construction of a complete frequency-orbit data map. The method for named entity recognition and relationship extraction can be divided into a joint extraction method and a Pipeline method according to whether the two link tasks are modeled in a unified manner or not.
The joint extraction is to uniformly model the two tasks into one model, and the extraction scheme can further utilize potential association information existing between the two tasks to reduce the propagation of error accumulation. However, due to the unified modeling of the two tasks, performing the two tasks with the same feature representation may cause misunderstandings for the learning of the model. How to strengthen the interaction between the entity model and the relationship model is also a difficult problem. The Pipeline method firstly identifies named entities and then extracts relations, the scheme is high in flexibility, and the entity model and the relation model can respectively use independent data sets.
For named entity recognition, existing methods are classified into rule-based methods, statistical model-based methods, and neural network-based methods. The rule-based method needs to construct a large number of entity identification rules which are matched with input character strings to identify named entities. The method needs experts to construct rules and has certain limitation in application. Statistical model-based methods treat named entity identification as a sequence tagging problem, but still require manual feature definition. The defined features have a large impact on the final recognition result. The neural network based approach solves the above problems without manually defining the features. And because the neural network has stronger feature expression capability, the features of the entity context can be fully learned.
For the satellite frequency-orbit entity relationship extraction, the existing methods are divided into a template-based relationship extraction, a supervised learning-based relationship extraction and a remote supervision-based method. When the data scale is large, the manual template building workload is large. The relation extraction method based on supervised learning needs a large amount of manually labeled data, and becomes a restriction factor of the relation extraction method. Remote supervision-based methods avoid manually labeling large amounts of data, but introduce noise. The existing research mainly considers selecting a sentence containing a correct label or identifying and removing noise, and does not consider the important significance of the noise on model training. Meanwhile, the method based on remote supervision has the problem of long tail. The two points result in that the relation extraction model trained by the existing method is biased and limited in accuracy.
Chinese patent CN108304911 proposes a knowledge extraction method, system and device based on a memory neural network, which can be used for a knowledge extraction task of a predefined relationship type and can automatically extract structured information meeting the predefined relationship type from unstructured texts in the Internet; chinese patent CN109359297 proposes a relationship extraction method and system, the method introduces hierarchy structure information of relationships to construct a set of attention mechanism of hierarchy structure, and stability of a relationship extraction model is improved. The above patent can extract knowledge, but the data used in the technical solutions of the above patent are all unstructured data, and the information contained in the existing structured data cannot be fully utilized. The first patent scheme needs a large number of manual labels, and the second patent scheme adopts the idea of remote supervision, but the effect of noise data is not fully considered, so that the accuracy rate of knowledge extraction is limited.
Disclosure of Invention
The invention discloses a satellite frequency-orbit data information extraction method based on an enhanced sample model, aiming at the problems that the data record of the traditional satellite database is not complete and the data volume is not enough to establish a frequency-orbit diagram knowledge model, so that useful knowledge information is extracted from unstructured data and is used as supplement of structured data.
The invention discloses a satellite frequency orbit data information extraction method based on an enhanced sample model, which comprises the following specific steps:
s1, according to the task requirement of satellite frequency orbit data identification and extraction, defining the entity type of the satellite frequency orbit data, wherein the six defined entity types comprise: satellite name, satellite network ID, department of charge, orbit position, orbit type, frequency band; the entity is a satellite communication subject in satellite frequency orbit data;
s2, defining a set of relationships between entities, defining relationships between entities based on the entity types defined in step S1, where the relationships between entities are represented by triples, which specifically includes: (satellite name, belonging to, satellite network ID), (satellite name, managed, governing department), (satellite name, orbital), (orbital type, suborbital, satellite name), (satellite name, usage, frequency band) and (governing department, owning network, satellite network ID), all the relationships between entities constitute a set of relationships between entities;
s3, acquiring the frequency-orbit data of the structured satellite, and extracting the knowledge of the frequency-orbit data of the structured satellite, wherein the extraction comprises data preprocessing, entity identification and entity relation extraction;
s31, preprocessing data, namely acquiring structured satellite frequency-orbit data from an SRS database of the International telecommunication Union according to a defined entity type, selecting corresponding data of the entity type from the structured satellite frequency-orbit data, and storing the corresponding data into an entity-relation table;
s32, carrying out entity identification on the structured satellite frequency-orbit data, firstly matching corresponding data from the entity-relationship table according to the defined entity type and the relationship thereof, and selecting related entities;
s33, extracting entity relationship, namely setting the relationship between the entities defined in S2 corresponding to the entity type defined in S1 to which the entity selected in the step S32 belongs as the relationship between the entities;
s34, establishing a triple set T for each two entities by using the corresponding relation;
s4, extracting knowledge of the unstructured satellite frequency-orbit data, obtaining text data of the unstructured satellite frequency-orbit data from the Internet by a data crawling method, segmenting the text data to obtain a segmentation sequence, labeling the segmentation sequence by a BIO labeling method, and taking the labeled text as a training sentence library; fine-tuning a pre-training model based on the BERT to form a named entity recognition model based on the BERT; training a named entity recognition model based on BERT by utilizing a training sentence library; correctly classifying each word in the word sequence by using a trained and BERT-based named entity recognition model;
s41, crawling and word segmentation are carried out on the unstructured satellite frequency-orbit data; respectively marking the defined entity types, namely satellite names, satellite network IDs, departments of charge, orbit positions, satellite types and frequency bands, as six types of labels A1, A2, A3, A4, A5 and A6; marking the label for the segmented sentence by using a BIO marking method to obtain a training sentence library;
s42, fine-tuning a sequence labeling layer of the pre-training model based on the BERT, namely replacing hidden layer representation of the BERT by using a full-connection layer to form a named entity recognition model based on the BERT; training a named entity recognition model based on BERT by using a training sentence library; after an input vector v of an input layer passes through a plurality of coding layers, semantic association expression of sentences in the unstructured satellite frequency-orbit data is obtained as h;
s43, outputting the probability distribution P of each moment of the word segmentation sequence under the BIO labeling mode by the sequence labeling layertThe expression of (a) is:
Pt=softmax(htW0+b0),t=1,2,...,N
wherein h istDenotes the component of h at time t, W0Weight matrix representing fully connected layers, b0Indicating the bias of the full connection layer, and softmax indicating the activation function;
s44, after the probability distribution of each moment of the word segmentation sequence is obtained, the named entity recognition model based on the BERT adopts a cross entropy loss function to train the parameters of the named entity recognition model based on the BERT so as to improve the classification prediction capability of the model; and correctly classifying each word in the word segmentation sequence by using the trained model to obtain a classification result BIO label, obtaining a complete entity name and type according to the classification result BIO label, and finally completing entity identification of the satellite frequency-orbit data.
S5, according to the classification result of the step S4, a sentence containing the entity type defined in the step S1 is screened out; in the screened sentences, for the sentences containing the entities with the same entity type, packaging the sentences to be used as a sentence bag, and marking the entity relationship among the entities in the sentences as a sentence bag label;
s6, the entity types and the relations thereof extracted in the step S3 are used for supplementing the sentence bag data in the step S5, the number of sentence bags is increased, and the number of sentence bags under different entity relations is balanced;
the step S6 specifically includes:
s61, calculating the number of the sentence pockets under each entity relationship, and finding out the median of the number of the sentence pockets under all the entity relationships;
s62, for the entity relation that the number of sentence pockets is less than the median, increasing the number of sentence pockets under the entity relation; entities contained in sentences in the existing sentence pockets of the entity relationship needing to increase the number of the sentence pockets are deleted, and the corresponding data of the entity type extracted in the step S3 is filled in the deletion positions of the sentence pockets to be used as new sentence pockets under the entity relationship, so that the number of the sentence pockets under the entity relationship with the number of the sentence pockets smaller than the median value is increased, and the balance of the number of the sentence pockets under different entity relationships is achieved.
S7, constructing an entity relationship extraction model, firstly screening noise sentences and correct label sentences in the sentence bags by using a reinforcement learning algorithm, and then training the entity relationship extraction model by using the correct label sentences and the noise sentences; the entity relation extraction model is realized by a segmented convolution neural network;
the step S7 includes the following steps:
s71, if the relation between the entities contained in the sentence pocket is the sentence pocket label of the sentence pocket, defining the sentence as a correct label sentence; if the relation between the entities contained in the sentence pocket is not the sentence pocket label of the sentence pocket, defining that the sentence is a noise sentence; sentences in the sentence pocket and sentence pocket labels are used as input of a reinforcement learning algorithm;
s72, setting the agent of reinforcement learning algorithm as the filter of correct label sentence or noise sentence, setting the action A of agent to the ith sentenceiThe method comprises two types, namely, marking the sentence as 1 for judging the sentence as a correct label sentence, and marking the sentence as-1 for judging the sentence as a noise sentence; wherein i is the serial number of the sentence in the input sentence bag, Ai∈{1,-1},AiThe expression of the action selection policy function of (1) is:
Figure BDA0003423610930000051
wherein, pi (A)i|Si(ii) a θ) represents the state SiDown selection action AiProbability of (S)iRepresenting the state of the agent during the ith selection, theta represents the parameter to be learned of the agent, sigma (·) represents a sigmoid function, and W and b respectively represent a weight matrix and bias to be learned;
s73, defining the state S of the agent as a vector formed by splicing the vector representation of the sentence with correct selected relation label, the vector representation of the selected noise sentence, the vector representation of the current sentence and the vector representation of the entity pair corresponding to the current sentence;
s74, after the agent takes corresponding action to each sentence in the sentence bag, the agent gets corresponding reward according to the action, the reward value of the action before the agent takes the last action is set as 0, and the reward of the last agent action is set as:
Figure BDA0003423610930000061
wherein B represents a certain sentence pocket; b issel+For the current correctly labeled sentence set, r+The sentence with correct label is corresponding to the relation; b issel-Current set of noisy sentences r-Indicates no relationship, i.e., an NA relationship; | represents the total number of sentences contained in the set, xjRepresenting the jth sentence in the sentence set;
s75, the optimization goal of the reinforcement learning algorithm is to maximize the expectation value of the total reward obtained by the intelligent agent, and according to the optimization goal, an optimization function is constructed as follows:
Figure BDA0003423610930000062
wherein the content of the first and second substances,
Figure BDA0003423610930000063
is represented in action set [ A ]0,A1,A2,…,An]And set of states [ S ]0,S1,S2,…,Sn]The expected value of the reward obtained by the agent, n being the total number of actions selected;
s76, according to the distance between each word in the sentence and the character of the entity, the position of the sentence text is coded to obtain the position code of the sentence text;
s77, word vectors of the words in the sentence are obtained by using a word2vec tool, then the position codes and the word vectors are spliced to obtain an input matrix of an entity relation extraction model, sentence features are extracted through convolution operation, and the formula of the convolution operation is as follows:
cij=wiqj-m+1:j,1≤i≤n
wherein, wiRepresenting entity relationship abstractionsTaking the vector of the ith convolution kernel of the model, n representing the number of convolution kernels, m representing the length of the convolution kernels, j representing the row index value of the input matrix, qi:jRepresenting a matrix of elements from the i-th to the j-th row of the input matrix, cijRepresenting the result obtained after convolution operation is carried out on a matrix formed by elements from the j-m +1 th row to the j th row of the input matrix by the ith convolution kernel, dividing the vector formed by the results of all convolution operation into a plurality of parts according to the row serial number of the vector corresponding to the entity in the input matrix, and then carrying out maximum pooling in sections to obtain the result vector of the sectional pooling;
and S78, splicing the result vectors obtained after the segmentation pooling, sending the splicing result to a softmax layer of the entity relationship extraction model, and outputting the splicing result as the probability of all relationship categories, wherein the relationship categories comprise seven categories including six defined entity relationships and no relationship (NA categories), and the corresponding relationship category with the maximum probability is the relationship classification result of the entity of the satellite frequency-track data finally extracted.
And S8, inputting the named entity information obtained in the step S4 and the corresponding sentence into the entity relationship extraction model obtained in the step S7, obtaining a correct relationship classification result of the entities in the sentence, and finishing the relationship extraction of the satellite frequency orbit data named entity.
S9, the entity extracted from the unstructured data and the relation thereof are represented by a triple, the triple is compared with the data in the triple set T, and if the triple data already exists in the triple set T, the triple data is not added; and if the data of the triples does not exist in the triple set T, adding the extracted entity and the triple data of the relationship thereof into the set T, and realizing the expansion of the structured satellite frequency-orbit data set represented in the form of the triples.
The invention has the beneficial effects that:
the invention realizes the method for extracting the satellite frequency-orbit data information based on the enhanced sample model, can conveniently complete the relational extraction of the satellite frequency-orbit data, and enriches the satellite frequency-orbit knowledge base. The invention adopts a Pipeline mode, and the scheme has high flexibility. The invention fully uses the existing structured data, solves the problem of long tail of the data and improves the accuracy of relation extraction.
Drawings
FIG. 1 is a flow chart of an implementation of a method for extracting satellite frequency-orbit data information based on an enhanced sample model according to the present invention;
FIG. 2 is an example of a BIO annotation mode annotation text in the present invention;
FIG. 3 is a schematic diagram of the components of the BERT-based named entity recognition model of the present invention.
Detailed Description
For a better understanding of the present disclosure, two examples are given herein.
The present invention will be described in detail below with reference to the accompanying drawings.
The invention discloses a technical scheme for extracting a frequency-orbit data relation of a remote supervision satellite based on reinforcement learning, aiming at the problems of noise and data long tail introduced by the traditional remote supervision. The scheme has the following characteristics: 1. identifying correct label sentences and noise sentences by using a reinforcement learning mode, and taking noise as a part of training relation extraction model of training data; 2. and introducing structured data, generating linguistic data of corresponding classes according to the texts of the sentence bag classes of the data to be supplemented, supplementing an unstructured training data set, and solving the problem of unbalanced long tails of the linguistic data. The satellite frequency orbit data refers to satellite frequency orbit data.
Example 1:
the invention discloses a satellite frequency orbit data information extraction method based on an enhanced sample model, the implementation flow of which is shown in figure 1, and the basic steps of the method comprise:
101. defining a relation set between entity types and entities;
102. extracting entities of predefined types and relations thereof from SRS database data, and establishing a triple set T;
103. marking unstructured text data by BIO, marking a prediction model by a training sequence, and completing the recognition of a satellite frequency-orbit named entity;
104. sentences containing the same entity pairs form a sentence bag, the relation of the corresponding entity pair types is marked as a sentence bag label, and the structured data is utilized to generate corpus supplement unstructured data and balance data;
105. selecting a correct class and a noise class in the packet, and training a relation classification model;
106. and (4) fusing the entities and the relations thereof in the extracted unstructured data with the set T by using the triples.
The method comprises the following specific steps:
s1, according to the task requirement of satellite frequency orbit data identification and extraction, defining the entity type of the satellite frequency orbit data, wherein the six defined entity types comprise: satellite name, satellite network ID, department of charge, orbit position, orbit type, frequency band; the entity is a satellite communication subject in satellite frequency orbit data;
s2, defining a set of relationships between entities, defining relationships between entities based on the entity types defined in step S1, where the relationships between entities are represented by triples, which specifically includes: (satellite name, belonging to, satellite network ID), (satellite name, managed, governing department), (satellite name, orbital), (orbital type, suborbital, satellite name), (satellite name, usage, frequency band) and (governing department, owning network, satellite network ID), all the relationships between entities constitute a set of relationships between entities;
s3, acquiring the frequency-orbit data of the structured satellite, and extracting the knowledge of the frequency-orbit data of the structured satellite, wherein the extraction comprises data preprocessing, entity identification and entity relation extraction;
s31, preprocessing data, namely acquiring structured satellite frequency-orbit data from an SRS database of the International telecommunication Union according to a defined entity type, selecting corresponding data of the entity type from the structured satellite frequency-orbit data, and storing the corresponding data into an entity-relation table;
s32, carrying out entity identification on the structured satellite frequency-orbit data, firstly matching corresponding data from the entity-relationship table according to the defined entity type and the relationship thereof, and selecting related entities;
s33, extracting entity relationship, namely setting the relationship between the entities defined in S2 corresponding to the entity type defined in S1 to which the entity selected in the step S32 belongs as the relationship between the entities;
s34, establishing a triple set T for each two entities by using the corresponding relation;
s4, extracting knowledge of the unstructured satellite frequency-orbit data, obtaining text data of the unstructured satellite frequency-orbit data from the Internet by a data crawling method, segmenting the text data to obtain a segmentation sequence, labeling the segmentation sequence by a BIO labeling method, and taking the labeled text as a training sentence library; fine-tuning a pre-training model based on the BERT to form a named entity recognition model based on the BERT; training a named entity recognition model based on BERT by utilizing a training sentence library;
s41, crawling and word segmentation are carried out on the unstructured satellite frequency-orbit data; respectively marking the defined entity types, namely satellite names, satellite network IDs, departments of charge, orbit positions, satellite types and frequency bands, as six types of labels A1, A2, A3, A4, A5 and A6; marking the label for the segmented sentence by using a BIO marking method to obtain a training sentence library;
s42, fine-tuning a sequence labeling layer of the pre-training model based on the BERT, namely replacing hidden layer representation of the BERT by using a full-connection layer to form a named entity recognition model based on the BERT; training a named entity recognition model based on BERT by using a training sentence library; after an input vector v of an input layer passes through a plurality of coding layers, semantic association expression of sentences in the unstructured satellite frequency-orbit data is obtained as h;
s43, outputting the probability distribution P of each moment of the word segmentation sequence under the BIO labeling mode by the sequence labeling layertThe expression of (a) is:
Pt=softmax(htW0+b0),t=1,2,...,N
wherein h istDenotes the component of h at time t, W0Weight matrix representing fully connected layers, b0Indicating the bias of the full connection layer, and softmax indicating the activation function;
s44, after the probability distribution of each moment of the word segmentation sequence is obtained, the named entity recognition model based on the BERT adopts a cross entropy loss function to train the parameters of the named entity recognition model based on the BERT so as to improve the classification prediction capability of the model; and correctly classifying each word in the word segmentation sequence by using the trained model to obtain a classification result BIO label, obtaining a complete entity name and type according to the classification result BIO label, and finally completing entity identification of the satellite frequency-orbit data.
S5, according to the classification result of the step S44, a sentence containing the entity type defined in the step S1 is screened out; in the screened sentences, for the sentences containing the entities with the same entity type, packaging the sentences to be used as a sentence bag, and marking the entity relationship among the entities in the sentences as a sentence bag label;
s6, the entity types and the relation thereof extracted in the step S3 are used for supplementing the sentence bag data in the step S5, the number of sentence bags is increased, the number of sentence bags under different entity relations is balanced, and the bias of an entity relation extraction model caused by the problem of long tail of a data set is solved;
the step S6 specifically includes:
s61, calculating the number of the sentence pockets under each entity relationship, and finding out the median of the number of the sentence pockets under all the entity relationships;
s62, for the entity relation that the number of sentence pockets is less than the median, increasing the number of sentence pockets under the entity relation; entities contained in sentences in the existing sentence pockets of the entity relationship needing to increase the number of the sentence pockets are deleted, and the corresponding data of the entity type extracted in the step S3 is filled in the deletion positions of the sentence pockets to be used as new sentence pockets under the entity relationship, so that the number of the sentence pockets under the entity relationship with the number of the sentence pockets smaller than the median value is increased, and the data volume balance under different relationship types is achieved.
S7, constructing an entity relationship extraction model, firstly screening noise sentences and correct label sentences in the sentence bags by using a reinforcement learning algorithm, and then training the entity relationship extraction model by using the correct label sentences and the noise sentences; the entity relation extraction model is realized by a segmented convolution neural network;
the step S7 includes the following steps:
s71, if the relation between the entities contained in the sentence pocket is the sentence pocket label of the sentence pocket, defining the sentence as a correct label sentence; if the relation between the entities contained in the sentence pocket is not the sentence pocket label of the sentence pocket, defining that the sentence is a noise sentence; sentences in the sentence pocket and sentence pocket labels are used as input of a reinforcement learning algorithm;
s72, setting the agent of reinforcement learning algorithm as the filter of correct label sentence or noise sentence, setting the action A of agent to the ith sentenceiThe method comprises two types, namely, marking the sentence as 1 for judging the sentence as a correct label sentence, and marking the sentence as-1 for judging the sentence as a noise sentence; wherein i is the serial number of the sentence in the input sentence bag, Ai∈{1,-1},AiThe expression of the action selection policy function of (1) is:
Figure BDA0003423610930000111
wherein, pi (A)i|Si(ii) a θ) represents the state SiDown selection action AiProbability of (S)iRepresenting the state of the agent during the ith selection, theta represents the parameter to be learned of the agent, sigma (·) represents a sigmoid function, and W and b respectively represent a weight matrix and bias to be learned;
s73, defining the state S of the agent as a vector formed by splicing the vector representation of the sentence with correct selected relation label, the vector representation of the selected noise sentence, the vector representation of the current sentence and the vector representation of the entity pair corresponding to the current sentence;
s74, after the agent takes corresponding action to each sentence in the sentence bag, the agent gets corresponding reward according to the action, the reward value of the action before the agent takes the last action is set as 0, and the reward of the last agent action is set as:
Figure BDA0003423610930000112
wherein B represents a certain sentence pocket; b issel+For the current correctly labeled sentence set, r+The sentence with correct label is corresponding to the relation; b issel-Current set of noisy sentences r-Indicates no relationship, i.e., an NA relationship; | represents the total number of sentences contained in the set;
s75, the optimization goal of the reinforcement learning algorithm is to maximize the expectation value of the total reward obtained by the intelligent agent, and according to the optimization goal, an optimization function is constructed as follows:
Figure BDA0003423610930000121
wherein the content of the first and second substances,
Figure BDA0003423610930000122
is represented in action set [ A ]0,A1,A2,…,An]And set of states [ S ]0,S1,S2,…,Sn]The expected value of the reward obtained by the agent, n being the total number of actions selected;
s76, according to the distance between each word in the sentence and the character of the entity, the position of the sentence text is coded to obtain the position code of the sentence text;
s77, word vectors of the words in the sentence are obtained by using a word2vec tool, then the position codes and the word vectors are spliced to obtain an input matrix of an entity relation extraction model, sentence features are extracted through convolution operation, and the formula of the convolution operation is as follows:
cij=wiqj-m+1:j,1≤i≤n
wherein, wiA vector representing the ith convolution kernel of the entity-relationship extraction model, n representing the number of convolution kernels, m representing the length of the convolution kernels, j representing the row index value of the input matrix, qi:jRepresenting a matrix of elements from the i-th to the j-th row of the input matrix, cijThe matrix formed by the elements from the j-m +1 th row to the j th row of the input matrix representing the ith convolution kernel is obtained after convolution operationThe result of (c) is a vector formed by the results of all convolution operations, and the resulting vector is divided into three parts [ c ] according to the row number of the vector corresponding to the entity in the input matrixi1,ci2,ci3]Then, the maximization pooling is carried out in a segmentation way to obtain a result vector of the segmentation pooling,
pij=max(cij)1≤i≤n,1≤j≤3,
wherein p isijRepresents the results after maximum pooling;
and S78, splicing the result vectors obtained after the segmentation pooling, sending the splicing result to a softmax layer of the entity relationship extraction model, and outputting the splicing result as the probability of all relationship categories, wherein the relationship categories comprise seven categories including six defined entity relationships and no relationship (NA categories), and the corresponding relationship category with the maximum probability is the relationship classification result of the entity of the satellite frequency-track data finally extracted.
And S8, inputting the named entity information obtained in the step S4 and the corresponding sentence into the entity relationship extraction model obtained in the step S7, obtaining a correct relationship classification result of the entities in the sentence, and finishing the relationship extraction of the satellite frequency orbit data named entity.
S9, the entity extracted from the unstructured data and the relation thereof are represented by a triple, the triple is compared with the data in the triple set T, and if the triple data already exists in the triple set T, the triple data is not added; and if the data of the triples does not exist in the triple set T, adding the extracted entity and the triple data of the relationship thereof into the set T, and realizing the expansion of the structured satellite frequency-orbit data set represented in the form of the triples.
Example 2:
as shown in fig. 1, the present invention describes a method for extracting satellite frequency-orbit information, which comprises the following specific steps:
s1, defining entity types, wherein according to task requirements, defining six types of entity types comprises: satellite name, satellite network ID, department of charge, orbit position, orbit type, frequency band;
s2, defining a set of relationships among entities, wherein the defined relationships among entities include, on the basis of the entity types defined in the step S1: (satellite name, belonging to, satellite network ID), (satellite name, managed, governing department), (satellite name, orbital), (orbital type, suborbital, satellite name), (satellite name, usage, frequency band), (governing department, owning network, satellite network ID);
s3, extracting the frequency and orbit data knowledge of the structured satellite, which mainly comprises the steps of preprocessing the structured data, identifying entities and extracting entity relations;
s3-1, data preprocessing is to select corresponding entity type data from an SRS database according to predefined entity types and store the entity type data into an Excel document;
s3-2, the frequency-track data entity identification method is that firstly, corresponding row and column data are matched from Excel according to the defined entity type and attribute thereof, and relevant entity nodes are selected;
s3-3, the entity relation extraction method is that the entity node selected from the database matches the relation between the corresponding entities according to the entity type represented by the corresponding column where the entity node is located and the relation set defined in the step S2;
s3-4, establishing a triple set T for each entity pair by using the corresponding relation;
s4, in the unstructured satellite frequency and orbit data recognition and extraction stage, firstly crawling unstructured text data, after word segmentation, marking the crawled and word segmented data by using a BIO marking method, using the marked text as a training sentence library, training a BERT-based named entity recognition model by using the training sentence library, and finally completing the satellite frequency and orbit data named entity recognition:
s4-1, firstly, crawling unstructured data about satellite frequency orbit knowledge and performing word segmentation; the categories of defining named entities, such as satellite names, satellite network IDs, departments in charge, orbit positions, satellite types and frequency bands, are respectively marked as six categories of A1, A2, A3, A4, A5 and A6; labeling the sentences in the training data set with labels by using a BIO labeling method as a training sentence library, as shown in FIG. 2;
s4-2, the BERT pre-training model can learn semantic association of texts, and the model is adjusted to adapt to entity recognition tasks. The whole structure comprises an input layer, a coding layer and a sequence marking layer; training by using a self-built training sentence library; the input layer represents that v is the superposition of input word vectors, block vectors and position vectors; v, learning through multiple layers of transformers to obtain semantic association of sentences, wherein the semantic association of the sentences is expressed as h;
s4-3, outputting the probability distribution P of each moment of the input sequence by the sequence annotation layer under the BIO annotation methodt
Pt=softmax(htW0+b0),t=1,2,...,N
Wherein h istDenotes the component of h at time t, W0Weight representing full connection, b0Represents the bias of the fully-connected layer;
s4-4, after the classification probability distribution corresponding to each word is obtained, model parameters are learned through a cross entropy loss function, and the classification prediction capability of the model is improved; the trained model can correctly classify each word, and complete entity names and types can be obtained according to the classification result BIO labels; and finally, the goal of the satellite frequency orbit data named entity recognition is achieved.
FIG. 3 is a schematic diagram of the components of the BERT-based named entity recognition model of the present invention.
S5, selecting sentences containing predefined entity types, packing the sentences containing the same named entity pairs as a sentence bag, and marking the relationship of the corresponding entity pair types as sentence bag labels.
S6, the entity extracted in S3 and the relation knowledge thereof are used for supplementing sentence bag data in S5, the number of sentence bags is increased, the data is balanced, and the bias of the model caused by the problem of long tail of the data set is solved, and the method specifically comprises the following steps:
s6-1, calculating the number of sentence pockets under each relation category, and finding out the median of the number;
s6-2, for the relation category of which the number of sentence pockets is less than the median, increasing the number of sentence pockets under the relation category; and for the relation needing to increase the number of the sentence bags, filling the entity of the relation extracted in the S3 according to the position of the text entity in the existing sentence bag, so as to achieve the balance of data volume under different relation types.
S7, the method comprises the steps of screening noise sentences and correct label sentences in sentence bags by means of reinforcement learning, training a segmented convolutional neural network simultaneously by means of the correct label sentences and the noise sentences, reducing the influence of noise caused by remote supervision, and increasing the accuracy of an entity relationship extraction model, and comprises the following specific steps:
s7-1, defining the sentence in the sentence pocket as a noise sentence, wherein the actual entity relationship is different from the label of the sentence pocket; otherwise, the sentence bag relation label is defined as correct label data, and the sentences in the sentence bags and the sentence bag relation label are input into a reinforcement learning algorithm;
s7-2, setting the agent as a correct label sentence or noise sentence filter, action A of the agentiThe method comprises two types, namely, the first type is to judge that the relation label of the sentence is correct and mark the sentence as 1, and the second type is to judge that the relation label of the sentence is incorrect and mark the sentence as-1, wherein the relation label of the sentence is regarded as a noise sentence; wherein i is the sequence number of the sentence in the input sentence bag, Ai belongs to {1, -1}, and the action selection policy function pi of AiθComprises the following steps:
Figure BDA0003423610930000151
wherein σ (·) represents a sigmoid function, whose parameters are (W, b);
s7-3, defining the state S of the agent as: vector average of selected correct label sentences, vector average of selected noise sentences, vector representation of current sentences and vector splicing of corresponding entities into vectors S7-4, and the intelligent agent can obtain reward after acting on sentences in each sentence bag; the reward value for the previous action of the agent is 0 and the reward for the last action is set to:
Figure BDA0003423610930000152
wherein B represents a certain sentence pocket; b issel+For the current correct set of tagged sentences, r+For correctly labeled sentencesA relationship; b issel-For the current set of noisy sentences r-Is in the NA relationship; | represents the total number of sentences in the set; the influence of correct label sentences and noise sentences is comprehensively considered in the reward setting, and model training can be more effectively guided;
s7-5, the optimization objective of the reinforcement learning algorithm is to maximize the expectation of the total reward obtained by the agent, according to which the optimization function is defined as:
Figure BDA0003423610930000161
s7-6, performing position coding on the text data according to the distance between each word in the sentence and the entity, wherein, if the entity of the sentence 'the orbit of the wind cloud number four 01 star is 99.5 degrees of east longitude' is the wind cloud number four 01 star and the east longitude is 99.5 degrees; the position of the sentence text is encoded as: [0,1,2,3,4] and [ -4, -3, -2, -1, 0 ];
s7-7, obtaining word vectors by using word2vec for words in sentences, splicing position codes and the word vectors, and extracting features through convolution; the convolution operation formula is:
cij=wiqj-m+1:j 1≤i≤n
where w represents the convolution kernel, n represents the number of convolution kernels, m represents the length of the convolution kernel, j represents the row index of the input vector, qi:jRepresenting a slave sequence qiTo qjElement (c) ofijRepresenting the result after convolution. The convolved result is divided into three parts ci1,ci2,ci3]Then carrying out segmented pooling;
pij=max(cij)1≤i≤n,1≤j≤3
and S7-8, splicing the pooled vectors and then sending the spliced vectors to a softmax layer, outputting the probability of all relation categories, including seven categories including six predefined relations and no relation (NA category), wherein the category corresponding to the maximum probability is the relation classification of the finally extracted satellite frequency-orbit data entity.
And S8, inputting the named entity information obtained in the S4 and the corresponding sentence into the relation extraction model trained in the S7 to obtain correct relation classification, and finishing the relation extraction of the satellite frequency orbit data named entity.
S9, comparing the newly extracted entity and the relation triple with the data in the triple set T, if the original set has the triple, not adding; and if the original set does not have the triple, adding the triple into the set T, and realizing the expansion of the structured data set represented in the form of the triple.
The foregoing is illustrative of the present application and is not to be construed as limiting thereof. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (7)

1. A satellite frequency orbit data information extraction method based on an enhanced sample model is characterized by comprising the following specific steps:
s1, defining entity types of the satellite frequency and orbit data according to the task requirements of the satellite frequency and orbit data information extraction, wherein the six defined entity types comprise: satellite name, satellite network ID, department of charge, orbit position, orbit type, frequency band; the entity is a satellite communication subject in satellite frequency orbit data;
s2, defining a relation set among the entities, defining the relation among the entities on the basis of the entity type defined in the step S1, wherein the relation among the entities is represented by a triple;
s3, acquiring the frequency-orbit data of the structured satellite, and extracting the knowledge of the frequency-orbit data of the structured satellite, wherein the extraction comprises data preprocessing, entity identification and entity relation extraction;
s4, extracting knowledge of the unstructured satellite frequency-orbit data, obtaining text data of the unstructured satellite frequency-orbit data from the Internet by a data crawling method, segmenting the text data to obtain a segmentation sequence, labeling the segmentation sequence by a BIO labeling method, and taking the labeled text as a training sentence library; fine-tuning a pre-training model based on the BERT to form a named entity recognition model based on the BERT; training a named entity recognition model based on BERT by utilizing a training sentence library; correctly classifying each word in the word sequence by using a trained and BERT-based named entity recognition model;
s5, according to the classification result of the step S4, a sentence containing the entity type defined in the step S1 is screened out; in the screened sentences, for the sentences containing the entities with the same entity type, packaging the sentences to be used as a sentence bag, and marking the entity relationship among the entities in the sentences as a sentence bag label;
s6, the entity types and the relations thereof extracted in the step S3 are used for supplementing the sentence bag data in the step S5, the number of sentence bags is increased, and the number of sentence bags under different entity relations is balanced;
s7, constructing an entity relationship extraction model, firstly screening noise sentences and correct label sentences in the sentence bags by using a reinforcement learning algorithm, and then training the entity relationship extraction model by using the correct label sentences and the noise sentences;
s8, inputting the named entity information obtained in the step S4 and the corresponding sentences into the entity relationship extraction model obtained in the step S7, obtaining a correct relationship classification result of the entities in the sentences, and completing the relationship extraction of the satellite frequency orbit data named entities;
s9, the entity extracted from the unstructured data and the relation thereof are represented by a triple, the triple is compared with the data in the triple set T, and if the triple data already exists in the triple set T, the triple data is not added; and if the data of the triples does not exist in the triple set T, adding the extracted entity and the triple data of the relationship thereof into the set T, and realizing the expansion of the structured satellite frequency-orbit data set represented in the form of the triples.
2. The method for extracting satellite frequency-orbit data information based on the enhanced sample model as claimed in claim 1, wherein the relationship between the entities specifically includes: (satellite name, belonging to, satellite network ID), (satellite name, managed, governing department), (satellite name, orbital), (orbital type, orbital, satellite name), (satellite name, usage, frequency band) and (governing department, owning network, satellite network ID), all the relationships between entities constitute a set of relationships between entities.
3. The method as claimed in claim 1, wherein the entity relationship extraction model is implemented by a segmented convolutional neural network.
4. The method of claim 1, wherein the method for extracting satellite frequency orbit data information based on the enhanced sample model,
the step S3 specifically includes:
s31, preprocessing data, namely acquiring structured satellite frequency-orbit data from an SRS database of the International telecommunication Union according to a defined entity type, selecting corresponding data of the entity type from the structured satellite frequency-orbit data, and storing the corresponding data into an entity-relation table;
s32, carrying out entity identification on the structured satellite frequency-orbit data, firstly matching corresponding data from the entity-relationship table according to the defined entity type and the relationship thereof, and selecting related entities;
s33, extracting entity relationship, namely setting the relationship between the entities defined in S2 corresponding to the entity type defined in S1 to which the entity selected in the step S32 belongs as the relationship between the entities;
and S34, establishing a triple set T for each two entities by using the corresponding relation.
5. The method of claim 1, wherein the method for extracting satellite frequency orbit data information based on the enhanced sample model,
the step S4 specifically includes:
s41, crawling and word segmentation are carried out on the unstructured satellite frequency-orbit data; respectively marking the defined entity types, namely satellite names, satellite network IDs, departments of charge, orbit positions, satellite types and frequency bands, as six types of labels A1, A2, A3, A4, A5 and A6; marking the label for the segmented sentence by using a BIO marking method to obtain a training sentence library;
s42, fine-tuning a sequence labeling layer of the pre-training model based on the BERT, namely replacing hidden layer representation of the BERT by using a full-connection layer to form a named entity recognition model based on the BERT; training a named entity recognition model based on BERT by using a training sentence library; after an input vector v of an input layer passes through a plurality of coding layers, semantic association expression of sentences in the unstructured satellite frequency-orbit data is obtained as h;
s43, outputting the probability distribution P of each moment of the word segmentation sequence under the BIO labeling mode by the sequence labeling layertThe expression of (a) is:
Pt=softmax(htW0+b0),t=1,2,...,N
wherein h istDenotes the component of h at time t, W0Weight matrix representing fully connected layers, b0Indicating the bias of the full connection layer, and softmax indicating the activation function;
s44, after the probability distribution of each moment of the word segmentation sequence is obtained, the named entity recognition model based on the BERT adopts a cross entropy loss function to train the parameters of the named entity recognition model based on the BERT so as to improve the classification prediction capability of the model; and correctly classifying each word in the word segmentation sequence by using the trained model to obtain a classification result BIO label, obtaining a complete entity name and type according to the classification result BIO label, and finally completing entity identification of the satellite frequency-orbit data.
6. The method of claim 1, wherein the method for extracting satellite frequency orbit data information based on the enhanced sample model,
the step S6 specifically includes:
s61, calculating the number of the sentence pockets under each entity relationship, and finding out the median of the number of the sentence pockets under all the entity relationships;
s62, for the entity relation that the number of sentence pockets is less than the median, increasing the number of sentence pockets under the entity relation; entities contained in sentences in the existing sentence pockets of the entity relationship needing to increase the number of the sentence pockets are deleted, and the corresponding data of the entity type extracted in the step S3 is filled in the deletion positions of the sentence pockets to be used as new sentence pockets under the entity relationship, so that the number of the sentence pockets under the entity relationship with the number of the sentence pockets smaller than the median value is increased, and the balance of the number of the sentence pockets under different entity relationships is achieved.
7. The method of claim 1, wherein the method for extracting satellite frequency orbit data information based on the enhanced sample model,
the step S7 includes the following steps:
s71, if the relation between the entities contained in the sentence pocket is the sentence pocket label of the sentence pocket, defining the sentence as a correct label sentence; if the relation between the entities contained in the sentence pocket is not the sentence pocket label of the sentence pocket, defining that the sentence is a noise sentence; sentences in the sentence pocket and sentence pocket labels are used as input of a reinforcement learning algorithm;
s72, setting the agent of reinforcement learning algorithm as the filter of correct label sentence or noise sentence, setting the action A of agent to the ith sentenceiThe method comprises two types, namely, marking the sentence as 1 for judging the sentence as a correct label sentence, and marking the sentence as-1 for judging the sentence as a noise sentence; wherein i is the serial number of the sentence in the input sentence bag, Ai∈{1,-1},AiThe expression of the action selection policy function of (1) is:
Figure FDA0003423610920000041
wherein, pi (A)i|Si(ii) a θ) represents the state SiDown selection action AiProbability of (S)iRepresenting the state of the agent during the ith selection, theta represents the parameter to be learned of the agent, sigma (·) represents a sigmoid function, and W and b respectively represent a weight matrix and bias to be learned;
s73, defining the state S of the agent as a vector formed by splicing the vector representation of the sentence with correct selected relation label, the vector representation of the selected noise sentence, the vector representation of the current sentence and the vector representation of the entity pair corresponding to the current sentence;
s74, after the agent takes corresponding action to each sentence in the sentence bag, the agent gets corresponding reward according to the action, the reward value of the action before the agent takes the last action is set as 0, and the reward of the last agent action is set as:
Figure FDA0003423610920000051
wherein B represents a certain sentence pocket; b issel+For the current correctly labeled sentence set, r+The sentence with correct label is corresponding to the relation; b issel-Current set of noisy sentences r-Indicates no relationship, i.e., an NA relationship; | represents the total number of sentences contained in the set; x is the number ofjRepresenting the jth sentence in the sentence set;
s75, the optimization goal of the reinforcement learning algorithm is to maximize the expectation value of the total reward obtained by the intelligent agent, and according to the optimization goal, an optimization function is constructed as follows:
Figure FDA0003423610920000052
wherein the content of the first and second substances,
Figure FDA0003423610920000053
is represented in action set [ A ]0,A1,A2,…,An]And set of states [ S ]0,S1,S2,…,Sn]The expected value of the reward obtained by the agent, n being the total number of actions selected;
s76, according to the distance between each word in the sentence and the character of the entity, the position of the sentence text is coded to obtain the position code of the sentence text;
s77, word vectors of the words in the sentence are obtained by using a word2vec tool, then the position codes and the word vectors are spliced to obtain an input matrix of an entity relation extraction model, sentence features are extracted through convolution operation, and the formula of the convolution operation is as follows:
cij=wiqj-m+1:j,1≤i≤n
wherein, wiA vector representing the ith convolution kernel of the entity-relationship extraction model, n representing the number of convolution kernels, m representing the length of the convolution kernels, j representing the row index value of the input matrix, qi:jRepresenting a matrix of elements from the i-th to the j-th row of the input matrix, cijRepresenting the result obtained after convolution operation is carried out on a matrix formed by elements from the j-m +1 th row to the j th row of the input matrix by the ith convolution kernel, dividing the vector formed by the results of all the convolution operation into a plurality of parts according to the row serial number of the vector corresponding to the entity in the input matrix, and then carrying out maximum pooling in sections to obtain the result vector of the sectional pooling;
and S78, splicing the result vectors obtained after the segmentation pooling, sending the splicing result to a softmax layer of the entity relationship extraction model, and outputting the splicing result as the probability of all relationship categories, wherein the relationship categories comprise seven categories including six defined entity relationships and no relationship, and the corresponding relationship category with the maximum probability is the relationship classification result of the entities of the satellite frequency-track data finally extracted.
CN202111570758.5A 2021-12-21 2021-12-21 Satellite frequency-orbit data information extraction method based on enhanced sample model Active CN114238524B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111570758.5A CN114238524B (en) 2021-12-21 2021-12-21 Satellite frequency-orbit data information extraction method based on enhanced sample model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111570758.5A CN114238524B (en) 2021-12-21 2021-12-21 Satellite frequency-orbit data information extraction method based on enhanced sample model

Publications (2)

Publication Number Publication Date
CN114238524A true CN114238524A (en) 2022-03-25
CN114238524B CN114238524B (en) 2022-05-31

Family

ID=80760213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111570758.5A Active CN114238524B (en) 2021-12-21 2021-12-21 Satellite frequency-orbit data information extraction method based on enhanced sample model

Country Status (1)

Country Link
CN (1) CN114238524B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114625880A (en) * 2022-05-13 2022-06-14 上海帜讯信息技术股份有限公司 Character relation extraction method, device, terminal and storage medium
CN116384385A (en) * 2023-04-14 2023-07-04 中国人民解放军军事科学院系统工程研究院 Satellite frequency orbit entity relation extraction method based on dynamic ensemble learning

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101124561A (en) * 2003-12-08 2008-02-13 Divx公司 Multimedia distribution system
CN109992629A (en) * 2019-02-28 2019-07-09 中国科学院计算技术研究所 A kind of neural network Relation extraction method and system of fusion entity type constraint
CN110019839A (en) * 2018-01-03 2019-07-16 中国科学院计算技术研究所 Medical knowledge map construction method and system based on neural network and remote supervisory
CN110807069A (en) * 2019-10-23 2020-02-18 华侨大学 Entity relationship joint extraction model construction method based on reinforcement learning algorithm
CN111241294A (en) * 2019-12-31 2020-06-05 中国地质大学(武汉) Graph convolution network relation extraction method based on dependency analysis and key words
CN111859912A (en) * 2020-07-28 2020-10-30 广西师范大学 PCNN model-based remote supervision relationship extraction method with entity perception
CN111914558A (en) * 2020-07-31 2020-11-10 湖北工业大学 Course knowledge relation extraction method and system based on sentence bag attention remote supervision
CN112084790A (en) * 2020-09-24 2020-12-15 中国民航大学 Relation extraction method and system based on pre-training convolutional neural network
CN112329463A (en) * 2020-11-27 2021-02-05 上海汽车集团股份有限公司 Training method of remote monitoring relation extraction model and related device
CN112347268A (en) * 2020-11-06 2021-02-09 华中科技大学 Text-enhanced knowledge graph joint representation learning method and device
US20210216880A1 (en) * 2019-01-02 2021-07-15 Ping An Technology (Shenzhen) Co., Ltd. Method, equipment, computing device and computer-readable storage medium for knowledge extraction based on textcnn
WO2021170085A1 (en) * 2020-02-27 2021-09-02 京东方科技集团股份有限公司 Tagging method, relationship extraction method, storage medium and operation apparatus
CN113392216A (en) * 2021-06-23 2021-09-14 武汉大学 Remote supervision relation extraction method and device based on consistency text enhancement
WO2021190236A1 (en) * 2020-03-23 2021-09-30 浙江大学 Entity relation mining method based on biomedical literature
CN113591478A (en) * 2021-06-08 2021-11-02 电子科技大学 Remote supervision text entity relation extraction method based on deep reinforcement learning

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101124561A (en) * 2003-12-08 2008-02-13 Divx公司 Multimedia distribution system
CN110019839A (en) * 2018-01-03 2019-07-16 中国科学院计算技术研究所 Medical knowledge map construction method and system based on neural network and remote supervisory
US20210216880A1 (en) * 2019-01-02 2021-07-15 Ping An Technology (Shenzhen) Co., Ltd. Method, equipment, computing device and computer-readable storage medium for knowledge extraction based on textcnn
CN109992629A (en) * 2019-02-28 2019-07-09 中国科学院计算技术研究所 A kind of neural network Relation extraction method and system of fusion entity type constraint
CN110807069A (en) * 2019-10-23 2020-02-18 华侨大学 Entity relationship joint extraction model construction method based on reinforcement learning algorithm
CN111241294A (en) * 2019-12-31 2020-06-05 中国地质大学(武汉) Graph convolution network relation extraction method based on dependency analysis and key words
WO2021170085A1 (en) * 2020-02-27 2021-09-02 京东方科技集团股份有限公司 Tagging method, relationship extraction method, storage medium and operation apparatus
WO2021190236A1 (en) * 2020-03-23 2021-09-30 浙江大学 Entity relation mining method based on biomedical literature
CN111859912A (en) * 2020-07-28 2020-10-30 广西师范大学 PCNN model-based remote supervision relationship extraction method with entity perception
CN111914558A (en) * 2020-07-31 2020-11-10 湖北工业大学 Course knowledge relation extraction method and system based on sentence bag attention remote supervision
CN112084790A (en) * 2020-09-24 2020-12-15 中国民航大学 Relation extraction method and system based on pre-training convolutional neural network
CN112347268A (en) * 2020-11-06 2021-02-09 华中科技大学 Text-enhanced knowledge graph joint representation learning method and device
CN112329463A (en) * 2020-11-27 2021-02-05 上海汽车集团股份有限公司 Training method of remote monitoring relation extraction model and related device
CN113591478A (en) * 2021-06-08 2021-11-02 电子科技大学 Remote supervision text entity relation extraction method based on deep reinforcement learning
CN113392216A (en) * 2021-06-23 2021-09-14 武汉大学 Remote supervision relation extraction method and device based on consistency text enhancement

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
YUJIN YUAN 等: ""Cross-Relation Cross-Bag Attention for Distantly-Supervised Relation Extraction"", 《HTTPS://DOI.ORG/10.1609/AAAI.V33I01.3301419》 *
梁家熙: ""基于深度学习的中文信息抽取算法研究"", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *
王丽客等: "基于远程监督的藏文实体关系抽取", 《中文信息学报》 *
王嘉宁等: "基于远程监督的关系抽取技术", 《华东师范大学学报(自然科学版)》 *
黄胜等: "基于深度学习的简历信息实体抽取方法", 《计算机工程与设计》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114625880A (en) * 2022-05-13 2022-06-14 上海帜讯信息技术股份有限公司 Character relation extraction method, device, terminal and storage medium
CN116384385A (en) * 2023-04-14 2023-07-04 中国人民解放军军事科学院系统工程研究院 Satellite frequency orbit entity relation extraction method based on dynamic ensemble learning
CN116384385B (en) * 2023-04-14 2023-09-26 中国人民解放军军事科学院系统工程研究院 Satellite frequency orbit entity relation extraction method based on dynamic ensemble learning

Also Published As

Publication number Publication date
CN114238524B (en) 2022-05-31

Similar Documents

Publication Publication Date Title
CN111753024B (en) Multi-source heterogeneous data entity alignment method oriented to public safety field
CN112347268A (en) Text-enhanced knowledge graph joint representation learning method and device
CN112507699B (en) Remote supervision relation extraction method based on graph convolution network
CN108182295A (en) A kind of Company Knowledge collection of illustrative plates attribute extraction method and system
CN110263323A (en) Keyword abstraction method and system based on the long Memory Neural Networks in short-term of fence type
CN114238524B (en) Satellite frequency-orbit data information extraction method based on enhanced sample model
CN112559766B (en) Legal knowledge map construction system
CN112434535B (en) Element extraction method, device, equipment and storage medium based on multiple models
CN111522965A (en) Question-answering method and system for entity relationship extraction based on transfer learning
CN114896388A (en) Hierarchical multi-label text classification method based on mixed attention
CN116127090B (en) Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
CN114419642A (en) Method, device and system for extracting key value pair information in document image
CN115526236A (en) Text network graph classification method based on multi-modal comparative learning
CN115687609A (en) Zero sample relation extraction method based on Prompt multi-template fusion
CN114065702A (en) Event detection method fusing entity relationship and event element
CN116010581A (en) Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene
CN115496072A (en) Relation extraction method based on comparison learning
CN111209362A (en) Address data analysis method based on deep learning
CN113962224A (en) Named entity recognition method and device, equipment, medium and product thereof
CN112966057A (en) Knowledge graph construction method, knowledge graph construction system, information processing system, terminal and medium
CN112148879A (en) Computer readable storage medium for automatically labeling code with data structure
CN115423105A (en) Pre-training language model construction method, system and device
CN115204171A (en) Document-level event extraction method and system based on hypergraph neural network
CN113806537A (en) Commodity category classification method and device, equipment, medium and product thereof
CN115204179A (en) Entity relationship prediction method and device based on power grid public data model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant