CN115577713A - Text processing method based on knowledge graph - Google Patents
Text processing method based on knowledge graph Download PDFInfo
- Publication number
- CN115577713A CN115577713A CN202211565438.5A CN202211565438A CN115577713A CN 115577713 A CN115577713 A CN 115577713A CN 202211565438 A CN202211565438 A CN 202211565438A CN 115577713 A CN115577713 A CN 115577713A
- Authority
- CN
- China
- Prior art keywords
- entity
- taa
- target text
- tbb
- idaa
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The application relates to the technical field of electric digital data processing, in particular to a text processing method based on a knowledge graph. The method comprises the following steps: s100, acquiring an entity set { A, B } in a target text; s200, acquiring a triple set { TAA, TBB, TAB } of the target text; s300, if TAA is not equal to ∅, acquiring a composition entity ZA corresponding to a subject of the target text according to IDAA; s400, if TBB is not equal to ∅, acquiring a component entity ZB corresponding to the object of the target text according to IDBB; s500, obtaining a target triple T = (X) of a target text 1 ,EAB,X 2 ). The method and the device improve the accuracy of obtaining the semantic relation of the target text.
Description
Technical Field
The invention relates to the technical field of electric digital data processing, in particular to a text processing method based on a knowledge graph.
Background
The existing entity identification method can automatically identify entities in a sentence, the existing relationship extraction method can automatically identify a certain semantic relationship among the entities in the sentence, for example, the black mobile phone 14 is the classic sentence, the three entities of the black mobile phone 14 and the classic sentence in the sentence can be identified by the entity identification method, and the semantic relationship existing in the sentence can be identified by the relationship extraction method as follows: black is the color of the cell phone 14 and the cell phone 14 is classic. However, the above statement means that the black cell phone 14 is classic and it is unknown whether the other cell phones 14 are classic or not. Therefore, when the structure of a sentence is complex, the semantic relationship obtained by using the existing relationship extraction method may not be accurate. How to improve the accuracy of obtaining the semantic relation of the statement is an urgent problem to be solved.
Disclosure of Invention
The invention aims to provide a text processing method based on a knowledge graph so as to improve the accuracy of obtaining the semantic relation of a target text.
According to the invention, the text processing method based on the knowledge graph is provided, and comprises the following steps:
s100, acquiring an entity set { A, B }, A = (A) in the target text 1 ,A 2 ,…,A n ,…,A N ),A n The method comprises the following steps that the number of nth entities identified from front to back in a subject of a target text is N, the value range of N is 1 to N, N is the number of the entities included in the subject of the target text, and N is more than or equal to 1; b = (B) 1 ,B 2 ,…,B m ,…,B M ),B m The object is the M-th entity recognized from front to back in the object of the target text, the value range of M is 1 to M, M is the number of entities included in the object of the target text, and M is more than or equal to 1; the target text is a sentence comprising a subject, a predicate and an object.
S200, acquiring a triple set { TAA, TBB, TAB } of the target text, wherein:
TAA = ∅ or TAA = { TAA = { (TAA) 1 ,TAA 2 ,…,TAA i ,…,TAA QA }≠∅,TAA i The ith triple corresponding to the subject of the target text, the value range of i is 1 to QA, QA is the triple corresponding to the subject of the target textThe number of groups; TAA i =(A i,1 ,EA i ,A i,2 ),A i,1 Is TAA i Comprising a first entity, A i,2 Is TAA i Comprising a second entity, EA i Is an entity A i,1 And entity A i,2 The relationship between them.
TBB = ∅ or TBB = { TBB = { (TBB) } 1 ,TBB 2 ,…,TBB j ,…,TBB QB }≠∅,TBB j The j is the jth triple corresponding to the object of the target text, the value range of j is 1 to QB, and QB is the triple quantity corresponding to the object of the target text; TBB j =(B j,1 ,EB j ,B j,2 ),B j,1 Is TBB j Including a first entity, B j,2 Is TBB j Including a second entity, EB j Is an entity B j,1 And entity B j,2 The relationship between them.
TAB=(A N ,EAB,B M ),A N Is the Nth entity in the subject of the target text, B M The Mth entity in the object of the target text, EAB is entity A N And entity B M The relationship between them.
S300, if TAA is not equal to ∅, acquiring a composition entity ZA corresponding to a subject of the target text according to IDAA; IDAA = (IDAA) 1 ,IDAA 2 ,…,IDAA i ,…,IDAA QA ),IDAA i Is TAA i Is used for uniquely identifying the TAA i 。
S400, if TBB is not equal to ∅, acquiring a component entity ZB corresponding to the object of the target text according to IDBB; IDBB = (IDBB) 1 ,IDBB 2 ,…,IDBB j ,…,IDBB QB ),IDBB j Is TBB j The number of (2) is used for uniquely identifying the TBB j 。
S500, obtaining a target triple T = (X) of the target text 1 ,EAB,X 2 ) When TAA ≠ ∅, X 1 = ZA; when TAA = ∅ and N =1, X 1 =A 1 (ii) a When TBB ≠ ∅, X 2 = ZB; when TBB = ∅ and M =1, X 2 =B 1 。
Compared with the prior art, the method has obvious beneficial effects, and by means of the technical scheme, the method for processing the text based on the knowledge graph can achieve considerable technical progress and practicability, has wide industrial utilization value, and at least has the following beneficial effects:
the method acquires all entities in the target text and all triple sets of the target text, wherein the triple sets may have a triple which is an entity in the subject and a triple which is an entity in the object; for such triples, the corresponding number of the triples is used as an entity for further constructing the triples, and the number represents the whole triples corresponding to the triples, so that the complex syntactic structure in the target text can be accurately represented based on the triples, the problem that the semantic relation of the sentences with the complex syntactic structure may be inaccurate when the relation extraction method is used for obtaining the semantic relation of the sentences with the complex syntactic structure in the prior art is solved, and the accuracy of obtaining the semantic relation of the target text is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a method for processing a knowledge-graph-based text according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
According to the invention, the invention provides a text processing method based on a knowledge graph, which comprises the following steps:
s100, acquiring an entity set { A, B }, A = (A) in a target text 1 ,A 2 ,…,A n ,…,A N ),A n The method comprises the following steps that the number of nth entities identified from front to back in a subject of a target text is N, the value range of N is 1 to N, N is the number of the entities included in the subject of the target text, and N is more than or equal to 1; b = (B) 1 ,B 2 ,…,B m ,…,B M ),B m The object is the M-th entity recognized from front to back in the object of the target text, the value range of M is 1 to M, M is the number of entities included in the object of the target text, and M is more than or equal to 1; the target text is a sentence comprising a subject, a predicate and an object.
Optionally, the entity in the target text is identified by using a machine learning-based identification algorithm, for example, an entity in the target text is identified by using an LSTM + CRF-based model. Those skilled in the art will appreciate that any method of entity identification in the prior art falls within the scope of the present invention.
It should be noted that, the position of the entity in the entity set is determined according to the appearance sequence of the entity in the target text, and the earlier the entity appears in the target text, the earlier the position of the entity in the entity set is.
S200, acquiring a triple set { TAA, TBB, TAB } of the target text, wherein: TAA = ∅ or TAA = { TAA = { (TAA) 1 ,TAA 2 ,…,TAA i ,…,TAA QA }≠∅,TAA i The value range of i is 1 to QA, and QA is the number of triples corresponding to the subject of the target text; TAA i =(A i,1 ,EA i ,A i,2 ),A i,1 Is TAA i Comprising a first entity, A i,2 Is TAA i Comprising a second entity, EA i Is an entity A i,1 And entity A i,2 The relationship between them.
According to the present invention, there is provided, TBB = ∅ or TBB = { TBB = { (TBB) 1 ,TBB 2 ,…,TBB j ,…,TBB QB }≠∅,TBB j The j is the jth triple corresponding to the object of the target text, the value range of j is 1 to QB, and QB is the triple quantity corresponding to the object of the target text; TBB j =(B j,1 ,EB j ,B j,2 ),B j,1 Is TBB j Including a first entity, B j,2 Is TBB j Including a second entity, EB j As entity B j,1 And entity B j,2 The relationship between them.
TAB = (A) according to the invention N ,EAB,B M ),A N Is the Nth entity in the subject of the target text, B M The Mth entity in the object of the target text, EAB is entity A N And entity B M The relationship between them.
It will be appreciated that existing relationship extraction methods may automatically identify certain semantic relationships between entities. Optionally, the relationship between the two entities in the target text is obtained by using the existing relationship extraction method based on the neural network. Those skilled in the art will appreciate that any method of extracting relationships in the prior art falls within the scope of the present invention.
It is understood that existing syntactic analysis methods may identify the syntactic structure of a sentence, such as the subject and object of a sentence. Alternatively, the present invention utilizes existing syntactic analysis tools to identify whether an entity in the target text is in a subject or object.
According to the method, the relation between the entities in the target text and whether the entities are located in the subject or the object of the target text can be obtained based on the relation extraction method and the syntactic analysis method, so that the triple set TAA corresponding to the subject of the target text, the triple set TBB corresponding to the object of the target text and the triple TAB corresponding to the target text and including the entities in the subject and the entities in the object can be respectively obtained. It should be understood that there may be no triples of entities in the subject, or there may be 1 or more triples of entities in the subject; there may be no triples of entities in the object, or there may be 1 or more triples of entities in the object.
S300, if TAA is not equal to ∅, acquiring a composition entity ZA corresponding to a subject of the target text according to IDAA; IDAA = (IDAA) 1 ,IDAA 2 ,…,IDAA i ,…,IDAA QA ),IDAA i Is TAA i Is used for uniquely identifying the TAA i 。
According to the invention, ZA = ZA if the entities in a are all in TAA 1 ,ZA 1 A first component entity corresponding to a subject of the target text; ZA when QA =1 1 Is IDAA 1 (ii) a When QA is not less than 2, ZA 1 Is IDSA 1,QA ;IDSA 1,QA The obtaining method comprises the following steps:
s310, obtaining the triple (IDAA) 1 ,E 0 ,IDAA 2 ) Number of (IDSA) 1,2 The number being used for a unique Identification (IDAA) 1 ,E 0 ,IDAA 2 ),E 0 Is a first predetermined relationship, IDAA 1 Is TAA 1 Number of (1), TAA 1 The 1 st triplet, IDAA, corresponding to the subject of the target text 2 Is TAA 2 Number of (1), TAA 2 A 2 nd triple corresponding to the subject of the target text; if QA =2,IDSA 1,QA Is IDSA 1,2 (ii) a If QA is>2, then S320 is entered.
According to the present invention, if two entities corresponding to a certain triple are both in the subject of the target text, the triple is given a corresponding number, and the number can be used to refer to the triple. And an entity is constructed by the number, and the entity can participate in the subsequent triple construction, so that the finally constructed knowledge graph can accurately represent the semantics of the target text.
According to the invention, the first preset relationship is used for indicating that the two corresponding entities are in a common subject relationship, and the two entities are both subjects.
S320, obtaining the triple (IDSA) 1,QA-1 ,E 0 ,IDAA QA ) Number of (IDSA) 1,QA The number is used for unique Identification (IDSA) 1,QA-1 ,E 0 ,IDAA QA );IDAA QA Is TAA QA Number of (1), TAA QA QA corresponding to subject of target textAnd (4) a triplet.
It should be understood that obtaining the IDSA 1,2 Afterwards, IDSA can be acquired in sequence 1,3 、…、IDSA 1,QA Wherein IDSA 1,3 Is a triplet (IDSA) 1,2 ,E 0 ,IDAA 3 ) Number of (2), IDSA 1,QA Is a triplet (IDSA) 1,QA-1 ,E 0 ,IDAA QA ) Number of (A), IDAA 3 Is TAA 3 Number of (1), TAA 3 IDSA as the 3 rd triple corresponding to the subject of the target text 1,QA-1 To be acquired (IDSA) 1,QA-1 ,E 0 ,IDAA QA ) Last triplet (IDSA) 1,QA-2 ,E 0 ,IDAA QA-1 ) Number of (2), IDSA 1,QA-2 Is a triplet (IDSA) 1,QA-3 ,E 0 ,IDAA QA-2 ) The numbering of (c), and so on.
According to the invention, if the entities present in A are not in TAA and the number PA ≧ 2 of entities present in A that are not in TAA, then ZA is a triplet (ZA) 1 ,E 0 ,IDN PA ) Number of (9), IDN PA The obtaining method comprises the following steps:
s311, acquiring triple (NA) 1 ,E 0 ,NA 2 ) ID No. of 2 The number being used for a unique identification (NA) 1 ,E 0 ,NA 2 ),NA 1 For the 1 st entity in A not in TAA, NA 2 Is the 2 nd entity in a not in TAA, if PA =2,idn PA Is IDN 2 (ii) a If PA>2, the process proceeds to S312.
S312, obtaining the triple (IDN) PA-1 ,E 0 ,NA PA ) ID No. of PA The number is used for unique Identification (IDN) PA-1 ,E 0 ,NA PA ),NA PA Is the PA-th entity in A that is not in TAA; IDN PA-1 To obtain (IDN) PA-1 ,E 0 ,NA PA ) The number of the corresponding last triplet.
It should be understood that obtaining the IDN 2 The IDN may then be sequentially obtained 3 、…、IDN PA Wherein IDN 3 Is a triplet (IDN) 2 ,E 0 ,NA 3 ) Number of (1), IDN PA Is a triplet (IDN) PA-1 ,E 0 ,NA PA ) Number of (2), NA 3 The 3 rd entity in A not in TAA, IDN PA-1 Is (IDN) PA-1 ,E 0 ,NA PA ) Corresponding last triplet (IDN) PA-2 ,E 0 ,NA PA-1 ) Number of (1), IDN PA-2 Is a triplet (IDN) PA-3 ,E 0 ,NA PA-2 ) The numbering of (c), and so on.
According to the invention, if the presentity in A is not in TAA and PA =1, ZA is a triple (ZA) 1 ,E 0 ,NA 1 ) The number of (2).
S400, if TBB is not equal to ∅, acquiring a component entity ZB corresponding to the object of the target text according to IDBB; IDBB = (IDBB) 1 ,IDBB 2 ,…,IDBB j ,…,IDBB QB ),IDBB j Is TBB j The number of (2) is used for uniquely identifying the TBB j 。
According to the invention, if all entities in B are in TBB, ZB = ZB 1 ,ZB 1 A first component entity corresponding to an object of the target text; ZB when QB =1 1 Is IDBB 1 (ii) a When QB is greater than or equal to 2, ZB 1 Is IDSB 1,QB ;IDSB 1,QB The obtaining method comprises the following steps:
s410, obtaining the triple (IDBB) 1 ,E 1 ,IDBB 2 ) Number of (IDSB) 1,2 The number being used for unique Identification (IDBB) 1 ,E 1 ,IDBB 2 ),E 1 For a second predetermined relationship, IDBB 1 Is TBB 1 Number of (TBB) 1 The 1 st triplet, IDBB, corresponding to the object of the target text 2 Is TBB 2 Number of (TBB) 2 The 2 nd triple corresponding to the object of the target text; IDSB if QB =2, 1,QB is IDSB 1,2 (ii) a If QB>2, then S420 is entered.
According to the present invention, if two entities corresponding to a triple are both in the object of the target text, the triple is assigned a corresponding number, which can be used to refer to the triple. And an entity is constructed by the number, and the entity can participate in the subsequent triple construction, so that the finally constructed knowledge graph can accurately represent the semantics of the target text.
According to the invention, the second preset relationship is used for indicating that the two corresponding entities are in a common object relationship, and indicating that the two entities are both objects.
S420, obtaining the triple (IDSB) 1,QB-1 ,E 1 ,IDBB QB ) Number of (IDSB) 1,QB The number being used for a unique Identification (IDSB) 1,QB-1 ,E 1 ,IDBB QB );IDSB 1,QB-1 Is (IDSB) 1,QB-1 ,E 1 ,IDBB QB ) Number of corresponding last triplet, IDBB QB Is TBB QB Number of (TBB) QB The QB-th triple corresponding to the object of the target text.
It should be understood that obtaining the IDSB 1,2 Then, IDSB can be obtained in turn 1,3 、…、IDSB 1,QB Wherein IDSB 1,3 Is (IDSB) 1,2 ,E 1 ,IDBB 3 ) Number of (A), IDSB 1,QB Is (IDSB) 1,QB-1 ,E 1 ,IDBB QB ) Wherein IDBB 3 Is TBB 3 Number of (TBB) 3 IDSB, which is the 3 rd triple corresponding to the object of the target text 1,QB-1 Is (IDSB) 1,QB-1 ,E 1 ,IDBB QB ) Corresponding last triplet (IDSB) 1,QB-2 ,E 1 ,IDBB QB-1 ) Number of (A), IDSB 1,QB-2 Is a triplet (IDSB) 1,QB-3 ,E 1 ,IDBB QB-2 ) The numbering of (c), and so on.
According to the invention, if the entity present in B is not in the TBB and the number PB of entities in B that are not in the TBB is ≧ 2, ZB is a triplet (ZB) 1 ,E 1 ,IDNB PB ) Number of (1), IDNB PB The obtaining method comprises the following steps:
s411, obtaining the triple (NB) 1 ,E 1 ,NB 2 ) Number IDNB of 2 The number being used for a unique identification (NB) 1 ,E 1 ,NB 2 ),NB 1 Is the 1 st entity in B not in TBB, NB 2 Is the 2 nd entity in B not in TBB, if PB =2,idnb PB Is IDNB 2 (ii) a If PB is>2, the process proceeds to S412.
S412, obtaining the triple (IDNB) PB-1 ,E 1 ,NB PB ) Number IDNB of PB The number is used for unique Identification (IDNB) PB-1 ,E 1 ,NB PB ),NB PB Is the PB-th entity in B which is not in the TBB; IDNB PB-1 To obtain (IDNB) PB-1 ,E 1 ,NB PB ) The number of the last triplet of (2).
It should be understood that obtaining the IDNB 2 The IDNB can then be subsequently retrieved 3 、…、IDNB PB Wherein IDNB 3 Is a triplet (IDNB) 2 ,E 1 ,NB 3 ) Number of (1), IDNB PB Is a triplet (IDNB) PB-1 ,E 1 ,NB PB ) Number of (2), NB 3 The 3 rd entity in B not in TBB, IDNB PB-1 To obtain (IDNB) PB-1 ,E 1 ,NB PB ) Last triplet (IDNB) PB-2 ,E 1 ,NB PB-1 ) Number of (1), IDNB PB-2 Is a triplet (IDNB) PB-3 ,E 1 ,NB PB-2 ) The numbering of (c), and so on.
According to the invention, if the presentity in B is not in TBB and PB =1, ZB is a triplet (ZB) 1 ,E 1 ,NB 1 ) The number of (2).
S500, obtaining a target triple T = (X) of a target text 1 ,EAB,X 2 ) When TAA ≠ ∅, X 1 = ZA; when TAA = ∅ and N =1, X 1 =A 1 (ii) a When TBB ≠ ∅, X 2 = ZB; when TBB = ∅ and M =1, X 2 =B 1 。
According to the invention, X is X when TAA = ∅ and N.gtoreq.2 1 =ID A,N ,ID A,N The obtaining method comprises the following steps:
s510, acquiring the triple (A) 1 ,E 0 ,A 2 ) ID of A,2 The number is used for unique identification (A) 1 ,E 0 ,A 2 ),A 1 Is the 1 st entity in A, A 2 For the 2 nd entity in a, if N =2 A,N Is ID A,2 (ii) a Such asFruit N>2, the process proceeds to S520.
S520, acquiring the triple (ID) A,N-1 ,E 0 ,A N ) ID of A,N The number being used for a unique Identification (ID) A,N-1 ,E 0 ,A N ),A N Is the Nth entity in A; ID A,N-1 To obtain (ID) A,N-1 ,E 0 ,A N ) The number of the corresponding last triplet.
It should be understood that the ID is obtained A,2 Thereafter, the IDs can also be acquired in sequence A,3 、…、ID A,N Wherein ID A,3 Is a triplet (ID) A,2 ,E 0 ,A 3 ) Is serial number, ID A,N Is a triplet (ID) A,N-1 ,E 0 ,A N ) Number of (A) 3 For the 3 rd entity in A, ID A,N-1 Is (ID) A,N-1 ,E 0 ,A N ) Last triplet (ID) of A,N-2 ,E 0 ,A N-1 ) Is serial number, ID A,N-2 Is a triplet (ID) A,N-3 ,E 0 ,A N-2 ) So on.
According to the invention, X is X when TBB = ∅ and M ≧ 2 2 =ID B,M ,ID B,M The obtaining method comprises the following steps:
s511, acquiring the triple (B) 1 ,E 1 ,B 2 ) ID of B,2 The number being used for a unique identification (B) 1 ,E 1 ,B 2 ),B 1 Is the 1 st entity in B, B 2 For the 2 nd entity in B, if M =2 B,M Is ID B,2 (ii) a If M is>2, the process proceeds to S521.
S521, acquiring the triple (ID) B,M-1 ,E 1 ,B M ) ID of B,M The number being used for a unique Identification (ID) B,M-1 ,E 1 ,B M ),B M Is the Mth entity in B; ID (identity) B,M-1 To obtain (ID) B,M-1 ,E 1 ,B M ) The number of the corresponding last triplet.
It should be understood that the ID is obtained B,2 Thereafter, the ID can also be acquired in sequence B,3 、…、ID B,M Wherein ID B,3 Is a triplet (ID) B,2 ,E 1 ,B 3 ) Is serial number, ID B,M Is a triplet (ID) B,M-1 ,E 1 ,B M ) ID of B,M ,B 3 For the 3 rd entity in B, ID B,M-1 Is (ID) B,M-1 ,E 1 ,B M ) Last triplet (ID) of B,M-2 ,E 1 ,B M-1 ) Is serial number, ID B,M-2 Is a triplet (ID) B,M-3 ,E 1 ,B M-2 ) The numbering of (c), and so on.
As a first specific embodiment, the target text is: the purple mobile phone is a restricted money appointed by a joe, wherein the purple mobile phone is a subject of the target text, and the restricted money appointed by the joe is an object of the target text; in S100, a = (purple, mobile phone), N =2,B = (joe, limited money), M =2 is obtained; s200, obtaining by using a syntactic analysis model and a relation extraction model: TAA = { cell phone, color, purple) }, QA =1, tbb = { (joe, specified, limit) }, QB =1, tab = (cell phone, yes, limit) }; s300, if the number of the triad (mobile phone, color, purple) is 1, then ZA 1 Namely number 1, ZA and number 1; s400, if the number of the triple (Joe, appointed, limited) is 2, ZB is the number 2; s500, T = (number 1, yes, number 2) is acquired, and it should be understood that number 1 refers to a purple mobile phone and number 2 refers to a restricted money designated by joe.
As a second specific embodiment, the target text is: the purple mobile phone in 2014 is a restricted money specified by Joe, wherein the purple mobile phone in 2014 is a subject of the target text, and the restricted money specified by Joe is an object of the target text; in S100, a = (2014, purple, mobile phone), N =3,B = (joe, limited), M =2 is obtained; s200, obtaining by using a syntactic analysis model and a relation extraction model: TAA = { (cell phone, color, purple) }, QA =1, tbb = { (joe, designation, limit) }, QB =1, tab = (cell phone, yes, limit) }; s300, if the number of the triad (mobile phone, color, purple) is 1, then ZA 1 Namely number 1, ZA, a triplet (ZA) 1 ,E 0 2014), such as triplets (ZA) 1 ,E 0 2014), then ZA is number 3; s400, such as triplet (Joe, specified, limit)Fixed amount) is 2, then ZB is number 2; s500, obtain T = (number 3, yes, number 2), it should be understood that number 3 refers to 2014-violet cell phone, and number 2 refers to the restricted money designated by joe.
As a third specific embodiment, the target text is: the purple mobile phone is a limited money, wherein the purple mobile phone is a subject of the target text, and the limited money is an object of the target text; in S100, a = (purple, mobile phone), N =2,B = (limit), M =2; s200, obtaining by using a syntactic analysis model and a relation extraction model: TAA = { (cell phone, color, purple) }, QA =1, tbb = ∅; s300, if the number of the triad (mobile phone, color, purple) is 1, then ZA 1 Namely number 1, ZA is also number 1; s400, ZB is a limited money; s500, T = (number 1, yes, limit), and it should be understood that number 1 refers to a purple cell phone.
Although some specific embodiments of the present invention have been described in detail by way of illustration, it should be understood by those skilled in the art that the above illustration is only for the purpose of illustration and is not intended to limit the scope of the invention. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.
Claims (9)
1. A text processing method based on knowledge graph is characterized by comprising the following steps:
s100, acquiring an entity set { A, B }, A = (A) in the target text 1 ,A 2 ,…,A n ,…,A N ),A n The method comprises the following steps that the number of nth entities identified from front to back in a subject of a target text is N, the value range of N is 1 to N, N is the number of the entities included in the subject of the target text, and N is more than or equal to 1; b = (B) 1 ,B 2 ,…,B m ,…,B M ),B m The object is the M-th entity recognized from front to back in the object of the target text, the value range of M is 1 to M, M is the number of entities included in the object of the target text, and M is more than or equal to 1; the target text is a sentence comprising a subject, a predicate and an object;
s200, acquiring a triple set { TAA, TBB, TAB } of the target text, wherein:
TAA = ∅ or TAA = { TAA = { (TAA) 1 ,TAA 2 ,…,TAA i ,…,TAA QA }≠∅,TAA i The number of the ith triple corresponding to the subject of the target text is 1 to QA, and QA is the number of the triples corresponding to the subject of the target text; TAA i =(A i,1 ,EA i ,A i,2 ),A i,1 Is TAA i Comprising a first entity, A i,2 Is TAA i Comprising a second entity, EA i Is entity A i,1 And entity A i,2 The relationship between;
TBB = ∅ or TBB = { TBB = { (TBB) 1 ,TBB 2 ,…,TBB j ,…,TBB QB }≠∅,TBB j The j is the jth triple corresponding to the object of the target text, the value range of j is 1 to QB, and QB is the triple quantity corresponding to the object of the target text; TBB j =(B j,1 ,EB j ,B j,2 ),B j,1 Is TBB j Including a first entity, B j,2 Is TBB j Including a second entity, EB j Is an entity B j,1 And entity B j,2 The relationship between;
TAB=(A N ,EAB,B M ),A N is the Nth entity in the subject of the target text, B M The Mth entity in the object of the target text, EAB is entity A N And entity B M The relationship between;
s300, if TAA is not equal to ∅, acquiring a composition entity ZA corresponding to a subject of the target text according to IDAA; IDAA = (IDAA) 1 ,IDAA 2 ,…,IDAA i ,…,IDAA QA ),IDAA i Is TAA i Is used for uniquely identifying the TAA i ;
S400, if TBB is not equal to ∅, acquiring a component entity ZB corresponding to the object of the target text according to IDBB; IDBB = (IDBB) 1 ,IDBB 2 ,…,IDBB j ,…,IDBB QB ),IDBB j Is TBB j The number of (2) is used for uniquely identifying the TBB j ;
S500, obtaining a target triple T = (X) of the target text 1 ,EAB,X 2 ) When TAA ≠ ∅, X 1 = ZA; when TAA = ∅ and N =1, X 1 =A 1 (ii) a When TBB ≠ ∅, X 2 = ZB; when TBB = ∅ and M =1, X 2 =B 1 。
2. The method according to claim 1, wherein in S300, the obtaining a composition entity ZA corresponding to a subject of the target text according to the IDAA includes: if all entities in A are in TAA, ZA = ZA 1 ,ZA 1 A first component entity corresponding to a subject of the target text; when QA =1, ZA 1 Is IDAA 1 (ii) a When QA is not less than 2, ZA 1 Is IDSA 1,QA ;IDSA 1,QA The obtaining method comprises the following steps:
s310, obtaining the triple (IDAA) 1 ,E 0 ,IDAA 2 ) Number of (IDSA) 1,2 The number being used for a unique Identification (IDAA) 1 ,E 0 ,IDAA 2 ),E 0 A first predetermined relationship; if QA =2,IDSA 1,QA Is IDSA 1,2 (ii) a If QA>2, entering S320;
s320, obtaining the triple (IDSA) 1,QA-1 ,E 0 ,IDAA QA ) Number IDSA of 1,QA The number is used for unique Identification (IDSA) 1,QA-1 ,E 0 ,IDAA QA );IDSA 1,QA-1 To be acquired (IDSA) 1,QA-2 ,E 0 ,IDAA QA-1 ) Is used for unique Identification (IDSA) 1,QA-2 ,E 0 ,IDAA QA-1 ),IDSA 1,QA-2 For adopting and acquiring IDSA 1,QA-1 The same method obtains the number of the triplet.
3. The method according to claim 2, wherein in S300, the obtaining a composition entity ZA corresponding to a subject of the target text according to the IDAA includes: if the entity in A is not in TAA and the number PA of the entity in A which is not in TAA is greater than or equal to 2, then ZA is a triple (ZA) 1 ,E 0 ,IDN PA ) Is used for unique identification (ZA) 1 ,E 0 ,IDN PA ),IDN PA The obtaining method comprises the following steps:
s311, obtaining the triple (NA) 1 ,E 0 ,NA 2 ) Number IDN of 2 The number being used for a unique identification (NA) 1 ,E 0 ,NA 2 ),NA 1 Is the 1 st entity in A not in TAA, NA 2 Is the 2 nd entity in a not in TAA, if PA =2,idn PA Is IDN 2 (ii) a If PA>2, entering S312;
s312, obtaining the triple (IDN) PA-1 ,E 0 ,NA PA ) Number IDN of PA The number is used for unique Identification (IDN) PA-1 ,E 0 ,NA PA ),NA PA Is the PA-th entity in A that is not in TAA; IDN PA-1 Is (IDN) PA-2 ,E 0 ,NA PA-1 ) The number of (2), the number for a unique Identification (IDN) PA-2 ,E 0 ,NA PA-1 ),IDN PA-2 To adopt and acquire IDN PA-1 The same method obtains the number of the triplet.
4. The method according to claim 3, wherein in step S300, the obtaining of the composition entity ZA corresponding to the subject of the target text according to IDAA includes: if the presentity in A is not in TAA and PA =1, ZA is a triplet (ZA) 1 ,E 0 ,NA 1 ) For unique identification (ZA) 1 ,E 0 ,NA 1 )。
5. The method of claim 2, wherein the first predetermined relationship is used to indicate that the two corresponding entities are in a common subject relationship.
6. The method of claim 1, wherein S500 further comprises: when TAA = ∅ and N.gtoreq.2, X 1 =ID A,N ,ID A,N The obtaining method comprises the following steps:
s510, acquiring the triple (A) 1 ,E 0 ,A 2 ) ID of A,2 The number being used for a unique identification (A) 1 ,E 0 ,A 2 ) If N =2,ID A,N Is ID A,2 (ii) a If N is present>2, entering S520;
s520, acquiring the triple (ID) A,N-1 ,E 0 ,A N ) ID of A,N The number being used for a unique Identification (ID) A,N-1 ,E 0 ,A N );ID A,N-1 Is (ID) A,N-2 ,E 0 ,A N-1 ) For a unique Identification (ID) A,N-2 ,E 0 ,A N-1 ),ID A,N-2 For adopting and obtaining ID A,N-1 The same method obtains the number of the triplet.
7. The method of claim 1, wherein in S400, the obtaining the component entity ZB corresponding to the object of the target text according to the IDBB comprises: ZB = ZB if the entities in B are all in TBB 1 ,ZB 1 A first component entity corresponding to an object of the target text; ZB when QB =1 1 Is IDBB 1 (ii) a When QB is greater than or equal to 2, ZB 1 Is IDSB 1,QB ;IDSB 1,QB The obtaining method comprises the following steps:
s410, obtaining the triple (IDBB) 1 ,E 1 ,IDBB 2 ) Number of (IDSB) 1,2 The number being used for a unique Identification (IDBB) 1 ,E 1 ,IDBB 2 ),E 1 A second predetermined relationship; IDSB if QB =2, 1,QB is IDSB 1,2 (ii) a If QB>2, entering S420;
s420, obtaining triple (IDSB) 1,QB-1 ,E 1 ,IDBB QB ) Number of (IDSB) 1,QB The number being used for a unique Identification (IDSB) 1,QB-1 ,E 1 ,IDBB QB );IDSB 1,QB-1 Is (IDSB) 1,QB-2 ,E 1 ,IDBB QB-1 ) Number for unique Identification (IDSB) 1,QB-2 ,E 1 ,IDBB QB-1 ),IDSB 1,QB-2 For adopting and obtaining IDSB 1,QB-1 The same method obtains the number of the triplet.
8. The method of claim 7, wherein the second predetermined relationship is used to indicate that the two corresponding entities are in a common object relationship.
9. The method of claim 1, wherein S500 further comprises: when TBB = ∅ and M ≧ 2, X 2 =ID B,M ,ID B,M The obtaining method comprises the following steps:
s511, obtaining the triple (B) 1 ,E 1 ,B 2 ) ID of B,2 The number being used for a unique identification (B) 1 ,E 1 ,B 2 ) If M =2,ID B,M Is ID B,2 (ii) a If M is>2, entering S521;
s521, acquiring the triple (ID) B,M-1 ,E 1 ,B M ) ID of B,M The number being used for a unique Identification (ID) B,M-1 ,E 1 ,B M );ID B,M-1 Is (ID) B,M-2 ,E 1 ,B M-1 ) For a unique Identification (ID) B,M-2 ,E 1 ,B M-1 ),ID B,M-2 For adopting and obtaining ID B,M-1 The same method obtains the number of the triplet.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211565438.5A CN115577713B (en) | 2022-12-07 | 2022-12-07 | Text processing method based on knowledge graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211565438.5A CN115577713B (en) | 2022-12-07 | 2022-12-07 | Text processing method based on knowledge graph |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115577713A true CN115577713A (en) | 2023-01-06 |
CN115577713B CN115577713B (en) | 2023-03-17 |
Family
ID=84590059
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211565438.5A Active CN115577713B (en) | 2022-12-07 | 2022-12-07 | Text processing method based on knowledge graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115577713B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103699663A (en) * | 2013-12-27 | 2014-04-02 | 中国科学院自动化研究所 | Hot event mining method based on large-scale knowledge base |
CN111639171A (en) * | 2020-06-08 | 2020-09-08 | 吉林大学 | Knowledge graph question-answering method and device |
WO2020233261A1 (en) * | 2019-07-12 | 2020-11-26 | 之江实验室 | Natural language generation-based knowledge graph understanding assistance system |
CN113407678A (en) * | 2021-06-30 | 2021-09-17 | 竹间智能科技(上海)有限公司 | Knowledge graph construction method, device and equipment |
US20220027766A1 (en) * | 2021-02-19 | 2022-01-27 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method for industry text increment and electronic device |
-
2022
- 2022-12-07 CN CN202211565438.5A patent/CN115577713B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103699663A (en) * | 2013-12-27 | 2014-04-02 | 中国科学院自动化研究所 | Hot event mining method based on large-scale knowledge base |
WO2020233261A1 (en) * | 2019-07-12 | 2020-11-26 | 之江实验室 | Natural language generation-based knowledge graph understanding assistance system |
CN111639171A (en) * | 2020-06-08 | 2020-09-08 | 吉林大学 | Knowledge graph question-answering method and device |
US20220027766A1 (en) * | 2021-02-19 | 2022-01-27 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method for industry text increment and electronic device |
CN113407678A (en) * | 2021-06-30 | 2021-09-17 | 竹间智能科技(上海)有限公司 | Knowledge graph construction method, device and equipment |
Non-Patent Citations (1)
Title |
---|
郑丽敏 等: "面向食品安全事件新闻文本的实体关系抽取研究", 《农业机械学报》 * |
Also Published As
Publication number | Publication date |
---|---|
CN115577713B (en) | 2023-03-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Karian et al. | Fitting statistical distributions: the generalized lambda distribution and generalized bootstrap methods | |
Barido-Sottani et al. | A multitype birth–death model for Bayesian inference of lineage-specific birth and death rates | |
CN109034861B (en) | User loss prediction method and device based on mobile terminal log behavior data | |
CN107633060B (en) | Information processing method and electronic equipment | |
Kell et al. | Validation of stock assessment methods: is it me or my model talking? | |
CN106202030B (en) | Rapid sequence labeling method and device based on heterogeneous labeling data | |
CN105279397A (en) | Method for identifying key proteins in protein-protein interaction network | |
CN110069776B (en) | Customer satisfaction evaluation method and device and computer readable storage medium | |
CN112163424A (en) | Data labeling method, device, equipment and medium | |
CN111897961A (en) | Text classification method and related components of wide neural network model | |
Duchen et al. | On the effect of asymmetrical trait inheritance on models of trait evolution | |
CN115759640A (en) | Public service information processing system and method for smart city | |
CN113486166B (en) | Construction method, device and equipment of intelligent customer service robot and storage medium | |
CN111160034A (en) | Method and device for labeling entity words, storage medium and equipment | |
CN115577713B (en) | Text processing method based on knowledge graph | |
CN114708264B (en) | Light spot quality judging method, device, equipment and storage medium | |
Gao et al. | Forecasting elections with agent-based modeling: Two live experiments | |
CN115809280A (en) | Group house renting identification and iteration identification method | |
CN115618415A (en) | Sensitive data identification method and device, electronic equipment and storage medium | |
CN114581219A (en) | Anti-telecommunication network fraud early warning method and system | |
CN112001760B (en) | Potential user mining method and device, electronic equipment and storage medium | |
He et al. | Jeans and language: kin networks and reproductive success are associated with the adoption of outgroup norms | |
CN111291376B (en) | Web vulnerability verification method based on crowdsourcing and machine learning | |
CN114118306A (en) | Method and device for analyzing SDS (sodium dodecyl sulfate) gel electrophoresis experimental data and SDS gel reagent | |
CN110399399B (en) | User analysis method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |