CN115577713A - Text processing method based on knowledge graph - Google Patents

Text processing method based on knowledge graph Download PDF

Info

Publication number
CN115577713A
CN115577713A CN202211565438.5A CN202211565438A CN115577713A CN 115577713 A CN115577713 A CN 115577713A CN 202211565438 A CN202211565438 A CN 202211565438A CN 115577713 A CN115577713 A CN 115577713A
Authority
CN
China
Prior art keywords
entity
taa
target text
tbb
idaa
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211565438.5A
Other languages
Chinese (zh)
Other versions
CN115577713B (en
Inventor
张正义
刘羽
刘宸
傅晓航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Yuchen Technology Co Ltd
Original Assignee
Zhongke Yuchen Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Yuchen Technology Co Ltd filed Critical Zhongke Yuchen Technology Co Ltd
Priority to CN202211565438.5A priority Critical patent/CN115577713B/en
Publication of CN115577713A publication Critical patent/CN115577713A/en
Application granted granted Critical
Publication of CN115577713B publication Critical patent/CN115577713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of electric digital data processing, in particular to a text processing method based on a knowledge graph. The method comprises the following steps: s100, acquiring an entity set { A, B } in a target text; s200, acquiring a triple set { TAA, TBB, TAB } of the target text; s300, if TAA is not equal to ∅, acquiring a composition entity ZA corresponding to a subject of the target text according to IDAA; s400, if TBB is not equal to ∅, acquiring a component entity ZB corresponding to the object of the target text according to IDBB; s500, obtaining a target triple T = (X) of a target text 1 ,EAB,X 2 ). The method and the device improve the accuracy of obtaining the semantic relation of the target text.

Description

Text processing method based on knowledge graph
Technical Field
The invention relates to the technical field of electric digital data processing, in particular to a text processing method based on a knowledge graph.
Background
The existing entity identification method can automatically identify entities in a sentence, the existing relationship extraction method can automatically identify a certain semantic relationship among the entities in the sentence, for example, the black mobile phone 14 is the classic sentence, the three entities of the black mobile phone 14 and the classic sentence in the sentence can be identified by the entity identification method, and the semantic relationship existing in the sentence can be identified by the relationship extraction method as follows: black is the color of the cell phone 14 and the cell phone 14 is classic. However, the above statement means that the black cell phone 14 is classic and it is unknown whether the other cell phones 14 are classic or not. Therefore, when the structure of a sentence is complex, the semantic relationship obtained by using the existing relationship extraction method may not be accurate. How to improve the accuracy of obtaining the semantic relation of the statement is an urgent problem to be solved.
Disclosure of Invention
The invention aims to provide a text processing method based on a knowledge graph so as to improve the accuracy of obtaining the semantic relation of a target text.
According to the invention, the text processing method based on the knowledge graph is provided, and comprises the following steps:
s100, acquiring an entity set { A, B }, A = (A) in the target text 1 ,A 2 ,…,A n ,…,A N ),A n The method comprises the following steps that the number of nth entities identified from front to back in a subject of a target text is N, the value range of N is 1 to N, N is the number of the entities included in the subject of the target text, and N is more than or equal to 1; b = (B) 1 ,B 2 ,…,B m ,…,B M ),B m The object is the M-th entity recognized from front to back in the object of the target text, the value range of M is 1 to M, M is the number of entities included in the object of the target text, and M is more than or equal to 1; the target text is a sentence comprising a subject, a predicate and an object.
S200, acquiring a triple set { TAA, TBB, TAB } of the target text, wherein:
TAA = ∅ or TAA = { TAA = { (TAA) 1 ,TAA 2 ,…,TAA i ,…,TAA QA }≠∅,TAA i The ith triple corresponding to the subject of the target text, the value range of i is 1 to QA, QA is the triple corresponding to the subject of the target textThe number of groups; TAA i =(A i,1 ,EA i ,A i,2 ),A i,1 Is TAA i Comprising a first entity, A i,2 Is TAA i Comprising a second entity, EA i Is an entity A i,1 And entity A i,2 The relationship between them.
TBB = ∅ or TBB = { TBB = { (TBB) } 1 ,TBB 2 ,…,TBB j ,…,TBB QB }≠∅,TBB j The j is the jth triple corresponding to the object of the target text, the value range of j is 1 to QB, and QB is the triple quantity corresponding to the object of the target text; TBB j =(B j,1 ,EB j ,B j,2 ),B j,1 Is TBB j Including a first entity, B j,2 Is TBB j Including a second entity, EB j Is an entity B j,1 And entity B j,2 The relationship between them.
TAB=(A N ,EAB,B M ),A N Is the Nth entity in the subject of the target text, B M The Mth entity in the object of the target text, EAB is entity A N And entity B M The relationship between them.
S300, if TAA is not equal to ∅, acquiring a composition entity ZA corresponding to a subject of the target text according to IDAA; IDAA = (IDAA) 1 ,IDAA 2 ,…,IDAA i ,…,IDAA QA ),IDAA i Is TAA i Is used for uniquely identifying the TAA i
S400, if TBB is not equal to ∅, acquiring a component entity ZB corresponding to the object of the target text according to IDBB; IDBB = (IDBB) 1 ,IDBB 2 ,…,IDBB j ,…,IDBB QB ),IDBB j Is TBB j The number of (2) is used for uniquely identifying the TBB j
S500, obtaining a target triple T = (X) of the target text 1 ,EAB,X 2 ) When TAA ≠ ∅, X 1 = ZA; when TAA = ∅ and N =1, X 1 =A 1 (ii) a When TBB ≠ ∅, X 2 = ZB; when TBB = ∅ and M =1, X 2 =B 1
Compared with the prior art, the method has obvious beneficial effects, and by means of the technical scheme, the method for processing the text based on the knowledge graph can achieve considerable technical progress and practicability, has wide industrial utilization value, and at least has the following beneficial effects:
the method acquires all entities in the target text and all triple sets of the target text, wherein the triple sets may have a triple which is an entity in the subject and a triple which is an entity in the object; for such triples, the corresponding number of the triples is used as an entity for further constructing the triples, and the number represents the whole triples corresponding to the triples, so that the complex syntactic structure in the target text can be accurately represented based on the triples, the problem that the semantic relation of the sentences with the complex syntactic structure may be inaccurate when the relation extraction method is used for obtaining the semantic relation of the sentences with the complex syntactic structure in the prior art is solved, and the accuracy of obtaining the semantic relation of the target text is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a method for processing a knowledge-graph-based text according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
According to the invention, the invention provides a text processing method based on a knowledge graph, which comprises the following steps:
s100, acquiring an entity set { A, B }, A = (A) in a target text 1 ,A 2 ,…,A n ,…,A N ),A n The method comprises the following steps that the number of nth entities identified from front to back in a subject of a target text is N, the value range of N is 1 to N, N is the number of the entities included in the subject of the target text, and N is more than or equal to 1; b = (B) 1 ,B 2 ,…,B m ,…,B M ),B m The object is the M-th entity recognized from front to back in the object of the target text, the value range of M is 1 to M, M is the number of entities included in the object of the target text, and M is more than or equal to 1; the target text is a sentence comprising a subject, a predicate and an object.
Optionally, the entity in the target text is identified by using a machine learning-based identification algorithm, for example, an entity in the target text is identified by using an LSTM + CRF-based model. Those skilled in the art will appreciate that any method of entity identification in the prior art falls within the scope of the present invention.
It should be noted that, the position of the entity in the entity set is determined according to the appearance sequence of the entity in the target text, and the earlier the entity appears in the target text, the earlier the position of the entity in the entity set is.
S200, acquiring a triple set { TAA, TBB, TAB } of the target text, wherein: TAA = ∅ or TAA = { TAA = { (TAA) 1 ,TAA 2 ,…,TAA i ,…,TAA QA }≠∅,TAA i The value range of i is 1 to QA, and QA is the number of triples corresponding to the subject of the target text; TAA i =(A i,1 ,EA i ,A i,2 ),A i,1 Is TAA i Comprising a first entity, A i,2 Is TAA i Comprising a second entity, EA i Is an entity A i,1 And entity A i,2 The relationship between them.
According to the present invention, there is provided, TBB = ∅ or TBB = { TBB = { (TBB) 1 ,TBB 2 ,…,TBB j ,…,TBB QB }≠∅,TBB j The j is the jth triple corresponding to the object of the target text, the value range of j is 1 to QB, and QB is the triple quantity corresponding to the object of the target text; TBB j =(B j,1 ,EB j ,B j,2 ),B j,1 Is TBB j Including a first entity, B j,2 Is TBB j Including a second entity, EB j As entity B j,1 And entity B j,2 The relationship between them.
TAB = (A) according to the invention N ,EAB,B M ),A N Is the Nth entity in the subject of the target text, B M The Mth entity in the object of the target text, EAB is entity A N And entity B M The relationship between them.
It will be appreciated that existing relationship extraction methods may automatically identify certain semantic relationships between entities. Optionally, the relationship between the two entities in the target text is obtained by using the existing relationship extraction method based on the neural network. Those skilled in the art will appreciate that any method of extracting relationships in the prior art falls within the scope of the present invention.
It is understood that existing syntactic analysis methods may identify the syntactic structure of a sentence, such as the subject and object of a sentence. Alternatively, the present invention utilizes existing syntactic analysis tools to identify whether an entity in the target text is in a subject or object.
According to the method, the relation between the entities in the target text and whether the entities are located in the subject or the object of the target text can be obtained based on the relation extraction method and the syntactic analysis method, so that the triple set TAA corresponding to the subject of the target text, the triple set TBB corresponding to the object of the target text and the triple TAB corresponding to the target text and including the entities in the subject and the entities in the object can be respectively obtained. It should be understood that there may be no triples of entities in the subject, or there may be 1 or more triples of entities in the subject; there may be no triples of entities in the object, or there may be 1 or more triples of entities in the object.
S300, if TAA is not equal to ∅, acquiring a composition entity ZA corresponding to a subject of the target text according to IDAA; IDAA = (IDAA) 1 ,IDAA 2 ,…,IDAA i ,…,IDAA QA ),IDAA i Is TAA i Is used for uniquely identifying the TAA i
According to the invention, ZA = ZA if the entities in a are all in TAA 1 ,ZA 1 A first component entity corresponding to a subject of the target text; ZA when QA =1 1 Is IDAA 1 (ii) a When QA is not less than 2, ZA 1 Is IDSA 1,QA ;IDSA 1,QA The obtaining method comprises the following steps:
s310, obtaining the triple (IDAA) 1 ,E 0 ,IDAA 2 ) Number of (IDSA) 1,2 The number being used for a unique Identification (IDAA) 1 ,E 0 ,IDAA 2 ),E 0 Is a first predetermined relationship, IDAA 1 Is TAA 1 Number of (1), TAA 1 The 1 st triplet, IDAA, corresponding to the subject of the target text 2 Is TAA 2 Number of (1), TAA 2 A 2 nd triple corresponding to the subject of the target text; if QA =2,IDSA 1,QA Is IDSA 1,2 (ii) a If QA is>2, then S320 is entered.
According to the present invention, if two entities corresponding to a certain triple are both in the subject of the target text, the triple is given a corresponding number, and the number can be used to refer to the triple. And an entity is constructed by the number, and the entity can participate in the subsequent triple construction, so that the finally constructed knowledge graph can accurately represent the semantics of the target text.
According to the invention, the first preset relationship is used for indicating that the two corresponding entities are in a common subject relationship, and the two entities are both subjects.
S320, obtaining the triple (IDSA) 1,QA-1 ,E 0 ,IDAA QA ) Number of (IDSA) 1,QA The number is used for unique Identification (IDSA) 1,QA-1 ,E 0 ,IDAA QA );IDAA QA Is TAA QA Number of (1), TAA QA QA corresponding to subject of target textAnd (4) a triplet.
It should be understood that obtaining the IDSA 1,2 Afterwards, IDSA can be acquired in sequence 1,3 、…、IDSA 1,QA Wherein IDSA 1,3 Is a triplet (IDSA) 1,2 ,E 0 ,IDAA 3 ) Number of (2), IDSA 1,QA Is a triplet (IDSA) 1,QA-1 ,E 0 ,IDAA QA ) Number of (A), IDAA 3 Is TAA 3 Number of (1), TAA 3 IDSA as the 3 rd triple corresponding to the subject of the target text 1,QA-1 To be acquired (IDSA) 1,QA-1 ,E 0 ,IDAA QA ) Last triplet (IDSA) 1,QA-2 ,E 0 ,IDAA QA-1 ) Number of (2), IDSA 1,QA-2 Is a triplet (IDSA) 1,QA-3 ,E 0 ,IDAA QA-2 ) The numbering of (c), and so on.
According to the invention, if the entities present in A are not in TAA and the number PA ≧ 2 of entities present in A that are not in TAA, then ZA is a triplet (ZA) 1 ,E 0 ,IDN PA ) Number of (9), IDN PA The obtaining method comprises the following steps:
s311, acquiring triple (NA) 1 ,E 0 ,NA 2 ) ID No. of 2 The number being used for a unique identification (NA) 1 ,E 0 ,NA 2 ),NA 1 For the 1 st entity in A not in TAA, NA 2 Is the 2 nd entity in a not in TAA, if PA =2,idn PA Is IDN 2 (ii) a If PA>2, the process proceeds to S312.
S312, obtaining the triple (IDN) PA-1 ,E 0 ,NA PA ) ID No. of PA The number is used for unique Identification (IDN) PA-1 ,E 0 ,NA PA ),NA PA Is the PA-th entity in A that is not in TAA; IDN PA-1 To obtain (IDN) PA-1 ,E 0 ,NA PA ) The number of the corresponding last triplet.
It should be understood that obtaining the IDN 2 The IDN may then be sequentially obtained 3 、…、IDN PA Wherein IDN 3 Is a triplet (IDN) 2 ,E 0 ,NA 3 ) Number of (1), IDN PA Is a triplet (IDN) PA-1 ,E 0 ,NA PA ) Number of (2), NA 3 The 3 rd entity in A not in TAA, IDN PA-1 Is (IDN) PA-1 ,E 0 ,NA PA ) Corresponding last triplet (IDN) PA-2 ,E 0 ,NA PA-1 ) Number of (1), IDN PA-2 Is a triplet (IDN) PA-3 ,E 0 ,NA PA-2 ) The numbering of (c), and so on.
According to the invention, if the presentity in A is not in TAA and PA =1, ZA is a triple (ZA) 1 ,E 0 ,NA 1 ) The number of (2).
S400, if TBB is not equal to ∅, acquiring a component entity ZB corresponding to the object of the target text according to IDBB; IDBB = (IDBB) 1 ,IDBB 2 ,…,IDBB j ,…,IDBB QB ),IDBB j Is TBB j The number of (2) is used for uniquely identifying the TBB j
According to the invention, if all entities in B are in TBB, ZB = ZB 1 ,ZB 1 A first component entity corresponding to an object of the target text; ZB when QB =1 1 Is IDBB 1 (ii) a When QB is greater than or equal to 2, ZB 1 Is IDSB 1,QB ;IDSB 1,QB The obtaining method comprises the following steps:
s410, obtaining the triple (IDBB) 1 ,E 1 ,IDBB 2 ) Number of (IDSB) 1,2 The number being used for unique Identification (IDBB) 1 ,E 1 ,IDBB 2 ),E 1 For a second predetermined relationship, IDBB 1 Is TBB 1 Number of (TBB) 1 The 1 st triplet, IDBB, corresponding to the object of the target text 2 Is TBB 2 Number of (TBB) 2 The 2 nd triple corresponding to the object of the target text; IDSB if QB =2, 1,QB is IDSB 1,2 (ii) a If QB>2, then S420 is entered.
According to the present invention, if two entities corresponding to a triple are both in the object of the target text, the triple is assigned a corresponding number, which can be used to refer to the triple. And an entity is constructed by the number, and the entity can participate in the subsequent triple construction, so that the finally constructed knowledge graph can accurately represent the semantics of the target text.
According to the invention, the second preset relationship is used for indicating that the two corresponding entities are in a common object relationship, and indicating that the two entities are both objects.
S420, obtaining the triple (IDSB) 1,QB-1 ,E 1 ,IDBB QB ) Number of (IDSB) 1,QB The number being used for a unique Identification (IDSB) 1,QB-1 ,E 1 ,IDBB QB );IDSB 1,QB-1 Is (IDSB) 1,QB-1 ,E 1 ,IDBB QB ) Number of corresponding last triplet, IDBB QB Is TBB QB Number of (TBB) QB The QB-th triple corresponding to the object of the target text.
It should be understood that obtaining the IDSB 1,2 Then, IDSB can be obtained in turn 1,3 、…、IDSB 1,QB Wherein IDSB 1,3 Is (IDSB) 1,2 ,E 1 ,IDBB 3 ) Number of (A), IDSB 1,QB Is (IDSB) 1,QB-1 ,E 1 ,IDBB QB ) Wherein IDBB 3 Is TBB 3 Number of (TBB) 3 IDSB, which is the 3 rd triple corresponding to the object of the target text 1,QB-1 Is (IDSB) 1,QB-1 ,E 1 ,IDBB QB ) Corresponding last triplet (IDSB) 1,QB-2 ,E 1 ,IDBB QB-1 ) Number of (A), IDSB 1,QB-2 Is a triplet (IDSB) 1,QB-3 ,E 1 ,IDBB QB-2 ) The numbering of (c), and so on.
According to the invention, if the entity present in B is not in the TBB and the number PB of entities in B that are not in the TBB is ≧ 2, ZB is a triplet (ZB) 1 ,E 1 ,IDNB PB ) Number of (1), IDNB PB The obtaining method comprises the following steps:
s411, obtaining the triple (NB) 1 ,E 1 ,NB 2 ) Number IDNB of 2 The number being used for a unique identification (NB) 1 ,E 1 ,NB 2 ),NB 1 Is the 1 st entity in B not in TBB, NB 2 Is the 2 nd entity in B not in TBB, if PB =2,idnb PB Is IDNB 2 (ii) a If PB is>2, the process proceeds to S412.
S412, obtaining the triple (IDNB) PB-1 ,E 1 ,NB PB ) Number IDNB of PB The number is used for unique Identification (IDNB) PB-1 ,E 1 ,NB PB ),NB PB Is the PB-th entity in B which is not in the TBB; IDNB PB-1 To obtain (IDNB) PB-1 ,E 1 ,NB PB ) The number of the last triplet of (2).
It should be understood that obtaining the IDNB 2 The IDNB can then be subsequently retrieved 3 、…、IDNB PB Wherein IDNB 3 Is a triplet (IDNB) 2 ,E 1 ,NB 3 ) Number of (1), IDNB PB Is a triplet (IDNB) PB-1 ,E 1 ,NB PB ) Number of (2), NB 3 The 3 rd entity in B not in TBB, IDNB PB-1 To obtain (IDNB) PB-1 ,E 1 ,NB PB ) Last triplet (IDNB) PB-2 ,E 1 ,NB PB-1 ) Number of (1), IDNB PB-2 Is a triplet (IDNB) PB-3 ,E 1 ,NB PB-2 ) The numbering of (c), and so on.
According to the invention, if the presentity in B is not in TBB and PB =1, ZB is a triplet (ZB) 1 ,E 1 ,NB 1 ) The number of (2).
S500, obtaining a target triple T = (X) of a target text 1 ,EAB,X 2 ) When TAA ≠ ∅, X 1 = ZA; when TAA = ∅ and N =1, X 1 =A 1 (ii) a When TBB ≠ ∅, X 2 = ZB; when TBB = ∅ and M =1, X 2 =B 1
According to the invention, X is X when TAA = ∅ and N.gtoreq.2 1 =ID A,N ,ID A,N The obtaining method comprises the following steps:
s510, acquiring the triple (A) 1 ,E 0 ,A 2 ) ID of A,2 The number is used for unique identification (A) 1 ,E 0 ,A 2 ),A 1 Is the 1 st entity in A, A 2 For the 2 nd entity in a, if N =2 A,N Is ID A,2 (ii) a Such asFruit N>2, the process proceeds to S520.
S520, acquiring the triple (ID) A,N-1 ,E 0 ,A N ) ID of A,N The number being used for a unique Identification (ID) A,N-1 ,E 0 ,A N ),A N Is the Nth entity in A; ID A,N-1 To obtain (ID) A,N-1 ,E 0 ,A N ) The number of the corresponding last triplet.
It should be understood that the ID is obtained A,2 Thereafter, the IDs can also be acquired in sequence A,3 、…、ID A,N Wherein ID A,3 Is a triplet (ID) A,2 ,E 0 ,A 3 ) Is serial number, ID A,N Is a triplet (ID) A,N-1 ,E 0 ,A N ) Number of (A) 3 For the 3 rd entity in A, ID A,N-1 Is (ID) A,N-1 ,E 0 ,A N ) Last triplet (ID) of A,N-2 ,E 0 ,A N-1 ) Is serial number, ID A,N-2 Is a triplet (ID) A,N-3 ,E 0 ,A N-2 ) So on.
According to the invention, X is X when TBB = ∅ and M ≧ 2 2 =ID B,M ,ID B,M The obtaining method comprises the following steps:
s511, acquiring the triple (B) 1 ,E 1 ,B 2 ) ID of B,2 The number being used for a unique identification (B) 1 ,E 1 ,B 2 ),B 1 Is the 1 st entity in B, B 2 For the 2 nd entity in B, if M =2 B,M Is ID B,2 (ii) a If M is>2, the process proceeds to S521.
S521, acquiring the triple (ID) B,M-1 ,E 1 ,B M ) ID of B,M The number being used for a unique Identification (ID) B,M-1 ,E 1 ,B M ),B M Is the Mth entity in B; ID (identity) B,M-1 To obtain (ID) B,M-1 ,E 1 ,B M ) The number of the corresponding last triplet.
It should be understood that the ID is obtained B,2 Thereafter, the ID can also be acquired in sequence B,3 、…、ID B,M Wherein ID B,3 Is a triplet (ID) B,2 ,E 1 ,B 3 ) Is serial number, ID B,M Is a triplet (ID) B,M-1 ,E 1 ,B M ) ID of B,M ,B 3 For the 3 rd entity in B, ID B,M-1 Is (ID) B,M-1 ,E 1 ,B M ) Last triplet (ID) of B,M-2 ,E 1 ,B M-1 ) Is serial number, ID B,M-2 Is a triplet (ID) B,M-3 ,E 1 ,B M-2 ) The numbering of (c), and so on.
As a first specific embodiment, the target text is: the purple mobile phone is a restricted money appointed by a joe, wherein the purple mobile phone is a subject of the target text, and the restricted money appointed by the joe is an object of the target text; in S100, a = (purple, mobile phone), N =2,B = (joe, limited money), M =2 is obtained; s200, obtaining by using a syntactic analysis model and a relation extraction model: TAA = { cell phone, color, purple) }, QA =1, tbb = { (joe, specified, limit) }, QB =1, tab = (cell phone, yes, limit) }; s300, if the number of the triad (mobile phone, color, purple) is 1, then ZA 1 Namely number 1, ZA and number 1; s400, if the number of the triple (Joe, appointed, limited) is 2, ZB is the number 2; s500, T = (number 1, yes, number 2) is acquired, and it should be understood that number 1 refers to a purple mobile phone and number 2 refers to a restricted money designated by joe.
As a second specific embodiment, the target text is: the purple mobile phone in 2014 is a restricted money specified by Joe, wherein the purple mobile phone in 2014 is a subject of the target text, and the restricted money specified by Joe is an object of the target text; in S100, a = (2014, purple, mobile phone), N =3,B = (joe, limited), M =2 is obtained; s200, obtaining by using a syntactic analysis model and a relation extraction model: TAA = { (cell phone, color, purple) }, QA =1, tbb = { (joe, designation, limit) }, QB =1, tab = (cell phone, yes, limit) }; s300, if the number of the triad (mobile phone, color, purple) is 1, then ZA 1 Namely number 1, ZA, a triplet (ZA) 1 ,E 0 2014), such as triplets (ZA) 1 ,E 0 2014), then ZA is number 3; s400, such as triplet (Joe, specified, limit)Fixed amount) is 2, then ZB is number 2; s500, obtain T = (number 3, yes, number 2), it should be understood that number 3 refers to 2014-violet cell phone, and number 2 refers to the restricted money designated by joe.
As a third specific embodiment, the target text is: the purple mobile phone is a limited money, wherein the purple mobile phone is a subject of the target text, and the limited money is an object of the target text; in S100, a = (purple, mobile phone), N =2,B = (limit), M =2; s200, obtaining by using a syntactic analysis model and a relation extraction model: TAA = { (cell phone, color, purple) }, QA =1, tbb = ∅; s300, if the number of the triad (mobile phone, color, purple) is 1, then ZA 1 Namely number 1, ZA is also number 1; s400, ZB is a limited money; s500, T = (number 1, yes, limit), and it should be understood that number 1 refers to a purple cell phone.
Although some specific embodiments of the present invention have been described in detail by way of illustration, it should be understood by those skilled in the art that the above illustration is only for the purpose of illustration and is not intended to limit the scope of the invention. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims (9)

1. A text processing method based on knowledge graph is characterized by comprising the following steps:
s100, acquiring an entity set { A, B }, A = (A) in the target text 1 ,A 2 ,…,A n ,…,A N ),A n The method comprises the following steps that the number of nth entities identified from front to back in a subject of a target text is N, the value range of N is 1 to N, N is the number of the entities included in the subject of the target text, and N is more than or equal to 1; b = (B) 1 ,B 2 ,…,B m ,…,B M ),B m The object is the M-th entity recognized from front to back in the object of the target text, the value range of M is 1 to M, M is the number of entities included in the object of the target text, and M is more than or equal to 1; the target text is a sentence comprising a subject, a predicate and an object;
s200, acquiring a triple set { TAA, TBB, TAB } of the target text, wherein:
TAA = ∅ or TAA = { TAA = { (TAA) 1 ,TAA 2 ,…,TAA i ,…,TAA QA }≠∅,TAA i The number of the ith triple corresponding to the subject of the target text is 1 to QA, and QA is the number of the triples corresponding to the subject of the target text; TAA i =(A i,1 ,EA i ,A i,2 ),A i,1 Is TAA i Comprising a first entity, A i,2 Is TAA i Comprising a second entity, EA i Is entity A i,1 And entity A i,2 The relationship between;
TBB = ∅ or TBB = { TBB = { (TBB) 1 ,TBB 2 ,…,TBB j ,…,TBB QB }≠∅,TBB j The j is the jth triple corresponding to the object of the target text, the value range of j is 1 to QB, and QB is the triple quantity corresponding to the object of the target text; TBB j =(B j,1 ,EB j ,B j,2 ),B j,1 Is TBB j Including a first entity, B j,2 Is TBB j Including a second entity, EB j Is an entity B j,1 And entity B j,2 The relationship between;
TAB=(A N ,EAB,B M ),A N is the Nth entity in the subject of the target text, B M The Mth entity in the object of the target text, EAB is entity A N And entity B M The relationship between;
s300, if TAA is not equal to ∅, acquiring a composition entity ZA corresponding to a subject of the target text according to IDAA; IDAA = (IDAA) 1 ,IDAA 2 ,…,IDAA i ,…,IDAA QA ),IDAA i Is TAA i Is used for uniquely identifying the TAA i
S400, if TBB is not equal to ∅, acquiring a component entity ZB corresponding to the object of the target text according to IDBB; IDBB = (IDBB) 1 ,IDBB 2 ,…,IDBB j ,…,IDBB QB ),IDBB j Is TBB j The number of (2) is used for uniquely identifying the TBB j
S500, obtaining a target triple T = (X) of the target text 1 ,EAB,X 2 ) When TAA ≠ ∅, X 1 = ZA; when TAA = ∅ and N =1, X 1 =A 1 (ii) a When TBB ≠ ∅, X 2 = ZB; when TBB = ∅ and M =1, X 2 =B 1
2. The method according to claim 1, wherein in S300, the obtaining a composition entity ZA corresponding to a subject of the target text according to the IDAA includes: if all entities in A are in TAA, ZA = ZA 1 ,ZA 1 A first component entity corresponding to a subject of the target text; when QA =1, ZA 1 Is IDAA 1 (ii) a When QA is not less than 2, ZA 1 Is IDSA 1,QA ;IDSA 1,QA The obtaining method comprises the following steps:
s310, obtaining the triple (IDAA) 1 ,E 0 ,IDAA 2 ) Number of (IDSA) 1,2 The number being used for a unique Identification (IDAA) 1 ,E 0 ,IDAA 2 ),E 0 A first predetermined relationship; if QA =2,IDSA 1,QA Is IDSA 1,2 (ii) a If QA>2, entering S320;
s320, obtaining the triple (IDSA) 1,QA-1 ,E 0 ,IDAA QA ) Number IDSA of 1,QA The number is used for unique Identification (IDSA) 1,QA-1 ,E 0 ,IDAA QA );IDSA 1,QA-1 To be acquired (IDSA) 1,QA-2 ,E 0 ,IDAA QA-1 ) Is used for unique Identification (IDSA) 1,QA-2 ,E 0 ,IDAA QA-1 ),IDSA 1,QA-2 For adopting and acquiring IDSA 1,QA-1 The same method obtains the number of the triplet.
3. The method according to claim 2, wherein in S300, the obtaining a composition entity ZA corresponding to a subject of the target text according to the IDAA includes: if the entity in A is not in TAA and the number PA of the entity in A which is not in TAA is greater than or equal to 2, then ZA is a triple (ZA) 1 ,E 0 ,IDN PA ) Is used for unique identification (ZA) 1 ,E 0 ,IDN PA ),IDN PA The obtaining method comprises the following steps:
s311, obtaining the triple (NA) 1 ,E 0 ,NA 2 ) Number IDN of 2 The number being used for a unique identification (NA) 1 ,E 0 ,NA 2 ),NA 1 Is the 1 st entity in A not in TAA, NA 2 Is the 2 nd entity in a not in TAA, if PA =2,idn PA Is IDN 2 (ii) a If PA>2, entering S312;
s312, obtaining the triple (IDN) PA-1 ,E 0 ,NA PA ) Number IDN of PA The number is used for unique Identification (IDN) PA-1 ,E 0 ,NA PA ),NA PA Is the PA-th entity in A that is not in TAA; IDN PA-1 Is (IDN) PA-2 ,E 0 ,NA PA-1 ) The number of (2), the number for a unique Identification (IDN) PA-2 ,E 0 ,NA PA-1 ),IDN PA-2 To adopt and acquire IDN PA-1 The same method obtains the number of the triplet.
4. The method according to claim 3, wherein in step S300, the obtaining of the composition entity ZA corresponding to the subject of the target text according to IDAA includes: if the presentity in A is not in TAA and PA =1, ZA is a triplet (ZA) 1 ,E 0 ,NA 1 ) For unique identification (ZA) 1 ,E 0 ,NA 1 )。
5. The method of claim 2, wherein the first predetermined relationship is used to indicate that the two corresponding entities are in a common subject relationship.
6. The method of claim 1, wherein S500 further comprises: when TAA = ∅ and N.gtoreq.2, X 1 =ID A,N ,ID A,N The obtaining method comprises the following steps:
s510, acquiring the triple (A) 1 ,E 0 ,A 2 ) ID of A,2 The number being used for a unique identification (A) 1 ,E 0 ,A 2 ) If N =2,ID A,N Is ID A,2 (ii) a If N is present>2, entering S520;
s520, acquiring the triple (ID) A,N-1 ,E 0 ,A N ) ID of A,N The number being used for a unique Identification (ID) A,N-1 ,E 0 ,A N );ID A,N-1 Is (ID) A,N-2 ,E 0 ,A N-1 ) For a unique Identification (ID) A,N-2 ,E 0 ,A N-1 ),ID A,N-2 For adopting and obtaining ID A,N-1 The same method obtains the number of the triplet.
7. The method of claim 1, wherein in S400, the obtaining the component entity ZB corresponding to the object of the target text according to the IDBB comprises: ZB = ZB if the entities in B are all in TBB 1 ,ZB 1 A first component entity corresponding to an object of the target text; ZB when QB =1 1 Is IDBB 1 (ii) a When QB is greater than or equal to 2, ZB 1 Is IDSB 1,QB ;IDSB 1,QB The obtaining method comprises the following steps:
s410, obtaining the triple (IDBB) 1 ,E 1 ,IDBB 2 ) Number of (IDSB) 1,2 The number being used for a unique Identification (IDBB) 1 ,E 1 ,IDBB 2 ),E 1 A second predetermined relationship; IDSB if QB =2, 1,QB is IDSB 1,2 (ii) a If QB>2, entering S420;
s420, obtaining triple (IDSB) 1,QB-1 ,E 1 ,IDBB QB ) Number of (IDSB) 1,QB The number being used for a unique Identification (IDSB) 1,QB-1 ,E 1 ,IDBB QB );IDSB 1,QB-1 Is (IDSB) 1,QB-2 ,E 1 ,IDBB QB-1 ) Number for unique Identification (IDSB) 1,QB-2 ,E 1 ,IDBB QB-1 ),IDSB 1,QB-2 For adopting and obtaining IDSB 1,QB-1 The same method obtains the number of the triplet.
8. The method of claim 7, wherein the second predetermined relationship is used to indicate that the two corresponding entities are in a common object relationship.
9. The method of claim 1, wherein S500 further comprises: when TBB = ∅ and M ≧ 2, X 2 =ID B,M ,ID B,M The obtaining method comprises the following steps:
s511, obtaining the triple (B) 1 ,E 1 ,B 2 ) ID of B,2 The number being used for a unique identification (B) 1 ,E 1 ,B 2 ) If M =2,ID B,M Is ID B,2 (ii) a If M is>2, entering S521;
s521, acquiring the triple (ID) B,M-1 ,E 1 ,B M ) ID of B,M The number being used for a unique Identification (ID) B,M-1 ,E 1 ,B M );ID B,M-1 Is (ID) B,M-2 ,E 1 ,B M-1 ) For a unique Identification (ID) B,M-2 ,E 1 ,B M-1 ),ID B,M-2 For adopting and obtaining ID B,M-1 The same method obtains the number of the triplet.
CN202211565438.5A 2022-12-07 2022-12-07 Text processing method based on knowledge graph Active CN115577713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211565438.5A CN115577713B (en) 2022-12-07 2022-12-07 Text processing method based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211565438.5A CN115577713B (en) 2022-12-07 2022-12-07 Text processing method based on knowledge graph

Publications (2)

Publication Number Publication Date
CN115577713A true CN115577713A (en) 2023-01-06
CN115577713B CN115577713B (en) 2023-03-17

Family

ID=84590059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211565438.5A Active CN115577713B (en) 2022-12-07 2022-12-07 Text processing method based on knowledge graph

Country Status (1)

Country Link
CN (1) CN115577713B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699663A (en) * 2013-12-27 2014-04-02 中国科学院自动化研究所 Hot event mining method based on large-scale knowledge base
CN111639171A (en) * 2020-06-08 2020-09-08 吉林大学 Knowledge graph question-answering method and device
WO2020233261A1 (en) * 2019-07-12 2020-11-26 之江实验室 Natural language generation-based knowledge graph understanding assistance system
CN113407678A (en) * 2021-06-30 2021-09-17 竹间智能科技(上海)有限公司 Knowledge graph construction method, device and equipment
US20220027766A1 (en) * 2021-02-19 2022-01-27 Beijing Baidu Netcom Science And Technology Co., Ltd. Method for industry text increment and electronic device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699663A (en) * 2013-12-27 2014-04-02 中国科学院自动化研究所 Hot event mining method based on large-scale knowledge base
WO2020233261A1 (en) * 2019-07-12 2020-11-26 之江实验室 Natural language generation-based knowledge graph understanding assistance system
CN111639171A (en) * 2020-06-08 2020-09-08 吉林大学 Knowledge graph question-answering method and device
US20220027766A1 (en) * 2021-02-19 2022-01-27 Beijing Baidu Netcom Science And Technology Co., Ltd. Method for industry text increment and electronic device
CN113407678A (en) * 2021-06-30 2021-09-17 竹间智能科技(上海)有限公司 Knowledge graph construction method, device and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑丽敏 等: "面向食品安全事件新闻文本的实体关系抽取研究", 《农业机械学报》 *

Also Published As

Publication number Publication date
CN115577713B (en) 2023-03-17

Similar Documents

Publication Publication Date Title
Karian et al. Fitting statistical distributions: the generalized lambda distribution and generalized bootstrap methods
Barido-Sottani et al. A multitype birth–death model for Bayesian inference of lineage-specific birth and death rates
CN109034861B (en) User loss prediction method and device based on mobile terminal log behavior data
CN107633060B (en) Information processing method and electronic equipment
Kell et al. Validation of stock assessment methods: is it me or my model talking?
CN106202030B (en) Rapid sequence labeling method and device based on heterogeneous labeling data
CN105279397A (en) Method for identifying key proteins in protein-protein interaction network
CN110069776B (en) Customer satisfaction evaluation method and device and computer readable storage medium
CN112163424A (en) Data labeling method, device, equipment and medium
CN111897961A (en) Text classification method and related components of wide neural network model
Duchen et al. On the effect of asymmetrical trait inheritance on models of trait evolution
CN115759640A (en) Public service information processing system and method for smart city
CN113486166B (en) Construction method, device and equipment of intelligent customer service robot and storage medium
CN111160034A (en) Method and device for labeling entity words, storage medium and equipment
CN115577713B (en) Text processing method based on knowledge graph
CN114708264B (en) Light spot quality judging method, device, equipment and storage medium
Gao et al. Forecasting elections with agent-based modeling: Two live experiments
CN115809280A (en) Group house renting identification and iteration identification method
CN115618415A (en) Sensitive data identification method and device, electronic equipment and storage medium
CN114581219A (en) Anti-telecommunication network fraud early warning method and system
CN112001760B (en) Potential user mining method and device, electronic equipment and storage medium
He et al. Jeans and language: kin networks and reproductive success are associated with the adoption of outgroup norms
CN111291376B (en) Web vulnerability verification method based on crowdsourcing and machine learning
CN114118306A (en) Method and device for analyzing SDS (sodium dodecyl sulfate) gel electrophoresis experimental data and SDS gel reagent
CN110399399B (en) User analysis method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant