CN115577713A

CN115577713A - Text processing method based on knowledge graph

Info

Publication number: CN115577713A
Application number: CN202211565438.5A
Authority: CN
Inventors: 张正义; 刘羽; 刘宸; 傅晓航
Original assignee: Zhongke Yuchen Technology Co Ltd
Current assignee: Zhongke Yuchen Technology Co Ltd
Priority date: 2022-12-07
Filing date: 2022-12-07
Publication date: 2023-01-06
Anticipated expiration: 2042-12-07
Also published as: CN115577713B

Abstract

The application relates to the technical field of electric digital data processing, in particular to a text processing method based on a knowledge graph. The method comprises the following steps: s100, acquiring an entity set { A, B } in a target text; s200, acquiring a triple set { TAA, TBB, TAB } of the target text; s300, if TAA is not equal to ∅, acquiring a composition entity ZA corresponding to a subject of the target text according to IDAA; s400, if TBB is not equal to ∅, acquiring a component entity ZB corresponding to the object of the target text according to IDBB; s500, obtaining a target triple T = (X) of a target text ₁ ,EAB,X ₂ ). The method and the device improve the accuracy of obtaining the semantic relation of the target text.

Description

Text processing method based on knowledge graph

Technical Field

The invention relates to the technical field of electric digital data processing, in particular to a text processing method based on a knowledge graph.

Background

The existing entity identification method can automatically identify entities in a sentence, the existing relationship extraction method can automatically identify a certain semantic relationship among the entities in the sentence, for example, the black mobile phone 14 is the classic sentence, the three entities of the black mobile phone 14 and the classic sentence in the sentence can be identified by the entity identification method, and the semantic relationship existing in the sentence can be identified by the relationship extraction method as follows: black is the color of the cell phone 14 and the cell phone 14 is classic. However, the above statement means that the black cell phone 14 is classic and it is unknown whether the other cell phones 14 are classic or not. Therefore, when the structure of a sentence is complex, the semantic relationship obtained by using the existing relationship extraction method may not be accurate. How to improve the accuracy of obtaining the semantic relation of the statement is an urgent problem to be solved.

Disclosure of Invention

The invention aims to provide a text processing method based on a knowledge graph so as to improve the accuracy of obtaining the semantic relation of a target text.

According to the invention, the text processing method based on the knowledge graph is provided, and comprises the following steps:

s100, acquiring an entity set { A, B }, A = (A) in the target text ₁ ，A ₂ ，…，A _n ，…，A _N )，A _n The method comprises the following steps that the number of nth entities identified from front to back in a subject of a target text is N, the value range of N is 1 to N, N is the number of the entities included in the subject of the target text, and N is more than or equal to 1; b = (B) ₁ ，B ₂ ，…，B _m ，…，B _M )，B _m The object is the M-th entity recognized from front to back in the object of the target text, the value range of M is 1 to M, M is the number of entities included in the object of the target text, and M is more than or equal to 1; the target text is a sentence comprising a subject, a predicate and an object.

S200, acquiring a triple set { TAA, TBB, TAB } of the target text, wherein:

TAA = ∅ or TAA = { TAA = { (TAA) ₁ ，TAA ₂ ，…，TAA _i ，…，TAA _QA }≠∅，TAA _i The ith triple corresponding to the subject of the target text, the value range of i is 1 to QA, QA is the triple corresponding to the subject of the target textThe number of groups; TAA _i =(A _i,1 ,EA _i ,A _i,2 )，A _i,1 Is TAA _i Comprising a first entity, A _i,2 Is TAA _i Comprising a second entity, EA _i Is an entity A _i,1 And entity A _i,2 The relationship between them.

TBB = ∅ or TBB = { TBB = { (TBB) } ₁ ，TBB ₂ ，…，TBB _j ，…，TBB _QB }≠∅，TBB _j The j is the jth triple corresponding to the object of the target text, the value range of j is 1 to QB, and QB is the triple quantity corresponding to the object of the target text; TBB _j =(B _j,1 ,EB _j ,B _j,2 )，B _j,1 Is TBB _j Including a first entity, B _j,2 Is TBB _j Including a second entity, EB _j Is an entity B _j,1 And entity B _j,2 The relationship between them.

TAB=(A _N ,EAB,B _M )，A _N Is the Nth entity in the subject of the target text, B _M The Mth entity in the object of the target text, EAB is entity A _N And entity B _M The relationship between them.

S300, if TAA is not equal to ∅, acquiring a composition entity ZA corresponding to a subject of the target text according to IDAA; IDAA = (IDAA) ₁ ，IDAA ₂ ，…，IDAA _i ，…，IDAA _QA )，IDAA _i Is TAA _i Is used for uniquely identifying the TAA _i 。

S400, if TBB is not equal to ∅, acquiring a component entity ZB corresponding to the object of the target text according to IDBB; IDBB = (IDBB) ₁ ，IDBB ₂ ，…，IDBB _j ，…，IDBB _QB )，IDBB _j Is TBB _j The number of (2) is used for uniquely identifying the TBB _j 。

S500, obtaining a target triple T = (X) of the target text ₁ ,EAB,X ₂ ) When TAA ≠ ∅, X ₁ = ZA; when TAA = ∅ and N =1, X ₁ =A ₁ (ii) a When TBB ≠ ∅, X ₂ = ZB; when TBB = ∅ and M =1, X ₂ =B ₁ 。

Compared with the prior art, the method has obvious beneficial effects, and by means of the technical scheme, the method for processing the text based on the knowledge graph can achieve considerable technical progress and practicability, has wide industrial utilization value, and at least has the following beneficial effects:

the method acquires all entities in the target text and all triple sets of the target text, wherein the triple sets may have a triple which is an entity in the subject and a triple which is an entity in the object; for such triples, the corresponding number of the triples is used as an entity for further constructing the triples, and the number represents the whole triples corresponding to the triples, so that the complex syntactic structure in the target text can be accurately represented based on the triples, the problem that the semantic relation of the sentences with the complex syntactic structure may be inaccurate when the relation extraction method is used for obtaining the semantic relation of the sentences with the complex syntactic structure in the prior art is solved, and the accuracy of obtaining the semantic relation of the target text is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a method for processing a knowledge-graph-based text according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

According to the invention, the invention provides a text processing method based on a knowledge graph, which comprises the following steps:

s100, acquiring an entity set { A, B }, A = (A) in a target text ₁ ，A ₂ ，…，A _n ，…，A _N )，A _n The method comprises the following steps that the number of nth entities identified from front to back in a subject of a target text is N, the value range of N is 1 to N, N is the number of the entities included in the subject of the target text, and N is more than or equal to 1; b = (B) ₁ ，B ₂ ，…，B _m ，…，B _M )，B _m The object is the M-th entity recognized from front to back in the object of the target text, the value range of M is 1 to M, M is the number of entities included in the object of the target text, and M is more than or equal to 1; the target text is a sentence comprising a subject, a predicate and an object.

Optionally, the entity in the target text is identified by using a machine learning-based identification algorithm, for example, an entity in the target text is identified by using an LSTM + CRF-based model. Those skilled in the art will appreciate that any method of entity identification in the prior art falls within the scope of the present invention.

It should be noted that, the position of the entity in the entity set is determined according to the appearance sequence of the entity in the target text, and the earlier the entity appears in the target text, the earlier the position of the entity in the entity set is.

S200, acquiring a triple set { TAA, TBB, TAB } of the target text, wherein: TAA = ∅ or TAA = { TAA = { (TAA) ₁ ，TAA ₂ ，…，TAA _i ，…，TAA _QA }≠∅，TAA _i The value range of i is 1 to QA, and QA is the number of triples corresponding to the subject of the target text; TAA _i =(A _i,1 ,EA _i ,A _i,2 )，A _i,1 Is TAA _i Comprising a first entity, A _i,2 Is TAA _i Comprising a second entity, EA _i Is an entity A _i,1 And entity A _i,2 The relationship between them.

According to the present invention, there is provided, TBB = ∅ or TBB = { TBB = { (TBB) ₁ ，TBB ₂ ，…，TBB _j ，…，TBB _QB }≠∅，TBB _j The j is the jth triple corresponding to the object of the target text, the value range of j is 1 to QB, and QB is the triple quantity corresponding to the object of the target text; TBB _j =(B _j,1 ,EB _j ,B _j,2 )，B _j,1 Is TBB _j Including a first entity, B _j,2 Is TBB _j Including a second entity, EB _j As entity B _j,1 And entity B _j,2 The relationship between them.

TAB = (A) according to the invention _N ,EAB,B _M )，A _N Is the Nth entity in the subject of the target text, B _M The Mth entity in the object of the target text, EAB is entity A _N And entity B _M The relationship between them.

It will be appreciated that existing relationship extraction methods may automatically identify certain semantic relationships between entities. Optionally, the relationship between the two entities in the target text is obtained by using the existing relationship extraction method based on the neural network. Those skilled in the art will appreciate that any method of extracting relationships in the prior art falls within the scope of the present invention.

It is understood that existing syntactic analysis methods may identify the syntactic structure of a sentence, such as the subject and object of a sentence. Alternatively, the present invention utilizes existing syntactic analysis tools to identify whether an entity in the target text is in a subject or object.

According to the method, the relation between the entities in the target text and whether the entities are located in the subject or the object of the target text can be obtained based on the relation extraction method and the syntactic analysis method, so that the triple set TAA corresponding to the subject of the target text, the triple set TBB corresponding to the object of the target text and the triple TAB corresponding to the target text and including the entities in the subject and the entities in the object can be respectively obtained. It should be understood that there may be no triples of entities in the subject, or there may be 1 or more triples of entities in the subject; there may be no triples of entities in the object, or there may be 1 or more triples of entities in the object.

According to the invention, ZA = ZA if the entities in a are all in TAA ₁ ，ZA ₁ A first component entity corresponding to a subject of the target text; ZA when QA =1 ₁ Is IDAA ₁ (ii) a When QA is not less than 2, ZA ₁ Is IDSA _1,QA ；IDSA _1,QA The obtaining method comprises the following steps:

s310, obtaining the triple (IDAA) ₁ ,E ₀ ,IDAA ₂ ) Number of (IDSA) _1,2 The number being used for a unique Identification (IDAA) ₁ ,E ₀ ,IDAA ₂ )，E ₀ Is a first predetermined relationship, IDAA ₁ Is TAA ₁ Number of (1), TAA ₁ The 1 st triplet, IDAA, corresponding to the subject of the target text ₂ Is TAA ₂ Number of (1), TAA ₂ A 2 nd triple corresponding to the subject of the target text; if QA =2,IDSA _1,QA Is IDSA _1,2 (ii) a If QA is>2, then S320 is entered.

According to the present invention, if two entities corresponding to a certain triple are both in the subject of the target text, the triple is given a corresponding number, and the number can be used to refer to the triple. And an entity is constructed by the number, and the entity can participate in the subsequent triple construction, so that the finally constructed knowledge graph can accurately represent the semantics of the target text.

According to the invention, the first preset relationship is used for indicating that the two corresponding entities are in a common subject relationship, and the two entities are both subjects.

S320, obtaining the triple (IDSA) _1,QA-1 ,E ₀ ,IDAA _QA ) Number of (IDSA) _1,QA The number is used for unique Identification (IDSA) _1,QA-1 ,E ₀ ,IDAA _QA )；IDAA _QA Is TAA _QA Number of (1), TAA _QA QA corresponding to subject of target textAnd (4) a triplet.

It should be understood that obtaining the IDSA _1,2 Afterwards, IDSA can be acquired in sequence _1,3 、…、IDSA _1,QA Wherein IDSA _1,3 Is a triplet (IDSA) _1,2 ,E ₀ ,IDAA ₃ ) Number of (2), IDSA _1,QA Is a triplet (IDSA) _1,QA-1 ,E ₀ ,IDAA _QA ) Number of (A), IDAA ₃ Is TAA ₃ Number of (1), TAA ₃ IDSA as the 3 rd triple corresponding to the subject of the target text _1,QA-1 To be acquired (IDSA) _1,QA-1 ,E ₀ ,IDAA _QA ) Last triplet (IDSA) _1,QA-2 ,E ₀ ,IDAA _QA-1 ) Number of (2), IDSA _1,QA-2 Is a triplet (IDSA) _1,QA-3 ,E ₀ ,IDAA _QA-2 ) The numbering of (c), and so on.

According to the invention, if the entities present in A are not in TAA and the number PA ≧ 2 of entities present in A that are not in TAA, then ZA is a triplet (ZA) ₁ ,E ₀ ,IDN _PA ) Number of (9), IDN _PA The obtaining method comprises the following steps:

s311, acquiring triple (NA) ₁ ,E ₀ ,NA ₂ ) ID No. of ₂ The number being used for a unique identification (NA) ₁ ,E ₀ ,NA ₂ )，NA ₁ For the 1 st entity in A not in TAA, NA ₂ Is the 2 nd entity in a not in TAA, if PA =2,idn _PA Is IDN ₂ (ii) a If PA>2, the process proceeds to S312.

S312, obtaining the triple (IDN) _PA-1 ,E ₀ ,NA _PA ) ID No. of _PA The number is used for unique Identification (IDN) _PA-1 ,E ₀ ,NA _PA )，NA _PA Is the PA-th entity in A that is not in TAA; IDN _PA-1 To obtain (IDN) _PA-1 ,E ₀ ,NA _PA ) The number of the corresponding last triplet.

It should be understood that obtaining the IDN ₂ The IDN may then be sequentially obtained ₃ 、…、IDN _PA Wherein IDN ₃ Is a triplet (IDN) ₂ ,E ₀ ,NA ₃ ) Number of (1), IDN _PA Is a triplet (IDN) _PA-1 ,E ₀ ,NA _PA ) Number of (2), NA ₃ The 3 rd entity in A not in TAA, IDN _PA-1 Is (IDN) _PA-1 ,E ₀ ,NA _PA ) Corresponding last triplet (IDN) _PA-2 ,E ₀ ,NA _PA-1 ) Number of (1), IDN _PA-2 Is a triplet (IDN) _PA-3 ,E ₀ ,NA _PA-2 ) The numbering of (c), and so on.

According to the invention, if the presentity in A is not in TAA and PA =1, ZA is a triple (ZA) ₁ ,E ₀ ,NA ₁ ) The number of (2).

According to the invention, if all entities in B are in TBB, ZB = ZB ₁ ，ZB ₁ A first component entity corresponding to an object of the target text; ZB when QB =1 ₁ Is IDBB ₁ (ii) a When QB is greater than or equal to 2, ZB ₁ Is IDSB _1,QB ；IDSB _1,QB The obtaining method comprises the following steps:

s410, obtaining the triple (IDBB) ₁ ,E ₁ ,IDBB ₂ ) Number of (IDSB) _1,2 The number being used for unique Identification (IDBB) ₁ ,E ₁ ,IDBB ₂ )，E ₁ For a second predetermined relationship, IDBB ₁ Is TBB ₁ Number of (TBB) ₁ The 1 st triplet, IDBB, corresponding to the object of the target text ₂ Is TBB ₂ Number of (TBB) ₂ The 2 nd triple corresponding to the object of the target text; IDSB if QB =2, _1,QB is IDSB _1,2 (ii) a If QB>2, then S420 is entered.

According to the present invention, if two entities corresponding to a triple are both in the object of the target text, the triple is assigned a corresponding number, which can be used to refer to the triple. And an entity is constructed by the number, and the entity can participate in the subsequent triple construction, so that the finally constructed knowledge graph can accurately represent the semantics of the target text.

According to the invention, the second preset relationship is used for indicating that the two corresponding entities are in a common object relationship, and indicating that the two entities are both objects.

S420, obtaining the triple (IDSB) _1,QB-1 ,E ₁ ,IDBB _QB ) Number of (IDSB) _1,QB The number being used for a unique Identification (IDSB) _1,QB-1 ,E ₁ ,IDBB _QB )；IDSB _1,QB-1 Is (IDSB) _1,QB-1 ,E ₁ ,IDBB _QB ) Number of corresponding last triplet, IDBB _QB Is TBB _QB Number of (TBB) _QB The QB-th triple corresponding to the object of the target text.

It should be understood that obtaining the IDSB _1,2 Then, IDSB can be obtained in turn _1,3 、…、IDSB _1,QB Wherein IDSB _1,3 Is (IDSB) _1,2 ,E ₁ ,IDBB ₃ ) Number of (A), IDSB _1,QB Is (IDSB) _1,QB-1 ,E ₁ ,IDBB _QB ) Wherein IDBB ₃ Is TBB ₃ Number of (TBB) ₃ IDSB, which is the 3 rd triple corresponding to the object of the target text _1,QB-1 Is (IDSB) _1,QB-1 ,E ₁ ,IDBB _QB ) Corresponding last triplet (IDSB) _1,QB-2 ,E ₁ ,IDBB _QB-1 ) Number of (A), IDSB _1,QB-2 Is a triplet (IDSB) _1,QB-3 ,E ₁ ,IDBB _QB-2 ) The numbering of (c), and so on.

According to the invention, if the entity present in B is not in the TBB and the number PB of entities in B that are not in the TBB is ≧ 2, ZB is a triplet (ZB) ₁ ,E ₁ ,IDNB _PB ) Number of (1), IDNB _PB The obtaining method comprises the following steps:

s411, obtaining the triple (NB) ₁ ,E ₁ ,NB ₂ ) Number IDNB of ₂ The number being used for a unique identification (NB) ₁ ,E ₁ ,NB ₂ )，NB ₁ Is the 1 st entity in B not in TBB, NB ₂ Is the 2 nd entity in B not in TBB, if PB =2,idnb _PB Is IDNB ₂ (ii) a If PB is>2, the process proceeds to S412.

S412, obtaining the triple (IDNB) _PB-1 ,E ₁ ,NB _PB ) Number IDNB of _PB The number is used for unique Identification (IDNB) _PB-1 ,E ₁ ,NB _PB )，NB _PB Is the PB-th entity in B which is not in the TBB; IDNB _PB-1 To obtain (IDNB) _PB-1 ,E ₁ ,NB _PB ) The number of the last triplet of (2).

It should be understood that obtaining the IDNB ₂ The IDNB can then be subsequently retrieved ₃ 、…、IDNB _PB Wherein IDNB ₃ Is a triplet (IDNB) ₂ ,E ₁ ,NB ₃ ) Number of (1), IDNB _PB Is a triplet (IDNB) _PB-1 ,E ₁ ,NB _PB ) Number of (2), NB ₃ The 3 rd entity in B not in TBB, IDNB _PB-1 To obtain (IDNB) _PB-1 ,E ₁ ,NB _PB ) Last triplet (IDNB) _PB-2 ,E ₁ ,NB _PB-1 ) Number of (1), IDNB _PB-2 Is a triplet (IDNB) _PB-3 ,E ₁ ,NB _PB-2 ) The numbering of (c), and so on.

According to the invention, if the presentity in B is not in TBB and PB =1, ZB is a triplet (ZB) ₁ ,E ₁ ,NB ₁ ) The number of (2).

S500, obtaining a target triple T = (X) of a target text ₁ ,EAB,X ₂ ) When TAA ≠ ∅, X ₁ = ZA; when TAA = ∅ and N =1, X ₁ =A ₁ (ii) a When TBB ≠ ∅, X ₂ = ZB; when TBB = ∅ and M =1, X ₂ =B ₁ 。

According to the invention, X is X when TAA = ∅ and N.gtoreq.2 ₁ =ID _A,N ，ID _A,N The obtaining method comprises the following steps:

s510, acquiring the triple (A) ₁ ,E ₀ ,A ₂ ) ID of _A,2 The number is used for unique identification (A) ₁ ,E ₀ ,A ₂ )，A ₁ Is the 1 st entity in A, A ₂ For the 2 nd entity in a, if N =2 _A,N Is ID _A,2 (ii) a Such asFruit N>2, the process proceeds to S520.

S520, acquiring the triple (ID) _A,N-1 ,E ₀ ,A _N ) ID of _A,N The number being used for a unique Identification (ID) _A,N-1 ,E ₀ ,A _N )，A _N Is the Nth entity in A; ID _A,N-1 To obtain (ID) _A,N-1 ,E ₀ ,A _N ) The number of the corresponding last triplet.

It should be understood that the ID is obtained _A,2 Thereafter, the IDs can also be acquired in sequence _A,3 、…、ID _A,N Wherein ID _A,3 Is a triplet (ID) _A,2 ,E ₀ ,A ₃ ) Is serial number, ID _A,N Is a triplet (ID) _A,N-1 ,E ₀ ,A _N ) Number of (A) ₃ For the 3 rd entity in A, ID _A,N-1 Is (ID) _A,N-1 ,E ₀ ,A _N ) Last triplet (ID) of _A,N-2 ,E ₀ ,A _N-1 ) Is serial number, ID _A,N-2 Is a triplet (ID) _A,N-3 ,E ₀ ,A _N-2 ) So on.

According to the invention, X is X when TBB = ∅ and M ≧ 2 ₂ =ID _B,M ，ID _B,M The obtaining method comprises the following steps:

s511, acquiring the triple (B) ₁ ,E ₁ ,B ₂ ) ID of _B,2 The number being used for a unique identification (B) ₁ ,E ₁ ,B ₂ )，B ₁ Is the 1 st entity in B, B ₂ For the 2 nd entity in B, if M =2 _B,M Is ID _B,2 (ii) a If M is>2, the process proceeds to S521.

S521, acquiring the triple (ID) _B,M-1 ,E ₁ ,B _M ) ID of _B,M The number being used for a unique Identification (ID) _B,M-1 ,E ₁ ,B _M )，B _M Is the Mth entity in B; ID (identity) _B,M-1 To obtain (ID) _B,M-1 ,E ₁ ,B _M ) The number of the corresponding last triplet.

It should be understood that the ID is obtained _B,2 Thereafter, the ID can also be acquired in sequence _B,3 、…、ID _B,M Wherein ID _B,3 Is a triplet (ID) _B,2 ,E ₁ ,B ₃ ) Is serial number, ID _B,M Is a triplet (ID) _B,M-1 ,E ₁ ,B _M ) ID of _B,M ，B ₃ For the 3 rd entity in B, ID _B,M-1 Is (ID) _B,M-1 ,E ₁ ,B _M ) Last triplet (ID) of _B,M-2 ,E ₁ ,B _M-1 ) Is serial number, ID _B,M-2 Is a triplet (ID) _B,M-3 ,E ₁ ,B _M-2 ) The numbering of (c), and so on.

As a first specific embodiment, the target text is: the purple mobile phone is a restricted money appointed by a joe, wherein the purple mobile phone is a subject of the target text, and the restricted money appointed by the joe is an object of the target text; in S100, a = (purple, mobile phone), N =2,B = (joe, limited money), M =2 is obtained; s200, obtaining by using a syntactic analysis model and a relation extraction model: TAA = { cell phone, color, purple) }, QA =1, tbb = { (joe, specified, limit) }, QB =1, tab = (cell phone, yes, limit) }; s300, if the number of the triad (mobile phone, color, purple) is 1, then ZA ₁ Namely number 1, ZA and number 1; s400, if the number of the triple (Joe, appointed, limited) is 2, ZB is the number 2; s500, T = (number 1, yes, number 2) is acquired, and it should be understood that number 1 refers to a purple mobile phone and number 2 refers to a restricted money designated by joe.

As a second specific embodiment, the target text is: the purple mobile phone in 2014 is a restricted money specified by Joe, wherein the purple mobile phone in 2014 is a subject of the target text, and the restricted money specified by Joe is an object of the target text; in S100, a = (2014, purple, mobile phone), N =3,B = (joe, limited), M =2 is obtained; s200, obtaining by using a syntactic analysis model and a relation extraction model: TAA = { (cell phone, color, purple) }, QA =1, tbb = { (joe, designation, limit) }, QB =1, tab = (cell phone, yes, limit) }; s300, if the number of the triad (mobile phone, color, purple) is 1, then ZA ₁ Namely number 1, ZA, a triplet (ZA) ₁ ,E ₀ 2014), such as triplets (ZA) ₁ ,E ₀ 2014), then ZA is number 3; s400, such as triplet (Joe, specified, limit)Fixed amount) is 2, then ZB is number 2; s500, obtain T = (number 3, yes, number 2), it should be understood that number 3 refers to 2014-violet cell phone, and number 2 refers to the restricted money designated by joe.

As a third specific embodiment, the target text is: the purple mobile phone is a limited money, wherein the purple mobile phone is a subject of the target text, and the limited money is an object of the target text; in S100, a = (purple, mobile phone), N =2,B = (limit), M =2; s200, obtaining by using a syntactic analysis model and a relation extraction model: TAA = { (cell phone, color, purple) }, QA =1, tbb = ∅; s300, if the number of the triad (mobile phone, color, purple) is 1, then ZA ₁ Namely number 1, ZA is also number 1; s400, ZB is a limited money; s500, T = (number 1, yes, limit), and it should be understood that number 1 refers to a purple cell phone.

Although some specific embodiments of the present invention have been described in detail by way of illustration, it should be understood by those skilled in the art that the above illustration is only for the purpose of illustration and is not intended to limit the scope of the invention. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims

1. A text processing method based on knowledge graph is characterized by comprising the following steps:

s100, acquiring an entity set { A, B }, A = (A) in the target text ₁ ，A ₂ ，…，A _n ，…，A _N )，A _n The method comprises the following steps that the number of nth entities identified from front to back in a subject of a target text is N, the value range of N is 1 to N, N is the number of the entities included in the subject of the target text, and N is more than or equal to 1; b = (B) ₁ ，B ₂ ，…，B _m ，…，B _M )，B _m The object is the M-th entity recognized from front to back in the object of the target text, the value range of M is 1 to M, M is the number of entities included in the object of the target text, and M is more than or equal to 1; the target text is a sentence comprising a subject, a predicate and an object;

s200, acquiring a triple set { TAA, TBB, TAB } of the target text, wherein:

TAA = ∅ or TAA = { TAA = { (TAA) ₁ ，TAA ₂ ，…，TAA _i ，…，TAA _QA }≠∅，TAA _i The number of the ith triple corresponding to the subject of the target text is 1 to QA, and QA is the number of the triples corresponding to the subject of the target text; TAA _i =(A _i,1 ,EA _i ,A _i,2 )，A _i,1 Is TAA _i Comprising a first entity, A _i,2 Is TAA _i Comprising a second entity, EA _i Is entity A _i,1 And entity A _i,2 The relationship between;

TBB = ∅ or TBB = { TBB = { (TBB) ₁ ，TBB ₂ ，…，TBB _j ，…，TBB _QB }≠∅，TBB _j The j is the jth triple corresponding to the object of the target text, the value range of j is 1 to QB, and QB is the triple quantity corresponding to the object of the target text; TBB _j =(B _j,1 ,EB _j ,B _j,2 )，B _j,1 Is TBB _j Including a first entity, B _j,2 Is TBB _j Including a second entity, EB _j Is an entity B _j,1 And entity B _j,2 The relationship between;

TAB=(A _N ,EAB,B _M )，A _N is the Nth entity in the subject of the target text, B _M The Mth entity in the object of the target text, EAB is entity A _N And entity B _M The relationship between;

s300, if TAA is not equal to ∅, acquiring a composition entity ZA corresponding to a subject of the target text according to IDAA; IDAA = (IDAA) ₁ ，IDAA ₂ ，…，IDAA _i ，…，IDAA _QA )，IDAA _i Is TAA _i Is used for uniquely identifying the TAA _i ；

S400, if TBB is not equal to ∅, acquiring a component entity ZB corresponding to the object of the target text according to IDBB; IDBB = (IDBB) ₁ ，IDBB ₂ ，…，IDBB _j ，…，IDBB _QB )，IDBB _j Is TBB _j The number of (2) is used for uniquely identifying the TBB _j ；

2. The method according to claim 1, wherein in S300, the obtaining a composition entity ZA corresponding to a subject of the target text according to the IDAA includes: if all entities in A are in TAA, ZA = ZA ₁ ，ZA ₁ A first component entity corresponding to a subject of the target text; when QA =1, ZA ₁ Is IDAA ₁ (ii) a When QA is not less than 2, ZA ₁ Is IDSA _1,QA ；IDSA _1,QA The obtaining method comprises the following steps:

s310, obtaining the triple (IDAA) ₁ ,E ₀ ,IDAA ₂ ) Number of (IDSA) _1,2 The number being used for a unique Identification (IDAA) ₁ ,E ₀ ,IDAA ₂ )，E ₀ A first predetermined relationship; if QA =2,IDSA _1,QA Is IDSA _1,2 (ii) a If QA>2, entering S320;

s320, obtaining the triple (IDSA) _1,QA-1 ,E ₀ ,IDAA _QA ) Number IDSA of _1,QA The number is used for unique Identification (IDSA) _1,QA-1 ,E ₀ ,IDAA _QA )；IDSA _1,QA-1 To be acquired (IDSA) _1,QA-2 ,E ₀ ,IDAA _QA-1 ) Is used for unique Identification (IDSA) _1,QA-2 ,E ₀ ,IDAA _QA-1 )，IDSA _1,QA-2 For adopting and acquiring IDSA _1,QA-1 The same method obtains the number of the triplet.

3. The method according to claim 2, wherein in S300, the obtaining a composition entity ZA corresponding to a subject of the target text according to the IDAA includes: if the entity in A is not in TAA and the number PA of the entity in A which is not in TAA is greater than or equal to 2, then ZA is a triple (ZA) ₁ ,E ₀ ,IDN _PA ) Is used for unique identification (ZA) ₁ ,E ₀ ,IDN _PA )，IDN _PA The obtaining method comprises the following steps:

s311, obtaining the triple (NA) ₁ ,E ₀ ,NA ₂ ) Number IDN of ₂ The number being used for a unique identification (NA) ₁ ,E ₀ ,NA ₂ )，NA ₁ Is the 1 st entity in A not in TAA, NA ₂ Is the 2 nd entity in a not in TAA, if PA =2,idn _PA Is IDN ₂ (ii) a If PA>2, entering S312;

s312, obtaining the triple (IDN) _PA-1 ,E ₀ ,NA _PA ) Number IDN of _PA The number is used for unique Identification (IDN) _PA-1 ,E ₀ ,NA _PA )，NA _PA Is the PA-th entity in A that is not in TAA; IDN _PA-1 Is (IDN) _PA-2 ,E ₀ ,NA _PA-1 ) The number of (2), the number for a unique Identification (IDN) _PA-2 ,E ₀ ,NA _PA-1 )，IDN _PA-2 To adopt and acquire IDN _PA-1 The same method obtains the number of the triplet.

4. The method according to claim 3, wherein in step S300, the obtaining of the composition entity ZA corresponding to the subject of the target text according to IDAA includes: if the presentity in A is not in TAA and PA =1, ZA is a triplet (ZA) ₁ ,E ₀ ,NA ₁ ) For unique identification (ZA) ₁ ,E ₀ ,NA ₁ )。

5. The method of claim 2, wherein the first predetermined relationship is used to indicate that the two corresponding entities are in a common subject relationship.

6. The method of claim 1, wherein S500 further comprises: when TAA = ∅ and N.gtoreq.2, X ₁ =ID _A,N ，ID _A,N The obtaining method comprises the following steps:

s510, acquiring the triple (A) ₁ ,E ₀ ,A ₂ ) ID of _A,2 The number being used for a unique identification (A) ₁ ,E ₀ ,A ₂ ) If N =2,ID _A,N Is ID _A,2 (ii) a If N is present>2, entering S520;

s520, acquiring the triple (ID) _A,N-1 ,E ₀ ,A _N ) ID of _A,N The number being used for a unique Identification (ID) _A,N-1 ,E ₀ ,A _N )；ID _A,N-1 Is (ID) _A,N-2 ,E ₀ ,A _N-1 ) For a unique Identification (ID) _A,N-2 ,E ₀ ,A _N-1 )，ID _A,N-2 For adopting and obtaining ID _A,N-1 The same method obtains the number of the triplet.

7. The method of claim 1, wherein in S400, the obtaining the component entity ZB corresponding to the object of the target text according to the IDBB comprises: ZB = ZB if the entities in B are all in TBB ₁ ，ZB ₁ A first component entity corresponding to an object of the target text; ZB when QB =1 ₁ Is IDBB ₁ (ii) a When QB is greater than or equal to 2, ZB ₁ Is IDSB _1,QB ；IDSB _1,QB The obtaining method comprises the following steps:

s410, obtaining the triple (IDBB) ₁ ,E ₁ ,IDBB ₂ ) Number of (IDSB) _1,2 The number being used for a unique Identification (IDBB) ₁ ,E ₁ ,IDBB ₂ )，E ₁ A second predetermined relationship; IDSB if QB =2, _1,QB is IDSB _1,2 (ii) a If QB>2, entering S420;

s420, obtaining triple (IDSB) _1,QB-1 ,E ₁ ,IDBB _QB ) Number of (IDSB) _1,QB The number being used for a unique Identification (IDSB) _1,QB-1 ,E ₁ ,IDBB _QB )；IDSB _1,QB-1 Is (IDSB) _1,QB-2 ,E ₁ ,IDBB _QB-1 ) Number for unique Identification (IDSB) _1,QB-2 ,E ₁ ,IDBB _QB-1 )，IDSB _1,QB-2 For adopting and obtaining IDSB _1,QB-1 The same method obtains the number of the triplet.

8. The method of claim 7, wherein the second predetermined relationship is used to indicate that the two corresponding entities are in a common object relationship.

9. The method of claim 1, wherein S500 further comprises: when TBB = ∅ and M ≧ 2, X ₂ =ID _B,M ，ID _B,M The obtaining method comprises the following steps:

s511, obtaining the triple (B) ₁ ,E ₁ ,B ₂ ) ID of _B,2 The number being used for a unique identification (B) ₁ ,E ₁ ,B ₂ ) If M =2,ID _B,M Is ID _B,2 (ii) a If M is>2, entering S521;

s521, acquiring the triple (ID) _B,M-1 ,E ₁ ,B _M ) ID of _B,M The number being used for a unique Identification (ID) _B,M-1 ,E ₁ ,B _M )；ID _B,M-1 Is (ID) _B,M-2 ,E ₁ ,B _M-1 ) For a unique Identification (ID) _B,M-2 ,E ₁ ,B _M-1 )，ID _B,M-2 For adopting and obtaining ID _B,M-1 The same method obtains the number of the triplet.