Specific implementation mode
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with attached drawing, it is right
Technical solution in the embodiment of the present application is clearly and completely described.
Structured message is the information that the database that we usually contact is managed, including production, business, transaction, client
The record of information etc..Unstructured information, technical term are content, and the information covered is more extensive, can be divided into:Operation
Content:Such as contract, invoice, letter and purchase records;Department's content:Such as document processing, electrical form, briefing file and electronics postal
Part;Web content:The information of such as HTML and XML formats;Multimedia content:Such as sound, film, figure.
The magnanimity information occurred on Internet is probably divided into structuring, semi-structured and three kinds unstructured.Certainly Chinese
Text message is also in this way, the position of structured message such as electronic commerce information, the property of information and the appearance of magnitude is fixed
's;Subdivision channel in semi-structured information such as professional website, the suitable specification of grammer of title and text, the model of keyword
Enclose suitable limitation;Non-structured information such as BLOG and BBS, all the elements are all unpredictable.
Since current most of enterprise's association identification technology generally more depends on the Chinese of standardization and structuring
Text message, and more accurate recognition methods is lacked for non-structured Chinese text information, so, the application is implemented
Example provides a kind of abstracting method of Chinese entity associated relationship, and referring specifically to Fig. 1, this method includes:
Step 100, the relative in text is extracted;Each section, there are in the Chinese text of Relation extraction value, is necessarily deposited
In relative, then before extracting text entities incidence relation, the relative in text is just first extracted, to determine text
Present in relationship.In general, relative can be noun, can also be verb.
Chinese text information is a series of information with certain meaning of one's words got up by word combinations, for meaning of one's words complexity
Chinese text information, it is desirable to extract the entity therein with incidence relation, it is necessary first to first determine text in there are which
A or which relationship, to realize the purpose accurately extracted.
Optionally, after the relative in extracting text, also to judge that the relative whether there is and be closed in predefined
It is in library.There are the relative of magnanimity in predefined relationship library, these relatives are all from processed a large amount of text envelope
It is obtained in breath, wherein can also include corresponding with the relevant collaboration word of relative, some relatival attributes and relative
Relationship etc., each relative has specific attribute and specific relationship, and has specific provider location relationship.
Predefined relationship library can provide certain reference to extract the relative in Chinese text, if the relative being drawn into exists
In predefined relationship library, then the attribute of some and relative that can be directly from relationship library corresponding to call relation word
Other parameters, also avoid the process for re-establishing relative attribute and parameter in this way so that the entire entity associated that extracts is closed
Process early period of system is quicker, in addition, due to having first relative property and parameter as a comparison, posterior relative is taken out
It takes and the acquisition of relative property can be more accurate.
Further, relatival attribute includes relatival meaning, relatival part of speech and relatival nexus nature
Etc., and according to relatival meaning, relatival part of speech and relatival nexus nature etc. can further obtain with
The specific position of the relevant entity of relative, these are all stored in advance in predefined relationship library, when to extract relative
It can quickly use.
If the relative is present in predefined relationship library, it is determined that the relatival nexus nature.It is general next
It says, the meaning of one's words relationship in Chinese text depends primarily on relatival nexus nature, so the relative in extracting text
Later, relatival nexus nature is also further judged, according to nexus nature, to be further processed to text.
If not finding the relative in predefined relationship library, illustrate, not predefined before the relative
Relationship stores in library, is also searched in predefined relationship library with the relevant other information of the relative less than at this moment can then selecting
It selects and abandons to the relatival further operating, that is, judging that the relative is invalid;Alternatively, establishing and being somebody's turn to do in predefined relationship library
The relevant information of relative, including:Pass corresponding with the relevant collaboration word of the relative, the relatival attribute and the relative
System etc. carries out next step operation, since this is relatival about this after establishing range of information, then to the relative
Information it has been established that so, if extract entity associated relationship next time, then encounter the relative, then can rapidly from
The relatival relevant information is extracted in predefined relationship library, therefore, the process that relationship word information is established can be enriched predefined
Relationship library keeps its content more comprehensive.
Step 101, if the relatival quantity extracted is more than 1, each relatival nexus nature is determined.If
The relative quantity extracted is more than 1, illustrates the entity associated relationship more than one in this section of text, for this relative number
The case where amount is more than 1, it will be clear that each relatival nexus nature, so as to hereafter in the text according to relatival relationship
Property extracts entity associated relationship.
In addition, after extracting relative, all relatives in text can also be stored together production Methods word
Set, a relationship set of words correspond to one section of text, and the relationship word order in relationship set of words goes out in the text with relative
Existing sequence consensus, in addition to this, relationship set of words also record with relative it is relevant cooperate with word, relatival nexus nature and
Cooperate with the nexus nature of word.
It, can be according to relatival sequence in relationship set of words and relatival relational after production Methods set of words
Matter, extracts the corresponding target agent entity of each relative and target by fact object successively from text, and this mode can be with
Not only accurate that entity is extracted in an orderly manner again, for the Chinese text of relationship complexity, also save many entities extractions
Time, improve efficiency.
Step 102, according to each relatival nexus nature, it is corresponding to extract each relative successively from text
Target agent entity and target are by fact object.Relatival nexus nature is usually divided into verb active relationship, noun forward direction is closed
System, the passive relationship of verb and noun inverse relationship etc..Relative in verb active relationship is typically the verb of an active, example
Such as, " purchase ", " merger " and " spending more money on " etc.;Relative in noun positive relationship is typically a positive noun, for example,
" controlling shareholder " and " investor " etc.;Relative in the passive relationship of verb is usually made of two parts, and a part is collaboration word,
Another part is relative main body, and collaboration word indicates passive relationship, and relative main body is still a verb, for example, " quilt ...
Purchase " and " by ... annex " etc., here, " quilt " and " by " is all to cooperate with word, indicates passive relationship, and " purchase " and " merger "
For relatival main body;Relative in noun inverse relationship is also divided into collaboration word and relative main body two parts, cooperates with word
Indicating inverse relationship, relative main body is a noun, for example, " as ... controlling shareholder " and " becoming ... holding people "
Deng, wherein " as " and " becoming " is collaboration word, indicates the inverse relationship of noun, " controlling shareholder " and " holding people " is relationship
Word main body.
Generally there are one agent entities and one for each relative by fact object, and agent entity is to constitute entity associated
The masters of relationship are the passive side for constituting entity associated relationship by fact object, i.e., agent entity is relatival subject, and by
Fact object is relatival object.In the text of complex relationship, due to relative have it is multiple, then each relatival agent
Other relatival agent entities near entity and determination by fact object and the relative and there is relationship by fact object, needed
The target agent entity and target of relationship by objective (RBO) word are determined according to the relatival agent entity of other in text and by fact object
By fact object.
It is worth noting that the corresponding agent entity of each relative and be usually fixed by the position of fact object
, specific position changes according to relatival nexus nature.In verb active relationship, agent be physically located at relative it
Before, word denoting the receiver of an action is physically located at after relative, for example A purchases B, and A here is exactly agent entity, and B is exactly by fact object.Dynamic
In the passive relationship of word, word denoting the receiver of an action is physically located at before collaboration word, and agent entity is then located between collaboration word and relative main body, such as
B is purchased by A, and B here is exactly by fact object, and A is agent entity.In noun positive relationship, with verb active relationship, agent
It being physically located at before relative, word denoting the receiver of an action is physically located at after relative, for example the purchase people of A is B, and A is exactly agent entity here,
B is by fact object.In noun inverse relationship, with the passive relationship of verb, word denoting the receiver of an action is physically located at before collaboration word, agent entity position
Between collaboration word and relative main body, such as purchase people of the B as A, B here is by fact object, and A is agent entity.
Step 103, Chinese is generated by fact object according to relative and the corresponding target agent entity of relative and target
Entity associated relationship.
It is worth noting that in the technical solution of the application, when the relatival quantity extracted is more than 1, determine every
One relatival nexus nature extracts each pass successively then according to each relatival nexus nature from text
The corresponding target agent entity of copula and target are corresponded to by fact object and generate Chinese entity associated relationship.But when one section of text
When relative quantity in this is only one, the technical solution of the application stands good, for the text of relationship complexity,
Only a kind of process of relatival text of processing is with regard to fairly simple, without considering other relatival nexus natures and related reality
The position of body only carries out judging to extract with entity to the relative itself.For example, to text, " Wanda's sport is purchased
IRONMAN series competitions " carry out the extraction of text entities incidence relation, can first extract relative " purchase ", then judge
The relative is verb active relationship, further according in verb active relationship, the position of target agent entity and target by fact object
Relationship extracts target agent entity " Wanda's sport " and target by fact object " IRONMAN series competitions ", in finally regenerating
Literary entity associated relationship " Wanda's sport->Purchase->IRONMAN series competitions ".
The abstracting method of Chinese entity associated relationship provided by the embodiments of the present application, according to relatival pass in Chinese text
It is property, extracts in text with the relevant target agent entity of the relative and target by fact object, further according to relative and pass
The corresponding target agent entity of copula and target are generated the corresponding Chinese entity associated of the relative in text and closed by fact object
System.Unstructured Chinese text is divided into according to different nexus natures different by technical solution provided by the embodiments of the present application
Words and expressions further reduces each position range of relatival target agent entity and target where by fact object, so as to
Search precision and search speed are improved, operand is reduced.In addition, the technical solution in the embodiment of the present application, also uses Chinese
Division rule on syntactic level largely filters out the fault relationships word and false entries of some redundancies, improves extraction
Relative and the accuracy rate for extracting entity.
In the preferred embodiment of the application, by taking verb active relationship as an example, step 102 is explained further, such as Fig. 2
Shown, step 102 can specifically include:
Step 201, if relatival nexus nature is verb active relationship, in the text find be located at relative it
It is preceding and be located in relatival first object relative and text distance relation word after relative it is farthest second
Relationship by objective (RBO) word.
With text, " the IRONMAN series under world's iron man's house flag have been purchased in last year, Wanda's sport under Wanda
For thing ", in the text exist three relatives, be respectively " under ", " purchase " and " under ", in the preferred embodiment I
Study verb active relationship, so after the nexus nature to three relationships judges, determine " under " be noun master
Dynamic relationship, and " purchase " is verb active relationship.Further according to described in step 201, due to before " purchase " near " purchase "
Relative be " under ", thus first object relative be " under ";Due to after " purchase " and distance " purchase " most
Remote relative be also " under ", thus the second relationship by objective (RBO) word also be " under ".
Step 202, first object relatival first is extracted in the text by the of fact object and the second relationship by objective (RBO) word
Two by fact object.
Due to first object relative " under " be noun positive relationship, so " under " the first word denoting the receiver of an action be physically located at
" under " after, before " purchase ", and the first agent be physically located at " under " before, after determining entity position, so
One agent is physically located in " last year, Wanda group " this section of text, further Entity recognition, it may be determined that " Wanda group " is
" under " the first agent entity, and the first word denoting the receiver of an action is physically located in " Wanda's sport " this section of text, after identification, it may be determined that
" Wanda's sport " be " under " first by fact object.
Second relationship by objective (RBO) word be " under ", so " under " the second agent be physically located at " purchase " and " under " between
" iron man company of the world " text in, by Entity recognition, it may be determined that the second agent entity is " iron man company of the world ",
Second word denoting the receiver of an action be physically located at " under " after " IRONMAN series competitions " text in, the second word denoting the receiver of an action can be determined after identification
Entity is " IRONMAN series competitions ".
Step 203, using first by fact object as relatival target agent entity and second by fact object as close
The target of copula is by fact object.
So after above-mentioned steps 201 and step 202, the target agent entity of " purchase " is " Wanda's sport ", and
The target of " purchase " is " IRONMAN series competitions " by fact object.
And then according to step 103, according to relative " purchase " and " purchase " corresponding target agent entity " Wanda
By fact object " IRONMAN series competitions ", it is " Wanda's sport-to generate Chinese entity associated relationship for sport " and target>Purchase->
IRONMAN series competitions ".
Optionally, in the above content it is found that using first by fact object as relatival target agent entity, Yi Ji
Two are included by the detailed process of fact object as relatival target by fact object:Respectively to first by fact object and the second word denoting the receiver of an action
Entity carries out Entity recognition;Using first after Entity recognition by fact object as relatival target agent entity and entity
After identification second by fact object as relatival target by fact object.In fact, the step of Entity recognition, is in step 202
The synchronous requirement carried out or all meet the embodiment of the present application in step 203, can realize and identify segment Chinese text
The purpose of middle entity.Due to extract first by fact object and second by fact object process inherently determine provider location mistake
Journey can only actually determine the range where entity, and the definite of entity and entity could be really determined after Entity recognition
Position, so this process of Entity recognition can increase the accuracy of entire entity associated Relation extraction process.
In addition, in above-mentioned steps 202, if not finding relative before " purchase " or after " purchase ", say
Bright first object relative or the second relationship by objective (RBO) word are not present, at this time, it may be necessary near " purchase " before finding " purchase "
Entity as target agent entity, or find " purchase " later the farthest entity of distance " purchase " as target by the fact
Body.For example, in the text of " the IRONMAN series competitions of iron man company of the world are purchased in Wanda's sport of Wanda group ", " receive
Purchase " is front and back without other relatives, so the entity " Wanda's sport " before finding " purchase " near " purchase " is used as mesh
Agent entity is marked, finding " purchase ", the farthest entity " IRONMAN series competitions " of distance " purchase " is used as target by the fact later
Body.
In second preferred embodiment of the application, by taking noun positive relationship as an example, step 102 is explained further, such as
Shown in Fig. 3, step 102 can specifically include:
Step 301, if relatival nexus nature is noun positive relationship, in the text find be located at relative it
It is preceding and to be located at relative described in distance after relative in relatival first object relative and text farthest
Second relationship by objective (RBO) word.
By taking text " the controlling shareholder C of the subsidiary B of A purchases D " as an example, it is with noun positive relationship word " controlling shareholder "
, the first object relative in text before " controlling shareholder " near " controlling shareholder " is " subsidiary ", in " holding stock
The second farthest relationship by objective (RBO) word of distance " controlling shareholder " is " purchase " after east ".
Step 302, first object relatival first is extracted in the text by the of fact object and the second relationship by objective (RBO) word
Two by fact object.
The first agent entity of first object relative " subsidiary " is " A " in text, and first by fact object is " B ", the
Second agent entity of two relationship by objective (RBO) words be " C ", second by fact object be " D ".
Step 303, using first by fact object as relatival target agent entity and second by fact object as close
The target of copula is by fact object.Then the target agent entity of " controlling shareholder " is " A ", and target is " D " by fact object.
Further according to step 103, according to relative " controlling shareholder " and " controlling shareholder " corresponding target agent entity " A "
With target by fact object " D ", it is " A- to generate Chinese entity associated relationship>Controlling shareholder->D”.
Optionally, in the above content it is found that using first by fact object as relatival target agent entity, Yi Ji
Two are included by the detailed process of fact object as relatival target by fact object:Respectively to first by fact object and the second word denoting the receiver of an action
Entity carries out Entity recognition;Using first after Entity recognition by fact object as relatival target agent entity and entity
After identification second by fact object as relatival target by fact object.In fact, the step of Entity recognition, is in step 302
The synchronous requirement carried out or all meet the embodiment of the present application in step 303, can realize and identify segment Chinese text
The purpose of middle entity.Due to extract first by fact object and second by fact object process inherently determine provider location mistake
Journey can only actually determine the range where entity, and the definite of entity and entity could be really determined after Entity recognition
Position, so this process of Entity recognition can increase the accuracy of entire entity associated Relation extraction process.
In addition, being needed in the text if not finding other relatives before or after relative " controlling shareholder "
Find " controlling shareholder " before near " controlling shareholder " entity be used as target agent entity, or searching " controlling shareholder " it
Afterwards distance " controlling shareholder " farthest entity as target by fact object.
In the third preferred embodiment of the application, by taking the passive relationship of verb as an example, step 102 is explained further, such as
Shown in Fig. 4, step 102 can specifically include:
Step 401, if relatival nexus nature is the passive relationship of verb, relative is decomposed into collaboration word and closed
Copula main body.
Using text, " controlling shareholder of U.S. TV Programs manufacturing company Dick as company A is furnished funds for by the subsidiary B of Wanda group
1000000000 dollars (about 7,800,000,000 Hongkong dollar) is purchased " for, there are the relative of the passive relationship of verb " quilt ... purchase " in the text,
In " quilt " be collaboration word, " purchase " be relative main body.
Step 402, it finds in text before being located at collaboration word and near the first object relative of collaboration word, and text
It is located at before relative main body and near the second relationship by objective (RBO) word of relative main body in this.
Near the first object relative of collaboration word " as ... holding stock before finding collaboration word " quilt " in the text
The second relationship by objective (RBO) word " subsidiary " near " purchase " between collaboration word " quilt " and relative main body " purchase " is found in east ".
Step 403, first object relatival first is extracted in the text by the of fact object and the second relationship by objective (RBO) word
Two by fact object.
First object relative is noun inverse relationship " as ... controlling shareholder ", at this point, this relatival first by
Fact object is located in the text " U.S. TV Programs manufacturing company Dick " before " as ", passes through Entity recognition process, it may be determined that " beautiful
Television production company of state Dick " is first object relatival first by fact object.Second relationship by objective (RBO) word " subsidiary " is run after fame
Word positive relationship, at this time relatival second word denoting the receiver of an action be physically located at after " subsidiary " " B furnishes funds for 1,000,000,000 dollars (about 7,800,000,000
Hongkong dollar) " text in, after Entity recognition, second by fact object be " B ".
Step 404, using first by fact object as relatival target by fact object and second by fact object as close
The target agent entity of copula.
By after step 403, the first of acquisition by fact object is " U.S. TV Programs manufacturing company Dick ", and second by the fact
Body is " B ", so, the target of relative " quilt ... purchase " is " U.S. TV Programs manufacturing company Dick " by fact object, and target is applied
Fact object is " B ".
Further according to step 103, entity associated relationship " B- can be generated>Purchase->U.S. TV Programs manufacturing company Dick ".
Optionally, in the above content it is found that using first by fact object as relatival target by fact object, Yi Ji
Two detailed processes by fact object as relatival target agent entity, including:Respectively to first by fact object and second by
Fact object carries out Entity recognition;Using first after Entity recognition by fact object as relatival target by fact object, Yi Jishi
Body identification after second by fact object as relatival target agent entity.In fact, the step of Entity recognition, is in step 403
The middle synchronous requirement carried out or all meet the embodiment of the present application in step 404 can be realized and identify segment Chinese text
The purpose of entity in this.Due to extracting first provider location is inherently determined by the process of fact object by fact object and second
Process can only actually determine the range where entity, and really entity and entity could be determined really after Entity recognition
Position is cut, so this process of Entity recognition can increase the accuracy of entire entity associated Relation extraction process.
In addition, if the text including relative " quilt ... purchase " is that " U.S. TV Programs manufacturing company Dick is collected by Wanda
Group furnishes funds for 1,000,000,000 dollars (about 7,800,000,000 Hongkong dollar) purchase ", then other relatives are just not present before collaboration word " quilt ", need at this time
Identify that the entity " U.S. TV Programs manufacturing company Dick " before " quilt " in text near " quilt " is used as target by fact object;Together
Other relatives are also not present between collaboration word " quilt " and relative main body " purchase ", then identify " quilt " and " purchase " for reason
Between near " purchase " entity " Wanda group " be used as target agent entity.So the entity associated relationship ultimately generated
For " Wanda group->Purchase->U.S. TV Programs manufacturing company Dick ".
In the 4th preferred embodiment of the application, by taking noun inverse relationship as an example, step 102 is explained further, such as
Shown in Fig. 5, step 102 can specifically include:
Step 501, if relatival nexus nature is noun inverse relationship, relative is decomposed into collaboration word and closed
Copula main body.
Using text, " subsidiary's second of practical Heat & Control Pty Ltd.'s first company of Gansu Power Company as State Grid Corporation of China is public
For the wholly-owned subsidiary of department ", wherein " as ... wholly-owned subsidiary " is the relative of noun inverse relationship, and " as " is
It is relative main body to cooperate with word, " wholly-owned subsidiary ".
Step 502, it finds in text before being located at collaboration word and near the first object relative of collaboration word, and text
It is located at before relative main body and near the second relationship by objective (RBO) word of relative main body in this.
The first object relative near " as " before finding " as " in the text is " practical Heat & Control Pty Ltd. ",
Between " as " and " wholly-owned subsidiary ", the second relationship by objective (RBO) word near " wholly-owned subsidiary " is " subsidiary ".
Step 503, first object relatival first is extracted in the text by the of fact object and the second relationship by objective (RBO) word
Two by fact object.
First object relative " practical Heat & Control Pty Ltd. " is noun positive relationship, and first by fact object is " first company ".
First agent entity is " Gansu Power Company ";Second relationship by objective (RBO) word " subsidiary " be noun positive relationship, second by
Fact object is located among " company B " text, by Entity recognition, it may be determined that " company B " be " subsidiary " second by
Fact object, the second agent are physically located among " State Grid Corporation of China " text, can determine that " national grid is public after Entity recognition
Department " is the second agent entity.
Step 504, using first by fact object as relatival target by fact object and second by fact object as close
The target agent entity of copula.
By after step 503, determine first by fact object be " first company " as relative " as ... wholly-owned son is public
The target of department " by fact object, second be " company B " as relative " as ... wholly-owned subsidiary " by fact object target
Agent entity, so being " company B-according to the entity associated relationship that step 103 generates>Wholly-owned subsidiary->First company ".
Optionally, in the above content it is found that using first by fact object as relatival target by fact object, Yi Ji
Two detailed processes by fact object as relatival target agent entity, including:Respectively to first by fact object and second by
Fact object carries out Entity recognition;Using first after Entity recognition by fact object as relatival target by fact object, Yi Jishi
Body identification after second by fact object as relatival target agent entity.In fact, the step of Entity recognition, is in step 503
The middle synchronous requirement carried out or all meet the embodiment of the present application in step 504 can be realized and identify segment Chinese text
The purpose of entity in this.Due to extracting first provider location is inherently determined by the process of fact object by fact object and second
Process can only actually determine the range where entity, and really entity and entity could be determined really after Entity recognition
Position is cut, so this process of Entity recognition can increase the accuracy of entire entity associated Relation extraction process.
In addition, if when text is " wholly-owned subsidiary of the Gansu Power Company as State Grid Corporation of China ", in text
Other relatives are not present before the collaboration word " as " of noun inverse relationship word " as ... wholly-owned subsidiary ", then identify
Entity " Gansu Power Company " before " as " near " as " is as relative " as ... wholly-owned subsidiary "
Other relationships are also not present by fact object, then due to cooperateing in target between word " as " and relative main body " wholly-owned subsidiary "
Word, so identifying that the entity " State Grid Corporation of China " between collaboration word and relative main body near relative main body is used as pass
Copula " as ... wholly-owned subsidiary " target agent entity.The entity associated relationship ultimately generated is " State Grid Corporation of China->
Wholly-owned subsidiary->Gansu Power Company ".
In above preferred embodiment, said respectively to how the relative of different nexus natures carries out entity relation extraction
It is bright, have for multiple relatival complicated Chinese texts for above-mentioned, needs to carry out entity pass to each relative respectively
The extraction of connection relationship, then the corresponding entity associated relationship of all relatives constitute all entity in this section of Chinese text and close
Connection relationship.
" series of the IRONMAN under world's iron man's house flag has been purchased in last year, Wanda's sport under Wanda for example, text
In race ", there are three relatives " under ", " purchase " and " under ", and three relatival nexus natures are name respectively
Word positive relationship, verb active relationship and noun positive relationship carry out entity to these three relatives respectively according to nexus nature
The extraction and generation of incidence relation can obtain three entity associated relationships, be respectively:" Wanda group->Under->Wanda's body
Educate ", " Wanda's sport->Purchase->IRONMAN series competitions " and " iron man company of the world->Under->IRONMAN series competitions ".
" U.S. TV Programs manufacturing company Dick furnishes funds for 10 to text as the controlling shareholder of company A by the subsidiary B of Wanda group
Hundred million dollars (about 7,800,000,000 Hongkong dollar) is purchased " in, exist " as ... controlling shareholder ", " quilt ... purchase " and " subsidiary " three are closed
Copula, and three relatival nexus natures are noun inverse relationship, the passive relationship of verb and noun positive relationship, root respectively
The extraction and generation that respectively these three relatives are carried out with entity associated relationship according to nexus nature, can obtain three entity associateds
Relationship is respectively:" company A->Controlling shareholder->U.S. TV Programs manufacturing company Dick ", " B->Purchase->U.S. TV Programs make public
Take charge of Dick " and " Wanda group->Subsidiary->B”.
Text " subsidiary company B of the practical Heat & Control Pty Ltd.'s first company of Gansu Power Company as State Grid Corporation of China
Wholly-owned subsidiary " in, there are " practical Heat & Control Pty Ltd. ", " as ... wholly-owned subsidiary " and " subsidiary " three relatives,
And three relatival nexus natures are noun positive relationship, noun inverse relationship and noun positive relationship respectively, according to pass
It is the extraction and generation that property respectively carries out these three relatives entity associated relationship, three entity associateds can be obtained and closed
It is, is respectively:" Gansu Power Company->Practical Heat & Control Pty Ltd.->First company ", " company B->Wholly-owned subsidiary->First company "
" State Grid Corporation of China->Subsidiary->Company B ".
By above technical scheme it is found that the abstracting method of Chinese entity associated relationship provided by the embodiments of the present application, according to
Relatival nexus nature in Chinese text extracts in text with the relevant target agent entity of the relative and target by the fact
Body generates the relative pair in text further according to relative and the corresponding target agent entity of relative and target by fact object
The Chinese entity associated relationship answered.Technical solution provided by the embodiments of the present application, by unstructured Chinese text according to different passes
It is that property is divided into different words and expressions, further reduces each relatival target agent entity and target by fact object institute
Position range reduce operand to improve search precision and search speed.In addition, the technology in the embodiment of the present application
Scheme, also uses the division rule on Chinese syntactic level, largely filters out the fault relationships word and mistake of some redundancies
Accidentally entity improves the accuracy rate for extracting relative and extracting entity.
Referring to Fig. 6, the embodiment of the present application also provides a kind of draw-out device of Chinese entity associated relationship, including:
Relative abstraction module 601, for extracting the relative in text;
Property determining module 602 determines that each is relatival if the relatival quantity for extracting is more than 1
Nexus nature;
Target entity abstraction module 603, for according to each relatival nexus nature, being extracted successively from text every
The corresponding target agent entity of one relative and target are by fact object;
Incidence relation generation module 604, for according to relative and the corresponding target agent entity of relative and target
By fact object, Chinese entity associated relationship is generated.
Optionally, target entity abstraction module 603 further includes:Verb active relationship entity abstraction module, is used for,
If relatival nexus nature is verb active relationship, in the text find be located at relative before and most
Positioned at the second farthest relationship by objective (RBO) of distance relation word after relative in relatival first object relative and text
Word;
First object relatival first is extracted in the text by the second of fact object and the second relationship by objective (RBO) word by the fact
Body;
Using first by fact object as relatival target agent entity and second by fact object as relatival mesh
Mark is by fact object.
Optionally, target entity abstraction module 603 further includes:Noun positive relationship entity abstraction module, is used for,
If relatival nexus nature is noun positive relationship, in the text find be located at relative before and most
Positioned at the second farthest relationship by objective (RBO) of distance relation word after relative in relatival first object relative and text
Word;
First object relatival first is extracted in the text by the second of fact object and the second relationship by objective (RBO) word by the fact
Body;
Using first by fact object as relatival target agent entity and second by fact object as relatival mesh
Mark is by fact object.
Optionally, target entity abstraction module 603 further includes:The passive relationship entity abstraction module of verb, is used for,
If relatival nexus nature is the passive relationship of verb, relative is decomposed into collaboration word and relative master
Body;
It finds before being located at collaboration word in text and is located in the first object relative and text of collaboration word
Before relative main body and near the second relationship by objective (RBO) word of relative main body;
First object relatival first is extracted in the text by the second of fact object and the second relationship by objective (RBO) word by the fact
Body;
Using first by fact object as relatival target by fact object and second by fact object as relatival mesh
Mark agent entity.
Optionally, target entity abstraction module 603 further includes:Noun inverse relationship entity abstraction module, is used for,
If relatival nexus nature is noun inverse relationship, relative is decomposed into collaboration word and relative master
Body;
It finds before being located at collaboration word in text and is located in the first object relative and text of collaboration word
Before relative main body and near the second relationship by objective (RBO) word of relative main body;
First object relatival first is extracted in the text by the second of fact object and the second relationship by objective (RBO) word by the fact
Body;
Using first by fact object as relatival target by fact object and second by fact object as relatival mesh
Mark agent entity.
Optionally, described device further includes:
Relative judgment module, is used for, and judges that relative whether there is in predefined relationship library;
If relative is present in predefined relationship library, it is determined that relatival nexus nature.
Optionally, verb active relationship entity abstraction module or noun positive relationship entity abstraction module include:
First instance identification module, for being carried out Entity recognition by fact object by fact object and second to first respectively;
Using first after Entity recognition by fact object as after relatival target agent entity and Entity recognition
Two by fact object as relatival target by fact object.
Optionally, the passive relationship entity abstraction module of verb or noun inverse relationship entity abstraction module include:
Second instance identification module, for being carried out Entity recognition by fact object by fact object and second to first respectively;
Using first after Entity recognition by fact object as relatival target after by fact object and Entity recognition
Two by fact object as relatival target agent entity.
Referring to Fig. 7, the embodiment of the present application also provides a kind of extraction system of Chinese entity associated relationship, the system comprises
Memory 701 and processor 702;
Memory 701 is used to store the executable program of processor 702;
Processor 702 is configured as:
Extract the relative in text;
If the relatival quantity extracted is more than 1, each relatival nexus nature is determined;
According to each relatival nexus nature, the corresponding target agent of each relative is extracted successively from text
Entity and target are by fact object;
According to relative and the corresponding target agent entity of relative and target by fact object, Chinese entity associated is generated
Relationship.
By above technical scheme it is found that abstracting method, the device of Chinese entity associated relationship provided by the embodiments of the present application
And system, according to relatival nexus nature in Chinese text, extract in text with the relevant target agent entity of the relative
With target text is generated further according to relative and the corresponding target agent entity of relative and target by fact object by fact object
In the corresponding Chinese entity associated relationship of the relative.Technical solution provided by the embodiments of the present application, by unstructured Chinese text
This is divided into different words and expressions according to different nexus natures, further reduces each relatival target agent entity and mesh
Position range of the mark where by fact object reduces operand to improve search precision and search speed.In addition, the application is real
The technical solution in example is applied, the division rule on Chinese syntactic level is also used, largely filters out the mistake of some redundancies
Accidentally relative and false entries improve the accuracy rate for extracting relative and extracting entity.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as:Personal computer, service
Device computer, handheld device or portable device, laptop device, multicomputer system, microprocessor-based system, top set
Box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer including any of the above system or equipment
Distributed computing environment etc..
The application can describe in the general context of computer-executable instructions executed by a computer, such as program
Module.Usually, program module includes routines performing specific tasks or implementing specific abstract data types, program, object, group
Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these distributed computing environments, by
Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with
In the local and remote computer storage media including storage device.
It should be noted that herein, the relational terms of such as " first " and " second " or the like are used merely to one
A entity or operation with another entity or operate distinguish, without necessarily requiring or implying these entities or operation it
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to
Cover non-exclusive inclusion, so that the process, method, article or equipment including a series of elements includes not only those
Element, but also include other elements that are not explicitly listed, or further include for this process, method, article or setting
Standby intrinsic element.
Those skilled in the art will readily occur to its of the application after considering specification and putting into practice application disclosed herein
Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or
Person's adaptive change follows the general principle of the application and includes the undocumented common knowledge in the art of the application
Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by following
Claim is pointed out.
It should be understood that the application is not limited to the precision architecture for being described above and being shown in the accompanying drawings, and
And various modifications and changes may be made without departing from the scope thereof.Scope of the present application is only limited by the accompanying claims.