CN108959575B - A kind of enterprise's incidence relation information mining method and device - Google Patents

A kind of enterprise's incidence relation information mining method and device Download PDF

Info

Publication number
CN108959575B
CN108959575B CN201810735344.5A CN201810735344A CN108959575B CN 108959575 B CN108959575 B CN 108959575B CN 201810735344 A CN201810735344 A CN 201810735344A CN 108959575 B CN108959575 B CN 108959575B
Authority
CN
China
Prior art keywords
incidence relation
enterprise
participle
word
relation information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810735344.5A
Other languages
Chinese (zh)
Other versions
CN108959575A (en
Inventor
霍锦超
刘文博
杨丽娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dingfu Intelligent Technology Co., Ltd
Original Assignee
Beijing Shenzhou Taiyue Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shenzhou Taiyue Software Co Ltd filed Critical Beijing Shenzhou Taiyue Software Co Ltd
Priority to CN201810735344.5A priority Critical patent/CN108959575B/en
Publication of CN108959575A publication Critical patent/CN108959575A/en
Application granted granted Critical
Publication of CN108959575B publication Critical patent/CN108959575B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Abstract

This application provides a kind of enterprise's incidence relation information mining method and devices, obtain text to be detected;Deconsolidation process is carried out to the text to be detected, obtains at least one subordinate sentence;Segment simultaneously part-of-speech tagging to each subordinate sentence;Identify the incidence relation word in each subordinate sentence;Judge whether the incidence relation word is weave connection relative, if the incidence relation word is weave connection relative, first enterprise's incidence relation information is determined using cartesian product algorithm according to the participle part of speech where the incidence relation word in subordinate sentence.Therefore, the application searches enterprise's incidence relation information without staff in text to be detected, improves the efficiency of enterprise's incidence relation information excavating, also, be not necessarily to staff's subjective judgement, improves the accuracy of excavation.

Description

A kind of enterprise's incidence relation information mining method and device
Technical field
This application involves the field of data mining more particularly to a kind of enterprise's incidence relation information mining methods and device.
Background technique
In recent years, with the fast development of internet, Internet of Things and the big technology of cloud computing three, news and carriage about enterprise Feelings information content increases rapidly.In the case where company information amount overload, in order to allow company manager to see clearly business opportunity, make more Reasonable decision, the comprehensive and accurate overview for grasping related fields enterprise of company manager are just particularly important.
The information such as general news report by manually searching relevant enterprise on network in the prior art, therefrom determine enterprise Industry incidence relation information, i.e. incidence relation between enterprise and enterprise and enterprise and person-to-person incidence relation.But mutually Information in networking is intricate, standard disunity, and artificial lookup is difficult quickly directly to extract from a large amount of information valuable Data information excavate enterprise's incidence relation information, cause to take a significant amount of time, efficiency is lower, and manually search easily by Worker's subjective impact, to cause the enterprise's incidence relation information inaccuracy excavated.
Summary of the invention
This application provides a kind of enterprise's incidence relation information mining method and devices, to solve the mistake of the information on internet Comprehensive complexity, standard disunity, artificial search are difficult quickly to extract valuable data information from a large amount of information directly to dig Enterprise's incidence relation information is dug, causes to take a significant amount of time, efficiency is lower, and manually searches easily by worker's subjectivity shadow It rings, to cause the problem of enterprise's incidence relation information inaccuracy excavated.
In a first aspect, the application provides a kind of enterprise's incidence relation information mining method, which comprises
Obtain text to be detected;
Deconsolidation process is carried out to the text to be detected, obtains at least one subordinate sentence;
Segment simultaneously part-of-speech tagging to each subordinate sentence;
Identify the incidence relation word in each subordinate sentence;
Judge whether the incidence relation word is weave connection relative, if the incidence relation word is weave connection pass When copula, then the first enterprise is determined using cartesian product algorithm according to the participle part of speech where the incidence relation word in subordinate sentence Incidence relation information.
Second aspect, the application provide a kind of enterprise's incidence relation information excavating device, and described device includes:
Module is obtained, for obtaining text to be detected;
It tears a module open, for carrying out deconsolidation process to the text to be detected, obtains at least one subordinate sentence;
Part-of-speech tagging module, for segment simultaneously part-of-speech tagging to each subordinate sentence;
First identification module, for identification the incidence relation word in each subordinate sentence;
First determining module judges whether the incidence relation word is weave connection relative, if the incidence relation When word is weave connection relative, then according to the participle part of speech where the incidence relation word in subordinate sentence, Descartes's integrating is utilized Method determines first enterprise's incidence relation information.
From the above technical scheme, it this application provides a kind of enterprise's incidence relation information mining method and device, obtains Take text to be detected;Deconsolidation process is carried out to the text to be detected, obtains at least one subordinate sentence;Each subordinate sentence is carried out Segment simultaneously part-of-speech tagging;Identify the incidence relation word in each subordinate sentence;Judge whether the incidence relation word is that tissue closes Join relative, if the incidence relation word is weave connection relative, according in the subordinate sentence of the incidence relation word place Participle part of speech determine first enterprise's incidence relation information using cartesian product algorithm.Therefore, the application is not necessarily to staff Enterprise's incidence relation information is searched in text to be detected, improves the efficiency of enterprise's incidence relation information excavating, also, be not necessarily to work Make personnel's subjective judgement, improves the accuracy of excavation.
Detailed description of the invention
In order to illustrate more clearly of the technical solution of the application, letter will be made to attached drawing needed in the embodiment below Singly introduce, it should be apparent that, for those of ordinary skills, without any creative labor, It is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of method flow diagram for enterprise's incidence relation information mining method that one embodiment of the application provides;
Fig. 2 is a kind of method flow diagram for enterprise's incidence relation information mining method that another embodiment of the application provides;
Fig. 3 is the method flow diagram after Fig. 2 step 214;
Fig. 4 is the method flow diagram after Fig. 3 step 307;
Fig. 5 is the method flow diagram after Fig. 4 step 411;
Fig. 6 is a kind of structural schematic diagram for enterprise's incidence relation information excavating device that one embodiment of the application provides;
Fig. 7 is the structural schematic diagram of the first determination unit;
Fig. 8 is the structural schematic diagram of screening unit;
Fig. 9 is a kind of structural schematic diagram for enterprise's incidence relation information excavating device that another embodiment of the application provides;
Figure 10 is a kind of structural schematic diagram for enterprise's incidence relation information excavating device that the another embodiment of the application provides;
Figure 11 is histography schematic diagram.
Specific embodiment
Referring to Fig. 1, in a first aspect, one embodiment of the application provides a kind of enterprise's incidence relation information mining method, it is described Method includes the following steps:
Step 101: obtaining text to be detected.
Text to be detected can be obtained from networks such as news websites or be sent from technical staff's operating terminal to server to be checked Text is surveyed, the data of industrial and commercial bureau's acquisition can also be visited by staff, the embodiment of the present invention is without limitation.
Step 102: deconsolidation process being carried out to the text to be detected, obtains at least one subordinate sentence.
Text to be detected tear open mode can from the starting position of text to be detected, search including default punctuate Character between two default punctuation marks is determined as a subordinate sentence, obtains at least one subordinate sentence by symbol, wherein pre- bidding Point symbol can be separator, fullstop, comma, exclamation mark and branch between sentence and sentence etc..
Step 103: segment simultaneously part-of-speech tagging to each subordinate sentence.
In the present embodiment, NLP (Natural Language Processing, natural language processing) system can be used Word segmentation processing is carried out to each subordinate sentence, while marking out the part of speech of each participle, it then can be by the word segmentation processing of each subordinate sentence Obtained word arranges from front to back according to the original word order of subordinate sentence.
Step 104: the incidence relation word in each subordinate sentence of identification.
Staff can establish data model according to actual excavation demand, which includes the type of incidence relation word, closes Join the incidence relation word and the corresponding multiple extension expression formulas of incidence relation word of relatival type subordinate, wherein extension expression Formula can be regular expression.By incidence relation word in data model and corresponding multiple extension expression formulas successively to each Participle is matched, to identify the incidence relation word in subordinate sentence.
Step 105: judging whether the incidence relation word is weave connection relative, if the incidence relation word is group When knitting incidence relation word, 106 are thened follow the steps.
Step 106: being determined according to the participle part of speech where the incidence relation word in subordinate sentence using cartesian product algorithm First enterprise's incidence relation information.
The type of incidence relation word includes a variety of, such as membership credentials type, investment relation type, membership credentials type packet The words such as president, general manager are included, investment relation type includes the words such as investment, financing.
From the above technical scheme, this application provides a kind of enterprise's incidence relation method for digging, text to be detected is obtained This;Deconsolidation process is carried out to the text to be detected, obtains at least one subordinate sentence;Segment simultaneously part of speech to each subordinate sentence Mark;Identify the incidence relation word in each subordinate sentence;Judge whether the incidence relation word is weave connection relative, such as When incidence relation word described in fruit is weave connection relative, then according to the participle part of speech in subordinate sentence where the incidence relation word, Using cartesian product algorithm, first enterprise's incidence relation information is determined.Therefore, the application is without staff in text to be detected Middle lookup enterprise incidence relation information improves the efficiency of enterprise's incidence relation information excavating, also, sentences without staff's subjectivity It is disconnected, improve the accuracy of excavation.
Referring to fig. 2, another embodiment of the application provides a kind of enterprise's incidence relation information mining method, the method includes Following steps:
Step 201: obtaining text to be detected.
Text to be detected can be obtained from networks such as news websites or be sent from technical staff's operating terminal to server to be checked Text is surveyed, the data of industrial and commercial bureau's acquisition can also be visited by staff, the embodiment of the present invention is without limitation.
After obtaining text to be detected, text to be detected is pre-processed using ETL, that is, is removed in text to be detected Messy code, advertisement and forbidden character, and by letter, bracket etc. carry out sameization processing, to facilitate subsequent information processing, and Improve the accuracy excavated.
ETL is the process of data pick-up, conversion, load, and ETL is by the data of operation system by extracting, cleaning conversion Be loaded into the process of data warehouse afterwards, it is therefore an objective to by enterprise dispersion, messy, the skimble-scamble Data Integration of standard is to together.
Step 202: deconsolidation process being carried out to the text to be detected, obtains at least one subordinate sentence.
Text to be detected tear open mode can from the starting position of text to be detected, search including default punctuate Character between two default punctuation marks is determined as a subordinate sentence, obtains at least one subordinate sentence by symbol, wherein pre- bidding Point symbol can be separator, fullstop, comma, exclamation mark and branch between sentence and sentence etc..
Step 203: segment simultaneously part-of-speech tagging to each subordinate sentence.
In the present embodiment, NLP (Natural Language Processing, natural language processing) technology can be used Word segmentation processing is carried out to each subordinate sentence, while marking out the part of speech of each participle, it then can be by the word segmentation processing of each subordinate sentence Obtained word arranges from front to back according to the original word order of subordinate sentence.
Step 204: the incidence relation word in each subordinate sentence of identification.
Staff can establish data model according to actual excavation demand, which includes the type of incidence relation word, closes Join the incidence relation word and the corresponding multiple extension expression formulas of incidence relation word of relatival type subordinate, wherein incidence relation The type of word includes a variety of, such as membership credentials type, investment relation type, membership credentials type include president, general manager Equal words, investment relation type include the words such as investment, financing;In addition, extension expression formula can be regular expression.Regular expressions Formula is made of some general characters and metacharacter, and general character includes the letter and number of capital and small letter, and it is special that metacharacter has Meaning, metacharacter include following 11 alphabetic characters: [] ︿ ﹩ ∣? * ().Metacharacter is used for specific use, for example, " " is used for Match any character other than line feed character " n " and " r ";"? " indicate just that character before it of matching 0 or 1, When character immediately any one other delimiter (* ,+,?, { n }, { n, }, { n, m }) back when, match pattern is non-greediness , the character string that as the few as possible matching of non-greediness mode is searched for, and the greedy mode defaulted then matching institute as much as possible The character string of search;" ∣ " indicates two matching conditions carrying out logical "or" operation.
Successively each participle is carried out by incidence relation word in data model and corresponding multiple extension expression formulas Matching, to identify the incidence relation word in subordinate sentence.The present embodiment does not limit for specific matching way.For example, in subordinate sentence " north The president of capital Divine Land Tai Yue software limited liability company is Wang Ning ", using NLP technological system to being obtained after its word segmentation processing, " Beijing Divine Land Tai Yue limited liability company ", " president " and " Wang Ning " these three participles, while carrying out part of speech and marking, wherein " Beijing Divine Land Tai Yue limited liability company " is physical mechanism title, and " Wang Ning " is eponym, and " president " is noun.So Afterwards, the data model pre-established using staff, the data model include that the type of incidence relation word is membership credentials word Language, incidence relation word are president, president and general manager etc., the regular expression of incidence relation word " president " be " (be | load Appoint) { 0,20 } president ".Using in data model incidence relation word and corresponding regular expression respectively with participle phase Match, may thereby determine that participle " president " is the incidence relation word in subordinate sentence.
Step 205: judging whether the incidence relation word is weave connection relative, if the incidence relation word is group When knitting incidence relation word, 206 are thened follow the steps.
The type of incidence relation word includes a variety of, such as membership credentials type, investment relation type, membership credentials type packet The words such as president, general manager are included, investment relation type includes the words such as investment, financing.
Step 206: noun is the participle and name name of physical mechanism title in subordinate sentence where extracting the incidence relation word The participle of title.
Specifically, noun can be subdivided into physical mechanism title, eponym, geographic name etc., in the embodiment of the present application, Only needing the noun in subordinate sentence where extracting is the participle of physical mechanism title and eponym.
For continuing the above example, if the type of incidence relation word is membership credentials word, such as senior executive's information, then extract The noun in subordinate sentence where incidence relation word is the participle " Beijing Divine Land Tai Yue limited liability company " of physical mechanism title, with And noun is the participle " Wang Ning " of eponym.
Step 207: if the noun is the quantity of the quantity of the participle of physical mechanism title and the participle of eponym It is one, then generates enterprise's incidence relation between the participle that the noun is physical mechanism title and the participle of eponym Information.
As above it exemplifies, noun is that the quantity of the participle of physical mechanism title and the participle of eponym is respectively one, then Directly generate enterprise's incidence relation information " Beijing Divine Land Tai Yue limited liability company-president-Wang Ning ".
Step 208: if the noun is the number of the quantity of the participle of physical mechanism title and/or the participle of eponym Amount at least two, then generate first set and second set, the first set and second set are that all nouns are The set of the participle composition of the participle and eponym of physical mechanism title.
In one subordinate sentence, noun is the quantity of the quantity of the participle of physical mechanism title and/or the participle of eponym When at least two, for example, subordinate sentence is " director of Beijing ### Co., Ltd a length of king xx and Lee xx ", through participle and part-of-speech tagging Afterwards, part of speech is that the participle of eponym has " king xx " and " Lee xx ", and noun is that the participle of physical mechanism title is that " Beijing ### has Limit company " then needs to form above-mentioned participle group into two set, i.e. first set { Beijing ### Co., Ltd, king xx, Lee xx }, the Two set { Beijing ### Co., Ltd, king xx, Lee xx }.
Step 209: the first set and second set being done into cartesian product, obtain multiple subclass.
Cartesian product refers to that in mathematics, two are gathered the result being multiplied.For the above example, by first set { Beijing ### Co., Ltd, king xx, Lee xx } and second set { Beijing ### Co., Ltd, king xx, Lee xx } do cartesian product, obtain multiple sons Gathering is respectively<Beijing ### Co., Ltd, and Beijing ### Co., Ltd>,<Beijing ### Co., Ltd, king xx>,<Beijing ### has Limit company, Lee xx>,<king xx, Beijing ### Co., Ltd>,<king xx, king xx>,<king xx, Lee xx>,<Lee xx, Beijing ### is limited Company>,<Lee xx, king xx>,<Lee xx, Lee xx>.
Step 210: judging whether the participle in each subclass is identical, if the participle phase in the subclass Together, 211 are thened follow the steps.
Step 211: abandoning the subclass.
For example, in upper example, subclass<Beijing ### Co., Ltd, Beijing ### Co., Ltd>,<king xx, king xx>and<Lee Xx, Lee xx > in participle it is identical, these three subclass need to be abandoned.
Step 212: in the son that all participles by participle and eponym that the noun is physical mechanism title form In set, identical subclass is judged whether there is, if there is identical subclass, thens follow the steps 212.
Step 213: abandoning participle that the noun is physical mechanism title after part of speech is the participle of eponym Subclass.
It is identical in the subclass that all participles by participle and eponym that noun is physical mechanism title form Subset is combined into more than two subclass containing identical participle, for example, continuing by taking above-mentioned example as an example, < Beijing ### is limited Company, king xx>and<king xx, Beijing ### Co., Ltd>it is identical subclass, similarly,<Beijing ### Co., Ltd, Lee xx> <Lee xx, Beijing ### Co., Ltd>it is identical subclass.It is physical mechanism title by noun in above-mentioned subclass It segments the subclass after the participle of eponym to abandon, i.e. general<king xx, Beijing ### Co., Ltd>and<Lee xx, north Capital ### Co., Ltd > discarding.
Step 214: being only made of the participle of participle or eponym that the noun is physical mechanism title remaining Subclass in, according to the participle of participle or eponym that the noun is physical mechanism title in the position of the subordinate sentence, The subclass for abandoning sorting by reversals, obtains target collection.
Sorting by reversals is the sortord opposite with reading order, for example, the subordinate sentence " president of Beijing ### Co., Ltd " king xx " is segmented before " Lee xx ", then subclass<Lee xx according to reading order for king xx and Lee xx ", king xx>arrange to be reversed Sequence, therefore the subclass is abandoned.Remaining subset is combined into<Beijing ### Co., Ltd, and king xx>,<Beijing ### Co., Ltd, Lee xx>,<king xx, Lee xx>, i.e., target collection be<Beijing ### Co., Ltd, king xx>,<Beijing ### Co., Ltd, Lee xx>,< King xx, Lee xx >.
Step 215: according to destination subset conjunction and incidence relation word, determining first enterprise's incidence relation information.
Destination subset is closed and incidence relation word generates first enterprise's incidence relation information, for example, target collection is < north Capital ### Co., Ltd, king xx>,<Beijing ### Co., Ltd, Lee xx>,<king xx, Lee xx>, then the association of the first enterprise can be obtained and close Be information be " Beijing ### Co., Ltd-president-king xx ", " Beijing ### Co., Ltd-president-Lee xx " and " president- King xx, Lee xx ".
From the above technical scheme, this application provides a kind of enterprise's incidence relation information mining methods, obtain to be checked Survey text;Deconsolidation process is carried out to the text to be detected, obtains at least one subordinate sentence;Each subordinate sentence is segmented simultaneously Part-of-speech tagging;Identify the incidence relation word in each subordinate sentence;Judge whether the incidence relation word is weave connection relationship Word, if the incidence relation word is weave connection relative, according to the participle where the incidence relation word in subordinate sentence Part of speech determines first enterprise's incidence relation information using cartesian product algorithm.Therefore, the application is without staff to be checked It surveys and searches enterprise's incidence relation information in text, improve the efficiency of enterprise's incidence relation information excavating, also, be not necessarily to staff Subjective judgement improves the accuracy of excavation.
Referring to Fig. 3, in the another embodiment of the application, in above-described embodiment after step 215 further include:
Step 301: judging in the text to be detected with the presence or absence of the ambiguity incidence relation that content is identical and part of speech is different Word, if ambiguity incidence relation word identical there are content and different part of speech in the text to be detected, thens follow the steps 302.
Step 302: part of speech label will be added before or after the ambiguity incidence relation word position.
For the characteristics of part of speech is using this as the foundation of Part of Speech Division, the word of Modern Chinese can be divided into two class of notional word and function word, Notional word generally includes noun, quantifier, adjective and verb etc., and function word includes adverbial word, preposition and conjunction etc..Due to the meaning of Chinese Abundant, a word may be because the difference of context and have different parts of speech, and for relative connective, constant volume is easily produced really in this way Raw mistake is needed in the present embodiment to text disambiguation to be detected, to obtain more to eliminate the mistake generated due to ambiguity Accurate enterprise incidence relation information.
For example, text to be detected is " Shenyang ## company, Beijing ## corporate investment, capital fund are 1,000,000 yuan ", in benefit When with data model simple match, the two participles " investment " can be confirmed as incidence relation word, but can be seen by semanteme Out, " investment " in " capital fund " is not required incidence relation word, so in order to avoid such case appearance, the present embodiment It will be distinguished according to the part of speech of " investment ", first " investment " is verb vt, and second " investment " is become famous with subsequent " fund " group Word, i.e., should " investment " be defined as gerund vn, then in " Shenyang ## company, Beijing ## corporate investment, capital fund one Million yuan " in " investment " before or after add part of speech label, i.e., " Beijing ## company [vt] has invested Shenyang ## company, [vn] capital fund is 1,000,000 yuan ".
Step 303: according to the part of speech label, identifying target association relative.
After text to be detected adds part of speech label, incidence relation word will be corresponded in data model and also adds corresponding word Property label, for example, part of speech label in incidence relation word " investment " addition under investment types is obtained " [vt] investment ".Then It using " [vt] is invested " in data model, is matched with text to be detected, obtains accurate target in text to be detected and close Join relative " [vt] investment ".
Step 304: extracting the subordinate sentence where the target association relative, and remove part of speech label.
For continuing the above example, the subordinate sentence extracted is " Beijing ## company [vt] has invested Shenyang ## company ", then again Part of speech label [vt] is removed, " Beijing ## corporate investment Shenyang ## company " is obtained, to carry out subsequent incidence relation digging Pick.
Step 305: relatival according to the target association for comprising the relatival each subordinate sentence of the target association The position of part of speech and the target association relative in subordinate sentence determines second enterprise's incidence relation information.
The relatival part of speech of target association has verb, noun etc., such as verb has investment, spends more money on, purchases, and noun has holding People, subsidiary, parent company, controlling shareholder etc..Enterprise's incidence relation information include be based on target association relative, building it is multiple Incidence relation between enterprise, if target association relative is " purchase ", enterprise's incidence relation information is objective for implementation-receipts Purchase-applied object.
In each subordinate sentence relatival comprising target association, opened at the position in subordinate sentence from target association relative Begin, identifies forward, if recognizing the first enterprise name, the first enterprise name is determined as to the objective for implementation of incidence relation word Title identified and since target association relative is at the position in subordinate sentence backward, by the second enterprise recognized name Claim, it is relatival by the title for applying object to be determined as target association;Based on target association relative, the title of objective for implementation is generated With by enterprise's incidence relation information between the title for applying object.
Wherein, incidence relation word is unidirectional incidence relation word here, and part of speech is verb, and such as " investment " " spending more money on ", " is received Purchase " etc..First enterprise name and the second enterprise name are any enterprise name.
In an implementation, after server determines the subordinate sentence comprising preset incidence relation word, for some subordinate sentence, server It can determine the relatival position of target association in the subordinate sentence, the word of noun is noted as before combining target incidence relation word The contextual information of language is noted as before identifying target association relative forward from target association relationship keyword position First enterprise name is determined as the implementation pair of incidence relation word if the first enterprise name can be recognized by the word of noun As, and the contextual information of the word of noun is noted as after combining target incidence relation word, it is relatival from target association Start at position, identifies the second enterprise that target association relative is noted as the word of noun later, and identification is obtained backward It is relatival by the title for applying object to be determined as target association for title.Then target association relative is used, obtained enterprise closes Connection relation information is the first enterprise name-incidence relation word-second enterprise name, in this manner it is possible to determine this subordinate sentence In include enterprise's incidence relation information, and so on can determine comprising including in the relatival each subordinate sentence of target association Enterprise's incidence relation information.
For example, be " * * Co., Ltd has invested ## Co., Ltd " comprising the relatival subordinate sentence of target association, After carrying out word segmentation processing, obtained word is " * * Co., Ltd " from front to back, " investment ", " ", " ## Limited Liability is public Department ", " * * Co., Ltd " are noun, and " investment " is verb, and " " is auxiliary word, and " ## Co., Ltd " is noun, clothes Business device can identify forward from " investment ", recognize " * * Co., Ltd ", " * * Co., Ltd " is determined as target Then the title of the objective for implementation of incidence relation word can identify backward from " investment ", recognize " ## Co., Ltd ", this Enterprise's incidence relation information that sample is determined is " * * Co., Ltd-investment-" ## Co., Ltd ".
It should be noted that if identified backward, it is unidentified to arrive any enterprise name, then carry out the identification of next subordinate sentence.
When target association relative is unidirectional target association relative, and part of speech is noun, corresponding processing can be as Under:
In each subordinate sentence relatival comprising target association, opened at the position in subordinate sentence from target association relative Begin, identifies backward, if recognizing third enterprise name, third enterprise name is determined as the relatival implementation of target association The title of object, and since target association relative is at the position in subordinate sentence, it identifies forward, the 4th enterprise that will be recognized It is relatival by the title for applying object to be determined as target association for title;Based on target association relative, the name of objective for implementation is generated Claim and by enterprise's incidence relation information between the title for applying object.
Wherein, target association relative is unidirectional target association relative here, and part of speech is noun, such as " holding stock East ", " holding people ", " parent company ", " subsidiary " etc..Third enterprise name and the 4th enterprise name are any enterprise name.
In an implementation, server is determined comprising for some subordinate sentence, taking after the relatival subordinate sentence of preset target association Business device can determine the relatival position of target association in the subordinate sentence, and noun is noted as after combining target incidence relation word Word contextual information, be marked after identifying target association relative backward from target association relative position It is relatival that third enterprise name is determined as target association if third enterprise name can be recognized for the word of noun Objective for implementation, and the contextual information of the word of noun is noted as before combining target incidence relation word, it is closed from target association Start at the position of copula, the word of noun is noted as before identifying target association relative forward, that identification is obtained It is relatival by the title for applying object to be determined as target association for four enterprise names.Then target association relative is used, is obtained Enterprise's incidence relation information is third enterprise name-enterprise name of target association relative-the 4th, in this manner it is possible to determine The enterprise's incidence relation information for including in this subordinate sentence out, and so on can determine each subordinate sentence comprising incidence relation word In include enterprise's incidence relation information.
For example, being that " controlling shareholder of * * Co., Ltd is ## Limited Liability comprising the relatival subordinate sentence of target association Company ", after carrying out subordinate sentence processing, the target association relative that server recognizes is " controlling shareholder ", can be from " holding stock East " identifies backward, recognizes " ## Co., Ltd ", " ## Co., Ltd " is determined as the relatival reality of target association The title of object is applied, then can be identified forward from " controlling shareholder ", " * * Co., Ltd " is recognized, determines in this way Enterprise's incidence relation information is "-controlling shareholder-* * Co., Ltd, ## Co., Ltd ".
When target association relative is bi-directional objects incidence relation word, corresponding processing be can be such that
In each subordinate sentence relatival comprising target association, opened at the position in subordinate sentence from target association relative Begin, identifies forward, the multiple enterprise names recognized are determined as the relatival objective for implementation arranged side by side of target association;Based on target Incidence relation word generates enterprise's incidence relation information between the multiple enterprise name.
Wherein, target association relative is bi-directional objects incidence relation word here, and part of speech can be noun or verb, example Such as, when part of speech is noun, bi-directional objects incidence relation word has " strategic partnership relationship ", " affiliate ", " competitive relation " etc., word Property when being verb, bi-directional objects incidence relation word has " starting jointly ", " joint undertake ", " joint investment ".
It should be noted that the enterprise name identified mentioned above can be based on preset enterprise if it is referred to as Full name and abbreviation corresponding relationship, find the corresponding full name of the abbreviation, full name stored into enterprise's incidence relation information.
In the present embodiment, server is determined comprising after the relatival subordinate sentence of preset target association, for some point Sentence, server can determine the relatival position of target association in the subordinate sentence, be marked before combining target incidence relation word For the contextual information of the word of noun, before identifying target association relative forward from target association relative position It is noted as the word of noun, after identification obtains first enterprise name, continuation identifies forward to be identified until in this subordinate sentence Less than enterprise name, then using the incidence relation word for including in this subordinate sentence, obtained enterprise's incidence relation information is multiple The title of enterprise-target association relative, in this manner it is possible to determine the enterprise's incidence relation information for including in this subordinate sentence, The rest may be inferred can determine enterprise's incidence relation information comprising including in the relatival each subordinate sentence of target association.
For example, being that " * * Co., Ltd and ## Co., Ltd be strategic comprising the relatival subordinate sentence of target association Cooperative relationship ", after carrying out subordinate sentence processing, obtained word is " * * Co., Ltd " from front to back, " ## Limited Liability is public Department ", " for ", " strategic partnership relationship ", the target association relative that server recognizes are " strategic partnership relationship ", Ke Yicong " strategic partnership relationship " identifies forward, recognizes " * * Co., Ltd " and " ## Co., Ltd ", determines in this way Enterprise's incidence relation information is " * * You Xianzerengongsi &## Co., Ltd-" strategic partnership relationship ".
Optionally, target association relative is bi-directional objects incidence relation word, and when part of speech is verb, in certain subordinate sentences also It will include the title of objective for implementation and by the title for applying object, corresponding processing be can be such that
In each subordinate sentence relatival comprising target association, opened at the position in subordinate sentence from target association relative Begin, identify forward, the enterprise name recognized is determined as to the title of the relatival objective for implementation of target association, and close from target Join relative to start at the position in subordinate sentence, identify backward, the enterprise name that will be recognized is determined as target association relative By the title for applying object;Based on target association relative, the title of objective for implementation is generated and by between the title for applying object Enterprise's incidence relation information.
In an implementation, server is determined comprising for some subordinate sentence, taking after the relatival subordinate sentence of preset target association Business device can determine the relatival position of target association in the subordinate sentence, and noun is noted as before combining target incidence relation word Word contextual information, be marked before identifying target association relative forward from target association relative position For the word of noun, after identification obtains first enterprise name, continuation is identified forward until identifying in this subordinate sentence less than enterprise The enterprise name recognized is determined as the title of objective for implementation by industry title, is marked later then in conjunction with target association relative Note is that the contextual information of the word of noun identifies backward since target association relative is at the position in subordinate sentence, will be known The enterprise name being clipped to, it is relatival by the title for applying object to be determined as target association, then uses target association relative, really The enterprise's incidence relation information for making this subordinate sentence is that the enterprise name recognized forward-target association relative-is known backward The enterprise name being clipped to.
For example, being that " * * Co., Ltd and ## Co., Ltd throw jointly comprising the relatival subordinate sentence of target association Provide * # Co., Ltd ", after carrying out word segmentation processing, obtained word is " * * Co., Ltd " from front to back, " ## has Limit responsible company ", " joint investment ", " ", " * # Co., Ltd ", the target association relative that server recognizes are " joint investment " can identify forward from " joint investment ", recognize " * * Co., Ltd " and " ## Co., Ltd ", " * * Co., Ltd " and " ## Co., Ltd " is all determined as the title of the relatival objective for implementation of target association, so Identification, recognizes " * # Co., Ltd " after backward, by " * # Co., Ltd " be determined as target association it is relatival by The title for applying object, the enterprise's incidence relation information determined in this way are " * * You Xianzerengongsi &## Co., Ltd "- " joint investment "-" * # Co., Ltd ".
It should be noted that the enterprise name identified mentioned above can be based on preset enterprise if it is referred to as Full name and abbreviation corresponding relationship, find the corresponding full name of the abbreviation, full name stored into enterprise's incidence relation information.
Step 306: judging whether the second enterprise incidence relation information and first enterprise's incidence relation information are identical, such as Fruit is identical, thens follow the steps 307.
Step 307: abandoning the second enterprise incidence relation information identical with the first enterprise incidence relation information.
Second obtained enterprise's related information is matched with first enterprise's incidence relation information, it, will if identical Second enterprise's incidence relation information abandons, to prevent from repeating storing.
Referring to fig. 4, in another embodiment provided by the present application, after above-described embodiment step 307 further include:
Step 401: judging have at least two contents identical there are no in addition to incidence relation word in the text to be detected And the identical participle of part of speech, if it is present executing step 402-406.
Step 402: at least two contents are identical and the location index of the identical participle of part of speech for record.
For example, containing " on May 28th, 2018, * * Co., Ltd makes decision, * * Limited Liability in text to be detected The president that company elects king xx new as * * Co., Ltd, king xx indicate to be responsible for for the enterprise ".Wherein, " * * is limited Responsible company " is three, and " king xx " is two.Server records the position of above-mentioned participle respectively.
Step 403: the location index of the identical participle of and part of speech identical according at least two contents is closed according to distance association The shortest path first principle of copula determines that target participle and the target segment corresponding location index.
Shortest path first principle is to be identified forward since the position of incidence relation word to the first default punctuation mark to be Only, then by the position of incidence relation word it identifies backward to the second default punctuation mark position, in the first default punctuation mark With the text between the second default punctuation mark, the participle nearest apart from incidence relation word is therefrom selected to be determined as target participle, First default punctuation mark and the second default punctuation mark include fullstop, branch or comma.For continuing the above example, wherein on " * * Co., Ltd " and two " king xx " there are three stating in example, according to former apart from incidence relation word shortest path first Then, then identified forward until first comma since the position of incidence relation word, then by incidence relation word position backward It identifies until second comma to get the " director that * Co., Ltd elects king xx new as * * Co., Ltd is arrived It is long ", it can thus be seen that in this text, " king xx " nearest apart from president and second " * * Co., Ltd " can Target segments the most.
Step 404: according to the position of the location index of target participle and incidence relation word, determining effective short sentence model It encloses.
For example, in upper example, after third " * * Co., Ltd " and first " king xx " are determined as target participle, According to the position of their location index and incidence relation word, the part between two target participles is determined as effective short sentence model It encloses, i.e. " president king xx new as * * Co., Ltd ".
Step 405: within the scope of effective short sentence, if the relationship clause of effectively short sentence is positive relationship clause placed in the middle, It is identified forward since the incidence relation word is at the position within the scope of effective short sentence, by recognize first enterprise Title or eponym are determined as first instance, and open at the position within the scope of effective short sentence from the incidence relation word Beginning identifies backward, and recognize first enterprise name or eponym are determined as second instance.
The determination of relationship clause type can be determined in data model after incidence relation word by staff, be designed not With incidence relation word sentence pattern template, if incidence relation word is " investment ", sentence pattern template can be " ... invest ... ", " ... invested ... ", " to ... investment ", it is matched using sentence pattern template with effective short sentence, by matched sentence The corresponding relationship clause type of formula template is determined as the relationship clause type of effective short sentence, and relationship clause type includes positive relationship Clause, positive relationship postposition clause, inverse relationship clause placed in the middle and inverse relationship postposition clause placed in the middle.
Positive relationship clause placed in the middle is the normal clause of Subject, Predicate and Object, for example, " Beijing * * * company, Beijing ### corporate buyout ", It since incidence relation word " purchase ", identifies forward, recognizes " Beijing ### company ", " Beijing ### company " is determined as first Entity, and identified backward since incidence relation word " purchase ", " Beijing * * * company " that recognizes is determined as second instance.
Step 406: within the scope of effective short sentence, if the relationship clause of effectively short sentence is positive relationship postposition clause, It is identified forward since incidence relation word is at the position within the scope of effective short sentence, by recognize first enterprise name Or eponym is determined as second instance, and continues to identify forward, recognize second enterprise name or eponym is true It is set to first instance.
Positive relationship postposition clause is the normal clause of Subject, Predicate and Object, but incidence relation word is usually noun, and in all enterprises After industry title or eponym, such as " president that king xx is Beijing ### company ", since incidence relation word " president " It identifies forward, " Beijing ### company " is determined as second instance, " king xx " is determined as first instance.
Step 407: within the scope of effective short sentence, if the relationship clause of effectively short sentence is inverse relationship clause placed in the middle, It is identified backward since the incidence relation word is at the position within the scope of effective short sentence, by recognize first enterprise Title or eponym are determined as first instance, and open at the position within the scope of effective short sentence from the incidence relation word Beginning identifies forward, and recognize first enterprise name or eponym are determined as second instance.
Inverse relationship clause placed in the middle is the passive clause of obvious formula, such as " Beijing ### company by investment Beijing * * * company ", It is identified backward since incidence relation word " investment ", " Beijing * * * company " is determined as first instance, and at " investment " forward Identification, is determined as second instance for " Beijing ### company ".
Step 408: within the scope of effective short sentence, if the relationship clause of effectively short sentence is inverse relationship postposition clause, It is identified forward since the incidence relation word is at the position within the scope of effective short sentence, by recognize first enterprise Title or eponym are determined as first instance, and continue to identify forward, by recognize second enterprise name or name name Title is determined as second instance.
Inverse relationship postposition clause is concealed passive clause, and incidence relation word is usually verb, and in all enterprise's names Claim or eponym after, such as " Beijing ### company to Beijing * * * corporate investment 1,000,000 " opens from incidence relation word " investment " Beginning identifies forward, " Beijing * * * company " is determined as first instance, and continue to identify forward, " Beijing ### company " is determined as Second instance.
Step 409: according to the first instance, second instance and incidence relation word, determining that third enterprise incidence relation is believed Breath.
Based on incidence relation word, enterprise's incidence relation information between first instance and second instance is generated, for example, " * * Co., Ltd-president-king xx ", " * * * company-investment-Beijing, Beijing ### company ".
Step 410: judge whether the third enterprise incidence relation information and first enterprise's incidence relation information are identical, or Whether third enterprise incidence relation information described in person and second enterprise's incidence relation information are identical, if the third enterprise is associated with Identical or described third enterprise incidence relation information is associated with relation information with the second enterprise with first enterprise's incidence relation information Relation information is identical, thens follow the steps 411.
Step 411: the third enterprise incidence relation information identical with the first enterprise incidence relation information is abandoned, And third enterprise incidence relation information identical with the second enterprise incidence relation information.
By obtained third enterprise related information respectively with first enterprise's incidence relation information and second enterprise's incidence relation Information is matched, if, if the third enterprise incidence relation information is identical as first enterprise's incidence relation information, or The third enterprise incidence relation information is identical as second enterprise's incidence relation information, then by the second enterprise incidence relation information It abandons, to prevent from repeating storing.
Referring to Fig. 5, in another embodiment provided by the present application, after above-described embodiment step 411 further include:
Step 501: according in first enterprise's incidence relation, second enterprise's incidence relation and third enterprise incidence relation Incidence relation word, by the enterprise in first enterprise incidence relation, second enterprise's incidence relation and third enterprise incidence relation Industry title or eponym establish associated path, and store to corresponding database.
Since incidence relation word includes many types, such as membership credentials type and investment incidence relation type, work people Member can establish multiple corresponding databases previously according to the type of incidence relation word, for example, the association of weave connection relationship Relative then needs the database for establishing shareholder's information and Business Name database.By first enterprise incidence relation, the second enterprise Industry incidence relation and enterprise name in third enterprise incidence relation or eponym establish after associated path, while will enterprise Industry title or eponym store to corresponding database.Such as: * * * company-shareholder-king xx, then by * * * company and king xx Associated path is established, and * * * company is stored to Business Name database, king xx is stored to the database of shareholder's information It is interior.
Step 502: obtain user input solicited message, the solicited message include user's enterprise name to be checked or Eponym.
Step 503: judging whether the solicited message matches with the storage information in database, depositing in the database Storage information is first enterprise's incidence relation, second enterprise's incidence relation and enterprise name or people in third enterprise incidence relation Name title, if matching, thens follow the steps 504.
Step 504: according to the associated path of the storage information, extracting incidence relation letter corresponding with the storage information Breath forms membership credentials map.
For example, user's input is Beijing Divine Land Tai You software limited liability company, then it is the information and staff is pre- If database in storage information match, if finding matched storage information, will with the storage information establish close The related information in connection path extracts, and membership credentials map as shown in figure 11 is obtained, to facilitate user's direct convenience Solve the organizational composition of the said firm.
From the above technical scheme, this application provides a kind of enterprise's incidence relation method for digging, text to be detected is obtained This;Deconsolidation process is carried out to the text to be detected, obtains at least one subordinate sentence;Segment simultaneously part of speech to each subordinate sentence Mark;Identify the incidence relation word in each subordinate sentence;Judge whether the incidence relation word is weave connection relative, such as When incidence relation word described in fruit is weave connection relative, then according to the participle part of speech in subordinate sentence where the incidence relation word, Using cartesian product algorithm, first enterprise's incidence relation information is determined.Therefore, the application is without staff in text to be detected Middle lookup enterprise incidence relation information improves the efficiency of enterprise's incidence relation information excavating, also, sentences without staff's subjectivity It is disconnected, improve the accuracy of excavation.
Second aspect, referring to Fig. 6, the embodiment of the present application provides a kind of enterprise's incidence relation information excavating device, described Device includes:
Module 601 is obtained, for obtaining text to be detected;
It tears a module 602 open, for carrying out deconsolidation process to the text to be detected, obtains at least one subordinate sentence;
Part-of-speech tagging module 603, for segment simultaneously part-of-speech tagging to each subordinate sentence;
First identification module 604, for identification the incidence relation word in each subordinate sentence;
First determining module 605, for judging whether the incidence relation word is weave connection relative, if the pass When connection relative is weave connection relative, then according to the participle part of speech where the incidence relation word in subordinate sentence, flute card is utilized That integration method, determines first enterprise's incidence relation information.
From the above technical scheme, this application provides a kind of enterprise's incidence relation excavating gears, to be checked by obtaining Survey text;Deconsolidation process is carried out to the text to be detected, obtains at least one subordinate sentence;Each subordinate sentence is segmented simultaneously Part-of-speech tagging;Identify the incidence relation word in each subordinate sentence;Judge whether the incidence relation word is weave connection relationship Word, if the incidence relation word is weave connection relative, according to the participle where the incidence relation word in subordinate sentence Part of speech determines first enterprise's incidence relation information using cartesian product algorithm.Therefore, the application is without staff to be checked It surveys and searches enterprise's incidence relation information in text, improve the efficiency of enterprise's incidence relation information excavating, also, be not necessarily to staff Subjective judgement improves the accuracy of excavation.
Further, referring to Fig. 7, when the type of the incidence relation word is membership credentials word, described first is determined Module 505 includes:
Extraction unit 701 is the participle of physical mechanism title for noun in subordinate sentence where extracting the incidence relation word With the participle of eponym;
First judging unit 702, if quantity and eponym for the participle that the noun is physical mechanism title The quantity of participle be one, then the enterprise generated between the participle of the physical mechanism title and the participle of eponym closes Join relation information;
Second judgment unit 703, if quantity and/or name name for the participle that the noun is physical mechanism title The quantity at least two of the participle of title, then generate first set and second set, and the first set and second set are institute The set being made of the participle of the physical mechanism title and the participle of eponym;
Cartesian product unit 704 obtains multiple subsets for the first set and second set to be done cartesian product It closes;
Screening unit 705 screens multiple subclass for being screened according to preset screening rule, Obtain target collection;
First determination unit 706 determines first enterprise's incidence relation information for closing according to the destination subset.
Further, referring to Fig. 8, the screening unit 605 includes:
First judgment sub-unit 801, for judging whether the participle in each subclass is identical, if the subset Participle in conjunction is identical, then abandons the subclass;
Second judgment sub-unit 802, in all points by participle and eponym that noun is physical mechanism title In the subclass of word composition, identical subclass is judged whether there is, if there is identical subclass, then it is real for abandoning the noun Subclass of the participle of body mechanism title after part of speech is the participle of eponym;
Target collection determines subelement 803, for it is remaining only by the noun be physical mechanism title participle or In the subclass of the participle composition of eponym, according to the participle of participle or eponym that the noun is physical mechanism title In the position of the subordinate sentence, the subclass of sorting by reversals is abandoned, target collection is obtained.
Further, referring to Fig. 9, described device further include:
First judges mould 901, for judging in the text to be detected with the presence or absence of the discrimination that content is identical and part of speech is different Adopted incidence relation word will if ambiguity incidence relation word identical there are content and different part of speech in the text to be detected Part of speech label is added before or after the ambiguity incidence relation word position;
Second identification module 902, for identifying target association relative according to the part of speech label;
Subordinate sentence module 903 is extracted, for extracting the subordinate sentence where the target association relative, and removes part of speech label;
Second determining module 904, for for comprising the relatival each subordinate sentence of the target association, according to the target The position of the part of speech of incidence relation word and the target association relative in subordinate sentence determines that second enterprise's incidence relation is believed Breath;
Second judgment module 905, for judging that the second enterprise incidence relation information and first enterprise's incidence relation are believed Whether breath is identical, if identical, abandons the second enterprise incidence relation letter identical with the first enterprise incidence relation information Breath.
Further, referring to Figure 10, described device further include:
Third judgment module 1001, for judging have at least in addition to incidence relation word there are no in the text to be detected Two contents are identical and the identical participle of part of speech, if it is present at least two contents of record are identical and the identical participle of part of speech Location index;
Determine location index module 1002, the position rope for the identical participle of identical and part of speech according at least two contents Draw, according to the shortest path first principle apart from incidence relation word, determines that target participle and the target segment corresponding position Index;
Effective short sentence range determination module 1003, location index and incidence relation word for being segmented according to the target Position determines effective short sentence range;
Entity determining module 1004 is used within the scope of effective short sentence, if the relationship clause of effectively short sentence is positive closes It is clause placed in the middle, then identifies forward, will identify since the incidence relation word is at the position within the scope of effective short sentence To first enterprise name or eponym be determined as first instance, and from the incidence relation word in effective short sentence model Start to identify backward at position in enclosing, recognize first enterprise name or eponym are determined as second instance;
If the relationship clause of effective short sentence is positive relationship postposition clause, from the incidence relation word described effective Start to identify forward at position within the scope of short sentence, it is real that recognize first enterprise name or eponym are determined as second Body, and continue to identify forward, recognize second enterprise name or eponym are determined as first instance;
If the relationship clause of effective short sentence is inverse relationship clause placed in the middle, from the incidence relation word described effective Start to identify backward at position within the scope of short sentence, it is real that recognize first enterprise name or eponym are determined as first Body, and identified forward since the incidence relation word is at the position within the scope of effective short sentence, first will recognized A enterprise name or eponym are determined as second instance;
If the relationship clause of effective short sentence is inverse relationship postposition clause, from the incidence relation word described effective Start to identify forward at position within the scope of short sentence, it is real that recognize first enterprise name or eponym are determined as first Body, and continue to identify forward, recognize second enterprise name or eponym are determined as second instance;
Third determining module 1005, for determining that third enterprise closes according to first instance, second instance and incidence relation word Join relation information;
Third judgment module 1006, for judging that the third enterprise incidence relation information and first enterprise's incidence relation are believed Whether breath identical or the third enterprise incidence relation information and second enterprise's incidence relation information it is whether identical, if institute State third enterprise incidence relation information with first enterprise's incidence relation information identical or described third enterprise incidence relation information It is identical as second enterprise's incidence relation information, then abandon the third enterprise identical with the first enterprise incidence relation information Incidence relation information, and third enterprise incidence relation information identical with the second enterprise incidence relation information.
From the above technical scheme, it this application provides a kind of enterprise's incidence relation information mining method and device, obtains Take text to be detected;Deconsolidation process is carried out to the text to be detected, obtains at least one subordinate sentence;Each subordinate sentence is carried out Segment simultaneously part-of-speech tagging;Identify the incidence relation word in each subordinate sentence;Judge whether the incidence relation word is that tissue closes Join relative, if the incidence relation word is weave connection relative, according in the subordinate sentence of the incidence relation word place Participle part of speech determine first enterprise's incidence relation information using cartesian product algorithm.Therefore, the application is not necessarily to staff Enterprise's incidence relation information is searched in text to be detected, improves the efficiency of enterprise's incidence relation information excavating, also, be not necessarily to work Make personnel's subjective judgement, improves the accuracy of excavation.
It is required that those skilled in the art can be understood that the technology in the embodiment of the present application can add by software The mode of general hardware platform realize.Based on this understanding, the technical solution in the embodiment of the present application substantially or Or the part that contributes to existing technology can be embodied in the form of software products, which can deposit Storage is in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions computer equipment to as (can be with It is personal computer, server or the network equipment etc.) execute certain part institutes of each embodiment of the application or embodiment The method stated.
Various embodiments are described in a progressive manner for this specification, same and similar part between each embodiment Can cross-reference, each embodiment focuses on the differences from other embodiments, especially for device reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.

Claims (6)

1. a kind of enterprise's incidence relation information mining method, which is characterized in that the described method includes:
Obtain text to be detected;
Deconsolidation process is carried out to the text to be detected, obtains at least one subordinate sentence;
Segment simultaneously part-of-speech tagging to each subordinate sentence;
Identify the incidence relation word in each subordinate sentence;
Judge whether the incidence relation word is weave connection relative, if the incidence relation word is weave connection relative When, then according to the participle part of speech where the incidence relation word in subordinate sentence, using cartesian product algorithm, determine that the first enterprise is associated with Relation information;
It is described according to subordinate sentence where the incidence relation word when the type of the incidence relation word is weave connection relationship word In participle part of speech determine that first enterprise's incidence relation information includes: using cartesian product algorithm
Noun is the participle of physical mechanism title and the participle of eponym in subordinate sentence where extracting the incidence relation word;
If the noun is one for the quantity of the quantity of the participle of physical mechanism title and the participle of eponym, give birth to At enterprise's incidence relation information between the participle of the physical mechanism title and the participle of eponym;
If the noun is the quantity at least two of the quantity of the participle of physical mechanism title and/or the participle of eponym, First set and second set are then generated, the first set and second set are that all nouns are physical mechanism title Participle and eponym participle composition set;
The first set and second set are done into cartesian product, obtain multiple subclass;
It is screened according to preset screening rule, multiple subclass is screened, target collection is obtained;
According to destination subset conjunction and incidence relation word, first enterprise's incidence relation information is determined;
After first enterprise of determination incidence relation information further include:
Judge in the text to be detected with the presence or absence of content is identical and part of speech is different ambiguity incidence relation word, if it is described to Detect ambiguity incidence relation word identical there are content and different part of speech in text, then it is the ambiguity incidence relation word institute is in place Part of speech label is added before or after setting;
According to the part of speech label, target association relative is identified;
The subordinate sentence where the target association relative is extracted, and removes part of speech label;
For comprising the relatival each subordinate sentence of the target association, according to the relatival part of speech of the target association, Yi Jisuo Position of the target association relative in subordinate sentence is stated, determines second enterprise's incidence relation information;
Judge whether the second enterprise incidence relation information and first enterprise's incidence relation information are identical, if identical, lose Abandon the second enterprise incidence relation information identical with the first enterprise incidence relation information.
2. the method as described in claim 1, which is characterized in that it is described to be screened according to preset screening rule, to multiple The subclass is screened, and is obtained target collection and is included:
Judge whether the participle in each subclass is identical, if the participle in the subclass is identical, described in discarding Subclass;
In the subclass that all participles by participle and eponym that noun is physical mechanism title form, judge whether there is Identical subclass then abandons the participle that the noun is physical mechanism title and behaves in part of speech if there is identical subclass Subclass after the participle of name title;
In the remaining subclass being only made of the participle of participle or eponym that the noun is physical mechanism title, press It is the participle of physical mechanism title or the participle of eponym in the position of the subordinate sentence according to the noun, abandons sorting by reversals Subclass obtains target collection.
3. the method as described in claim 1, which is characterized in that judgement the second enterprise incidence relation information and first After whether enterprise's incidence relation information is identical further include:
Judge in the text to be detected to there is that at least two contents are identical and part of speech is identical there are no in addition to incidence relation word Participle, if it is present at least two contents are identical and the location index of the identical participle of part of speech for record;
The location index of the identical participle of identical and part of speech according at least two contents, according to the shortest path apart from incidence relation word Diameter priority principle determines that target participle and the target segment corresponding location index;
According to the position of the location index of target participle and incidence relation word, effective short sentence range is determined;
Within the scope of effective short sentence, if the relationship clause of effectively short sentence is positive relationship clause placed in the middle, closed from the association Copula starts to identify forward at the position within the scope of effective short sentence, by recognize first enterprise name or name name Title is determined as first instance, and identifies backward since the incidence relation word is at the position within the scope of effective short sentence, Recognize first enterprise name or eponym are determined as second instance;
If the relationship clause of effective short sentence is positive relationship postposition clause, from the incidence relation word in effective short sentence Start to identify forward at position in range, recognize first enterprise name or eponym be determined as second instance, And continue to identify forward, recognize second enterprise name or eponym are determined as first instance;
If the relationship clause of effective short sentence is inverse relationship clause placed in the middle, from the incidence relation word in effective short sentence Start to identify backward at position in range, recognize first enterprise name or eponym be determined as first instance, And identified forward since the incidence relation word is at the position within the scope of effective short sentence, recognize first is looked forward to Industry title or eponym are determined as second instance;
If the relationship clause of effective short sentence is inverse relationship postposition clause, from the incidence relation word in effective short sentence Start to identify forward at position in range, recognize first enterprise name or eponym be determined as first instance, And continue to identify forward, recognize second enterprise name or eponym are determined as second instance;
According to first instance, second instance and incidence relation word, third enterprise incidence relation information is determined;
Judge whether the third enterprise incidence relation information and first enterprise's incidence relation information are identical or the third is looked forward to Whether industry incidence relation information and second enterprise's incidence relation information are identical, if the third enterprise incidence relation information and the One enterprise's incidence relation information is identical or the third enterprise incidence relation information and second enterprise's incidence relation information phase Together, then abandon the third enterprise incidence relation information identical with the first enterprise incidence relation information, and with it is described The identical third enterprise incidence relation information of second enterprise's incidence relation information.
4. method as claimed in claim 3, which is characterized in that the judgement third enterprise incidence relation information and first Whether enterprise's incidence relation information is identical or the third enterprise incidence relation information is with second enterprise's incidence relation information is It is no it is identical after further include:
According to the incidence relation word in first enterprise's incidence relation, second enterprise's incidence relation and third enterprise incidence relation, By the enterprise name or name in first enterprise incidence relation, second enterprise's incidence relation and third enterprise incidence relation Title establishes associated path, and stores to corresponding database;
The solicited message of user's input is obtained, the solicited message includes user's enterprise name to be checked or eponym;
Judge whether the solicited message matches with the storage information in database, if matching, according to the storage information Associated path, extracts corresponding with storage information incidence relation information, formation membership credentials map, in the database Storage information be first enterprise's incidence relation, second enterprise's incidence relation and enterprise name in third enterprise incidence relation or Eponym.
5. a kind of enterprise's incidence relation information excavating device, which is characterized in that described device includes:
Module is obtained, for obtaining text to be detected;
It tears a module open, for carrying out deconsolidation process to the text to be detected, obtains at least one subordinate sentence;
Part-of-speech tagging module, for segment simultaneously part-of-speech tagging to each subordinate sentence;
First identification module, for identification the incidence relation word in each subordinate sentence;
First determining module judges whether the incidence relation word is weave connection relative, if the incidence relation word is When weave connection relative, then according to the participle part of speech where the incidence relation word in subordinate sentence, using cartesian product algorithm, really Fixed first enterprise's incidence relation information;
When the type of the incidence relation word is weave connection relationship word, first determining module includes:
Extraction unit is the participle and name name of physical mechanism title for noun in subordinate sentence where extracting the incidence relation word The participle of title;
First judging unit, if the participle of the quantity and eponym for the participle that the noun is physical mechanism title Quantity is one, then generates the participle that the noun is physical mechanism title and be associated with the enterprise between the participle of eponym Relation information;
Second judgment unit, if for the quantity for the participle that the noun is physical mechanism title and/or point of eponym The quantity at least two of word, then generate first set and second set, and the first set and second set are all described Noun is the set of the participle of physical mechanism title and the participle composition of eponym;
Cartesian product unit obtains multiple subclass for the first set and second set to be done cartesian product;
Screening unit screens multiple subclass, obtains target for being screened according to preset screening rule Set;
First determination unit, for determining first enterprise's incidence relation information according to destination subset conjunction and incidence relation word;
Described device further include:
First judgment module, for judging in the text to be detected with the presence or absence of the ambiguity association that content is identical and part of speech is different Relative, if ambiguity incidence relation word identical there are content and different part of speech in the text to be detected, by the discrimination Part of speech label is added before or after adopted incidence relation word position;
Second identification module, for identifying target association relative according to the part of speech label;
Subordinate sentence module is extracted, for extracting the subordinate sentence where the target association relative, and removes part of speech label;
Second determining module, for for being closed according to the target association comprising the relatival each subordinate sentence of the target association The position of the part of speech of copula and the target association relative in subordinate sentence determines second enterprise's incidence relation information;
Second judgment module, for judge the second enterprise incidence relation information and first enterprise's incidence relation information whether phase Together, if it is identical, abandon the second enterprise incidence relation information identical with the first enterprise incidence relation information.
6. device as claimed in claim 5, which is characterized in that the screening unit includes:
First judgment sub-unit, for judging whether the participle in each subclass is identical, if in the subclass It segments identical, then abandons the subclass;
Second judgment sub-unit, for what is formed in all participles by participle and eponym that noun is physical mechanism title In subclass, identical subclass is judged whether there is, if there is identical subclass, then abandoning the noun is physical mechanism name Subclass of the participle after part of speech is referred to as the participle of eponym;
Target collection determines subelement, in the remaining only participle by the noun for physical mechanism title or eponym Participle composition subclass in, according to the noun be physical mechanism title participle or eponym participle at described point The position of sentence, abandons the subclass of sorting by reversals, obtains target collection.
CN201810735344.5A 2018-07-06 2018-07-06 A kind of enterprise's incidence relation information mining method and device Active CN108959575B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810735344.5A CN108959575B (en) 2018-07-06 2018-07-06 A kind of enterprise's incidence relation information mining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810735344.5A CN108959575B (en) 2018-07-06 2018-07-06 A kind of enterprise's incidence relation information mining method and device

Publications (2)

Publication Number Publication Date
CN108959575A CN108959575A (en) 2018-12-07
CN108959575B true CN108959575B (en) 2019-09-24

Family

ID=64486052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810735344.5A Active CN108959575B (en) 2018-07-06 2018-07-06 A kind of enterprise's incidence relation information mining method and device

Country Status (1)

Country Link
CN (1) CN108959575B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740157B (en) * 2018-12-29 2023-08-18 贵州小爱机器人科技有限公司 Method and device for determining label of working individual and computer storage medium
CN109902295A (en) * 2019-02-01 2019-06-18 杭州晶一智能科技有限公司 A kind of foreign language word library self-training method based on the network information
CN110597870A (en) * 2019-08-05 2019-12-20 长春市万易科技有限公司 Enterprise relation mining method
CN110825817B (en) * 2019-09-18 2023-11-10 上海合合信息科技股份有限公司 Enterprise suspected association judgment method and system
CN110704578B (en) * 2019-10-09 2022-08-09 北京秒针人工智能科技有限公司 Incidence relation determining method and device, electronic equipment and readable storage medium
CN110851519A (en) * 2019-11-18 2020-02-28 上海新炬网络信息技术股份有限公司 Method for processing data through ETL tool based on NLP natural language
CN112836919A (en) * 2020-11-30 2021-05-25 广东电网有限责任公司 Supplier association analysis method and device based on knowledge graph
CN115579009B (en) * 2022-12-06 2023-04-07 广州小鹏汽车科技有限公司 Voice interaction method, server and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639826A (en) * 2009-09-01 2010-02-03 西北大学 Text hidden method based on Chinese sentence pattern template transformation
CN103309852A (en) * 2013-06-14 2013-09-18 瑞达信息安全产业股份有限公司 Method for discovering compound words in specific field based on statistics and rules
CN103412855A (en) * 2013-06-27 2013-11-27 华中师范大学 Method and system for automatic identification of relative words in complex sentence of modern Chinese language
CN103593338A (en) * 2013-11-15 2014-02-19 北京锐安科技有限公司 Information processing method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708922A (en) * 2016-10-21 2017-05-24 天津海量信息技术股份有限公司 Character relation atlas analysis method based on mass data
CN107392436A (en) * 2017-06-27 2017-11-24 北京神州泰岳软件股份有限公司 A kind of method and apparatus for extracting enterprise's incidence relation information
CN107392433B (en) * 2017-06-27 2018-09-04 北京神州泰岳软件股份有限公司 A kind of method and apparatus of extraction enterprise incidence relation information
CN107247707B (en) * 2017-06-27 2020-08-04 鼎富智能科技有限公司 Enterprise association relation information extraction method and device based on completion strategy
CN107368470A (en) * 2017-06-27 2017-11-21 北京神州泰岳软件股份有限公司 A kind of method and apparatus for extracting enterprises organizational structure information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639826A (en) * 2009-09-01 2010-02-03 西北大学 Text hidden method based on Chinese sentence pattern template transformation
CN103309852A (en) * 2013-06-14 2013-09-18 瑞达信息安全产业股份有限公司 Method for discovering compound words in specific field based on statistics and rules
CN103412855A (en) * 2013-06-27 2013-11-27 华中师范大学 Method and system for automatic identification of relative words in complex sentence of modern Chinese language
CN103593338A (en) * 2013-11-15 2014-02-19 北京锐安科技有限公司 Information processing method and device

Also Published As

Publication number Publication date
CN108959575A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
CN108959575B (en) A kind of enterprise&#39;s incidence relation information mining method and device
CN111522994B (en) Method and device for generating information
CN105095195B (en) Nan-machine interrogation&#39;s method and system of knowledge based collection of illustrative plates
CN102866990B (en) A kind of theme dialogue method and device
US6618725B1 (en) Method and system for detecting frequent association patterns
CN103631882B (en) Semantization service generation system and method based on graph mining technique
CN103186524B (en) A kind of place name identification method and apparatus
CN109635118A (en) A kind of user&#39;s searching and matching method based on big data
CN105550171B (en) A kind of the Query Information error correction method and system of vertical search engine
CN110020433B (en) Industrial and commercial high-management name disambiguation method based on enterprise incidence relation
CN108460014A (en) Recognition methods, device, computer equipment and the storage medium of business entity
CN108268580A (en) The answering method and device of knowledge based collection of illustrative plates
CN106126521A (en) The social account method for digging of destination object and server
CN107992481A (en) A kind of matching regular expressions method, apparatus and system based on multiway tree
CN107247707A (en) Enterprise&#39;s incidence relation information extracting method and device based on completion strategy
CN107392433B (en) A kind of method and apparatus of extraction enterprise incidence relation information
CN112989055B (en) Text recognition method and device, computer equipment and storage medium
CN103678318B (en) Multi-word unit extraction method and equipment and artificial neural network training method and equipment
CN107943514A (en) The method for digging and system of core code element in a kind of software document
CN102195899A (en) Method and system for information mining of communication network
CN107515849A (en) It is a kind of into word judgment model generating method, new word discovery method and device
CN110442730A (en) A kind of knowledge mapping construction method based on deepdive
CN107526721A (en) A kind of disambiguation method and device to electric business product review vocabulary
CN102063497B (en) Open type knowledge sharing platform and entry processing method thereof
CN108628821A (en) A kind of vocabulary mining method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181207

Assignee: Zhongke Dingfu (Beijing) Science and Technology Development Co., Ltd.

Assignor: Beijing Shenzhou Taiyue Software Co., Ltd.

Contract record no.: X2019990000215

Denomination of invention: A method and a device for mining enterprise association relation information

Granted publication date: 20190924

License type: Exclusive License

Record date: 20191127

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200629

Address after: 230000 zone B, 19th floor, building A1, 3333 Xiyou Road, hi tech Zone, Hefei City, Anhui Province

Patentee after: Dingfu Intelligent Technology Co., Ltd

Address before: 100089 Beijing city Haidian District wanquanzhuang Road No. 28 Wanliu new building block A Room 601

Patentee before: BEIJING ULTRAPOWER SOFTWARE Co.,Ltd.