CN109033132A

CN109033132A - The method and device of text and the main body degree of correlation are calculated using knowledge mapping

Info

Publication number: CN109033132A
Application number: CN201810567101.5A
Authority: CN
Inventors: 孙雨轩; 吴成龙; 周劼人
Original assignee: China Securities Credit Reporting (shenzhen) Co Ltd
Current assignee: China Securities Credit Reporting (shenzhen) Co Ltd
Priority date: 2018-06-05
Filing date: 2018-06-05
Publication date: 2018-12-18
Anticipated expiration: 2038-06-05
Also published as: CN109033132B

Abstract

The invention discloses a kind of methods and device that text and the main body degree of correlation are calculated using knowledge mapping, which comprises obtains text；Word segmentation processing is carried out to text, extract the keyword set occurred in text, pass through the knowledge mapping pre-established, retrieve enterprise dominant associated with keyword, the enterprise dominant associated with keyword to be gathered as candidate enterprise, wherein, the knowledge mapping includes destination node information, associated nodal information, relationship and relevance weight between the destination node information and the associated nodal information, the destination node information includes the first enterprise dominant information, the associated nodal information includes the second main information associated with the first main body enterprise dominant information, product or natural person's information；The degree of association of text and the enterprise dominant of the candidate is calculated according to the word frequency that the associated keyword of candidate enterprise dominant in the enterprise of candidate set occurs.

Description

The method and device of text and the main body degree of correlation are calculated using knowledge mapping

Technical field

The present invention relates to a kind of methods and device that text and the main body degree of correlation are calculated using knowledge mapping.

Background technique

In the information age, the acquisition and processing analysis of mass data are a big difficulties.In some industries (such as financial row Industry), people pay close attention to the information of each dimension of enterprise, to help the decisions such as management investment.On the one hand, participant in the market needs more Extensively, on the other hand more full data also require these data processed in time.Enterprise's public feelings information is that market participates in The dimension that person pays close attention to, as a kind of non-structured text information, there are public feelings information data to disperse, data volume is big, The features such as data format is complicated, timeliness is strong.Therefore, using technological means, such as natural language processing, this kind of data are carried out high Effect ground handles and extracts valuable information, is the demand of numerous financial practitioners.In face of numerous and complicated public feelings information, how will The enterprise of itself and concern associates, and screens out value less or with the incoherent information of main body, is to carry out data analysis and excavation Essential step.

Text information is associated with, common method with enterprise dominant, is to construct the keywords database of enterprise dominant, including enterprise Industrial and commercial title, enterprise's abbreviation, listing of a company code etc., and take this as the standard, carry out Keywords matching retrieval in text information library, Relevant information of the text that will match to as the enterprise dominant.On the one hand such method needs to construct more full enterprise in advance Keywords database is as retrieval foundation；On the other hand, to matching retrieval obtain as a result, being associated degree sequence, effect is not yet It is good, often occur occurring keyword in text, be not but the information of the enterprise, therefore still has more redundancy； Meanwhile association is directly matched by keyword, it can also slip for the important information of the emphasis affiliated enterprise of enterprise, cause information It loses.

Summary of the invention

In view of the above shortcomings of the prior art, the technical problems to be solved by the present invention are: providing a kind of using knowledge graph Spectrum calculates the method and device of text and the main body degree of correlation, keyword can be applied alone to tradition when analyzing mass text Matched mode is optimized.In conjunction with knowledge mapping method, target subject can be associated with and text information is associated journey Degree is quantified, and the relevant dimension of text information and target subject is enriched, and provides basis for subsequent further analysis.

In order to solve the above technical problems, one technical scheme adopted by the invention is that: a kind of utilization knowledge mapping meter is provided The method for calculating text and the enterprise dominant degree of correlation, comprising the following steps:

Obtain text；

Word segmentation processing is carried out to text, extracts the keyword set that occurs in text, by the knowledge mapping pre-established, Enterprise dominant associated with keyword is retrieved, is collected the enterprise dominant associated with keyword as candidate enterprise Close, wherein the knowledge mapping include destination node information, associated nodal information, the destination node information with it is described Relationship and relevance weight between associated nodal information, the destination node information include the first enterprise dominant information, The associated nodal information include the second main information associated with the first main body enterprise dominant information, product or Natural person's information；

Text is calculated according to the word frequency that the associated keyword of candidate enterprise dominant in the enterprise of candidate set occurs The degree of association of this and the enterprise dominant of the candidate.

Further, word segmentation processing is being carried out to text, the keyword set occurred in text is being extracted, by pre-establishing Knowledge mapping, retrieve associated with keyword enterprise dominant, will described in enterprise dominant conduct associated with keyword In the step of candidate enterprise gathers, comprising:

Word segmentation processing is carried out to text, obtains all keywords to form keyword set, the keyword set note For K, the keyword in the keyword set K is searched in the knowledge mapping, is obtained associated with the keyword set K Enterprise dominant, gather the enterprise dominant associated with keyword as candidate enterprise, the enterprise of the candidate Set is denoted as C.

Further, according to the associated keyword appearance of candidate enterprise dominant in the enterprise of candidate set Word frequency calculated in the step of degree of association of text and the enterprise dominant of the candidate, comprising:

Enabling F is the frequency matrix of keyword set K:

f_iIndicate the word frequency of i-th of keyword；

The correlation matrix of set C and its keyword set K based on R are enabled, it is 1 that knowledge mapping node, which is connected, map Node is not attached to as 0:

Based on the aggregation word frequency vector of set C and relative keyword:

Wherein,Indicate whole keyword word frequency relevant to i-th of candidate enterprise dominant in text The sum of；

Degree of correlation factor R X is defined, RX is used to measure the associated order between enterprise dominant candidate in this document；

Wherein,

Degree of correlation factor R Y is defined, for measuring the associated order of enterprise dominant candidate between different texts, β > 0, β To scale adjustment parameter, scale > 0 is that text information always segments the once purged obtained participle word quantity of number, for measuring Text length；

Wherein, 0≤ry_i≤1

Obtain the correlation matrix R of text and candidate enterprise dominant set C^KC

Wherein, ⊙ is matrix point multiplication operation,Indicate Ben Wenben to the degree of association of i-th of candidate enterprise dominant.

Further, in the step of calculating the degree of association of enterprise dominant of text and the candidate, further includes:

Word frequency, the relationship power occurred according to the associated keyword of candidate enterprise dominant in the enterprise of candidate set The degree of association of re-computation text and the enterprise dominant of the candidate.

Further, the word occurred according to the associated keyword of candidate enterprise dominant in the enterprise of candidate set Frequently, in the step of degree of association of relationship weight calculation text and the enterprise dominant of the candidate, comprising:

The word frequency vector F of statistics keyword K set first:

f_iIndicate the word frequency of i-th of keyword；

Enabling R is the correlation matrix of candidate enterprise set C and its keyword set K:

r_ijIndicate the related coefficient of i-th candidate enterprise dominant and j-th of keyword；

For correlation coefficient weighted frequency matrix:

WhereinIndicate i-th of candidate the sum of keyword Weighted Term Frequency of enterprise dominant；

Wherein,

Wherein, 0≤ry_i≤1；

Obtain the correlation matrix R of text and candidate enterprise dominant set C^KC；

Further, before the step of carrying out word segmentation processing to the text, further includes:

Paragraph is carried out to the text and divides pretreatment, and assigns respective weights to paragraph position；

In the step of calculating the degree of association of enterprise dominant of the text and the candidate, further includes:

According to the word frequency of the associated keyword appearance of candidate enterprise dominant in the enterprise of candidate set, paragraph position It sets, the degree of association of relationship weight, text length calculating text and the enterprise dominant of the candidate.

Further, paragraph is carried out to the text by following formula and divides pretreatment:

Wherein,Integer of the expression not less than x, paragragh of the P for text, P >=1, the H are split for text The part divided, is denoted as part respectively₁,…,part_H, title is designated as part₀, the paragraph quantity of H >=1, every part is denoted as L =(l₀,l₁,…,l_H),Indicate that first part accounts for the maximum ratio of total number of segment P, It is total to indicate that the part H accounts for The maximum ratio of number of segment P,

Further, according to the associated keyword appearance of candidate enterprise dominant in the enterprise of candidate set In the step of word frequency, paragraph position, relationship weight, text length calculate the degree of association of text and the enterprise dominant of the candidate, Including following sub-step:

Enabling W is weight matrix of the keyword in paragraph position:

Wherein w_iIndicate keyword in the resulting weight of i-th section, w₀Refer to keyword in the resulting weight of title；

Enabling R is the correlation matrix of enterprise dominant set C and its keyword set K:

F is keyword K in the different resulting frequency matrixes in paragraph position:

f_ijIndicate i-th of keyword in part_jPartial word frequency；

For correlation coefficient weighted frequency matrix:

WhereinIndicate i-th of candidate enterprise dominant in part_jThe sum of partial Weighted Term Frequency；

Wherein,

Wherein, 0≤ry_i≤1

In order to solve the above technical problems, another technical solution used in the present invention is: providing a kind of using knowledge mapping Calculate the device of text and the enterprise dominant degree of correlation, comprising:

Text obtains module, for obtaining text；

Word segmentation module extracts the keyword set occurred in text, by building in advance for carrying out word segmentation processing to text Vertical knowledge mapping retrieves enterprise dominant associated with keyword, and the enterprise dominant associated with keyword is made Gather for candidate enterprise, wherein the knowledge mapping includes that several nodal informations, each nodal information are believed with corresponding node Relationship and relevance weight between breath, in several nodal informations, nodal information therein is enterprise dominant information, remaining Nodal information be the corresponding product information of corresponding enterprise dominant or natural person's information；

Calculation of relationship degree module, for according to the candidate associated key of enterprise dominant in the enterprise of candidate set The word frequency that word occurs calculates the degree of association of text and the enterprise dominant of the candidate.

Further, the calculation of relationship degree module is also used to according to the candidate enterprise in the enterprise of candidate set The degree of association of the enterprise dominant of word frequency, relationship weight calculation text and the candidate that the associated keyword of owner's body occurs.

The present invention constructs the knowledge mapping of financial field, in this, as the network of personal connections of candidate matches keyword, covers Enterprise is the relationships such as the industrial and commercial full name of target subject, abbreviation, product, senior executive, shareholder, investment；In invention, keyword is gone out Paragraph position assign different weights, limit of consideration is incorporated to the importance of text difference paragraph；Utilize knowledge mapping technology The complex relationship net of building calculates possible keyword all degree of being associated, and is finally weighted and is quantified, and improves Text and the associated success rate of target subject and accuracy rate.

Detailed description of the invention

Fig. 1 is the process for the method first embodiment that the present invention calculates text and the enterprise dominant degree of correlation using knowledge mapping Figure.

Fig. 2 is the structural schematic diagram of knowledge mapping of the present invention.

Fig. 3 is the process for the method second embodiment that the present invention calculates text and the enterprise dominant degree of correlation using knowledge mapping Figure.

Fig. 4 is the schematic diagram of sample article in specific example.

Fig. 5 is the schematic diagram of knowledge mapping relevant to the sample article in specific example.

Fig. 6 is the box for one embodiment of device that the present invention calculates text and the enterprise dominant degree of correlation using knowledge mapping Figure.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that the described embodiment is only a part of the embodiment of the present invention, instead of all the embodiments.Based on this Embodiment in invention, every other reality obtained by those of ordinary skill in the art without making creative efforts Example is applied, shall fall within the protection scope of the present invention.

Referring to Figure 1, the method that the present invention calculates text and the enterprise dominant degree of correlation using knowledge mapping, including following step It is rapid:

S101, text is obtained；

The text can be public sentiment text (i.e. public feelings information).

S102, word segmentation processing is carried out to text, extracts the keyword set occurred in text, passes through the knowledge pre-established Map retrieves enterprise dominant associated with keyword, using the enterprise dominant associated with keyword as candidate's Enterprise's set, wherein the knowledge mapping includes destination node information, associated nodal information, the destination node information Relationship and relevance weight between the associated nodal information, the destination node information include the first enterprise dominant Information, the associated nodal information include the second main information associated with the first main body enterprise dominant information, Product or natural person's information；

The knowledge mapping is established especially by following manner: being believed from destination node is extracted in database (such as corpus) Breath, associated nodal information are assigned according to the relationship between the destination node information and the associated nodal information Corresponding relevance weight, to constitute the knowledge mapping (reference can be made to Fig. 2).Wherein, the destination node information is first Enterprise dominant information (such as enterprise name are as follows: XX limited liability company), node letter associated with the destination node information Breath can be the second main information associated with the first enterprise dominant information, associated with the first main body company information Natural person's information (such as senior executive, shareholder under the first main body enterprise etc.) or associated with the first main body company information Product (such as product of the first main body Corporation R & D, listing).In the knowledge mapping, no matter the first main body company information Or the second enterprise dominant information can become destination node information, and the second enterprise dominant A in figure 2 becomes target When nodal information, then original first enterprise dominant is then the associated nodal information of the second enterprise dominant A in Fig. 2, Only their relationship has corresponding change.In the knowledge mapping, it is associated there that each destination node information is also presented Relationship and relevance weight between nodal information, the relationship between the first enterprise dominant and the second enterprise dominant include but not Be limited to: investment relation, supply-demand relationship, guarantee relationship etc., the relationship between natural person and the first enterprise dominant include that tenure is closed (such as shareholder, senior executive, employees etc.) such as systems.Such as second enterprise dominant A and first enterprise dominant relationship are as follows: second enterprise Industry main body A is the supplier of the first enterprise dominant, and relevance weight is 0.65, and product A is the product under the first enterprise dominant, is closed Connection property weight is 0.5, and natural person B is the shareholder of the first enterprise dominant, and relevance weight is 1.In above-mentioned knowledge mapping, according to not Bigger with the attribute information imparting respective relevancy of relationship, such as investment relation ratio, correlation is bigger；Position of holding a post is heavier It wants, correlation is bigger etc., the specific building mode present invention is not explained in detail.The knowledge mapping of building can pass through diagram data inventory Information is stored up, and for retrieval and inquisition.

In S102 step, by word segmentation processing, all keywords are obtained to form keyword set, the keyword Set is denoted as K, and the keyword in the keyword set K is searched in the knowledge mapping, obtains and the keyword set K Associated enterprise dominant is gathered the enterprise dominant associated with keyword as candidate enterprise, the candidate Enterprise set be denoted as C.

S103, the word frequency meter occurred according to the associated keyword of candidate enterprise dominant in the enterprise of candidate set Calculate the degree of association of text and the enterprise dominant of the candidate.Wherein, as follows according to the mode of word frequency calculating correlation:

Enabling F is the frequency matrix of keyword set K:

f_iIndicate the word frequency of i-th of keyword；

Based on the aggregation word frequency vector of set C and relative keyword:

Wherein,

Wherein, 0≤ry_i≤1

Wherein, ⊙ is matrix point multiplication operation,Indicate Ben Wenben to the degree of association of i-th of candidate enterprise dominant.It is based on This degree of association can screen the close enterprise dominant compared with the Ben Wenben degree of correlation with given threshold；It is also possible to i-th The relevant different texts of a main body are screened, are sorted.

It is as one preferred or optional, it can also pass through word frequency, the related coefficient of keyword and candidate enterprise dominant The degree of association of the text and the enterprise dominant of the candidate is calculated, as follows:

The word frequency vector F of statistics keyword K set first:

f_iIndicate the word frequency of i-th of keyword；

For correlation coefficient weighted frequency matrix:

Wherein,

Degree of correlation factor R Y is defined, for measuring the associated order of enterprise dominant candidate between different texts, β > 0, β To scale adjustment parameter, scale > 0 is that text information always segments the once purged obtained participle word quantity of number, for measuring Text length.

Wherein, 0≤ry_i≤1

It is intelligible, in other examples, the calculating of relationship weight be in order to preferably, more accurately calculate pass The degree of association between keyword and the enterprise dominant of candidate, in some embodiments, the technical characteristic of relationship weight not necessarily.

Embodiment of the present invention is foundation according to the knowledge mapping pre-established, after extracting the keyword in text, Each keyword is retrieved by the knowledge mapping to obtain enterprise dominant corresponding with the keyword, by the correspondence Enterprise dominant text is then appeared according to keyword to form candidate enterprise dominant set as candidate enterprise dominant Word frequency in this and the relationship weight between the enterprise dominant of candidate, and obtain the enterprise dominant of the text Yu the candidate The degree of association, improve the associated success rate of text and enterprise dominant (claiming Target Enterprise main body) and accuracy rate, enrich text envelope The relevant dimension of breath and Target Enterprise main body provides more accurate basis for subsequent further analysis.

Fig. 3 is referred to, Fig. 3 is that the method second that the present invention calculates text and the enterprise dominant degree of correlation using knowledge mapping is real Apply the flow chart of example.The method for calculating text and the enterprise dominant degree of correlation using knowledge mapping of the present embodiment includes following step It is rapid:

S201, text is obtained；

S202, paragraph division pretreatment is carried out to the text；

In this step, paragraph is carried out to the text in the following manner and divides pretreatment:

Setting public sentiment text information includes two title, text major parts, and text has P >=1 paragragh.Setting will be literary This text splits into the part H >=1, is denoted as part respectively₁,…,part_H, by part₀It is denoted as title division, the paragraph number of every part Amount is denoted as L=(l₀,l₁,…,l_H).Consider that the different paragraphs of text have different importance in the text, is split in text When, the length of text head and tail parts is limited, is enabledRespectively part 1 and the portion H Divide and accounts for total number of segment P maximum ratio, in the present embodiment, Ke YiquFor splitting the paragraph for including of every part Number calculation formula are as follows:

Wherein,Indicate the integer for being not less than x.Paragragh of the P for text, P >=1, the H are split for text The part divided, is denoted as part respectively₁,…,part_H, title is designated as part₀, the paragraph quantity of H >=1, every part is denoted as L =(l₀,l₁,…,l_H),Indicate that first part accounts for the maximum ratio of total number of segment P, It is total to indicate that the part H accounts for The maximum ratio of number of segment P,

In this step, after the paragraph divides pre-treatment step, corresponding weight also is assigned for paragraph position.Generally Ground, paragraphs to the title of text, front and tail portion paragraphs and assigns higher weights, and text middle position weight is relatively low. For example, the weight w of the title division of text₀It is 0.35, the weight w of preceding part₁It is 0.25, the weight w of portion_HIt is 0.25, in Between part w₂~w_H-1It is 0.15.

S203, word segmentation processing is carried out to text, extracts the keyword set occurred in text, passes through the knowledge pre-established Map retrieves enterprise dominant associated with keyword, using the enterprise dominant associated with keyword as candidate's Enterprise's set, wherein the knowledge mapping includes destination node information, associated nodal information, the destination node information Relationship and relevance weight between the associated nodal information, the destination node information include the first enterprise dominant Information, the associated nodal information include the second main information associated with the first main body enterprise dominant information, Product or natural person's information；

In this step, word segmentation processing is carried out to the segmentation text that S202 step obtains, and obtain text in conjunction with knowledge mapping In all candidate words that can be found in knowledge mapping, be marked as keyword, all keywords are formed Keyword set is denoted as K, and the keyword in the keyword set K is searched in the knowledge mapping, obtains and the key The associated enterprise dominant of set of words K is gathered the enterprise dominant associated with keyword as candidate enterprise, institute It states candidate enterprise's set and is denoted as C.

S204, the word frequency occurred according to the associated keyword of candidate enterprise dominant in the enterprise of candidate set, Paragraph position, relationship weight, the degree of association of text length calculating text and the enterprise dominant of the candidate, the text length are logical It crosses the quantity for the word got in participle step and determines.

This step calculates the degree of association of text and the enterprise dominant of the candidate in the following manner:

Enabling W is weight matrix of the keyword in paragraph position:

Enable the correlation matrix of set C and its keyword set K based on R:

f_ijIndicate i-th of keyword in part_jPartial word frequency；

For correlation coefficient weighted frequency matrix:

Wherein,

Wherein, 0≤ry_i≤1

Embodiment of the present invention divides pretreatment by carrying out paragraph to text, and assigns corresponding power to text fragment Weight, in this way, determining the weight matrix of keyword by the paragraph position where text, then tie related coefficient after word segmentation processing Weighted Term Frequency matrix can obtain the degree of correlation factor, obtain the correlation matrix of text and candidate enterprise dominant set C, from And more accurately obtain the degree of association of each of entire text and candidate enterprise dominant set C enterprise dominant.

It is explained in detail below by way of a specific example and illustrates how to calculate text and the enterprise dominant degree of correlation using knowledge mapping Method:

Fig. 4 and Fig. 5 is referred to, Fig. 4 is the sample article of the example, and Fig. 5 is knowledge graph corresponding with the sample article Spectrum, because position is limited, only shows the partial knowledge map centered on " LeTV information technology (Beijing) limited liability company ".

The first step pre-processes sample article, and in sample article, altogether there are four paragragh, P=4 takes textH=3,

The paragraph and weight obtained according to the formula is as follows:

Table 1W=(0.35,0.25,0.15,0.25)

Second step extracts the keyword in text and extracts candidate host complex

(1) keyword set in title and text:

K={ LeEco, Sun Hongbin, circle of friends, LeTV, new LeEco intelligence man, Tencent, Tencent's video, LeEco TV, happy wound Entertainment }

(2) it is retrieved in knowledge mapping, there is the enterprise of direct correlation to gather with K:

C={ LeTV information technology (Beijing) limited liability company, Shenzhen Tencent Computer System Co., Ltd }

Third step calculates the degree of association of public sentiment text and candidate target main body

In conjunction with the related coefficient (number on line) in knowledge mapping, it can obtain host complex C's and its keyword set K Correlation matrix R:

Table 2

Frequency matrix F is as follows:

It can obtainMatrix is as follows:

After cleaning text information always segments word quantity, obtaining participle number is 148, and scale=148 takes β=100

Obtain the correlation matrix R of text Yu host complex C^KCIt is as follows:

So the degree of association of sample article and " LeTV information technology (Beijing) limited liability company " is 0.526, with " depth The degree of association of computer system Co., Ltd of Tencent of ditch between fields city " is 0.122.(coefficient is that citing is assumed in the above specific example)

Fig. 6 is referred to, the invention also discloses a kind of dresses that text and the enterprise dominant degree of correlation are calculated using knowledge mapping It sets, comprising:

Text obtains module, for obtaining text；

Calculation of relationship degree module, for according to the candidate associated key of enterprise dominant in the enterprise of candidate set The degree of association of the enterprise dominant of word frequency, relationship weight calculation text and the candidate that word occurs.

Further include that paragraph divides preprocessing module as optional, divide pretreatment for carrying out paragraph to the text, It is also used to assign corresponding weight to text fragment；

The calculation of relationship degree module is also used to according to the candidate enterprise dominant association in the enterprise of candidate set The word frequency that occurs of keyword, paragraph position, relationship weight, text length calculate text and the candidate enterprise dominant pass Connection degree.

As optional, the paragraph divides preprocessing module and by following formula carries out paragraph and divide to pre-process:

Wherein,Integer of the expression not less than x, paragragh of the P for text, P >=1, the H are split for text The part divided, is denoted as part respectively₁,…,part_H, title is designated as part₀, the paragraph quantity of H >=1, every part is denoted as L =(l₀,l₁,…,l_H),Indicate that first part accounts for the maximum ratio of total number of segment P, Indicate that the part H accounts for The maximum ratio of total number of segment P,

As optional, the word segmentation module is also used to carry out at participle the segmentation text divided by paragraph Reason, obtains all keywords to form keyword set, the keyword set is denoted as K, searches in the knowledge mapping Keyword in the keyword set K obtains enterprise dominant associated with the keyword set K, will described and pass The associated enterprise dominant of keyword is gathered as candidate enterprise, and enterprise's set of the candidate is denoted as C.

Embodiment of the present invention, each module of the device that text and the enterprise dominant degree of correlation are calculated using knowledge mapping Function description can be found in the above method description, just no longer repeat one by one herein.

The above is only embodiments of the present invention, are not intended to limit the scope of the invention, all to utilize the present invention Equivalent structure or equivalent flow shift made by specification and accompanying drawing content is applied directly or indirectly in other relevant technologies Field is included within the scope of the present invention.

Claims

1. a kind of method for calculating text and the enterprise dominant degree of correlation using knowledge mapping, comprising the following steps:

Obtain text；

Word segmentation processing is carried out to text, extracts the keyword set occurred in text, passes through the knowledge mapping pre-established, retrieval Enterprise dominant associated with keyword is gathered the enterprise dominant associated with keyword as candidate enterprise, Wherein, the knowledge mapping include destination node information, associated nodal information, the destination node information to it is described related Relationship and relevance weight between the nodal information of connection, the destination node information includes the first enterprise dominant information, described Associated nodal information includes the second main information associated with the first main body enterprise dominant information, product or nature People's information；

According to the word frequency that the associated keyword of candidate enterprise dominant in the enterprise of candidate set occurs calculate text with The degree of association of the enterprise dominant of the candidate.

2. the method for calculating text and the enterprise dominant degree of correlation using knowledge mapping as described in claim 1, which is characterized in that Word segmentation processing is being carried out to text, is extracting the keyword set that occurs in text, by the knowledge mapping pre-established, retrieval with The associated enterprise dominant of keyword, using the enterprise dominant associated with keyword as the step of candidate enterprise's set In rapid, comprising:

Word segmentation processing is carried out to text, obtains all keywords to form keyword set, the keyword set is denoted as K, The keyword in the keyword set K is searched in the knowledge mapping, obtains enterprise associated with the keyword set K Owner's body is gathered the enterprise dominant associated with keyword as candidate enterprise, enterprise's set of the candidate It is denoted as C.

3. the method for calculating text and the enterprise dominant degree of correlation using knowledge mapping as claimed in claim 2, which is characterized in that Text and institute are calculated in the word frequency occurred according to the associated keyword of candidate enterprise dominant in the enterprise of candidate set In the step of stating the degree of association of candidate enterprise dominant, comprising:

Enabling F is the frequency matrix of keyword set K:

f_iIndicate the word frequency of i-th of keyword；

The correlation matrix of set C and its keyword set K based on R are enabled, it is 1 that knowledge mapping node, which is connected, map node It is not attached to as 0:

Based on the aggregation word frequency vector of set C and relative keyword:

Wherein,Indicate in text whole keyword word frequency relevant to i-th of candidate enterprise dominant it With；

Wherein, (1 ..., 1) u=,

Wherein,0≤rx_i≤ 1,

Degree of correlation factor R Y is defined, for measuring the associated order of enterprise dominant candidate between different texts, β > 0, β are contracting Adjustment parameter is put, scale > 0 is that text information always segments the once purged obtained participle word quantity of number, for measuring text Length；

Wherein, 0≤ry_i≤1

4. the method for calculating text and the enterprise dominant degree of correlation using knowledge mapping as claimed in claim 2, which is characterized in that In the step of calculating the degree of association of enterprise dominant of text and the candidate, further includes:

According to the word frequency of the associated keyword appearance of candidate enterprise dominant in the enterprise of candidate set, relationship weight meter Calculate the degree of association of text and the enterprise dominant of the candidate.

5. the method for calculating text and the enterprise dominant degree of correlation using knowledge mapping as claimed in claim 4, which is characterized in that Word frequency, the relationship weight calculation occurred according to the associated keyword of candidate enterprise dominant in the enterprise of candidate set is literary In the step of sheet and the degree of association of the enterprise dominant of the candidate, comprising:

The word frequency vector F of statistics keyword K set first:

f_iIndicate the word frequency of i-th of keyword；

ri_jIndicate the related coefficient of i-th candidate enterprise dominant and j-th of keyword；

For correlation coefficient weighted frequency matrix:

Wherein, (1 ..., 1) u=,

Wherein, 0≤rx_i≤ 1,

Wherein, 0≤ry_i≤1；

6. the method for calculating text and the enterprise dominant degree of correlation using knowledge mapping as claimed in claim 4, which is characterized in that Before the step of carrying out word segmentation processing to the text, further includes:

The word frequency that is occurred according to the associated keyword of candidate enterprise dominant in the enterprise of candidate set, paragraph position, Relationship weight, text length calculate the degree of association of text and the enterprise dominant of the candidate.

7. the method for calculating text and the enterprise dominant degree of correlation using knowledge mapping as claimed in claim 6, which is characterized in that Paragraph is carried out to the text by following formula and divides pretreatment:

Wherein,Indicate that the integer for being not less than x, the P are the paragragh of text, P >=1, the H is what text was split Part is denoted as part respectively₁,…,part_H, title is designated as part₀, the paragraph quantity of H >=1, every part is denoted as L= (l₀,l₁,…,l_H),Indicate that first part accounts for the maximum ratio of total number of segment P, It is total to indicate that the part H accounts for The maximum ratio of number of segment P,

8. the method for calculating text and the enterprise dominant degree of correlation using knowledge mapping as claimed in claim 7, which is characterized in that In the word frequency occurred according to the associated keyword of candidate enterprise dominant in the enterprise of candidate set, paragraph position, close In the step of being the degree of association of the enterprise dominant of weight, text length calculating text and the candidate, including following sub-step:

Enabling W is weight matrix of the keyword in paragraph position:

W=(w₀,w₁,…,w_H),

f_ijIndicate i-th of keyword in part_jPartial word frequency；

For correlation coefficient weighted frequency matrix:

Wherein, (1 ..., 1) u=,

Wherein, 0≤rx_i≤ 1,

Wherein, 0≤ry_i≤1

9. a kind of device for calculating text and the enterprise dominant degree of correlation using knowledge mapping, comprising:

Text obtains module, for obtaining text；

Word segmentation module extracts the keyword set occurred in text, passes through what is pre-established for carrying out word segmentation processing to text Knowledge mapping retrieves enterprise dominant associated with keyword, and the enterprise dominant associated with keyword is used as and is waited The enterprise of choosing gathers, wherein the knowledge mapping include several nodal informations, each nodal information and corresponding nodal information it Between relationship and relevance weight, in several nodal informations, nodal information therein is enterprise dominant information, remaining section Point information is the corresponding product information of corresponding enterprise dominant or natural person's information；

Calculation of relationship degree module, for being gone out according to the candidate associated keyword of enterprise dominant in the enterprise of candidate set Existing word frequency calculates the degree of association of text and the enterprise dominant of the candidate.

10. calculating the device of text and the enterprise dominant degree of correlation using knowledge mapping as claimed in claim 9, feature exists In the calculation of relationship degree module is also used to according to the candidate associated pass of enterprise dominant in the enterprise of candidate set The degree of association of the enterprise dominant of word frequency, relationship weight calculation text and the candidate that keyword occurs.