CN106250412A - The knowledge mapping construction method merged based on many source entities - Google Patents
The knowledge mapping construction method merged based on many source entities Download PDFInfo
- Publication number
- CN106250412A CN106250412A CN201610583823.0A CN201610583823A CN106250412A CN 106250412 A CN106250412 A CN 106250412A CN 201610583823 A CN201610583823 A CN 201610583823A CN 106250412 A CN106250412 A CN 106250412A
- Authority
- CN
- China
- Prior art keywords
- page
- synonym
- similarity
- limit
- title
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Abstract
The invention discloses a kind of knowledge mapping construction method merged based on many source entities.First the present invention crawls Chinese three big encyclopaedias: Baidupedia, interactive encyclopaedia, wikipedia, and data are done pretreatment, extracts including title synonym, the qi page that disappears extracts, Candidate Set extraction and text participle etc..Then, for the page in same Candidate Set, calculate the feature between the page two-by-two, and train the similarity between the classifier calculated page, and build weight map according to similarity.Finally, by mixed linear programming model, retrain the relation between summit and summit in weight map, by the maximum of calculating target function, obtain the connectedness between summit and summit, by each connected component as an entity, thus obtain all pages describing same entity.The present invention, by introducing Candidate Set, substantially reduces the scale of problem;Simultaneously further through mixed linear programming model, improve the accuracy rate that entity merges.
Description
Technical field
The present invention relates to Text similarity computing method, particularly relate to a kind of knowledge mapping structure merged based on many source entities
Construction method.
Background technology
Along with developing rapidly of the Internet, the approach that people obtain information and knowledge is more and more diversified, but magnanimity
Data are distributed in each corner of the Internet, and this obtains knowledge to user and brings the biggest obstacle.Therefore, a system is built
One complete knowledge base is extremely urgent.
Currently existing many knowledge base, such as DBpedia is a special semantic net exemplary applications, and it is from Wiki
Capture structurized data in the entry of encyclopaedia, to strengthen the function of searching of wikipedia, and other data set are linked to
Wikipedia;Freebase is a large-scale cooperation knowledge base, and it incorporates the many resources on network.In Freebase
Entry is also similar with DBpedia, all uses the form of structural data.By accessing its data it appeared that the most all of in
Holding is all to format, and stores according to the form of tlv triple and shows.This pattern is fixing, and same type of entry is all wrapped
Containing identical attribute.For these reasons, just can link together easily between homogeneous data, provide for information inquiry
Facility.Freebase comprises number theme in terms of necessarily, thousands of type and attribute.But the language of these knowledge bases
It is all English, the complete knowledge base that Chinese field also neither one is large-scale at present.
In traditional Entities Matching algorithm about knowledge base, it is mainly based upon the coupling of paired entity, and this is asked
Topic form one classification problem of chemical conversion.But, most of this kind of algorithms all depend heavily on the quality of data template.For
For web data, data are not to present with a unified triple form, and the data of homology are not on expression-form
Also having bigger difference, the suitability in our this problem of the most this method is relatively low.
In other matching algorithm, the structural information of the page is also allowed in feature, such as in Chinese and English Wiki
Entities Matching in because with the presence of quite a few page across language link, so this partial information can be as elder generation
Test knowledge.But, there is no any link between our multi-source data, so the architectural feature of the page cannot include feature in
Among.
In the feature calculation of two set, it is possible to use Jaccard coefficient.Jaccard coefficient is mainly used in calculating symbol
Number tolerance or Boolean tolerance individuality between similarity because the characteristic attribute of individuality is all to be measured or Boolean by symbol
Mark, therefore cannot weigh the size of difference occurrence, can only obtain " the most identical " this result, so Jaccard coefficient
It is only concerned between individuality and jointly has the characteristic that no this problem consistent.If comparing the Jaccard similarity coefficient of X Yu Y, only than
Relatively XnAnd YnIn identical number.
In characteristic similarity calculates, many algorithms are had to apply.Simply can directly calculate Euclidean distance or
COS distance.Grader can also be used to calculate similarity according to features training grader.Random forest is that a kind of performance is good
Good grader, can be used in characteristic similarity calculating.It refers to utilize many decision trees to be trained sample and in advance
A kind of grader surveyed, and classification of its output be classifications by indivedual tree outputs mode depending on.Random forest has perhaps
Many advantages, such as during Character losing, still can keep higher accuracy, and will not produce over-fitting problem.
Summary of the invention
The present invention, for integrating multi-source encyclopaedic knowledge, builds unified knowledge base, it is provided that one merges based on many source entities
Knowledge mapping construction method.The encyclopaedia of homology would generally not comprise the multiple pages describing same entity, and many source entities melt
Conjunction technology can find these pages in the data of magnanimity, and map that to same physically.
The technical scheme that the present invention solves the employing of its technical problem is as follows: a kind of knowledge mapping merged based on many source entities
Construction method, comprises the following steps:
1) the pretreatment encyclopaedia page: extract the synonym of encyclopaedia title, extracts the qi page that disappears, and utilizes synon transmission to close
System builds synonym phrase, and all synonym phrases form synonym phrase set, according to each synonym phrase in synonym phrase set
Corresponding page makeup Candidate Set, carries out participle with participle instrument to the text of the encyclopaedia page.
2) by step 1) word segmentation result, calculate the feature between the page two-by-two in same Candidate Set, by instruction
Practicing grader is that every one-dimensional characteristic composes upper different weight, and utilizes the similarity between this classifier calculated page.
3) according to step 2) in similarity between the page that calculates build the weight map of this Candidate Set, utilize mixed linear
Plan model, defines this model objective function, and the maximum of calculating target function, obtains the connection between summit and summit
Property.By each connected component in weight map as an entity, thus obtain all pages describing same entity.
Further, described step 1) including:
1.1) extracting the synonym of encyclopaedia title, extracting mode includes following two:
A) template matching: utilize specific template to remove to mate the beginning of each page and a word of summary, if
It is made into merit, then obtains synonym pair.Template is artificially defined, contains major part synonym to pattern occur.
B) link redirect: jump to another page by hyperlink in the page, if the title of another page and
The text of this hyperlink is different, then it is assumed that the two word is synonym.
1.2) the qi page that disappears is extracted: kth encyclopaedia is expressed asK maximum is 3, wherein aiRepresent page
Face, n representation page total quantity.By all pages occurred in the qi page that disappears, qi page set M that disappears can be extracted, inside set M
The page all can not represent same entity the most two-by-two.
M={ai∈εk|ai∈M≠aj∈M}
1.3) Candidate Set is extracted: according to synon transitivity, if A and B synonym each other, A and C synonym each other,
So B and C synonym the most each other.In this way, synonym phrase S is obtainedt, all synonym phrase StForm synonym phrase collection
Close, the synonym each other of element two-by-two in each synonym phrase of this set.
Given St, from all encyclopaedia source, find out title belong to StThe page, all these page constitute Candidate Set Pt。
Pt={ a ∈ ε1,…,K|a.Title∈St}
K is the sum of encyclopaedia;A.Title is the title of page a.
1.4) text to the encyclopaedia page carries out participle: 5 territory participles to the page, including summary, message box (key and
Value), link, catalogue, user tag, and remove stop words and the length word less than 2.
Further, described step 2) including:
2.1) 6 territories that one page of definition is comprised, including title T, make a summary A, message box I, catalogue C, user tag G
With link L, represent a page by 6 tuples:
A={T, A, I, C, G, L}
Wherein message box is expressed as key-value pair, therefore I={P, V}, and wherein P represents that attribute, V represent property value;
For belonging to 2 pages of same Candidate Set, if they describe be an entity, then their text
Duplication can be bigger, therefore following 7 features of definition, as follows:
1) summary feature
2) message box attribute character
3) message box property value feature
4) directory feature
5) user tag feature
6) chain feature
7) global characteristics, S represents the 6 tuples { string-concatenation of T, A, I, C, G, L}
Sw(X) represent the results set after character string X participle.
2.2) will be in step 2.1) 7 features obtaining as the input of grader, utilize in Weka algorithm bag
RandomForest Algorithm for Training two classification device, then with this two classification device predict between two pages similar
Degree.
Further, described step 3) specifically include following steps:
3.1) according to step 2) similarity between the calculated page builds the weight map of this Candidate Set, two nodes
Between weight limit similarity represent.Thus, former problem is converted into the choice problem on limit.Use yijRepresent between two nodes
Whether there is a limit:
It is simultaneously introduced other penalty terms and constraints to build mixed linear programming model:
Penalty term 1:
If aiWith ajThere are limit, and aiWith akThere is limit, so ajWith akBetween also should have a limit, otherwise add penalty term φ,
It is multiplied by coefficient u as adjusting parameter simultaneously.Therefore for φ, there is a following constraint:
φjk≥0
Penalty term 2:
If aiWith ajBetween similarity the highest, then the probability having limit between them is the biggest.For two similarities very
Little aiWith ajIf there being limit between them, then penalty term is relatively big, if aiWith ajSimilarity bigger, then penalty term is relatively
Little.Therefore, ψ is usedijRepresenting penalty term, represent adjustment parameter with λ, this penalty term following formula retrains:
ψij≥0
sim(ai,aj) it is aiAnd ajBetween weight;
Penalty term 3:
For a occurred inside disappearing qi page set M atiWith ajIf, yijEqual to 1, then show matching error, because of
This needs to use penalty term ζijRetrain aiWith ajBetween there is no limit.This constraints is represented by following formula:
ζij≥0
N is the number of qi page set of disappearing;
Additionally, similarity is arranged threshold tau, only similarity is more than a of threshold tauiWith ajThe page between just can have limit.
Comprehensive each penalty term above and threshold value, obtain object function as follows:
s.t.yij∈{0,1},φij,ψij,ζij≥0
Try to achieve the maximum of this object function, thus obtain parameter y on limit corresponding to this maximumij。
3.2) by each connected component in this weight map as an entity, obtain describing all pages of an entity
Face.
The inventive method compared with prior art has the advantages that
1. the method utilizes title synonym, obtains title Candidate Set, then obtains page Candidate Set from title Candidate Set,
At page candidate's centralized calculation Page resemblance, thus reduce the scale of problem largely so that ensuing
Algorithm is implemented simpler.
2. the method is according to page structure, is extracted the Jaccard coefficient of 7 text features, and uses random forest to calculate
Method calculates the similarity between the page and the page, and this similarity can accurately react the similarity of the page.
3. the method is to the similarity modeling between the page on figure, utilizes mixed linear programming model to try to achieve summit on figure
And the relation between summit, i.e. relation between the page and the page.By these relations, a non-directed graph can be built.At this
In individual non-directed graph, can accurately obtain describing all pages of an entity.
Accompanying drawing explanation
Fig. 1 is the overview flow chart of the present invention;
Fig. 2 is step 2) flow chart;
Fig. 3 is step 3) flow chart;
Fig. 4 is step 4) flow chart.
Detailed description of the invention
With specific embodiment, the present invention is made into once describing in detail below in conjunction with the accompanying drawings.
As Figure 1-Figure 4, the step of the knowledge mapping construction method merged based on many source entities is as follows:
1) the pretreatment encyclopaedia page: extract the synonym of encyclopaedia title, extracts the qi page that disappears, and utilizes synon transmission to close
System builds synonym phrase, and all synonym phrases form synonym phrase set, according to each synonym phrase in synonym phrase set
Corresponding page makeup Candidate Set, carries out participle with participle instrument to the text of the encyclopaedia page.
2) by step 1) word segmentation result, calculate the feature between the page two-by-two in same Candidate Set, by instruction
Practicing grader is that every one-dimensional characteristic composes upper different weight, and utilizes the similarity between this classifier calculated page.
3) according to step 2) in similarity between the page that calculates build the weight map of this Candidate Set, utilize mixed linear
Plan model, defines this model objective function, and the maximum of calculating target function, obtains the connection between summit and summit
Property.By each connected component in weight map as an entity, thus obtain all pages describing same entity.
Described step 1) be:
1.1) extracting the synonym of encyclopaedia title, extracting mode includes following two:
A) template matching: utilize specific template to remove to mate the beginning of each page and a word of summary, if
It is made into merit, then obtains synonym pair.Template is artificially defined, contains major part synonym to pattern occur.Such as: for same
, in short would generally there is " A has another name called B " in the beginning of the page or the of summary in the page of justice word, " A have another name called B ", and " A is the same of B
Justice word " etc. character string, mated by canonical, a part of synonym pair can be obtained.
B) link redirect: jump to another page by hyperlink in the page, if the title of another page and
The text of this hyperlink is different, then it is assumed that the two word is synonym.
1.2) the qi page that disappears is extracted: kth encyclopaedia is expressed asK maximum is 3, wherein aiRepresent page
Face, n representation page total quantity.By all pages occurred in the qi page that disappears, qi page set M that disappears can be extracted, inside set M
The page all can not represent same entity the most two-by-two.
M={ai∈εk|ai∈M≠aj∈M}
1.3) Candidate Set is extracted: according to synon transitivity, if A and B synonym each other, A and C synonym each other,
So B and C synonym the most each other.In this way, synonym phrase S is obtainedt, all synonym phrase StForm synonym phrase collection
Close, the synonym each other of element two-by-two in each synonym phrase of this set.
Given St, from all encyclopaedia source, find out title belong to StThe page, all these page constitute Candidate Set Pt。
Pt={ a ∈ ε1,…,K|a.Title∈St}
K is the sum of encyclopaedia;A.Title is the title of page a.
1.4) text to the encyclopaedia page carries out participle: 5 territory participles to the page, including summary, message box (key and
Value), link, catalogue, user tag, and remove stop words and the length word less than 2.
Described step 2) including:
2.1) 6 territories that one page of definition is comprised, including title T, make a summary A, message box I, catalogue C, user tag G
With link L, represent a page by 6 tuples:
A={T, A, I, C, G, L}
Wherein message box is expressed as key-value pair, therefore I={P, V}, and wherein P represents that attribute, V represent property value;
For belonging to 2 pages of same Candidate Set, if they describe be an entity, then their text
Duplication can be bigger, and therefore following 7 features of definition, as follows: 1) summary feature
2) message box attribute character
3) message box property value feature
4) directory feature
5) user tag feature
6) chain feature
7) global characteristics, S represents the 6 tuples { string-concatenation of T, A, I, C, G, L}
Sw(X) represent the results set after character string X participle.
2.2) will be in step 2.1) 7 features obtaining as the input of grader, utilize in Weka algorithm bag
RandomForest Algorithm for Training two classification device, then with this two classification device predict between two pages similar
Degree.
Described step 3) specifically include following steps:
3.1) according to step 2) similarity between the calculated page builds the weight map of this Candidate Set, two nodes
Between weight limit similarity represent.Thus, former problem is converted into the choice problem on limit.Use yijRepresent between two nodes
Whether there is a limit:
It is simultaneously introduced other penalty terms and constraints to build mixed linear programming model:
Penalty term 1:
If aiWith ajThere are limit, and aiWith akThere is limit, so ajWith akBetween also should have a limit, otherwise add penalty term φ,
It is multiplied by coefficient u as adjusting parameter simultaneously.Therefore for φ, there is a following constraint:
φjk≥0
Penalty term 2:
If aiWith ajBetween similarity the highest, then the probability having limit between them is the biggest.For two similarities very
Little aiWith ajIf there being limit between them, then penalty term is relatively big, if aiWith ajSimilarity bigger, then penalty term is relatively
Little.Therefore, ψ is usedijRepresenting penalty term, represent adjustment parameter with λ, this penalty term following formula retrains:
ψij≥0
sim(ai,aj) it is aiAnd ajBetween weight;
Penalty term 3:
For a occurred inside disappearing qi page set M atiWith ajIf, yijEqual to 1, then show matching error, because of
This needs to use penalty term ζijRetrain aiWith ajBetween there is no limit.This constraints is represented by following formula:
ζij≥0
N is the number of qi page set of disappearing;
Additionally, similarity is arranged threshold tau, only similarity is more than a of threshold tauiWith ajThe page between just can have limit.
Comprehensive each penalty term above and threshold value, obtain object function as follows:
s.t.yij∈{0,1},φij,ψij,ζij≥0
Try to achieve the maximum of this object function, thus obtain parameter y on limit corresponding to this maximumij。
3.2) by each connected component in this weight map as an entity, obtain describing all pages of an entity
Face.
Embodiment
Provide below the step that realizes of an example in detail present invention:
(1) data set that example uses is from Baidupedia and interactive encyclopaedia, and wherein the page quantity of Baidupedia is
10143321, the page quantity of interactive encyclopaedia is 6618544.
(2) according to all pages in (1), analyze page column structure, extract title, summary, catalogue, classify, link,
The information such as message box, and these information are stored in lucene index.In addition to title, other territory can be all empty.
(3) according to all pages in (1), title synonym is extracted.Synon extracting method mainly includes template
Join and link redirection.By the synonym pair extracted, obtain title TongYiCi CiLin further.With these title synonyms
Set is gone and the page title coupling in (1), obtains the Candidate Set page.
(4) in the Candidate Set page that (3) obtain, extract the feature between the page two-by-two, and be characterized as input with these,
Training random forest grader.In this step, need manually to mark training set.
(5) similarity matrix obtained based on step (4), builds mixed linear programming model, can be pushed up with this model
Relation between point and summit, 1 represents there is limit between two summits, and 0 represents do not have limit between two summits.With these summits and
Limit is input, can build a non-directed graph.Extract each connected component in non-directed graph, the page that these connected components represent
Face represents an entity.
The operation result of this example:
For Similarity Measure, have employed 5 kinds of methods and contrast, finally show that the effect of random forest grader is
Alright.The calculating of similarity by tetra-kinds of evaluation indexes of Precision, Recall, F1 and Accuracy by used herein
Method (SCM) and additive method, including greed coupling (GA), hierarchical clustering (AC), minimum spanning tree cluster (MSTC) and association
Compare with cluster (CC), the result obtained such as following table:
Method | Precision | Recall | F1 | Accuracy |
GA | 78.3% | 76.1% | 77.2% | 91.6% |
AC | 73.0% | 79.0% | 75.9% | 91.5% |
MSTC | 63.4% | 80.5% | 71% | 88.8% |
CC | 62.4% | 65.5% | 63.9% | 87.4% |
SCM | 75.8% | 82.5% | 79.0% | 92.5 |
Contrasted by upper table it can be seen that this method will be better than additive method in the performance of F1 and Accuracy.Cause
This, this method has good use value and application prospect in terms of Entities Matching.
Claims (4)
1. the knowledge mapping construction method merged based on many source entities, it is characterised in that comprise the following steps:
1) the pretreatment encyclopaedia page: extract the synonym of encyclopaedia title, extracts the qi page that disappears, utilizes synon transitive relation structure
Building synonym phrase, all synonym phrases form synonym phrase set, corresponding according to each synonym phrase in synonym phrase set
Page makeup Candidate Set, with participle instrument, the text of the encyclopaedia page is carried out participle.
2) by step 1) word segmentation result, calculate the feature between the page two-by-two in same Candidate Set, by training point
Class device is that every one-dimensional characteristic composes upper different weight, and utilizes the similarity between this classifier calculated page.
3) according to step 2) in similarity between the page that calculates build the weight map of this Candidate Set, utilize mixed linear programming
Model, defines this model objective function, and the maximum of calculating target function, obtains the connectedness between summit and summit.Will
Each connected component in weight map is as an entity, thus obtains all pages describing same entity.
2. according to a kind of knowledge mapping construction method merged based on many source entities described in claim 1, it is characterised in that
Described step 1) including:
1.1) extracting the synonym of encyclopaedia title, extracting mode includes following two:
A) template matching: utilize specific template to remove to mate the beginning of each page and a word of summary, if mated into
Merit, then obtain synonym pair.Template is artificially defined, contains major part synonym to pattern occur.
B) link redirects: jump to another page by hyperlink in the page, if the title of another page surpasses with this
The text of link is different, then it is assumed that the two word is synonym.
1.2) the qi page that disappears is extracted: kth encyclopaedia is expressed asK maximum is 3, wherein aiRepresentation page, n table
Show page total quantity.By all pages occurred in the qi page that disappears, qi page set M that disappears, any two inside set M can be extracted
Two pages all can not represent same entity.
M={ai∈εk|ai∈M≠aj∈M}
1.3) Candidate Set is extracted: according to synon transitivity, if A and B synonym each other, A and C synonym each other, then B
With C synonym the most each other.In this way, synonym phrase S is obtainedt, all synonym phrase StForm synonym phrase set, should
The synonym each other of element two-by-two in each synonym phrase of set.
Given St, from all encyclopaedia source, find out title belong to StThe page, all these page constitute Candidate Set Pt。
Pt={ a ∈ ε1,…,K|a.Title∈St}
K is the sum of encyclopaedia;A.Title is the title of page a.
1.4) text to the encyclopaedia page carries out participle: 5 territory participles to the page, including summary, message box (key and value), chain
Connect, catalogue, user tag, and remove stop words and the length word less than 2.
3. according to a kind of knowledge mapping construction method merged based on many source entities described in claim 1, it is characterised in that
Described step 2) including:
2.1) 6 territories that one page of definition is comprised, including title T, make a summary A, message box I, catalogue C, user tag G and chain
Meet L, represent a page by 6 tuples:
A={T, A, I, C, G, L}
Wherein message box is expressed as key-value pair, therefore I={P, V}, and wherein P represents that attribute, V represent property value;
For belonging to 2 pages of same Candidate Set, if what they described is an entity, then their text is overlapping
Rate can be bigger, therefore following 7 features of definition, as follows:
1) summary feature
2) message box attribute character
3) message box property value feature
4) directory feature
5) user tag feature
6) chain feature
7) global characteristics, S represents the 6 tuples { string-concatenation of T, A, I, C, G, L}
Sw(X) represent the results set after character string X participle.
2.2) will be in step 2.1) 7 features obtaining as the input of grader, utilize in Weka algorithm bag
RandomForest Algorithm for Training two classification device, then with this two classification device predict between two pages similar
Degree.
4. a kind of knowledge mapping construction method merged based on many source entities described in claim 1, it is characterised in that described
Step 3) specifically include following steps:
3.1) according to step 2) similarity between the calculated page builds the weight map of this Candidate Set, between two nodes
Weight limit similarity represent.Thus, former problem is converted into the choice problem on limit.Use yijWhether represent between two nodes
There is a limit:
It is simultaneously introduced other penalty terms and constraints to build mixed linear programming model:
Penalty term 1:
If aiWith ajThere are limit, and aiWith akThere is limit, so ajWith akBetween also should have a limit, otherwise add penalty term φ, simultaneously
It is multiplied by coefficient u as adjusting parameter.Therefore for φ, there is a following constraint:
φjk≥0
Penalty term 2:
If aiWith ajBetween similarity the highest, then the probability having limit between them is the biggest.The least for two similarities
aiWith ajIf there being limit between them, then penalty term is relatively big, if aiWith ajSimilarity bigger, then penalty term is less.Cause
This, use ψijRepresenting penalty term, represent adjustment parameter with λ, this penalty term following formula retrains:
ψij≥0
sim(ai,aj) it is aiAnd ajBetween weight;
Penalty term 3:
For a occurred inside disappearing qi page set M atiWith ajIf, yijEqual to 1, then show matching error, therefore need
Penalty term ζ to be usedijRetrain aiWith ajBetween there is no limit.This constraints is represented by following formula:
ζij≥0
N is the number of qi page set of disappearing;
Additionally, similarity is arranged threshold tau, only similarity is more than a of threshold tauiWith ajThe page between just can have limit.
Comprehensive each penalty term above and threshold value, obtain object function as follows:
s.t. yij∈{0,1},φij,ψij,ζij≥0
Try to achieve the maximum of this object function, thus obtain parameter y on limit corresponding to this maximumij。
3.2) by each connected component in this weight map as an entity, obtain describing all pages of an entity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610583823.0A CN106250412B (en) | 2016-07-22 | 2016-07-22 | Knowledge mapping construction method based on the fusion of multi-source entity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610583823.0A CN106250412B (en) | 2016-07-22 | 2016-07-22 | Knowledge mapping construction method based on the fusion of multi-source entity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106250412A true CN106250412A (en) | 2016-12-21 |
CN106250412B CN106250412B (en) | 2019-04-23 |
Family
ID=57604424
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610583823.0A Active CN106250412B (en) | 2016-07-22 | 2016-07-22 | Knowledge mapping construction method based on the fusion of multi-source entity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106250412B (en) |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777331A (en) * | 2017-01-11 | 2017-05-31 | 北京航空航天大学 | Knowledge mapping generation method and device |
CN106844658A (en) * | 2017-01-23 | 2017-06-13 | 中山大学 | A kind of Chinese text knowledge mapping method for auto constructing and system |
CN106909643A (en) * | 2017-02-20 | 2017-06-30 | 同济大学 | The social media big data motif discovery method of knowledge based collection of illustrative plates |
CN107038257A (en) * | 2017-05-10 | 2017-08-11 | 浙江大学 | A kind of city Internet of Things data analytical framework of knowledge based collection of illustrative plates |
CN107220386A (en) * | 2017-06-29 | 2017-09-29 | 北京百度网讯科技有限公司 | Information-pushing method and device |
CN107423820A (en) * | 2016-05-24 | 2017-12-01 | 清华大学 | The knowledge mapping of binding entity stratigraphic classification represents learning method |
CN108182295A (en) * | 2018-02-09 | 2018-06-19 | 重庆誉存大数据科技有限公司 | A kind of Company Knowledge collection of illustrative plates attribute extraction method and system |
CN108399180A (en) * | 2017-02-08 | 2018-08-14 | 腾讯科技(深圳)有限公司 | A kind of knowledge mapping construction method, device and server |
CN108694177A (en) * | 2017-04-06 | 2018-10-23 | 北大方正集团有限公司 | Knowledge mapping construction method and system |
CN108777635A (en) * | 2018-05-24 | 2018-11-09 | 梧州井儿铺贸易有限公司 | A kind of Enterprise Equipment Management System |
CN109033129A (en) * | 2018-06-04 | 2018-12-18 | 桂林电子科技大学 | Multi-source Information Fusion knowledge mapping based on adaptive weighting indicates learning method |
CN109284394A (en) * | 2018-09-12 | 2019-01-29 | 青岛大学 | A method of Company Knowledge map is constructed from multi-source data integration visual angle |
CN109522547A (en) * | 2018-10-23 | 2019-03-26 | 浙江大学 | Chinese synonym iteration abstracting method based on pattern learning |
CN109657069A (en) * | 2018-12-11 | 2019-04-19 | 北京百度网讯科技有限公司 | The generation method and its device of knowledge mapping |
CN109857872A (en) * | 2019-02-18 | 2019-06-07 | 浪潮软件集团有限公司 | The information recommendation method and device of knowledge based map |
CN109902144A (en) * | 2019-01-11 | 2019-06-18 | 杭州电子科技大学 | A kind of entity alignment schemes based on improvement WMD algorithm |
CN110209839A (en) * | 2019-06-18 | 2019-09-06 | 卓尔智联(武汉)研究院有限公司 | Agricultural knowledge map construction device, method and computer readable storage medium |
CN110245198A (en) * | 2019-06-18 | 2019-09-17 | 北京百度网讯科技有限公司 | Multi-source ticketing data managing method and system, server and computer-readable medium |
CN110377747A (en) * | 2019-06-10 | 2019-10-25 | 河海大学 | A kind of knowledge base fusion method towards encyclopaedia website |
CN110427612A (en) * | 2019-07-02 | 2019-11-08 | 平安科技(深圳)有限公司 | Based on multilingual entity disambiguation method, device, equipment and storage medium |
CN111708891A (en) * | 2019-03-01 | 2020-09-25 | 九阳股份有限公司 | Food material entity linking method and device among multi-source food material data |
CN111813962A (en) * | 2020-09-07 | 2020-10-23 | 北京富通东方科技有限公司 | Entity similarity calculation method for knowledge graph fusion |
CN111881290A (en) * | 2020-06-17 | 2020-11-03 | 国家电网有限公司 | Distribution network multi-source grid entity fusion method based on weighted semantic similarity |
CN112115328A (en) * | 2020-08-24 | 2020-12-22 | 苏宁金融科技(南京)有限公司 | Page flow map construction method and device and computer readable storage medium |
CN112163094A (en) * | 2020-08-25 | 2021-01-01 | 中国科学院计算机网络信息中心 | Scientific and technological resource convergence and continuous service method and device |
CN112328812A (en) * | 2021-01-05 | 2021-02-05 | 成都数联铭品科技有限公司 | Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment |
CN113139050A (en) * | 2021-05-10 | 2021-07-20 | 桂林电子科技大学 | Text abstract generation method based on named entity identification additional label and priori knowledge |
CN113157861A (en) * | 2021-04-12 | 2021-07-23 | 山东新一代信息产业技术研究院有限公司 | Entity alignment method fusing Wikipedia |
CN113326686A (en) * | 2020-02-28 | 2021-08-31 | 株式会社斯库林集团 | Similarity calculation device, recording medium, and similarity calculation method |
CN113392220A (en) * | 2020-10-23 | 2021-09-14 | 腾讯科技(深圳)有限公司 | Knowledge graph generation method and device, computer equipment and storage medium |
CN114153839A (en) * | 2021-10-29 | 2022-03-08 | 杭州未名信科科技有限公司 | Integration method, device, equipment and storage medium of multi-source heterogeneous data |
US11487832B2 (en) * | 2018-09-27 | 2022-11-01 | Google Llc | Analyzing web pages to facilitate automatic navigation |
CN113326686B (en) * | 2020-02-28 | 2024-05-10 | 株式会社斯库林集团 | Similarity calculation device, recording medium, and similarity calculation method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049569A (en) * | 2012-12-31 | 2013-04-17 | 武汉传神信息技术有限公司 | Text similarity matching method on basis of vector space model |
CN103729343A (en) * | 2013-10-10 | 2014-04-16 | 上海交通大学 | Semantic ambiguity eliminating method based on encyclopedia link co-occurrence |
CN105787105A (en) * | 2016-03-21 | 2016-07-20 | 浙江大学 | Iterative-model-based establishment method of Chinese encyclopedic knowledge graph classification system |
-
2016
- 2016-07-22 CN CN201610583823.0A patent/CN106250412B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049569A (en) * | 2012-12-31 | 2013-04-17 | 武汉传神信息技术有限公司 | Text similarity matching method on basis of vector space model |
CN103729343A (en) * | 2013-10-10 | 2014-04-16 | 上海交通大学 | Semantic ambiguity eliminating method based on encyclopedia link co-occurrence |
CN105787105A (en) * | 2016-03-21 | 2016-07-20 | 浙江大学 | Iterative-model-based establishment method of Chinese encyclopedic knowledge graph classification system |
Non-Patent Citations (2)
Title |
---|
楼仁杰: "基于中文百科的知识图谱分类体系构建研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
王龙甫: "基于中文百科的概念知识库构建", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423820A (en) * | 2016-05-24 | 2017-12-01 | 清华大学 | The knowledge mapping of binding entity stratigraphic classification represents learning method |
CN106777331A (en) * | 2017-01-11 | 2017-05-31 | 北京航空航天大学 | Knowledge mapping generation method and device |
CN106844658A (en) * | 2017-01-23 | 2017-06-13 | 中山大学 | A kind of Chinese text knowledge mapping method for auto constructing and system |
CN106844658B (en) * | 2017-01-23 | 2019-12-13 | 中山大学 | Automatic construction method and system of Chinese text knowledge graph |
CN108399180A (en) * | 2017-02-08 | 2018-08-14 | 腾讯科技(深圳)有限公司 | A kind of knowledge mapping construction method, device and server |
CN108399180B (en) * | 2017-02-08 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Knowledge graph construction method and device and server |
CN106909643A (en) * | 2017-02-20 | 2017-06-30 | 同济大学 | The social media big data motif discovery method of knowledge based collection of illustrative plates |
CN106909643B (en) * | 2017-02-20 | 2020-08-14 | 同济大学 | Knowledge graph-based social media big data topic discovery method |
CN108694177A (en) * | 2017-04-06 | 2018-10-23 | 北大方正集团有限公司 | Knowledge mapping construction method and system |
CN107038257A (en) * | 2017-05-10 | 2017-08-11 | 浙江大学 | A kind of city Internet of Things data analytical framework of knowledge based collection of illustrative plates |
CN107220386A (en) * | 2017-06-29 | 2017-09-29 | 北京百度网讯科技有限公司 | Information-pushing method and device |
CN107220386B (en) * | 2017-06-29 | 2020-10-02 | 北京百度网讯科技有限公司 | Information pushing method and device |
CN108182295A (en) * | 2018-02-09 | 2018-06-19 | 重庆誉存大数据科技有限公司 | A kind of Company Knowledge collection of illustrative plates attribute extraction method and system |
CN108182295B (en) * | 2018-02-09 | 2021-09-10 | 重庆电信系统集成有限公司 | Enterprise knowledge graph attribute extraction method and system |
CN108777635A (en) * | 2018-05-24 | 2018-11-09 | 梧州井儿铺贸易有限公司 | A kind of Enterprise Equipment Management System |
CN109033129B (en) * | 2018-06-04 | 2021-08-03 | 桂林电子科技大学 | Multi-source information fusion knowledge graph representation learning method based on self-adaptive weight |
CN109033129A (en) * | 2018-06-04 | 2018-12-18 | 桂林电子科技大学 | Multi-source Information Fusion knowledge mapping based on adaptive weighting indicates learning method |
CN109284394A (en) * | 2018-09-12 | 2019-01-29 | 青岛大学 | A method of Company Knowledge map is constructed from multi-source data integration visual angle |
US11971936B2 (en) | 2018-09-27 | 2024-04-30 | Google Llc | Analyzing web pages to facilitate automatic navigation |
US11487832B2 (en) * | 2018-09-27 | 2022-11-01 | Google Llc | Analyzing web pages to facilitate automatic navigation |
CN109522547A (en) * | 2018-10-23 | 2019-03-26 | 浙江大学 | Chinese synonym iteration abstracting method based on pattern learning |
CN109657069A (en) * | 2018-12-11 | 2019-04-19 | 北京百度网讯科技有限公司 | The generation method and its device of knowledge mapping |
CN109902144A (en) * | 2019-01-11 | 2019-06-18 | 杭州电子科技大学 | A kind of entity alignment schemes based on improvement WMD algorithm |
CN109902144B (en) * | 2019-01-11 | 2020-01-31 | 杭州电子科技大学 | entity alignment method based on improved WMD algorithm |
CN109857872A (en) * | 2019-02-18 | 2019-06-07 | 浪潮软件集团有限公司 | The information recommendation method and device of knowledge based map |
CN111708891A (en) * | 2019-03-01 | 2020-09-25 | 九阳股份有限公司 | Food material entity linking method and device among multi-source food material data |
CN111708891B (en) * | 2019-03-01 | 2023-12-08 | 九阳股份有限公司 | Food material entity linking method and device between multi-source food material data |
CN110377747A (en) * | 2019-06-10 | 2019-10-25 | 河海大学 | A kind of knowledge base fusion method towards encyclopaedia website |
CN110377747B (en) * | 2019-06-10 | 2021-12-07 | 河海大学 | Knowledge base fusion method for encyclopedic website |
CN110209839A (en) * | 2019-06-18 | 2019-09-06 | 卓尔智联(武汉)研究院有限公司 | Agricultural knowledge map construction device, method and computer readable storage medium |
CN110245198A (en) * | 2019-06-18 | 2019-09-17 | 北京百度网讯科技有限公司 | Multi-source ticketing data managing method and system, server and computer-readable medium |
CN110427612A (en) * | 2019-07-02 | 2019-11-08 | 平安科技(深圳)有限公司 | Based on multilingual entity disambiguation method, device, equipment and storage medium |
CN113326686B (en) * | 2020-02-28 | 2024-05-10 | 株式会社斯库林集团 | Similarity calculation device, recording medium, and similarity calculation method |
CN113326686A (en) * | 2020-02-28 | 2021-08-31 | 株式会社斯库林集团 | Similarity calculation device, recording medium, and similarity calculation method |
CN111881290A (en) * | 2020-06-17 | 2020-11-03 | 国家电网有限公司 | Distribution network multi-source grid entity fusion method based on weighted semantic similarity |
CN112115328B (en) * | 2020-08-24 | 2022-08-19 | 苏宁金融科技(南京)有限公司 | Page flow map construction method and device and computer readable storage medium |
CN112115328A (en) * | 2020-08-24 | 2020-12-22 | 苏宁金融科技(南京)有限公司 | Page flow map construction method and device and computer readable storage medium |
CN112163094A (en) * | 2020-08-25 | 2021-01-01 | 中国科学院计算机网络信息中心 | Scientific and technological resource convergence and continuous service method and device |
CN111813962A (en) * | 2020-09-07 | 2020-10-23 | 北京富通东方科技有限公司 | Entity similarity calculation method for knowledge graph fusion |
CN111813962B (en) * | 2020-09-07 | 2020-12-18 | 北京富通东方科技有限公司 | Entity similarity calculation method for knowledge graph fusion |
CN113392220A (en) * | 2020-10-23 | 2021-09-14 | 腾讯科技(深圳)有限公司 | Knowledge graph generation method and device, computer equipment and storage medium |
CN113392220B (en) * | 2020-10-23 | 2024-03-26 | 腾讯科技(深圳)有限公司 | Knowledge graph generation method and device, computer equipment and storage medium |
CN112328812A (en) * | 2021-01-05 | 2021-02-05 | 成都数联铭品科技有限公司 | Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment |
CN113157861B (en) * | 2021-04-12 | 2022-05-24 | 山东浪潮科学研究院有限公司 | Entity alignment method fusing Wikipedia |
CN113157861A (en) * | 2021-04-12 | 2021-07-23 | 山东新一代信息产业技术研究院有限公司 | Entity alignment method fusing Wikipedia |
CN113139050A (en) * | 2021-05-10 | 2021-07-20 | 桂林电子科技大学 | Text abstract generation method based on named entity identification additional label and priori knowledge |
CN114153839A (en) * | 2021-10-29 | 2022-03-08 | 杭州未名信科科技有限公司 | Integration method, device, equipment and storage medium of multi-source heterogeneous data |
Also Published As
Publication number | Publication date |
---|---|
CN106250412B (en) | 2019-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106250412A (en) | The knowledge mapping construction method merged based on many source entities | |
CN106776711B (en) | Chinese medical knowledge map construction method based on deep learning | |
CN111931506B (en) | Entity relationship extraction method based on graph information enhancement | |
CN106294593B (en) | In conjunction with the Relation extraction method of subordinate clause grade remote supervisory and semi-supervised integrated study | |
CN104391942B (en) | Short essay eigen extended method based on semantic collection of illustrative plates | |
CN106055675B (en) | A kind of Relation extraction method based on convolutional neural networks and apart from supervision | |
CN103473283B (en) | Method for matching textual cases | |
CN110598000A (en) | Relationship extraction and knowledge graph construction method based on deep learning model | |
CN107122413A (en) | A kind of keyword extracting method and device based on graph model | |
CN104991905B (en) | A kind of mathematic(al) representation search method based on level index | |
CN105528437B (en) | A kind of question answering system construction method extracted based on structured text knowledge | |
CN105653706A (en) | Multilayer quotation recommendation method based on literature content mapping knowledge domain | |
CN110674252A (en) | High-precision semantic search system for judicial domain | |
CN107239512B (en) | A kind of microblogging comment spam recognition methods of combination comment relational network figure | |
CN107122349A (en) | A kind of feature word of text extracting method based on word2vec LDA models | |
CN102117281A (en) | Method for constructing domain ontology | |
US9146988B2 (en) | Hierarchal clustering method for large XML data | |
CN110175334A (en) | Text knowledge's extraction system and method based on customized knowledge slot structure | |
CN104317838A (en) | Cross-media Hash index method based on coupling differential dictionary | |
CN112487190A (en) | Method for extracting relationships between entities from text based on self-supervision and clustering technology | |
CN114997288A (en) | Design resource association method | |
CN104794209B (en) | Chinese microblogging mood sorting technique based on Markov logical network and system | |
CN115391553A (en) | Method for automatically searching time sequence knowledge graph complement model | |
CN103064907A (en) | System and method for topic meta search based on unsupervised entity relation extraction | |
CN103699568A (en) | Method for extracting hyponymy relation of field terms from wikipedia |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20161221 Assignee: TONGDUN HOLDINGS Co.,Ltd. Assignor: ZHEJIANG University Contract record no.: X2021990000612 Denomination of invention: Construction method of knowledge map based on multi-source entity fusion Granted publication date: 20190423 License type: Common License Record date: 20211012 |