CN110069775A - Entity disambiguation method and system - Google Patents

Entity disambiguation method and system Download PDF

Info

Publication number
CN110069775A
CN110069775A CN201910207612.0A CN201910207612A CN110069775A CN 110069775 A CN110069775 A CN 110069775A CN 201910207612 A CN201910207612 A CN 201910207612A CN 110069775 A CN110069775 A CN 110069775A
Authority
CN
China
Prior art keywords
entity
senses
dictionary entry
analyzed
natural language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910207612.0A
Other languages
Chinese (zh)
Other versions
CN110069775B (en
Inventor
宋亚楠
邱楠
严汉明
梁剑华
邹创华
邓婧文
程谦
彭旺友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Ruihuan Laser Technology Co ltd
Original Assignee
Shenzhen Amber Virtual Face Intelligent Technology Co Ltd
Shenzhen Green Bristlegrass Intelligence Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Amber Virtual Face Intelligent Technology Co Ltd, Shenzhen Green Bristlegrass Intelligence Science And Technology Ltd filed Critical Shenzhen Amber Virtual Face Intelligent Technology Co Ltd
Priority to CN201910207612.0A priority Critical patent/CN110069775B/en
Publication of CN110069775A publication Critical patent/CN110069775A/en
Application granted granted Critical
Publication of CN110069775B publication Critical patent/CN110069775B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

Entity disambiguation method provided by the invention and system, method includes the following steps: obtaining the entity to be disambiguated in natural language to be analyzed and being somebody's turn to do multiple senses of a dictionary entry of entity to be disambiguated;The each senses of a dictionary entry for calculating separately the entity to be disambiguated appears in total score in natural language to be analyzed;The definition highest senses of a dictionary entry of total score is meaning of the entity to be disambiguated in natural language to be analyzed.Manual definition's rule is needed the method overcome the prior art or needs the defect of a large amount of training datas.

Description

Entity disambiguation method and system
Technical field
The invention belongs to human-computer interaction technique fields, and in particular to entity disambiguation method and system.
Background technique
Entity can simply be interpreted as noun in field of human-computer interaction.Entity disambiguation refers to that the noun occurred in a sentence can There can be multiple meanings (multiple senses of a dictionary entry), need to judge the concrete meaning that the noun represents in the sentence (language environment), such as Li po may refer to poet li po, it is also possible to refer to song li po.
The concrete meaning of entity may be ever-changing in natural language, and these entities represent in concrete syntax environment Single meaning.For the mankind, the concrete meaning of these entity on behalf can be intuitively judged, but to machine or machine For device people, researcher is needed to research and develop special technology, machine or robot could be made to possess, and similar to people, judgement is real Body represents ability/function of concrete meaning.
For at present, entity is disambiguated to carry out generally by the mode of rule, that is, defines an entity and other realities The shortcomings that body while meaning for Shi Qi representative occur, this method is that a large amount of professionals is needed to participate in, need to define big gauge Then, it is difficult to safeguard.There are also another using machine learning, the thinking of deep learning, using the computing capability of machine, by defeated The meaning for entering entity in a large amount of sentences and sentence allows machine voluntarily to learn the model that can determine whether entity meaning in concrete syntax out, But the shortcomings that this method is that parameter amount may larger more difficult debugging, training time for needing a large amount of data, training to need Power is larger with calculating.
Summary of the invention
For the defects in the prior art, the present invention provides a kind of entity disambiguation method and system, overcomes the prior art It needs Manual definition's rule or needs the defect of a large amount of training datas.
In a first aspect, a kind of entity disambiguation method, comprising the following steps:
It obtains the entity to be disambiguated in natural language to be analyzed and is somebody's turn to do multiple senses of a dictionary entry of entity to be disambiguated;
The each senses of a dictionary entry for calculating separately the entity to be disambiguated appears in total score in natural language to be analyzed;
The definition highest senses of a dictionary entry of total score is meaning of the entity to be disambiguated in natural language to be analyzed.
Preferably, the total score that each senses of a dictionary entry of the entity to be disambiguated appears in natural language to be analyzed passes through with lower section Method calculates:
Counting statistics score;
Calculate semantic score;
Each senses of a dictionary entry is calculated according to the following formula appears in total score in natural language to be analyzed:
Total score=W1× statistics score+W2× semantic score;
Wherein, W1、W2Respectively weight, and W1+W2=1.
Preferably, the statistics score calculates by the following method:
The natural language to be analyzed is pre-processed, the stop words in natural language to be analyzed is removed;
More granularity participles are carried out to pretreated natural language to be analyzed, obtain the cliction up and down of entity to be disambiguated;
The subgraph of entity to be disambiguated is selected from preset knowledge mapping;The knowledge mapping includes the son of each entity Figure, the subgraph of each entity includes all senses of a dictionary entry of the entity;
When the senses of a dictionary entry in the subgraph is corresponding with the cliction up and down in the natural language to be analyzed, determine in subgraph The senses of a dictionary entry appear in natural language to be analyzed, define the senses of a dictionary entry be with reference to the senses of a dictionary entry;
The statistics score of each senses of a dictionary entry is calculated according to the following formula:
Wherein, n is the quantity that the senses of a dictionary entry is referred in the subgraph of entity to be disambiguated, and i is variable.
Preferably, the semantic score calculates by the following method:
Obtain referential field;
The natural language to be analyzed and referential field are segmented respectively, obtain the cliction up and down of entity to be disambiguated with And the reference participle of referential field;
Calculate separately each senses of a dictionary entry and each semantic similarity referring to participle of upper and lower cliction;
The semantic score of each senses of a dictionary entry is calculated according to the following formula:
Wherein, max is maximizing function, and m is the number of the senses of a dictionary entry in upper and lower cliction, and j is variable.
Second aspect, a kind of entity disambiguating system, comprising:
Acquisition unit: for obtaining the entity to be disambiguated in natural language to be analyzed and being somebody's turn to do the multiple of entity to be disambiguated The senses of a dictionary entry;
Analytical unit: each senses of a dictionary entry for calculating separately the entity to be disambiguated appears in total in natural language to be analyzed Score;
Output unit: being entity to be disambiguated containing in natural language to be analyzed for defining the highest senses of a dictionary entry of total score Justice.
Preferably, the analytical unit is specifically used for:
Counting statistics score;
Calculate semantic score;
Each senses of a dictionary entry is calculated according to the following formula appears in total score in natural language to be analyzed:
Total score=W1× statistics score+W2× semantic score;
Wherein, W1、W2Respectively weight, and W1+W2=1.
Preferably, the analytical unit is specifically used for:
The natural language to be analyzed is pre-processed, the stop words in natural language to be analyzed is removed;
More granularity participles are carried out to pretreated natural language to be analyzed, obtain the cliction up and down of entity to be disambiguated;
The subgraph of entity to be disambiguated is selected from preset knowledge mapping;The knowledge mapping includes the son of each entity Figure, the subgraph of each entity includes all senses of a dictionary entry of the entity;
When the senses of a dictionary entry in the subgraph is corresponding with the cliction up and down in the natural language to be analyzed, determine in subgraph The senses of a dictionary entry appear in natural language to be analyzed, define the senses of a dictionary entry be with reference to the senses of a dictionary entry;
The statistics score of each senses of a dictionary entry is calculated according to the following formula:
Wherein, n is the quantity that the senses of a dictionary entry is referred in the subgraph of entity to be disambiguated, and i is variable.
Preferably, the analytical unit is specifically used for:
Obtain referential field;
The natural language to be analyzed and referential field are segmented respectively, obtain the cliction up and down of entity to be disambiguated with And the reference participle of referential field;
Calculate separately each senses of a dictionary entry and each semantic similarity referring to participle of upper and lower cliction;
The semantic score of each senses of a dictionary entry is calculated according to the following formula:
Wherein, max is maximizing function, and m is the number of the senses of a dictionary entry in upper and lower cliction, and j is variable.
As shown from the above technical solution, entity disambiguation method provided by the invention and system, overcome prior art needs Manual definition's rule or the defect for needing a large amount of training datas.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art are briefly described.In all the appended drawings, similar element Or part is generally identified by similar appended drawing reference.In attached drawing, each element or part might not be drawn according to actual ratio.
Fig. 1 is the flow chart for the entity disambiguation method that the embodiment of the present invention one provides.
Fig. 2 is the module frame chart for the entity disambiguating system that the embodiment of the present invention three provides.
Specific embodiment
It is described in detail below in conjunction with embodiment of the attached drawing to technical solution of the present invention.Following embodiment is only used for Clearly illustrate technical solution of the present invention, therefore be only used as example, and cannot be used as a limitation and limit protection model of the invention It encloses.It should be noted that unless otherwise indicated, technical term or scientific term used in this application are should be belonging to the present invention The ordinary meaning that field technical staff is understood.
It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded Body, step, operation, the presence or addition of element, component and/or its set.
It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
As used in this specification and in the appended claims, term " if " can be according to context quilt Be construed to " when ... " or " once " or " in response to determination " or " in response to detecting ".Similarly, phrase " if it is determined that " or " if detecting [described condition or event] " can be interpreted to mean according to context " once it is determined that " or " in response to true It is fixed " or " once detecting [described condition or event] " or " in response to detecting [described condition or event] ".
Embodiment one:
A kind of entity disambiguation method, referring to Fig. 1, comprising the following steps:
S1: obtaining the entity to be disambiguated in natural language to be analyzed and is somebody's turn to do multiple senses of a dictionary entry of entity to be disambiguated;
Specifically, each entity has multiple senses of a dictionary entry.Such as li po, it may refer to poet li po, it is also possible to refer to song li po, So poet li po and song li po are two kinds of senses of a dictionary entry of li po.Natural language to be analyzed, that is, user's input conversation content.
S2: each senses of a dictionary entry for calculating separately the entity to be disambiguated appears in total score in natural language to be analyzed;
Specifically, total score is higher, and the probability for illustrating that the senses of a dictionary entry appears in natural language to be analyzed is bigger, which can It can be the meaning in natural language to be analyzed.Total score is lower, illustrates that the senses of a dictionary entry appears in the probability in natural language to be analyzed Smaller, which may not be the meaning in natural language to be analyzed.
S3: the definition highest senses of a dictionary entry of total score is meaning of the entity to be disambiguated in natural language to be analyzed.
Specifically, the cliction up and down of the information of this method binding entity itself and current session, to natural language to be analyzed The specific entity for calling the turn appearance carries out meaning disambiguation.This method can be seen as selecting unique in knowledge mapping technology Correct Ontology Mapping.Inherently understand this method, it is exactly selected from multiple senses of a dictionary entry of an entity in current language Say the senses of a dictionary entry corresponding in environment.
Preferably, the total score that each senses of a dictionary entry of the entity to be disambiguated appears in natural language to be analyzed passes through with lower section Method calculates:
Counting statistics score;
Calculate semantic score;
Each senses of a dictionary entry is calculated according to the following formula appears in total score in natural language to be analyzed:
Total score=W1× statistics score+W2× semantic score;
Wherein, W1、W2Respectively weight, and W1+W2=1.
Specifically, W1、W2For representing the reliability of phase reserved portion, value can initially be set to 0.5.Specific real Now, it can be adjusted according to statistics score and the reliability of semantic score, it is both anyway adjusting and be 1.To institute When stating weight and being adjusted, the different degree of two calculated result of weighed value adjusting is mainly utilized.If thinking that counting score more may be used It leans on, just tunes up W1, otherwise tune up W2.If adjusting W1、 W2, so that W1Or W2In to have one be zero, then illustrate corresponding calculating knot Fruit is with regard to nonsensical.The adjustment page of weight can be adjusted rule of thumb with project situation, can also be adjusted by machine learning, Such as assume there is multiple groups weight, multiple groups calculated result and corresponding legitimate reading, using these as the input of machine learning, machine Indoctrination session exports a model.This model can export the occurrence of most suitable two weights when receiving new input.
Embodiment two:
Embodiment two on the basis of example 1, increases the following contents:
1, the statistics score calculates by the following method:
The natural language to be analyzed is pre-processed, the stop words in natural language to be analyzed is removed;
Specifically, remove the stop words in natural language to be analyzed, the efficiency of natural language processing can be promoted, save sky Between.Stop words vocabulary wherein can be set for filtering stop words, stop words vocabulary can be according to specific product and function need It asks, determines generation by product personnel and technical staff.
More granularity participles are carried out to pretreated natural language to be analyzed, obtain the cliction up and down of entity to be disambiguated;
Specifically, more granularities participle, which refers in participle and no longer only to be segmented according to Monosized powder, (such as no longer provides The maximum number of words of one word), but the word frequency of all words in dictionary is counted, the high word of word frequency is seen as a word in participle. Such as: " Valentine's Day " word frequency 1000, " sweet heart " word frequency 500, " section " word frequency 400, therefore the minimum word frequency of word segmentation result " Valentine's Day " It is 1000, the minimum word frequency of " sweet heart/section " is 400, therefore word segmentation result should be " Valentine's Day ", and three words are considered as one not Can cutting word.
The subgraph of entity to be disambiguated is selected from preset knowledge mapping;The knowledge mapping includes the son of each entity Figure, the subgraph of each entity includes all senses of a dictionary entry of the entity;
When the senses of a dictionary entry in the subgraph is corresponding with the cliction up and down in the natural language to be analyzed, determine in subgraph The senses of a dictionary entry appear in natural language to be analyzed, define the senses of a dictionary entry be with reference to the senses of a dictionary entry;
The statistics score of each senses of a dictionary entry is calculated according to the following formula:
Wherein, n is the quantity that the senses of a dictionary entry is referred in the subgraph of entity to be disambiguated, and i is variable.
Specifically, statistics score calculate when, be considered as the entity context word quantity to be disambiguated for including and these on Lower cliction is at a distance from entity is between natural language to be analyzed.Such as assume that entity to be disambiguated is A, natural language to be analyzed For B, the subgraph of A is had selected in knowledge mapping first, includes several words (i.e. the senses of a dictionary entry) in subgraph.It is corresponding that A is sought first Several senses of a dictionary entry appear in the quantity in B, obtain molecule, it is assumed that in the subgraph of A word C be include word in B, defined terms C is With reference to the senses of a dictionary entry, then the distance of C and A is exactly denominator, system successively is calculated with reference to the senses of a dictionary entry according to all in the subgraph of entity to be disambiguated Count score.
2, the semantic score calculates by the following method:
Obtain referential field;
Specifically, since the description that industry uniformly approves that various encyclopaedias treat disambiguation entity is that the corresponding senses of a dictionary entry of the entity is most smart Quasi- semantic description, such as Baidupedia etc..So referential field can be selected to carry out semantic computation score from encyclopaedia.Reference Field can be a word, a few words or one section of word.
The natural language to be analyzed and referential field are segmented respectively, obtain the cliction up and down of entity to be disambiguated with And the reference participle of referential field;
Calculate separately each senses of a dictionary entry and each semantic similarity referring to participle of upper and lower cliction;
Specifically, it is assumed that the senses of a dictionary entry of upper and lower cliction includes A1, A2, A3, and the word segmentation result of referential field includes B1, B2, B3; It so needs to calculate separately the semantic similarity of A1 and B1, A1 and B2, A1 and B3, calculates the language of A2 and B1, A2 and B2, A2 and B3 Adopted similarity, and so on.Each senses of a dictionary entry and each semantic similarity referring to participle can thus be obtained.Semantic similarity can To be calculated using existing Arithmetic of Semantic Similarity.
The semantic score of each senses of a dictionary entry is calculated according to the following formula:
Wherein, max is maximizing function, and m is the number of the senses of a dictionary entry in upper and lower cliction, and j is variable.
Specifically, the maximum value for counting all semantic similarities under the same senses of a dictionary entry, defining the maximum value is the senses of a dictionary entry Semantic score.
Method provided by the embodiment of the present invention, to briefly describe, embodiment part does not refer to place, can refer to aforementioned side Corresponding contents in method embodiment.
Embodiment three:
A kind of entity disambiguating system, referring to fig. 2, comprising:
Acquisition unit: for obtaining the entity to be disambiguated in natural language to be analyzed and being somebody's turn to do the multiple of entity to be disambiguated The senses of a dictionary entry;
Analytical unit: each senses of a dictionary entry for calculating separately the entity to be disambiguated appears in total in natural language to be analyzed Score;
Output unit: being entity to be disambiguated containing in natural language to be analyzed for defining the highest senses of a dictionary entry of total score Justice.
Preferably, the analytical unit is specifically used for:
Counting statistics score;
Calculate semantic score;
Each senses of a dictionary entry is calculated according to the following formula appears in total score in natural language to be analyzed:
Total score=W1× statistics score+W2× semantic score;
Wherein, W1、W2Respectively weight, and W1+W2=1.
Preferably, the analytical unit is specifically used for:
The natural language to be analyzed is pre-processed, the stop words in natural language to be analyzed is removed;
More granularity participles are carried out to pretreated natural language to be analyzed, obtain the cliction up and down of entity to be disambiguated;
The subgraph of entity to be disambiguated is selected from preset knowledge mapping;The knowledge mapping includes the son of each entity Figure, the subgraph of each entity includes all senses of a dictionary entry of the entity;
When the senses of a dictionary entry in the subgraph is corresponding with the cliction up and down in the natural language to be analyzed, determine in subgraph The senses of a dictionary entry appear in natural language to be analyzed, define the senses of a dictionary entry be with reference to the senses of a dictionary entry;
The statistics score of each senses of a dictionary entry is calculated according to the following formula:
Wherein, n is the quantity that the senses of a dictionary entry is referred in the subgraph of entity to be disambiguated, and i is variable.
Preferably, the analytical unit is specifically used for:
Obtain referential field;
The natural language to be analyzed and referential field are segmented respectively, obtain the cliction up and down of entity to be disambiguated with And the reference participle of referential field;
Calculate separately each senses of a dictionary entry and each semantic similarity referring to participle of upper and lower cliction;
The semantic score of each senses of a dictionary entry is calculated according to the following formula:
Wherein, max is maximizing function, and m is the number of the senses of a dictionary entry in upper and lower cliction, and j is variable.
The entity disambiguating system overcomes the prior art and needs Manual definition's rule or need lacking for a large amount of training datas It falls into.
In several embodiments provided herein, it should be understood that disclosed system can be by others side Formula is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only one Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or It is desirably integrated into another system, or some features can be ignored or not executed.In addition, shown or discussed is mutual Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit, It is also possible to electricity, mechanical or other form connections.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of unit therein can be selected to realize the embodiment of the present invention according to the actual needs Purpose.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention Portion or part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store journey The medium of sequence code.
System provided by the embodiment of the present invention, to briefly describe, embodiment part does not refer to place, can refer to aforementioned side Corresponding contents in method embodiment.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme should all cover within the scope of the claims and the description of the invention.

Claims (8)

1. a kind of entity disambiguation method, which comprises the following steps:
It obtains the entity to be disambiguated in natural language to be analyzed and is somebody's turn to do multiple senses of a dictionary entry of entity to be disambiguated;
The each senses of a dictionary entry for calculating separately the entity to be disambiguated appears in total score in natural language to be analyzed;
The definition highest senses of a dictionary entry of total score is meaning of the entity to be disambiguated in natural language to be analyzed.
2. entity disambiguation method according to claim 1, which is characterized in that should entity be disambiguated each senses of a dictionary entry appear in Total score in analysis natural language calculates by the following method:
Counting statistics score;
Calculate semantic score;
Each senses of a dictionary entry is calculated according to the following formula appears in total score in natural language to be analyzed:
Total score=W1× statistics score+W2× semantic score;
Wherein, W1、W2Respectively weight, and W1+W2=1.
3. entity disambiguation method according to claim 2, which is characterized in that the statistics score calculates by the following method:
The natural language to be analyzed is pre-processed, the stop words in natural language to be analyzed is removed;
More granularity participles are carried out to pretreated natural language to be analyzed, obtain the cliction up and down of entity to be disambiguated;
The subgraph of entity to be disambiguated is selected from preset knowledge mapping;The knowledge mapping includes the subgraph of each entity, often The subgraph of a entity includes all senses of a dictionary entry of the entity;
When the senses of a dictionary entry in the subgraph is corresponding with the cliction up and down in the natural language to be analyzed, the justice in subgraph is determined Item appears in natural language to be analyzed, and defining the senses of a dictionary entry is with reference to the senses of a dictionary entry;
The statistics score of each senses of a dictionary entry is calculated according to the following formula:
Wherein, n is the quantity that the senses of a dictionary entry is referred in the subgraph of entity to be disambiguated, and i is variable.
4. entity disambiguation method according to claim 2, which is characterized in that the semantic score calculates by the following method:
Obtain referential field;
The natural language to be analyzed and referential field are segmented respectively, obtain cliction up and down and the ginseng of entity to be disambiguated According to the reference participle of field;
Calculate separately each senses of a dictionary entry and each semantic similarity referring to participle of upper and lower cliction;
The semantic score of each senses of a dictionary entry is calculated according to the following formula:
Wherein, max is maximizing function, and m is the number of the senses of a dictionary entry in upper and lower cliction, and j is variable.
5. a kind of entity disambiguating system characterized by comprising
Acquisition unit: for obtaining the entity to be disambiguated in natural language to be analyzed and being somebody's turn to do multiple senses of a dictionary entry of entity to be disambiguated;
Analytical unit: each senses of a dictionary entry for calculating separately the entity to be disambiguated appear in natural language to be analyzed must Point;
Output unit: being meaning of the entity to be disambiguated in natural language to be analyzed for defining the highest senses of a dictionary entry of total score.
6. entity disambiguating system according to claim 5, which is characterized in that the analytical unit is specifically used for:
Counting statistics score;
Calculate semantic score;
Each senses of a dictionary entry is calculated according to the following formula appears in total score in natural language to be analyzed:
Total score=W1× statistics score+W2× semantic score;
Wherein, W1、W2Respectively weight, and W1+W2=1.
7. entity disambiguating system according to claim 6, which is characterized in that the analytical unit is specifically used for:
The natural language to be analyzed is pre-processed, the stop words in natural language to be analyzed is removed;
More granularity participles are carried out to pretreated natural language to be analyzed, obtain the cliction up and down of entity to be disambiguated;
The subgraph of entity to be disambiguated is selected from preset knowledge mapping;The knowledge mapping includes the subgraph of each entity, often The subgraph of a entity includes all senses of a dictionary entry of the entity;
When the senses of a dictionary entry in the subgraph is corresponding with the cliction up and down in the natural language to be analyzed, the justice in subgraph is determined Item appears in natural language to be analyzed, and defining the senses of a dictionary entry is with reference to the senses of a dictionary entry;
The statistics score of each senses of a dictionary entry is calculated according to the following formula:
Wherein, n is the quantity that the senses of a dictionary entry is referred in the subgraph of entity to be disambiguated, and i is variable.
8. entity disambiguating system according to claim 6, which is characterized in that the analytical unit is specifically used for:
Obtain referential field;
The natural language to be analyzed and referential field are segmented respectively, obtain cliction up and down and the ginseng of entity to be disambiguated According to the reference participle of field;
Calculate separately each senses of a dictionary entry and each semantic similarity referring to participle of upper and lower cliction;
The semantic score of each senses of a dictionary entry is calculated according to the following formula:
Wherein, max is maximizing function, and m is the number of the senses of a dictionary entry in upper and lower cliction, and j is variable.
CN201910207612.0A 2019-03-19 2019-03-19 Entity disambiguation method and system Active CN110069775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910207612.0A CN110069775B (en) 2019-03-19 2019-03-19 Entity disambiguation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910207612.0A CN110069775B (en) 2019-03-19 2019-03-19 Entity disambiguation method and system

Publications (2)

Publication Number Publication Date
CN110069775A true CN110069775A (en) 2019-07-30
CN110069775B CN110069775B (en) 2023-04-18

Family

ID=67366378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910207612.0A Active CN110069775B (en) 2019-03-19 2019-03-19 Entity disambiguation method and system

Country Status (1)

Country Link
CN (1) CN110069775B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9558740B1 (en) * 2015-03-30 2017-01-31 Amazon Technologies, Inc. Disambiguation in speech recognition
CN108446269A (en) * 2018-03-05 2018-08-24 昆明理工大学 A kind of Word sense disambiguation method and device based on term vector

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9558740B1 (en) * 2015-03-30 2017-01-31 Amazon Technologies, Inc. Disambiguation in speech recognition
CN108446269A (en) * 2018-03-05 2018-08-24 昆明理工大学 A kind of Word sense disambiguation method and device based on term vector

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TOMOAKI URATA ET AL.: "An Entity Disambiguation Approach Based on Wikipedia for Entity Linking in Microblogs", 《2017 6TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS》 *
史兆鹏 等: "基于依存句法分析的多特征词义消歧", 《计算机工程》 *

Also Published As

Publication number Publication date
CN110069775B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
US10108907B2 (en) Method and system to provide related data
CN111143576A (en) Event-oriented dynamic knowledge graph construction method and device
CN103207913B (en) The acquisition methods of commercial fine granularity semantic relation and system
CN107229610A (en) The analysis method and device of a kind of affection data
KR20180062321A (en) Method for drawing word related keyword based on deep learning and computerprogram
CN111581949B (en) Method and device for disambiguating name of learner, storage medium and terminal
CN103678278A (en) Chinese text emotion recognition method
CN108287848B (en) Method and system for semantic parsing
CN110334209A (en) File classification method, device, medium and electronic equipment
CN110472240A (en) Text feature and device based on TF-IDF
CN109934251A (en) A kind of method, identifying system and storage medium for rare foreign languages text identification
CN110472040A (en) Extracting method and device, storage medium, the computer equipment of evaluation information
CN110364186A (en) A kind of emotion identification method across language voice end to end based on confrontation study
CN113657421A (en) Convolutional neural network compression method and device and image classification method and device
CN114265937A (en) Intelligent classification analysis method and system of scientific and technological information, storage medium and server
CN112417868A (en) Block chain news visualization method based on emotion scores and topic models
WO2023159756A1 (en) Price data processing method and apparatus, electronic device, and storage medium
CN112287656A (en) Text comparison method, device, equipment and storage medium
CN109543002A (en) Write a Chinese character in simplified form restoring method, device, equipment and the storage medium of character
CN117235582A (en) Multi-granularity information processing method and device based on electronic medical record
CN116644148A (en) Keyword recognition method and device, electronic equipment and storage medium
CN110069775A (en) Entity disambiguation method and system
CN113761875B (en) Event extraction method and device, electronic equipment and storage medium
CN115544204A (en) Bad corpus filtering method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230331

Address after: Room 106, Building 1, Tsinghua Science and Technology Park, No. 1666 Zuchongzhi South Road, Yushan Town, Kunshan City, Suzhou City, Jiangsu Province, 215000

Applicant after: Jiangsu Ruihuan Laser Technology Co.,Ltd.

Address before: 518017-09, Dongfang Science and technology building, Keyuan Road, Yuehai street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: SHENZHEN GOWILD ROBOTICS Co.,Ltd.

Applicant before: SHENZHEN GOWILD INTELLIGENT TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant