CN108491373A - A kind of entity recognition method and system - Google Patents

A kind of entity recognition method and system Download PDF

Info

Publication number
CN108491373A
CN108491373A CN201810101815.7A CN201810101815A CN108491373A CN 108491373 A CN108491373 A CN 108491373A CN 201810101815 A CN201810101815 A CN 201810101815A CN 108491373 A CN108491373 A CN 108491373A
Authority
CN
China
Prior art keywords
entity
dictionary
character string
knowledge base
speech rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810101815.7A
Other languages
Chinese (zh)
Other versions
CN108491373B (en
Inventor
任可欣
冯知凡
陆超
张扬
李莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810101815.7A priority Critical patent/CN108491373B/en
Publication of CN108491373A publication Critical patent/CN108491373A/en
Application granted granted Critical
Publication of CN108491373B publication Critical patent/CN108491373B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Abstract

The application provides a kind of entity recognition method, the method includes:Input text is segmented using natural language processing method and carries out entity mark;According to knowledge base entity dictionary, input text is matched to maximum matching process using preceding;Judge whether the character string for hitting knowledge base entity dictionary meets default part-of-speech rule, the character string of default part-of-speech rule will be met as entity correction result;The entity annotation results that natural language processing segments are modified using the entity correction result.Having modified participle boundary error reduces the human cost of Entity recognition, improves whole efficiency, improves the recognition effect for not including entity.

Description

A kind of entity recognition method and system
【Technical field】
This application involves natural language processing technique field more particularly to a kind of entity recognition method and systems.
【Background technology】
Entity entities refer to the object that is present in real world and can be distinguished with other objects. Entity Mention refer to the character substring that an entity can be indicated in free text.Entity recognition refers to in text The proprietary names such as name, place name be identified.For example, input short text, such as query, title etc. are exported in short text Entity entities;Such as, input " Zhou Jielun elder brothers insult wedding ", output " Zhou Jielun elder brothers insult wedding ", is realized with reaching to text understanding Purpose.
Entity recognition be information extraction, question answering system, syntactic analysis, chain of entities refer to, the application fields such as machine translation it is important Master tool, occupied an important position during natural language processing technique moves towards practical.
Traditional entity recognition method is broadly divided into:
(1) method based on domain-planning and dictionary.Syntax rule of this method based on linguist's hand-coding, root It is identified according to relevant informations such as morphology, syntaxes.
(2) method based on machine learning.Based on the training manually marked it is anticipated that training such as condition random The sequence labellings model such as field, hidden Markov model, to predict unlabeled data.
But said program is required for a large amount of human cost, and it is poor for the recognition effect for not including entity.
First, rule-based and dictionary method, needs domain expert's configuration rule, accurate generally in small data set It is higher, but recall low;And cannot identify the entity except dictionary, though in dictionary, the method for rule-based dictionary without Method solves entity ambiguity problem;It is difficult to expand to multi-field, domain expert's configuration rule human cost is larger.
Secondly, the method based on machine learning, as the solution of current mainstream, in order to obtain relatively good training Effect needs the training pattern of manpower mark high quality, human cost higher;Due to being learnt from the training data of mark, It is poor for not including Entity recognition effect;And to the entity of not obvious characteristic, such as song title, video display name identification effect Fruit is poor.
In addition, due to short text, such as query, title etc. express lack of standardization and some new popular entities appearance, meeting Cause the participle tool on basis that can cut some emerging entities scattered, causes recognition effect poor.
【Invention content】
The many aspects of the application provide a kind of entity recognition method and system, to reduce the manpower of Entity recognition at This, improves whole efficiency, improves the recognition effect for not including entity.
The one side of the application provides a kind of entity recognition method, including:
Input text is segmented and carries out entity mark;
According to knowledge base entity dictionary, input text is matched to maximum matching process using preceding;
Judge whether the character string for hitting knowledge base entity dictionary meets default part-of-speech rule;
Using the character string for meeting default part-of-speech rule, the entity annotation results of the input text are corrected.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the knowledge base Entity dictionary includes:
Name field in the encyclopaedia entity of knowledge base;The alias of the manual sorting of encyclopaedia push;That is excavated in encyclopaedia is other Name.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, according to knowledge base Entity dictionary, using it is preceding to maximum matching process to input text carry out matching further include:
According to novel entities dictionary, the character string of miss knowledge base entity dictionary is matched;
If hitting novel entities dictionary, the character string is skipped, is matched to maximum before continuing;
If miss novel entities dictionary, judges whether the character string meets default part-of-speech rule, default word will be met Property rule character string as word segmentation result.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, it is default by meeting The character string of part-of-speech rule includes as entity correction result:
Using the character string for meeting default part-of-speech rule as candidate entity, judge whether traversal input text;
If traversed, using candidate entity as word segmentation result;
If do not traversed, matched to maximum before continuing.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the default word Property rule is:Entity character string is noun and the noun by adjective modification.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the novel entities Dictionary is obtained by following steps:
Obtain search term;
To each search term using character as granularity, window is set, calculates the mutual information of character string and left and right in each window Comentropy;
The character string of preset mutual information threshold value, left information entropy threshold and right information entropy threshold will be met simultaneously as real Body;
Removal has been embodied in the entity in knowledge base entity dictionary, obtains novel entities dictionary.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method utilizes the reality Body correction result to the entity annotation results that natural language processing segments be modified including:
Corresponding single entity in entity correction result is replaced with by scattered multiple entities are cut in entity annotation results.
The another aspect of the application provides a kind of entity recognition system, including:
Entity labeling module, for being segmented to input text and carrying out entity mark;
Knowledge base entity dictionary matching module is used for according to knowledge base entity dictionary, using preceding to maximum matching process pair Input text is matched;
Part-of-speech rule judgment module, for judging whether the character string for hitting knowledge base entity dictionary meets default part of speech Rule;
Correcting module, for using the character string for meeting default part-of-speech rule, correcting the entity mark of the input text As a result.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the knowledge base Entity dictionary includes:
Name fields in the encyclopaedia entity of knowledge base;The alias of the manual sorting of encyclopaedia push;That is excavated in encyclopaedia is other Name.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the system is also Including novel entities dictionary matching module, it is used for:
According to novel entities dictionary, the character string of miss knowledge base entity dictionary is matched;
If hitting novel entities dictionary, the character string is skipped, then is continued by knowledge base entity dictionary matching module Forward direction maximum matches;
If miss novel entities dictionary, judge whether the character string meets default word by part-of-speech rule judgment module Property rule, the character string of default part-of-speech rule will be met as word segmentation result.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the part of speech rule Then judgment module is specifically additionally operable to:
Using the character string for meeting default part-of-speech rule as candidate entity, judge whether traversal input text;
If traversed, using candidate entity as word segmentation result;
If do not traversed, matched to maximum before being continued from knowledge base entity dictionary matching module.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the default word Property rule is:Entity character string is noun and the noun by adjective modification.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the novel entities Dictionary is obtained by following steps:
Obtain user's search term;
To each search term using word as granularity, window is set, calculates the mutual information of character string and left and right letter in each window Cease entropy;
The character string of preset mutual information threshold value, left information entropy threshold and right information entropy threshold will be met simultaneously as real Body;
Removal has been embodied in the entity in knowledge base entity dictionary, obtains novel entities dictionary.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the amendment mould Block is specifically used for:
Corresponding single entity in entity correction result is replaced with by scattered multiple entities are cut in entity annotation results.
The another aspect of the application provides a kind of equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors so that one or more of places Reason device realizes any above-mentioned method.
The another aspect of the application provides a kind of computer readable storage medium, is stored thereon with computer program, It is characterized in that, which realizes any above-mentioned method when being executed by processor.
By the technical solution it is found that using technical solution provided in this embodiment, has modified participle boundary error and reduce The human cost of Entity recognition, improves whole efficiency, improves the recognition effect for not including entity.
【Description of the drawings】
In order to more clearly explain the technical solutions in the embodiments of the present application, embodiment or the prior art will be retouched below Attached drawing needed in stating is briefly described, it should be apparent that, the accompanying drawings in the following description is some of the application Embodiment for those of ordinary skill in the art without having to pay creative labor, can also be according to this A little attached drawings obtain other attached drawings.
Fig. 1 is the flow diagram for the entity recognition method that one embodiment of the application provides;
Fig. 2 is the structural schematic diagram for the entity recognition system that another embodiment of the application provides;
Fig. 3 is the block diagram suitable for the exemplary computer system/server for realizing the embodiment of the present invention.
【Specific implementation mode】
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people Whole other embodiments that member is obtained without creative efforts, shall fall in the protection scope of this application.
In addition, the terms "and/or", only a kind of incidence relation of description affiliated partner, indicates may exist Three kinds of relationships, for example, A and/or B, can indicate:Individualism A exists simultaneously A and B, these three situations of individualism B.Separately Outside, character "/" herein, it is a kind of relationship of "or" to typically represent forward-backward correlation object.
Fig. 1 is the flow chart for the entity recognition method that one embodiment of the application provides, as shown in Figure 1, including following step Suddenly:
Step S11, input text is segmented using natural language processing method and carries out entity mark;
Step S12, according to knowledge base entity dictionary, input text is matched to maximum matching process using preceding;
Step S13, judge whether the character string for hitting knowledge base entity dictionary meets default part-of-speech rule;
Step S14, using the character string for meeting default part-of-speech rule, the entity annotation results of the input text are corrected.
In a kind of preferred implementation of step S11,
Input text is inquired in knowledge base entity dictionary;
If hit, as Entity recognition result and part-of-speech tagging is carried out using the input text;For example, user is defeated The text entered is only to be made of an entity, and the entity has been embodied in knowledge base entity dictionary, then can be direct It identifies and is exported, without subsequent step.
If miss, input text is segmented, and part of speech is carried out to word segmentation result according to knowledge base entity dictionary Mark.
Preferably, it is segmented using NLP natural language processing techniques, such as based on dictionary (based on dictionary, dictionary With), based on statistics (based on time frequency statistics), rule-based (knowledge based understanding) segmentation methods to the short text of input It is segmented.
Preferably, participle is carried out to input text using Baidu's NLPC platforms and entity marks.
But participle is carried out using NLP natural language processing techniques and is susceptible to participle boundary error, for example, input text This is " Best of Inuyasha strongest goblin download ", wherein " the strongest goblin of Best of Inuyasha " is that had complete semantic entity ( It is embodied in knowledge base entity dictionary), but NLP natural language processing techniques is used to carry out segmenting to be cut dissipating for " Best of Inuyasha The u goblin Ng " of the most strong a of the u of n, so as to cause participle boundary error.In addition, due to can in short text (such as query, title) Can exist and be not logged in entity (novel entities that do not included in knowledge base entity dictionary), such as occur neologisms on network, this portion Point being not logged in entity is often split the participle boundary error for leading to NLP.Therefore, it is necessary to above-mentioned participle boundary error into Row is corrected.
In a kind of preferred implementation of step S12,
Preferably, according to the chinese character number i of the longest entity in knowledge base entity dictionary as match window length, Using the preceding i character for inputting text as candidate character strings, knowledge base entity dictionary is searched.
If the candidate character strings taken out hit knowledge base entity dictionary, judge whether the candidate character strings meet word Property rule limitation.
If the candidate character strings miss knowledge base entity dictionary taken out, continues with novel entities dictionary and is matched; If hitting novel entities dictionary, the candidate character strings are skipped (currently, since the novel entities accuracy rate of excavation is 60%, Wu Fazhi It connects as entity dictionary, therefore for the mention of hit novel entities dictionary using strategy is not recalled, to lose the side recalled Formula improves whole accuracy rate), reduce match window, the last character of candidate character strings is removed, to remaining character String matches before continuing to maximum.
Preferably, if miss novel entities dictionary, judges whether the candidate character strings meet part-of-speech rule, to carry High recall rate (for example, some uncommon words be not both embodied in knowledge base entity dictionary, it is not embodied in novel entities dictionary yet, but It is, as long as it meets the part-of-speech rule limitation of noun, then as candidate entity, to improve recall rate).
In a kind of preferred implementation of step S13,
Preferably, judge whether the candidate character strings meet part-of-speech rule limitation and further comprise:If met, Using the candidate character strings as candidate entity;If do not met, the candidate character strings are skipped, reduces match window, will wait It selects the last character of character string to remove, is matched to maximum before continuing to remaining character string.
Wherein, the part-of-speech rule is as follows:According to the language mode of Chinese, a significant entity character string is run after fame Word and the noun modified by adjective.For example, the short text of input is " method of study Korean ", wherein " study Korean " is An entity in knowledge base, is embodied in knowledge base entity dictionary, therefore, because " learning Chinese " has hit knowledge base reality Pronouns, general term for nouns, numerals and measure words allusion quotation can splice the correct word segmentation result mistakes of original NLP, need to judge whether it meets next pair of part-of-speech rule limitation Forward direction maximum matching result is verified.In the case of " study Korean ", since " study " is verb, it is unsatisfactory for by describing The noun of word modification it is assumed that therefore will not will " study Korean " as candidate entity, need to skip the candidate character strings, subtract Small match window removes the last character in candidate character strings, to maximum before continuing to remaining character string Match.
The candidate character strings for meeting part-of-speech rule limitation are classified as further comprising the steps of after candidate entity:
Judge whether traversal input text, if it is, exporting all and candidate entity;If not, by described candidate real Body removes from short text, is matched to maximum before continuing to remaining character string.
In a kind of preferred implementation of step S14,
Preferably, scattered multiple entities will be cut in entity annotation results and replaces with corresponding list in entity correction result A entity segments boundary error to correct NLP.
For example, entity " Best of Inuyasha " that NLP is segmented, " goblin " are replaced with " the strongest goblin of Best of Inuyasha ".
Preferably, the knowledge base entity dictionary through the following steps that obtain:
Obtain the name fields in the encyclopaedia entity of knowledge base;
Receive the alias of the manual sorting of encyclopaedia push;
Alias is excavated from encyclopaedia info-box, for example, the alias of " Zhou Jielun " is " Zhou Dong ".
Preferably, the novel entities dictionary through the following steps that obtain:
1) querylog of search engine is obtained;
2) for each query using word as granularity, window is set, calculate in each window the mutual information of character string and Left and right comentropy;
Mutual informationWherein p (x, y) is the joint probability distribution function of X and Y, and p (x) It is the marginal probability distribution function of X and Y respectively with p (y).
Mutual information embodies two variable Xs, and the degree that interdepends between Y, association relationship is higher, shows the phase of X and Y Close property it is higher, then X and Y form significant entity possibility it is bigger.
Left and right entropy refers to the entropy of the entropy and right margin of the left margin of multi-character words expression, by taking left entropy as an example, to a character string The all possible word in the left side and word frequency calculate comentropy, then sum.
For character string w, the formula of left and right entropy is as follows:
Wherein, a indicates the adjacent word of character string w, and left and right entropy is higher, shows that character string w forms significant entity Possibility it is bigger.
Left and right entropy has reacted the degree of freedom of term, for example, for " the Great Sage Equalling Heaven ", E can be obtained (together by calculating its right entropy Extremely big *)<<E (the Great Sage Equalling Heaven *), it can be seen that " the Great Sage Equalling Heaven " forms significant entity possibility as a character string Bigger.
3) according to preset mutual information threshold value, left information entropy threshold and right information entropy threshold, above-mentioned threshold value will be met simultaneously Character string as entity.
4) from the entity being embodied in knowledge base entity dictionary is removed in obtained entity, novel entities dictionary is obtained.
By user's query log processing to magnanimity, timely emerging entity can must be excavated.
Currently, since the novel entities accuracy rate excavated by the above method is 60%, can not directly as entity dictionary, It is only used as novel entities dictionary, therefore the entity of hit novel entities dictionary is used and does not recall strategy, in such a way that loss is recalled To improve whole accuracy rate.
The present embodiment the method, knowledge based library entity dictionary are matched using preceding to maximum, and it is wrong to correct participle boundary Accidentally, scattered entity will be cut and be stitched together again;Novel entities dictionary is excavated from querylog, to solve participle boundary error. The human cost for reducing Entity recognition, improves whole efficiency, improves the recognition effect for not including entity.
It should be noted that for each method embodiment above-mentioned, for simple description, therefore it is all expressed as to a system The combination of actions of row, but those skilled in the art should understand that, the application is not limited by the described action sequence, Because according to the application, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art also answer This knows that embodiment described in this description belongs to preferred embodiment, and involved action and module are not necessarily originally Necessary to application.
In the described embodiment, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiment.
Fig. 2 is the structure chart for the entity recognition system that one embodiment of the application provides, as shown in Fig. 2, including:
Entity labeling module 21, for being segmented to input text and carrying out entity mark;
Knowledge base entity dictionary matching module 22 is used for according to knowledge base entity dictionary, using preceding to maximum matching process Input text is matched;
Part-of-speech rule judgment module 23, for judging whether the character string for hitting knowledge base entity dictionary meets default word Property rule;
Correcting module 24, for using the character string for meeting default part-of-speech rule, correcting the entity mark of the input text Note result.
In a kind of preferred implementation of entity labeling module 21,
Input text is inquired in knowledge base entity dictionary;
If hit, as Entity recognition result and part-of-speech tagging is carried out using the input text;For example, user is defeated The text entered is only to be made of an entity, and the entity has been embodied in knowledge base entity dictionary, then can be direct It identifies and is exported, without subsequent step.
If miss, input text is segmented, and part of speech is carried out to word segmentation result according to knowledge base entity dictionary Mark.
Preferably, it is segmented using NLP natural language processing techniques, such as based on dictionary (based on dictionary, dictionary With), based on statistics (based on time frequency statistics), rule-based (knowledge based understanding) segmentation methods to the short text of input It is segmented.
Preferably, participle is carried out to input text using Baidu's NLPC platforms and entity marks.
But participle is carried out using NLP natural language processing techniques and is susceptible to participle boundary error, for example, input text This is " Best of Inuyasha strongest goblin download ", wherein " the strongest goblin of Best of Inuyasha " is that had complete semantic entity ( It is embodied in knowledge base entity dictionary), but NLP natural language processing techniques is used to carry out segmenting to be cut dissipating for " Best of Inuyasha The u goblin Ng " of the most strong a of the u of n, so as to cause participle boundary error.In addition, due to can in short text (such as query, title) Can exist and be not logged in entity (novel entities that do not included in knowledge base entity dictionary), such as occur neologisms on network, this portion Point being not logged in entity is often split the participle boundary error for leading to NLP.Therefore, it is necessary to above-mentioned participle boundary error into Row is corrected.
In a kind of preferred implementation of knowledge base entity dictionary matching module 22,
Preferably, according to the chinese character number i of the longest entity in knowledge base entity dictionary as match window length, Using the preceding i character for inputting text as candidate character strings, knowledge base entity dictionary is searched.
If the candidate character strings taken out hit knowledge base entity dictionary, judge whether the candidate character strings meet word Property rule limitation.
If the candidate character strings miss knowledge base entity dictionary taken out, continues with novel entities dictionary and is matched; If hitting novel entities dictionary, the candidate character strings are skipped (currently, since the novel entities accuracy rate of excavation is 60%, Wu Fazhi It connects as entity dictionary, therefore for the mention of hit novel entities dictionary using strategy is not recalled, to lose the side recalled Formula improves whole accuracy rate), reduce match window, the last character of candidate character strings is removed, to remaining character String matches before continuing to maximum.
Preferably, if miss novel entities dictionary, judges whether the candidate character strings meet part-of-speech rule, to carry High recall rate (for example, some uncommon words be not both embodied in knowledge base entity dictionary, it is not embodied in novel entities dictionary yet, but It is, as long as it meets the part-of-speech rule limitation of noun, then as candidate entity, to improve recall rate).
In a kind of preferred implementation of part-of-speech rule judgment module 23,
Preferably, judge whether the candidate character strings meet part-of-speech rule limitation and further comprise:If met, Using the candidate character strings as candidate entity;If do not met, the candidate character strings are skipped, reduces match window, will wait It selects the last character of character string to remove, is matched to maximum before continuing to remaining character string.
Wherein, the part-of-speech rule is as follows:According to the language mode of Chinese, a significant entity character string is run after fame Word and the noun modified by adjective.For example, the short text of input is " method of study Korean ", wherein " study Korean " is An entity in knowledge base, is embodied in knowledge base entity dictionary, therefore, because " learning Chinese " has hit knowledge base reality Pronouns, general term for nouns, numerals and measure words allusion quotation can splice the correct word segmentation result mistakes of original NLP, need to judge whether it meets next pair of part-of-speech rule limitation Forward direction maximum matching result is verified.In the case of " study Korean ", since " study " is verb, it is unsatisfactory for by describing The noun of word modification it is assumed that therefore will not will " study Korean " as candidate entity, need to skip the candidate character strings, subtract Small match window removes the last character in candidate character strings, to maximum before continuing to remaining character string Match.
The system also includes spider module 25, for by the candidate characters tandem for meeting part-of-speech rule limitation After candidate entity, judge whether traversal input text, if it is, exporting all and candidate entity;If not, will be described Candidate entity removes from short text, from knowledge base entity dictionary matching module 22 to remaining character string continue before to Maximum matching.
In a kind of preferred implementation of correcting module 24,
Preferably, scattered multiple entities will be cut in entity annotation results and replaces with corresponding list in entity correction result A entity segments boundary error to correct NLP.
For example, entity " Best of Inuyasha " that NLP is segmented, " goblin " are replaced with " the strongest goblin of Best of Inuyasha ".
Preferably, the knowledge base entity dictionary is by obtaining the name fields in the encyclopaedia entity of knowledge base, receiving The alias of the manual sorting of encyclopaedia push excavates what alias obtained from encyclopaedia info-box.For example, the alias of " Zhou Jielun " is " Zhou Dong ".
Preferably, the novel entities dictionary through the following steps that obtain:
1) querylog of search engine is obtained;
2) for each query using word as granularity, window is set, calculate in each window the mutual information of character string and Left and right comentropy;
Mutual informationWherein p (x, y) is the joint probability distribution function of X and Y, and p (x) It is the marginal probability distribution function of X and Y respectively with p (y).
Mutual information embodies two variable Xs, and the degree that interdepends between Y, association relationship is higher, shows the phase of X and Y Close property it is higher, then X and Y form significant entity possibility it is bigger.
Left and right entropy refers to the entropy of the entropy and right margin of the left margin of multi-character words expression, by taking left entropy as an example, to a character string The all possible word in the left side and word frequency calculate comentropy, then sum.
For character string w, the formula of left and right entropy is as follows:
Wherein, a indicates the adjacent word of character string w, and left and right entropy is higher, shows that character string w forms significant entity Possibility it is bigger.
Left and right entropy has reacted the degree of freedom of term, for example, for " the Great Sage Equalling Heaven ", E can be obtained (together by calculating its right entropy Extremely big *)<<E (the Great Sage Equalling Heaven *), it can be seen that " the Great Sage Equalling Heaven " forms significant entity possibility as a character string Bigger.
3) mutual information to character string in each window and the summation of left and right comentropy, novel entities are obtained by threshold filtering Dictionary.
By user's query log processing to magnanimity, timely emerging entity can must be excavated.
Currently, since the novel entities accuracy rate excavated by the above method is 60%, can not directly as entity dictionary, It is only used as novel entities dictionary, therefore the entity of hit novel entities dictionary is used and does not recall strategy, in such a way that loss is recalled To improve whole accuracy rate.
System described in the present embodiment, knowledge based library entity dictionary are matched using preceding to maximum, and it is wrong to correct participle boundary Accidentally, scattered entity will be cut and be stitched together again;Novel entities dictionary is excavated from query log, it is wrong to solve participle boundary Accidentally.The human cost for reducing Entity recognition, improves whole efficiency, improves the recognition effect for not including entity.
It is apparent to those skilled in the art that for convenience and simplicity of description, the description is System, the specific work process of device and unit can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed method and apparatus can pass through Other modes are realized.For example, the apparatus embodiments described above are merely exemplary, for example, the unit is drawn Point, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be by some interfaces, device or unit it is indirect Coupling or communication connection can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, as unit The component of display may or may not be physical unit, you can be located at a place, or may be distributed over more In a network element.Some or all of unit therein can be selected according to the actual needs to realize this embodiment scheme Purpose.
In addition, each functional unit in each embodiment of the application can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.The integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of hardware adds SFU software functional unit.
Fig. 3 shows the frame of the exemplary computer system/server 012 suitable for being used for realizing embodiment of the present invention Figure.The computer system/server 012 that Fig. 3 is shown is only an example, to the function of the embodiment of the present invention and should not be made With range band come any restrictions.
As shown in figure 3, computer system/server 012 is showed in the form of universal computing device.Computer system/clothes The component of business device 012 can include but is not limited to:One or more processor or processing unit 016, system storage 028, the bus 018 of connection different system component (including system storage 028 and processing unit 016).
Bus 018 indicates one or more in a few class bus structures, including memory bus or memory control Device, peripheral bus, graphics acceleration port, processor or total using the local of the arbitrary bus structures in a variety of bus structures Line.For example, these architectures include but not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) Bus.
Computer system/server 012 typically comprises a variety of computer system readable media.These media can be Any usable medium that can be accessed by computer system/server 012, including volatile and non-volatile media move And immovable medium.
System storage 028 may include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 030 and/or cache memory 032.Computer system/server 012 may further include it Its removable/nonremovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 034 It can be used for reading and writing immovable, non-volatile magnetic media (Fig. 3 do not show, commonly referred to as " hard disk drive ").Although figure It is not shown, can be provided for the disc driver to moving non-volatile magnetic disk (such as " floppy disk ") read-write in 3, and To moving the CD drive of anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.At this In the case of a little, each driver can be connected by one or more data media interfaces with bus 018.Memory 028 can To include at least one program product, the program product is with one group of (for example, at least one) program module, these program modules It is configured to perform the function of various embodiments of the present invention.
Program/utility 040 with one group of (at least one) program module 042, can be stored in such as memory In 028, such program module 042 include --- but being not limited to --- operating system, one or more application program, its Its program module and program data may include the realization of network environment in each or certain combination in these examples. Program module 042 usually executes function and/or method in embodiment described in the invention.
Computer system/server 012 can also with one or more external equipments 014 (such as keyboard, sensing equipment, Display 024 etc.) communication, in the present invention, computer system/server 012 is communicated with external radar equipment, may be used also With one or more enable a user to the equipment interacted with the computer system/server 012 communication, and/or with make this Any equipment that computer system/server 012 can be communicated with one or more of the other computing device (such as network interface card, it adjusts Modulator-demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 022.Also, computer system/ Server 012 can also pass through network adapter 020 and one or more network (such as LAN (LAN), wide area network (WAN) and/or public network, for example, internet) communication.As shown in figure 3, network adapter 020 passes through bus 018 and calculating Other modules of machine systems/servers 012 communicate.It should be understood that although being not shown in Fig. 3, can in conjunction with computer system/ Server 012 uses other hardware and/or software module, including but not limited to:Microcode, device driver, redundancy processing are single Member, external disk drive array, RAID system, tape drive and data backup storage system etc..
Processing unit 016 is stored in the program in system storage 028 by operation, described in the invention to execute Embodiment in function and/or method.
Above-mentioned computer program can be set in computer storage media, i.e., the computer storage media is encoded There is computer program, the program by one or more computers when being executed so that one or more computers execute the present invention Method flow shown in above-described embodiment and/or device operation.
With time, the development of technology, medium meaning is more and more extensive, and the route of transmission of computer program is no longer limited In tangible medium, can also directly be downloaded from network etc..Arbitrary group of one or more computer-readable media may be used It closes.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable to deposit Storage media for example may be-but not limited to-system, device or the device of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor Part, or the arbitrary above combination.The more specific example (non exhaustive list) of computer readable storage medium includes:Tool There are one or multiple conducting wires electrical connection, portable computer diskette, hard disk, random access memory (RAM), read-only storage Device (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer can Reading storage medium, can be any include or the tangible medium of storage program, the program can be commanded execution system, device or The use or in connection of person's device.
Computer-readable signal media may include the data letter propagated in a base band or as a carrier wave part Number, wherein carrying computer-readable program code.Diversified forms may be used in the data-signal of this propagation, packet Include --- but being not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media It can also be any computer-readable medium other than computer readable storage medium, which can send, Either transmission is propagated for by the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but not It is limited to --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
It can be write with one or more programming languages or combinations thereof for executing the computer that operates of the present invention Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with Fully execute, partly execute on the user computer on the user computer, being executed as an independent software package, Part executes or executes on a remote computer or server completely on the remote computer on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including LAN (LAN) or wide area network (WAN) is connected to subscriber computer, or, it may be connected to outer computer (such as utilize internet Service provider is connected by internet).
Finally it should be noted that:Above example is only to illustrate the technical solution of the application, rather than its limitations;To the greatest extent Pipe is with reference to the foregoing embodiments described in detail the application, it will be understood by those of ordinary skill in the art that:It is still Can be with technical scheme described in the above embodiments is modified, or which part technical characteristic is equally replaced It changes;And these modifications or replacements, the essence of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution God and range.

Claims (16)

1. a kind of entity recognition method, which is characterized in that including:
Input text is segmented and carries out entity mark;
According to knowledge base entity dictionary, input text is matched to maximum matching process using preceding;
Judge whether the character string for hitting knowledge base entity dictionary meets default part-of-speech rule;
Using the character string for meeting default part-of-speech rule, the entity annotation results of the input text are corrected.
2. according to the method described in claim 1, it is characterized in that, the knowledge base entity dictionary includes:
Name field in the encyclopaedia entity of knowledge base;The alias of the manual sorting of encyclopaedia push;The alias excavated in encyclopaedia.
3. according to the method described in claim 1, it is characterized in that, according to knowledge base entity dictionary, matched to maximum using preceding Method carries out matching to input text:
According to novel entities dictionary, the character string of miss knowledge base entity dictionary is matched;
If hitting novel entities dictionary, the character string is skipped, is matched to maximum before continuing;
If miss novel entities dictionary, judges whether the character string meets default part-of-speech rule, default part of speech rule will be met Character string then is as word segmentation result.
4. according to the method described in claim 3, it is characterized in that, judging whether the character string for hitting knowledge base entity dictionary accords with Closing default part-of-speech rule further includes:
Using the character string for meeting default part-of-speech rule as candidate entity, judge whether traversal input text;
If traversed, using candidate entity as word segmentation result;
If do not traversed, matched to maximum before continuing.
5. according to the method described in claim 4, it is characterized in that, the default part-of-speech rule is:Entity character string is noun And the noun modified by adjective.
6. according to the method described in claim 3, it is characterized in that, the novel entities dictionary is obtained by following steps:
Obtain search term;
To each search term using character as granularity, window is set, the mutual information of character string and left and right information in each window are calculated Entropy;
To the character string of preset mutual information threshold value, left information entropy threshold and right information entropy threshold be met as entity simultaneously;
Removal has been embodied in the entity in knowledge base entity dictionary, obtains novel entities dictionary.
7. according to the method described in claim 1, it is characterized in that, using the entity correction result to natural language processing point The entity annotation results that word obtains be modified including:
Corresponding single entity in entity correction result is replaced with by scattered multiple entities are cut in entity annotation results.
8. a kind of entity recognition system, which is characterized in that including:
Entity labeling module, for being segmented to input text and carrying out entity mark;
Knowledge base entity dictionary matching module, for according to knowledge base entity dictionary, using it is preceding to maximum matching process to input Text is matched;
Part-of-speech rule judgment module, for judging whether the character string for hitting knowledge base entity dictionary meets default part-of-speech rule;
Correcting module, for using the character string for meeting default part-of-speech rule, correcting the entity annotation results of the input text.
9. system according to claim 8, which is characterized in that the knowledge base entity dictionary includes:
Name fields in the encyclopaedia entity of knowledge base;The alias of the manual sorting of encyclopaedia push;The alias excavated in encyclopaedia.
10. system according to claim 8, which is characterized in that the system also includes novel entities dictionary matching modules, use In:
According to novel entities dictionary, the character string of miss knowledge base entity dictionary is matched;
If hit novel entities dictionary, skip the character string, then from knowledge base entity dictionary matching module continue before to Maximum matching;
If miss novel entities dictionary, judge whether the character string meets default part of speech and advise by part-of-speech rule judgment module Then, the character string of default part-of-speech rule will be met as word segmentation result.
11. system according to claim 10, which is characterized in that the part-of-speech rule judgment module is specifically additionally operable to:
Using the character string for meeting default part-of-speech rule as candidate entity, judge whether traversal input text;
If traversed, using candidate entity as word segmentation result;
If do not traversed, matched to maximum before being continued from knowledge base entity dictionary matching module.
12. system according to claim 11, which is characterized in that the default part-of-speech rule is:Entity character string is run after fame Word and the noun modified by adjective.
13. system according to claim 10, which is characterized in that the novel entities dictionary is obtained by following steps:
Obtain user's search term;
To each search term using word as granularity, window is set, the mutual information of character string and left and right comentropy in each window are calculated;
To the character string of preset mutual information threshold value, left information entropy threshold and right information entropy threshold be met as entity simultaneously;
Removal has been embodied in the entity in knowledge base entity dictionary, obtains novel entities dictionary.
14. system according to claim 8, which is characterized in that the correcting module is specifically used for:
Corresponding single entity in entity correction result is replaced with by scattered multiple entities are cut in entity annotation results.
15. a kind of equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors so that one or more of processors are real The now method as described in any in claim 1-7.
16. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The method as described in any in claim 1-7 is realized when execution.
CN201810101815.7A 2018-02-01 2018-02-01 Entity identification method and system Active CN108491373B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810101815.7A CN108491373B (en) 2018-02-01 2018-02-01 Entity identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810101815.7A CN108491373B (en) 2018-02-01 2018-02-01 Entity identification method and system

Publications (2)

Publication Number Publication Date
CN108491373A true CN108491373A (en) 2018-09-04
CN108491373B CN108491373B (en) 2022-05-27

Family

ID=63344351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810101815.7A Active CN108491373B (en) 2018-02-01 2018-02-01 Entity identification method and system

Country Status (1)

Country Link
CN (1) CN108491373B (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271630A (en) * 2018-09-11 2019-01-25 成都信息工程大学 A kind of intelligent dimension method and device based on natural language processing
CN109271392A (en) * 2018-10-30 2019-01-25 长威信息科技发展股份有限公司 Quick discrimination and the method and apparatus for extracting relevant database entity and attribute
CN109508382A (en) * 2018-10-19 2019-03-22 北京明略软件系统有限公司 A kind of label for labelling method and apparatus, computer readable storage medium
CN110390101A (en) * 2019-07-22 2019-10-29 中新软件(上海)有限公司 Non-standard de-sign judgment method, device and the computer equipment of entity contract remarks
CN110705258A (en) * 2019-09-18 2020-01-17 北京明略软件系统有限公司 Text entity identification method and device
CN110750991A (en) * 2019-09-18 2020-02-04 平安科技(深圳)有限公司 Entity identification method, device, equipment and computer readable storage medium
CN111062216A (en) * 2019-12-18 2020-04-24 腾讯科技(深圳)有限公司 Named entity identification method, device, terminal and readable medium
CN111178080A (en) * 2020-01-02 2020-05-19 杭州涂鸦信息技术有限公司 Named entity identification method and system based on structured information
CN111353020A (en) * 2020-02-27 2020-06-30 北京奇艺世纪科技有限公司 Method, device, computer equipment and storage medium for mining text data
CN111382570A (en) * 2018-12-28 2020-07-07 深圳市优必选科技有限公司 Text entity recognition method and device, computer equipment and storage medium
CN111401083A (en) * 2019-01-02 2020-07-10 阿里巴巴集团控股有限公司 Name identification method and device, storage medium and processor
CN111414766A (en) * 2018-12-18 2020-07-14 北京搜狗科技发展有限公司 Translation method and device
CN111611779A (en) * 2020-04-07 2020-09-01 腾讯科技(深圳)有限公司 Auxiliary text labeling method, device and equipment and storage medium thereof
CN111666768A (en) * 2020-06-10 2020-09-15 京东方科技集团股份有限公司 Chinese named entity recognition method and device and electronic equipment
CN112417876A (en) * 2020-11-23 2021-02-26 北京乐学帮网络技术有限公司 Text processing method and device, computer equipment and storage medium
CN113051900A (en) * 2021-04-30 2021-06-29 中国平安人寿保险股份有限公司 Synonym recognition method and device, computer equipment and storage medium
CN113127503A (en) * 2021-03-18 2021-07-16 中国科学院国家空间科学中心 Automatic information extraction method and system for aerospace information
CN113987145A (en) * 2021-10-22 2022-01-28 智联(无锡)信息技术有限公司 Method, system, equipment and storage medium for accurately reasoning user attribute entity
CN114138945A (en) * 2022-01-19 2022-03-04 支付宝(杭州)信息技术有限公司 Entity identification method and device in data analysis
CN114218935A (en) * 2022-02-15 2022-03-22 支付宝(杭州)信息技术有限公司 Entity display method and device in data analysis
WO2022111083A1 (en) * 2020-11-30 2022-06-02 京东方科技集团股份有限公司 Entity recognition method, entity recognition apparatus, electronic device and storage medium
CN115238702A (en) * 2022-09-21 2022-10-25 中科雨辰科技有限公司 Entity library processing method and storage medium
CN116049447A (en) * 2023-03-24 2023-05-02 中科雨辰科技有限公司 Entity linking system based on knowledge base
CN111382570B (en) * 2018-12-28 2024-05-03 深圳市优必选科技有限公司 Text entity recognition method, device, computer equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101950284A (en) * 2010-09-27 2011-01-19 北京新媒传信科技有限公司 Chinese word segmentation method and system
CN102063424A (en) * 2010-12-24 2011-05-18 上海电机学院 Method for Chinese word segmentation
US20130204606A1 (en) * 2010-08-09 2013-08-08 Institute Of Automation, Chinese Academy Of Sciences Method for labeling semantic role of bilingual parallel sentence pair
CN103530298A (en) * 2012-07-06 2014-01-22 深圳市世纪光速信息技术有限公司 Information searching method and device
CN104391837A (en) * 2014-11-19 2015-03-04 熊玮 Intelligent grammatical analysis method based on case semantics
CN104572625A (en) * 2015-01-21 2015-04-29 北京云知声信息技术有限公司 Recognition method of named entity
CN104715049A (en) * 2015-03-26 2015-06-17 无锡中科泛在信息技术研发中心有限公司 Commodity review property word extracting method based on noumenon lexicon
CN105426539A (en) * 2015-12-23 2016-03-23 成都电科心通捷信科技有限公司 Dictionary-based lucene Chinese word segmentation method
CN106372060A (en) * 2016-08-31 2017-02-01 北京百度网讯科技有限公司 Search text labeling method and device
CN106547733A (en) * 2016-10-19 2017-03-29 中国国防科技信息中心 A kind of name entity recognition method towards particular text
CN106570179A (en) * 2016-11-10 2017-04-19 中国科学院信息工程研究所 Evaluative text-oriented kernel entity identification method and apparatus
CN106649250A (en) * 2015-10-29 2017-05-10 北京国双科技有限公司 Method and device for identifying emotional new words
CN107273356A (en) * 2017-06-14 2017-10-20 北京百度网讯科技有限公司 Segmenting method, device, server and storage medium based on artificial intelligence

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130204606A1 (en) * 2010-08-09 2013-08-08 Institute Of Automation, Chinese Academy Of Sciences Method for labeling semantic role of bilingual parallel sentence pair
CN101950284A (en) * 2010-09-27 2011-01-19 北京新媒传信科技有限公司 Chinese word segmentation method and system
CN102063424A (en) * 2010-12-24 2011-05-18 上海电机学院 Method for Chinese word segmentation
CN103530298A (en) * 2012-07-06 2014-01-22 深圳市世纪光速信息技术有限公司 Information searching method and device
CN104391837A (en) * 2014-11-19 2015-03-04 熊玮 Intelligent grammatical analysis method based on case semantics
CN104572625A (en) * 2015-01-21 2015-04-29 北京云知声信息技术有限公司 Recognition method of named entity
CN104715049A (en) * 2015-03-26 2015-06-17 无锡中科泛在信息技术研发中心有限公司 Commodity review property word extracting method based on noumenon lexicon
CN106649250A (en) * 2015-10-29 2017-05-10 北京国双科技有限公司 Method and device for identifying emotional new words
CN105426539A (en) * 2015-12-23 2016-03-23 成都电科心通捷信科技有限公司 Dictionary-based lucene Chinese word segmentation method
CN106372060A (en) * 2016-08-31 2017-02-01 北京百度网讯科技有限公司 Search text labeling method and device
CN106547733A (en) * 2016-10-19 2017-03-29 中国国防科技信息中心 A kind of name entity recognition method towards particular text
CN106570179A (en) * 2016-11-10 2017-04-19 中国科学院信息工程研究所 Evaluative text-oriented kernel entity identification method and apparatus
CN107273356A (en) * 2017-06-14 2017-10-20 北京百度网讯科技有限公司 Segmenting method, device, server and storage medium based on artificial intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵成 等: "一种中文地址知识库支撑的中文地址分词算法", 《测绘科学技术学报》 *

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271630A (en) * 2018-09-11 2019-01-25 成都信息工程大学 A kind of intelligent dimension method and device based on natural language processing
CN109271630B (en) * 2018-09-11 2022-07-05 成都信息工程大学 Intelligent labeling method and device based on natural language processing
CN109508382A (en) * 2018-10-19 2019-03-22 北京明略软件系统有限公司 A kind of label for labelling method and apparatus, computer readable storage medium
CN109271392A (en) * 2018-10-30 2019-01-25 长威信息科技发展股份有限公司 Quick discrimination and the method and apparatus for extracting relevant database entity and attribute
CN111414766B (en) * 2018-12-18 2024-01-30 北京搜狗科技发展有限公司 Translation method and device
CN111414766A (en) * 2018-12-18 2020-07-14 北京搜狗科技发展有限公司 Translation method and device
CN111382570B (en) * 2018-12-28 2024-05-03 深圳市优必选科技有限公司 Text entity recognition method, device, computer equipment and storage medium
CN111382570A (en) * 2018-12-28 2020-07-07 深圳市优必选科技有限公司 Text entity recognition method and device, computer equipment and storage medium
CN111401083A (en) * 2019-01-02 2020-07-10 阿里巴巴集团控股有限公司 Name identification method and device, storage medium and processor
CN111401083B (en) * 2019-01-02 2023-05-02 阿里巴巴集团控股有限公司 Name identification method and device, storage medium and processor
CN110390101B (en) * 2019-07-22 2023-04-25 中新软件(上海)有限公司 Nonstandard design judgment method and device for entity contract remarks and computer equipment
CN110390101A (en) * 2019-07-22 2019-10-29 中新软件(上海)有限公司 Non-standard de-sign judgment method, device and the computer equipment of entity contract remarks
CN110750991B (en) * 2019-09-18 2022-04-15 平安科技(深圳)有限公司 Entity identification method, device, equipment and computer readable storage medium
CN110750991A (en) * 2019-09-18 2020-02-04 平安科技(深圳)有限公司 Entity identification method, device, equipment and computer readable storage medium
CN110705258A (en) * 2019-09-18 2020-01-17 北京明略软件系统有限公司 Text entity identification method and device
CN111062216B (en) * 2019-12-18 2021-11-23 腾讯科技(深圳)有限公司 Named entity identification method, device, terminal and readable medium
CN111062216A (en) * 2019-12-18 2020-04-24 腾讯科技(深圳)有限公司 Named entity identification method, device, terminal and readable medium
CN111178080A (en) * 2020-01-02 2020-05-19 杭州涂鸦信息技术有限公司 Named entity identification method and system based on structured information
CN111178080B (en) * 2020-01-02 2023-07-18 杭州涂鸦信息技术有限公司 Named entity identification method and system based on structured information
CN111353020B (en) * 2020-02-27 2023-06-30 北京奇艺世纪科技有限公司 Method, device, computer equipment and storage medium for mining text data
CN111353020A (en) * 2020-02-27 2020-06-30 北京奇艺世纪科技有限公司 Method, device, computer equipment and storage medium for mining text data
CN111611779B (en) * 2020-04-07 2023-10-13 腾讯科技(深圳)有限公司 Auxiliary text labeling method, device, equipment and storage medium thereof
CN111611779A (en) * 2020-04-07 2020-09-01 腾讯科技(深圳)有限公司 Auxiliary text labeling method, device and equipment and storage medium thereof
CN111666768A (en) * 2020-06-10 2020-09-15 京东方科技集团股份有限公司 Chinese named entity recognition method and device and electronic equipment
CN112417876A (en) * 2020-11-23 2021-02-26 北京乐学帮网络技术有限公司 Text processing method and device, computer equipment and storage medium
WO2022111083A1 (en) * 2020-11-30 2022-06-02 京东方科技集团股份有限公司 Entity recognition method, entity recognition apparatus, electronic device and storage medium
CN113127503A (en) * 2021-03-18 2021-07-16 中国科学院国家空间科学中心 Automatic information extraction method and system for aerospace information
CN113051900A (en) * 2021-04-30 2021-06-29 中国平安人寿保险股份有限公司 Synonym recognition method and device, computer equipment and storage medium
CN113051900B (en) * 2021-04-30 2023-08-22 中国平安人寿保险股份有限公司 Synonym recognition method, synonym recognition device, computer equipment and storage medium
CN113987145A (en) * 2021-10-22 2022-01-28 智联(无锡)信息技术有限公司 Method, system, equipment and storage medium for accurately reasoning user attribute entity
CN113987145B (en) * 2021-10-22 2024-02-02 智联网聘信息技术有限公司 Method, system, equipment and storage medium for accurately reasoning user attribute entity
CN114138945B (en) * 2022-01-19 2022-06-14 支付宝(杭州)信息技术有限公司 Entity identification method and device in data analysis
CN114138945A (en) * 2022-01-19 2022-03-04 支付宝(杭州)信息技术有限公司 Entity identification method and device in data analysis
CN114218935B (en) * 2022-02-15 2022-06-21 支付宝(杭州)信息技术有限公司 Entity display method and device in data analysis
CN114218935A (en) * 2022-02-15 2022-03-22 支付宝(杭州)信息技术有限公司 Entity display method and device in data analysis
CN115238702B (en) * 2022-09-21 2022-12-06 中科雨辰科技有限公司 Entity library processing method and storage medium
CN115238702A (en) * 2022-09-21 2022-10-25 中科雨辰科技有限公司 Entity library processing method and storage medium
CN116049447A (en) * 2023-03-24 2023-05-02 中科雨辰科技有限公司 Entity linking system based on knowledge base
CN116049447B (en) * 2023-03-24 2023-06-13 中科雨辰科技有限公司 Entity linking system based on knowledge base

Also Published As

Publication number Publication date
CN108491373B (en) 2022-05-27

Similar Documents

Publication Publication Date Title
CN108491373A (en) A kind of entity recognition method and system
CN109657054B (en) Abstract generation method, device, server and storage medium
US10776578B2 (en) Method and apparatus for building synonymy discriminating model and method and apparatus for discriminating synonymous text
AU2017408800B2 (en) Method and system of mining information, electronic device and readable storable medium
US7493251B2 (en) Using source-channel models for word segmentation
JP6901816B2 (en) Entity-related data generation methods, devices, devices, and storage media
US20180373692A1 (en) Method for parsing query based on artificial intelligence and computer device
EP1889180A2 (en) Collocation translation from monolingual and available bilingual corpora
CN109684634B (en) Emotion analysis method, device, equipment and storage medium
CN107992596A (en) A kind of Text Clustering Method, device, server and storage medium
CN108628830B (en) Semantic recognition method and device
CN110569335B (en) Triple verification method and device based on artificial intelligence and storage medium
CN108460011A (en) A kind of entitative concept mask method and system
CN107807915B (en) Error correction model establishing method, device, equipment and medium based on error correction platform
US9311299B1 (en) Weakly supervised part-of-speech tagging with coupled token and type constraints
CN108549656A (en) Sentence analytic method, device, computer equipment and readable medium
CN108363556A (en) A kind of method and system based on voice Yu augmented reality environmental interaction
US20210042470A1 (en) Method and device for separating words
CN107203504B (en) Character string replacing method and device
CN107491477A (en) A kind of emoticon searching method and device
CN108121697A (en) Method, apparatus, equipment and the computer storage media that a kind of text is rewritten
CN110334209A (en) File classification method, device, medium and electronic equipment
CN107861948B (en) Label extraction method, device, equipment and medium
CN109785829A (en) A kind of customer service householder method and system based on voice control
CN112148958A (en) Method, apparatus, and computer storage medium for information recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant