CN108491373A - A kind of entity recognition method and system - Google Patents
A kind of entity recognition method and system Download PDFInfo
- Publication number
- CN108491373A CN108491373A CN201810101815.7A CN201810101815A CN108491373A CN 108491373 A CN108491373 A CN 108491373A CN 201810101815 A CN201810101815 A CN 201810101815A CN 108491373 A CN108491373 A CN 108491373A
- Authority
- CN
- China
- Prior art keywords
- entity
- dictionary
- character string
- knowledge base
- speech rule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
Abstract
The application provides a kind of entity recognition method, the method includes:Input text is segmented using natural language processing method and carries out entity mark;According to knowledge base entity dictionary, input text is matched to maximum matching process using preceding;Judge whether the character string for hitting knowledge base entity dictionary meets default part-of-speech rule, the character string of default part-of-speech rule will be met as entity correction result;The entity annotation results that natural language processing segments are modified using the entity correction result.Having modified participle boundary error reduces the human cost of Entity recognition, improves whole efficiency, improves the recognition effect for not including entity.
Description
【Technical field】
This application involves natural language processing technique field more particularly to a kind of entity recognition method and systems.
【Background technology】
Entity entities refer to the object that is present in real world and can be distinguished with other objects.
Entity Mention refer to the character substring that an entity can be indicated in free text.Entity recognition refers to in text
The proprietary names such as name, place name be identified.For example, input short text, such as query, title etc. are exported in short text
Entity entities;Such as, input " Zhou Jielun elder brothers insult wedding ", output " Zhou Jielun elder brothers insult wedding ", is realized with reaching to text understanding
Purpose.
Entity recognition be information extraction, question answering system, syntactic analysis, chain of entities refer to, the application fields such as machine translation it is important
Master tool, occupied an important position during natural language processing technique moves towards practical.
Traditional entity recognition method is broadly divided into:
(1) method based on domain-planning and dictionary.Syntax rule of this method based on linguist's hand-coding, root
It is identified according to relevant informations such as morphology, syntaxes.
(2) method based on machine learning.Based on the training manually marked it is anticipated that training such as condition random
The sequence labellings model such as field, hidden Markov model, to predict unlabeled data.
But said program is required for a large amount of human cost, and it is poor for the recognition effect for not including entity.
First, rule-based and dictionary method, needs domain expert's configuration rule, accurate generally in small data set
It is higher, but recall low;And cannot identify the entity except dictionary, though in dictionary, the method for rule-based dictionary without
Method solves entity ambiguity problem;It is difficult to expand to multi-field, domain expert's configuration rule human cost is larger.
Secondly, the method based on machine learning, as the solution of current mainstream, in order to obtain relatively good training
Effect needs the training pattern of manpower mark high quality, human cost higher;Due to being learnt from the training data of mark,
It is poor for not including Entity recognition effect;And to the entity of not obvious characteristic, such as song title, video display name identification effect
Fruit is poor.
In addition, due to short text, such as query, title etc. express lack of standardization and some new popular entities appearance, meeting
Cause the participle tool on basis that can cut some emerging entities scattered, causes recognition effect poor.
【Invention content】
The many aspects of the application provide a kind of entity recognition method and system, to reduce the manpower of Entity recognition at
This, improves whole efficiency, improves the recognition effect for not including entity.
The one side of the application provides a kind of entity recognition method, including:
Input text is segmented and carries out entity mark;
According to knowledge base entity dictionary, input text is matched to maximum matching process using preceding;
Judge whether the character string for hitting knowledge base entity dictionary meets default part-of-speech rule;
Using the character string for meeting default part-of-speech rule, the entity annotation results of the input text are corrected.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the knowledge base
Entity dictionary includes:
Name field in the encyclopaedia entity of knowledge base;The alias of the manual sorting of encyclopaedia push;That is excavated in encyclopaedia is other
Name.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, according to knowledge base
Entity dictionary, using it is preceding to maximum matching process to input text carry out matching further include:
According to novel entities dictionary, the character string of miss knowledge base entity dictionary is matched;
If hitting novel entities dictionary, the character string is skipped, is matched to maximum before continuing;
If miss novel entities dictionary, judges whether the character string meets default part-of-speech rule, default word will be met
Property rule character string as word segmentation result.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, it is default by meeting
The character string of part-of-speech rule includes as entity correction result:
Using the character string for meeting default part-of-speech rule as candidate entity, judge whether traversal input text;
If traversed, using candidate entity as word segmentation result;
If do not traversed, matched to maximum before continuing.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the default word
Property rule is:Entity character string is noun and the noun by adjective modification.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the novel entities
Dictionary is obtained by following steps:
Obtain search term;
To each search term using character as granularity, window is set, calculates the mutual information of character string and left and right in each window
Comentropy;
The character string of preset mutual information threshold value, left information entropy threshold and right information entropy threshold will be met simultaneously as real
Body;
Removal has been embodied in the entity in knowledge base entity dictionary, obtains novel entities dictionary.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method utilizes the reality
Body correction result to the entity annotation results that natural language processing segments be modified including:
Corresponding single entity in entity correction result is replaced with by scattered multiple entities are cut in entity annotation results.
The another aspect of the application provides a kind of entity recognition system, including:
Entity labeling module, for being segmented to input text and carrying out entity mark;
Knowledge base entity dictionary matching module is used for according to knowledge base entity dictionary, using preceding to maximum matching process pair
Input text is matched;
Part-of-speech rule judgment module, for judging whether the character string for hitting knowledge base entity dictionary meets default part of speech
Rule;
Correcting module, for using the character string for meeting default part-of-speech rule, correcting the entity mark of the input text
As a result.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the knowledge base
Entity dictionary includes:
Name fields in the encyclopaedia entity of knowledge base;The alias of the manual sorting of encyclopaedia push;That is excavated in encyclopaedia is other
Name.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the system is also
Including novel entities dictionary matching module, it is used for:
According to novel entities dictionary, the character string of miss knowledge base entity dictionary is matched;
If hitting novel entities dictionary, the character string is skipped, then is continued by knowledge base entity dictionary matching module
Forward direction maximum matches;
If miss novel entities dictionary, judge whether the character string meets default word by part-of-speech rule judgment module
Property rule, the character string of default part-of-speech rule will be met as word segmentation result.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the part of speech rule
Then judgment module is specifically additionally operable to:
Using the character string for meeting default part-of-speech rule as candidate entity, judge whether traversal input text;
If traversed, using candidate entity as word segmentation result;
If do not traversed, matched to maximum before being continued from knowledge base entity dictionary matching module.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the default word
Property rule is:Entity character string is noun and the noun by adjective modification.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the novel entities
Dictionary is obtained by following steps:
Obtain user's search term;
To each search term using word as granularity, window is set, calculates the mutual information of character string and left and right letter in each window
Cease entropy;
The character string of preset mutual information threshold value, left information entropy threshold and right information entropy threshold will be met simultaneously as real
Body;
Removal has been embodied in the entity in knowledge base entity dictionary, obtains novel entities dictionary.
The aspect and any possible implementation manners as described above, it is further provided a kind of realization method, the amendment mould
Block is specifically used for:
Corresponding single entity in entity correction result is replaced with by scattered multiple entities are cut in entity annotation results.
The another aspect of the application provides a kind of equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors so that one or more of places
Reason device realizes any above-mentioned method.
The another aspect of the application provides a kind of computer readable storage medium, is stored thereon with computer program,
It is characterized in that, which realizes any above-mentioned method when being executed by processor.
By the technical solution it is found that using technical solution provided in this embodiment, has modified participle boundary error and reduce
The human cost of Entity recognition, improves whole efficiency, improves the recognition effect for not including entity.
【Description of the drawings】
In order to more clearly explain the technical solutions in the embodiments of the present application, embodiment or the prior art will be retouched below
Attached drawing needed in stating is briefly described, it should be apparent that, the accompanying drawings in the following description is some of the application
Embodiment for those of ordinary skill in the art without having to pay creative labor, can also be according to this
A little attached drawings obtain other attached drawings.
Fig. 1 is the flow diagram for the entity recognition method that one embodiment of the application provides;
Fig. 2 is the structural schematic diagram for the entity recognition system that another embodiment of the application provides;
Fig. 3 is the block diagram suitable for the exemplary computer system/server for realizing the embodiment of the present invention.
【Specific implementation mode】
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In attached drawing, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described embodiment is
Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people
Whole other embodiments that member is obtained without creative efforts, shall fall in the protection scope of this application.
In addition, the terms "and/or", only a kind of incidence relation of description affiliated partner, indicates may exist
Three kinds of relationships, for example, A and/or B, can indicate:Individualism A exists simultaneously A and B, these three situations of individualism B.Separately
Outside, character "/" herein, it is a kind of relationship of "or" to typically represent forward-backward correlation object.
Fig. 1 is the flow chart for the entity recognition method that one embodiment of the application provides, as shown in Figure 1, including following step
Suddenly:
Step S11, input text is segmented using natural language processing method and carries out entity mark;
Step S12, according to knowledge base entity dictionary, input text is matched to maximum matching process using preceding;
Step S13, judge whether the character string for hitting knowledge base entity dictionary meets default part-of-speech rule;
Step S14, using the character string for meeting default part-of-speech rule, the entity annotation results of the input text are corrected.
In a kind of preferred implementation of step S11,
Input text is inquired in knowledge base entity dictionary;
If hit, as Entity recognition result and part-of-speech tagging is carried out using the input text;For example, user is defeated
The text entered is only to be made of an entity, and the entity has been embodied in knowledge base entity dictionary, then can be direct
It identifies and is exported, without subsequent step.
If miss, input text is segmented, and part of speech is carried out to word segmentation result according to knowledge base entity dictionary
Mark.
Preferably, it is segmented using NLP natural language processing techniques, such as based on dictionary (based on dictionary, dictionary
With), based on statistics (based on time frequency statistics), rule-based (knowledge based understanding) segmentation methods to the short text of input
It is segmented.
Preferably, participle is carried out to input text using Baidu's NLPC platforms and entity marks.
But participle is carried out using NLP natural language processing techniques and is susceptible to participle boundary error, for example, input text
This is " Best of Inuyasha strongest goblin download ", wherein " the strongest goblin of Best of Inuyasha " is that had complete semantic entity (
It is embodied in knowledge base entity dictionary), but NLP natural language processing techniques is used to carry out segmenting to be cut dissipating for " Best of Inuyasha
The u goblin Ng " of the most strong a of the u of n, so as to cause participle boundary error.In addition, due to can in short text (such as query, title)
Can exist and be not logged in entity (novel entities that do not included in knowledge base entity dictionary), such as occur neologisms on network, this portion
Point being not logged in entity is often split the participle boundary error for leading to NLP.Therefore, it is necessary to above-mentioned participle boundary error into
Row is corrected.
In a kind of preferred implementation of step S12,
Preferably, according to the chinese character number i of the longest entity in knowledge base entity dictionary as match window length,
Using the preceding i character for inputting text as candidate character strings, knowledge base entity dictionary is searched.
If the candidate character strings taken out hit knowledge base entity dictionary, judge whether the candidate character strings meet word
Property rule limitation.
If the candidate character strings miss knowledge base entity dictionary taken out, continues with novel entities dictionary and is matched;
If hitting novel entities dictionary, the candidate character strings are skipped (currently, since the novel entities accuracy rate of excavation is 60%, Wu Fazhi
It connects as entity dictionary, therefore for the mention of hit novel entities dictionary using strategy is not recalled, to lose the side recalled
Formula improves whole accuracy rate), reduce match window, the last character of candidate character strings is removed, to remaining character
String matches before continuing to maximum.
Preferably, if miss novel entities dictionary, judges whether the candidate character strings meet part-of-speech rule, to carry
High recall rate (for example, some uncommon words be not both embodied in knowledge base entity dictionary, it is not embodied in novel entities dictionary yet, but
It is, as long as it meets the part-of-speech rule limitation of noun, then as candidate entity, to improve recall rate).
In a kind of preferred implementation of step S13,
Preferably, judge whether the candidate character strings meet part-of-speech rule limitation and further comprise:If met,
Using the candidate character strings as candidate entity;If do not met, the candidate character strings are skipped, reduces match window, will wait
It selects the last character of character string to remove, is matched to maximum before continuing to remaining character string.
Wherein, the part-of-speech rule is as follows:According to the language mode of Chinese, a significant entity character string is run after fame
Word and the noun modified by adjective.For example, the short text of input is " method of study Korean ", wherein " study Korean " is
An entity in knowledge base, is embodied in knowledge base entity dictionary, therefore, because " learning Chinese " has hit knowledge base reality
Pronouns, general term for nouns, numerals and measure words allusion quotation can splice the correct word segmentation result mistakes of original NLP, need to judge whether it meets next pair of part-of-speech rule limitation
Forward direction maximum matching result is verified.In the case of " study Korean ", since " study " is verb, it is unsatisfactory for by describing
The noun of word modification it is assumed that therefore will not will " study Korean " as candidate entity, need to skip the candidate character strings, subtract
Small match window removes the last character in candidate character strings, to maximum before continuing to remaining character string
Match.
The candidate character strings for meeting part-of-speech rule limitation are classified as further comprising the steps of after candidate entity:
Judge whether traversal input text, if it is, exporting all and candidate entity;If not, by described candidate real
Body removes from short text, is matched to maximum before continuing to remaining character string.
In a kind of preferred implementation of step S14,
Preferably, scattered multiple entities will be cut in entity annotation results and replaces with corresponding list in entity correction result
A entity segments boundary error to correct NLP.
For example, entity " Best of Inuyasha " that NLP is segmented, " goblin " are replaced with " the strongest goblin of Best of Inuyasha ".
Preferably, the knowledge base entity dictionary through the following steps that obtain:
Obtain the name fields in the encyclopaedia entity of knowledge base;
Receive the alias of the manual sorting of encyclopaedia push;
Alias is excavated from encyclopaedia info-box, for example, the alias of " Zhou Jielun " is " Zhou Dong ".
Preferably, the novel entities dictionary through the following steps that obtain:
1) querylog of search engine is obtained;
2) for each query using word as granularity, window is set, calculate in each window the mutual information of character string and
Left and right comentropy;
Mutual informationWherein p (x, y) is the joint probability distribution function of X and Y, and p (x)
It is the marginal probability distribution function of X and Y respectively with p (y).
Mutual information embodies two variable Xs, and the degree that interdepends between Y, association relationship is higher, shows the phase of X and Y
Close property it is higher, then X and Y form significant entity possibility it is bigger.
Left and right entropy refers to the entropy of the entropy and right margin of the left margin of multi-character words expression, by taking left entropy as an example, to a character string
The all possible word in the left side and word frequency calculate comentropy, then sum.
For character string w, the formula of left and right entropy is as follows:
Wherein, a indicates the adjacent word of character string w, and left and right entropy is higher, shows that character string w forms significant entity
Possibility it is bigger.
Left and right entropy has reacted the degree of freedom of term, for example, for " the Great Sage Equalling Heaven ", E can be obtained (together by calculating its right entropy
Extremely big *)<<E (the Great Sage Equalling Heaven *), it can be seen that " the Great Sage Equalling Heaven " forms significant entity possibility as a character string
Bigger.
3) according to preset mutual information threshold value, left information entropy threshold and right information entropy threshold, above-mentioned threshold value will be met simultaneously
Character string as entity.
4) from the entity being embodied in knowledge base entity dictionary is removed in obtained entity, novel entities dictionary is obtained.
By user's query log processing to magnanimity, timely emerging entity can must be excavated.
Currently, since the novel entities accuracy rate excavated by the above method is 60%, can not directly as entity dictionary,
It is only used as novel entities dictionary, therefore the entity of hit novel entities dictionary is used and does not recall strategy, in such a way that loss is recalled
To improve whole accuracy rate.
The present embodiment the method, knowledge based library entity dictionary are matched using preceding to maximum, and it is wrong to correct participle boundary
Accidentally, scattered entity will be cut and be stitched together again;Novel entities dictionary is excavated from querylog, to solve participle boundary error.
The human cost for reducing Entity recognition, improves whole efficiency, improves the recognition effect for not including entity.
It should be noted that for each method embodiment above-mentioned, for simple description, therefore it is all expressed as to a system
The combination of actions of row, but those skilled in the art should understand that, the application is not limited by the described action sequence,
Because according to the application, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art also answer
This knows that embodiment described in this description belongs to preferred embodiment, and involved action and module are not necessarily originally
Necessary to application.
In the described embodiment, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiment.
Fig. 2 is the structure chart for the entity recognition system that one embodiment of the application provides, as shown in Fig. 2, including:
Entity labeling module 21, for being segmented to input text and carrying out entity mark;
Knowledge base entity dictionary matching module 22 is used for according to knowledge base entity dictionary, using preceding to maximum matching process
Input text is matched;
Part-of-speech rule judgment module 23, for judging whether the character string for hitting knowledge base entity dictionary meets default word
Property rule;
Correcting module 24, for using the character string for meeting default part-of-speech rule, correcting the entity mark of the input text
Note result.
In a kind of preferred implementation of entity labeling module 21,
Input text is inquired in knowledge base entity dictionary;
If hit, as Entity recognition result and part-of-speech tagging is carried out using the input text;For example, user is defeated
The text entered is only to be made of an entity, and the entity has been embodied in knowledge base entity dictionary, then can be direct
It identifies and is exported, without subsequent step.
If miss, input text is segmented, and part of speech is carried out to word segmentation result according to knowledge base entity dictionary
Mark.
Preferably, it is segmented using NLP natural language processing techniques, such as based on dictionary (based on dictionary, dictionary
With), based on statistics (based on time frequency statistics), rule-based (knowledge based understanding) segmentation methods to the short text of input
It is segmented.
Preferably, participle is carried out to input text using Baidu's NLPC platforms and entity marks.
But participle is carried out using NLP natural language processing techniques and is susceptible to participle boundary error, for example, input text
This is " Best of Inuyasha strongest goblin download ", wherein " the strongest goblin of Best of Inuyasha " is that had complete semantic entity (
It is embodied in knowledge base entity dictionary), but NLP natural language processing techniques is used to carry out segmenting to be cut dissipating for " Best of Inuyasha
The u goblin Ng " of the most strong a of the u of n, so as to cause participle boundary error.In addition, due to can in short text (such as query, title)
Can exist and be not logged in entity (novel entities that do not included in knowledge base entity dictionary), such as occur neologisms on network, this portion
Point being not logged in entity is often split the participle boundary error for leading to NLP.Therefore, it is necessary to above-mentioned participle boundary error into
Row is corrected.
In a kind of preferred implementation of knowledge base entity dictionary matching module 22,
Preferably, according to the chinese character number i of the longest entity in knowledge base entity dictionary as match window length,
Using the preceding i character for inputting text as candidate character strings, knowledge base entity dictionary is searched.
If the candidate character strings taken out hit knowledge base entity dictionary, judge whether the candidate character strings meet word
Property rule limitation.
If the candidate character strings miss knowledge base entity dictionary taken out, continues with novel entities dictionary and is matched;
If hitting novel entities dictionary, the candidate character strings are skipped (currently, since the novel entities accuracy rate of excavation is 60%, Wu Fazhi
It connects as entity dictionary, therefore for the mention of hit novel entities dictionary using strategy is not recalled, to lose the side recalled
Formula improves whole accuracy rate), reduce match window, the last character of candidate character strings is removed, to remaining character
String matches before continuing to maximum.
Preferably, if miss novel entities dictionary, judges whether the candidate character strings meet part-of-speech rule, to carry
High recall rate (for example, some uncommon words be not both embodied in knowledge base entity dictionary, it is not embodied in novel entities dictionary yet, but
It is, as long as it meets the part-of-speech rule limitation of noun, then as candidate entity, to improve recall rate).
In a kind of preferred implementation of part-of-speech rule judgment module 23,
Preferably, judge whether the candidate character strings meet part-of-speech rule limitation and further comprise:If met,
Using the candidate character strings as candidate entity;If do not met, the candidate character strings are skipped, reduces match window, will wait
It selects the last character of character string to remove, is matched to maximum before continuing to remaining character string.
Wherein, the part-of-speech rule is as follows:According to the language mode of Chinese, a significant entity character string is run after fame
Word and the noun modified by adjective.For example, the short text of input is " method of study Korean ", wherein " study Korean " is
An entity in knowledge base, is embodied in knowledge base entity dictionary, therefore, because " learning Chinese " has hit knowledge base reality
Pronouns, general term for nouns, numerals and measure words allusion quotation can splice the correct word segmentation result mistakes of original NLP, need to judge whether it meets next pair of part-of-speech rule limitation
Forward direction maximum matching result is verified.In the case of " study Korean ", since " study " is verb, it is unsatisfactory for by describing
The noun of word modification it is assumed that therefore will not will " study Korean " as candidate entity, need to skip the candidate character strings, subtract
Small match window removes the last character in candidate character strings, to maximum before continuing to remaining character string
Match.
The system also includes spider module 25, for by the candidate characters tandem for meeting part-of-speech rule limitation
After candidate entity, judge whether traversal input text, if it is, exporting all and candidate entity;If not, will be described
Candidate entity removes from short text, from knowledge base entity dictionary matching module 22 to remaining character string continue before to
Maximum matching.
In a kind of preferred implementation of correcting module 24,
Preferably, scattered multiple entities will be cut in entity annotation results and replaces with corresponding list in entity correction result
A entity segments boundary error to correct NLP.
For example, entity " Best of Inuyasha " that NLP is segmented, " goblin " are replaced with " the strongest goblin of Best of Inuyasha ".
Preferably, the knowledge base entity dictionary is by obtaining the name fields in the encyclopaedia entity of knowledge base, receiving
The alias of the manual sorting of encyclopaedia push excavates what alias obtained from encyclopaedia info-box.For example, the alias of " Zhou Jielun " is
" Zhou Dong ".
Preferably, the novel entities dictionary through the following steps that obtain:
1) querylog of search engine is obtained;
2) for each query using word as granularity, window is set, calculate in each window the mutual information of character string and
Left and right comentropy;
Mutual informationWherein p (x, y) is the joint probability distribution function of X and Y, and p (x)
It is the marginal probability distribution function of X and Y respectively with p (y).
Mutual information embodies two variable Xs, and the degree that interdepends between Y, association relationship is higher, shows the phase of X and Y
Close property it is higher, then X and Y form significant entity possibility it is bigger.
Left and right entropy refers to the entropy of the entropy and right margin of the left margin of multi-character words expression, by taking left entropy as an example, to a character string
The all possible word in the left side and word frequency calculate comentropy, then sum.
For character string w, the formula of left and right entropy is as follows:
Wherein, a indicates the adjacent word of character string w, and left and right entropy is higher, shows that character string w forms significant entity
Possibility it is bigger.
Left and right entropy has reacted the degree of freedom of term, for example, for " the Great Sage Equalling Heaven ", E can be obtained (together by calculating its right entropy
Extremely big *)<<E (the Great Sage Equalling Heaven *), it can be seen that " the Great Sage Equalling Heaven " forms significant entity possibility as a character string
Bigger.
3) mutual information to character string in each window and the summation of left and right comentropy, novel entities are obtained by threshold filtering
Dictionary.
By user's query log processing to magnanimity, timely emerging entity can must be excavated.
Currently, since the novel entities accuracy rate excavated by the above method is 60%, can not directly as entity dictionary,
It is only used as novel entities dictionary, therefore the entity of hit novel entities dictionary is used and does not recall strategy, in such a way that loss is recalled
To improve whole accuracy rate.
System described in the present embodiment, knowledge based library entity dictionary are matched using preceding to maximum, and it is wrong to correct participle boundary
Accidentally, scattered entity will be cut and be stitched together again;Novel entities dictionary is excavated from query log, it is wrong to solve participle boundary
Accidentally.The human cost for reducing Entity recognition, improves whole efficiency, improves the recognition effect for not including entity.
It is apparent to those skilled in the art that for convenience and simplicity of description, the description is
System, the specific work process of device and unit can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed method and apparatus can pass through
Other modes are realized.For example, the apparatus embodiments described above are merely exemplary, for example, the unit is drawn
Point, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be by some interfaces, device or unit it is indirect
Coupling or communication connection can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, as unit
The component of display may or may not be physical unit, you can be located at a place, or may be distributed over more
In a network element.Some or all of unit therein can be selected according to the actual needs to realize this embodiment scheme
Purpose.
In addition, each functional unit in each embodiment of the application can be integrated in a processing unit, it can also
It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.The integrated list
The form that hardware had both may be used in member is realized, can also be realized in the form of hardware adds SFU software functional unit.
Fig. 3 shows the frame of the exemplary computer system/server 012 suitable for being used for realizing embodiment of the present invention
Figure.The computer system/server 012 that Fig. 3 is shown is only an example, to the function of the embodiment of the present invention and should not be made
With range band come any restrictions.
As shown in figure 3, computer system/server 012 is showed in the form of universal computing device.Computer system/clothes
The component of business device 012 can include but is not limited to:One or more processor or processing unit 016, system storage
028, the bus 018 of connection different system component (including system storage 028 and processing unit 016).
Bus 018 indicates one or more in a few class bus structures, including memory bus or memory control
Device, peripheral bus, graphics acceleration port, processor or total using the local of the arbitrary bus structures in a variety of bus structures
Line.For example, these architectures include but not limited to industry standard architecture (ISA) bus, microchannel architecture
(MAC) bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI)
Bus.
Computer system/server 012 typically comprises a variety of computer system readable media.These media can be
Any usable medium that can be accessed by computer system/server 012, including volatile and non-volatile media move
And immovable medium.
System storage 028 may include the computer system readable media of form of volatile memory, such as deposit at random
Access to memory (RAM) 030 and/or cache memory 032.Computer system/server 012 may further include it
Its removable/nonremovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 034
It can be used for reading and writing immovable, non-volatile magnetic media (Fig. 3 do not show, commonly referred to as " hard disk drive ").Although figure
It is not shown, can be provided for the disc driver to moving non-volatile magnetic disk (such as " floppy disk ") read-write in 3, and
To moving the CD drive of anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.At this
In the case of a little, each driver can be connected by one or more data media interfaces with bus 018.Memory 028 can
To include at least one program product, the program product is with one group of (for example, at least one) program module, these program modules
It is configured to perform the function of various embodiments of the present invention.
Program/utility 040 with one group of (at least one) program module 042, can be stored in such as memory
In 028, such program module 042 include --- but being not limited to --- operating system, one or more application program, its
Its program module and program data may include the realization of network environment in each or certain combination in these examples.
Program module 042 usually executes function and/or method in embodiment described in the invention.
Computer system/server 012 can also with one or more external equipments 014 (such as keyboard, sensing equipment,
Display 024 etc.) communication, in the present invention, computer system/server 012 is communicated with external radar equipment, may be used also
With one or more enable a user to the equipment interacted with the computer system/server 012 communication, and/or with make this
Any equipment that computer system/server 012 can be communicated with one or more of the other computing device (such as network interface card, it adjusts
Modulator-demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 022.Also, computer system/
Server 012 can also pass through network adapter 020 and one or more network (such as LAN (LAN), wide area network
(WAN) and/or public network, for example, internet) communication.As shown in figure 3, network adapter 020 passes through bus 018 and calculating
Other modules of machine systems/servers 012 communicate.It should be understood that although being not shown in Fig. 3, can in conjunction with computer system/
Server 012 uses other hardware and/or software module, including but not limited to:Microcode, device driver, redundancy processing are single
Member, external disk drive array, RAID system, tape drive and data backup storage system etc..
Processing unit 016 is stored in the program in system storage 028 by operation, described in the invention to execute
Embodiment in function and/or method.
Above-mentioned computer program can be set in computer storage media, i.e., the computer storage media is encoded
There is computer program, the program by one or more computers when being executed so that one or more computers execute the present invention
Method flow shown in above-described embodiment and/or device operation.
With time, the development of technology, medium meaning is more and more extensive, and the route of transmission of computer program is no longer limited
In tangible medium, can also directly be downloaded from network etc..Arbitrary group of one or more computer-readable media may be used
It closes.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable to deposit
Storage media for example may be-but not limited to-system, device or the device of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor
Part, or the arbitrary above combination.The more specific example (non exhaustive list) of computer readable storage medium includes:Tool
There are one or multiple conducting wires electrical connection, portable computer diskette, hard disk, random access memory (RAM), read-only storage
Device (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory
(CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer can
Reading storage medium, can be any include or the tangible medium of storage program, the program can be commanded execution system, device or
The use or in connection of person's device.
Computer-readable signal media may include the data letter propagated in a base band or as a carrier wave part
Number, wherein carrying computer-readable program code.Diversified forms may be used in the data-signal of this propagation, packet
Include --- but being not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media
It can also be any computer-readable medium other than computer readable storage medium, which can send,
Either transmission is propagated for by the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but not
It is limited to --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
It can be write with one or more programming languages or combinations thereof for executing the computer that operates of the present invention
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
Fully execute, partly execute on the user computer on the user computer, being executed as an independent software package,
Part executes or executes on a remote computer or server completely on the remote computer on the user computer for part.
In situations involving remote computers, remote computer can pass through the network of any kind --- including LAN
(LAN) or wide area network (WAN) is connected to subscriber computer, or, it may be connected to outer computer (such as utilize internet
Service provider is connected by internet).
Finally it should be noted that:Above example is only to illustrate the technical solution of the application, rather than its limitations;To the greatest extent
Pipe is with reference to the foregoing embodiments described in detail the application, it will be understood by those of ordinary skill in the art that:It is still
Can be with technical scheme described in the above embodiments is modified, or which part technical characteristic is equally replaced
It changes;And these modifications or replacements, the essence of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution
God and range.
Claims (16)
1. a kind of entity recognition method, which is characterized in that including:
Input text is segmented and carries out entity mark;
According to knowledge base entity dictionary, input text is matched to maximum matching process using preceding;
Judge whether the character string for hitting knowledge base entity dictionary meets default part-of-speech rule;
Using the character string for meeting default part-of-speech rule, the entity annotation results of the input text are corrected.
2. according to the method described in claim 1, it is characterized in that, the knowledge base entity dictionary includes:
Name field in the encyclopaedia entity of knowledge base;The alias of the manual sorting of encyclopaedia push;The alias excavated in encyclopaedia.
3. according to the method described in claim 1, it is characterized in that, according to knowledge base entity dictionary, matched to maximum using preceding
Method carries out matching to input text:
According to novel entities dictionary, the character string of miss knowledge base entity dictionary is matched;
If hitting novel entities dictionary, the character string is skipped, is matched to maximum before continuing;
If miss novel entities dictionary, judges whether the character string meets default part-of-speech rule, default part of speech rule will be met
Character string then is as word segmentation result.
4. according to the method described in claim 3, it is characterized in that, judging whether the character string for hitting knowledge base entity dictionary accords with
Closing default part-of-speech rule further includes:
Using the character string for meeting default part-of-speech rule as candidate entity, judge whether traversal input text;
If traversed, using candidate entity as word segmentation result;
If do not traversed, matched to maximum before continuing.
5. according to the method described in claim 4, it is characterized in that, the default part-of-speech rule is:Entity character string is noun
And the noun modified by adjective.
6. according to the method described in claim 3, it is characterized in that, the novel entities dictionary is obtained by following steps:
Obtain search term;
To each search term using character as granularity, window is set, the mutual information of character string and left and right information in each window are calculated
Entropy;
To the character string of preset mutual information threshold value, left information entropy threshold and right information entropy threshold be met as entity simultaneously;
Removal has been embodied in the entity in knowledge base entity dictionary, obtains novel entities dictionary.
7. according to the method described in claim 1, it is characterized in that, using the entity correction result to natural language processing point
The entity annotation results that word obtains be modified including:
Corresponding single entity in entity correction result is replaced with by scattered multiple entities are cut in entity annotation results.
8. a kind of entity recognition system, which is characterized in that including:
Entity labeling module, for being segmented to input text and carrying out entity mark;
Knowledge base entity dictionary matching module, for according to knowledge base entity dictionary, using it is preceding to maximum matching process to input
Text is matched;
Part-of-speech rule judgment module, for judging whether the character string for hitting knowledge base entity dictionary meets default part-of-speech rule;
Correcting module, for using the character string for meeting default part-of-speech rule, correcting the entity annotation results of the input text.
9. system according to claim 8, which is characterized in that the knowledge base entity dictionary includes:
Name fields in the encyclopaedia entity of knowledge base;The alias of the manual sorting of encyclopaedia push;The alias excavated in encyclopaedia.
10. system according to claim 8, which is characterized in that the system also includes novel entities dictionary matching modules, use
In:
According to novel entities dictionary, the character string of miss knowledge base entity dictionary is matched;
If hit novel entities dictionary, skip the character string, then from knowledge base entity dictionary matching module continue before to
Maximum matching;
If miss novel entities dictionary, judge whether the character string meets default part of speech and advise by part-of-speech rule judgment module
Then, the character string of default part-of-speech rule will be met as word segmentation result.
11. system according to claim 10, which is characterized in that the part-of-speech rule judgment module is specifically additionally operable to:
Using the character string for meeting default part-of-speech rule as candidate entity, judge whether traversal input text;
If traversed, using candidate entity as word segmentation result;
If do not traversed, matched to maximum before being continued from knowledge base entity dictionary matching module.
12. system according to claim 11, which is characterized in that the default part-of-speech rule is:Entity character string is run after fame
Word and the noun modified by adjective.
13. system according to claim 10, which is characterized in that the novel entities dictionary is obtained by following steps:
Obtain user's search term;
To each search term using word as granularity, window is set, the mutual information of character string and left and right comentropy in each window are calculated;
To the character string of preset mutual information threshold value, left information entropy threshold and right information entropy threshold be met as entity simultaneously;
Removal has been embodied in the entity in knowledge base entity dictionary, obtains novel entities dictionary.
14. system according to claim 8, which is characterized in that the correcting module is specifically used for:
Corresponding single entity in entity correction result is replaced with by scattered multiple entities are cut in entity annotation results.
15. a kind of equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors so that one or more of processors are real
The now method as described in any in claim 1-7.
16. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
The method as described in any in claim 1-7 is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810101815.7A CN108491373B (en) | 2018-02-01 | 2018-02-01 | Entity identification method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810101815.7A CN108491373B (en) | 2018-02-01 | 2018-02-01 | Entity identification method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108491373A true CN108491373A (en) | 2018-09-04 |
CN108491373B CN108491373B (en) | 2022-05-27 |
Family
ID=63344351
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810101815.7A Active CN108491373B (en) | 2018-02-01 | 2018-02-01 | Entity identification method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108491373B (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271630A (en) * | 2018-09-11 | 2019-01-25 | 成都信息工程大学 | A kind of intelligent dimension method and device based on natural language processing |
CN109271392A (en) * | 2018-10-30 | 2019-01-25 | 长威信息科技发展股份有限公司 | Quick discrimination and the method and apparatus for extracting relevant database entity and attribute |
CN109508382A (en) * | 2018-10-19 | 2019-03-22 | 北京明略软件系统有限公司 | A kind of label for labelling method and apparatus, computer readable storage medium |
CN110390101A (en) * | 2019-07-22 | 2019-10-29 | 中新软件(上海)有限公司 | Non-standard de-sign judgment method, device and the computer equipment of entity contract remarks |
CN110705258A (en) * | 2019-09-18 | 2020-01-17 | 北京明略软件系统有限公司 | Text entity identification method and device |
CN110750991A (en) * | 2019-09-18 | 2020-02-04 | 平安科技(深圳)有限公司 | Entity identification method, device, equipment and computer readable storage medium |
CN111062216A (en) * | 2019-12-18 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Named entity identification method, device, terminal and readable medium |
CN111178080A (en) * | 2020-01-02 | 2020-05-19 | 杭州涂鸦信息技术有限公司 | Named entity identification method and system based on structured information |
CN111353020A (en) * | 2020-02-27 | 2020-06-30 | 北京奇艺世纪科技有限公司 | Method, device, computer equipment and storage medium for mining text data |
CN111382570A (en) * | 2018-12-28 | 2020-07-07 | 深圳市优必选科技有限公司 | Text entity recognition method and device, computer equipment and storage medium |
CN111401083A (en) * | 2019-01-02 | 2020-07-10 | 阿里巴巴集团控股有限公司 | Name identification method and device, storage medium and processor |
CN111414766A (en) * | 2018-12-18 | 2020-07-14 | 北京搜狗科技发展有限公司 | Translation method and device |
CN111611779A (en) * | 2020-04-07 | 2020-09-01 | 腾讯科技(深圳)有限公司 | Auxiliary text labeling method, device and equipment and storage medium thereof |
CN111666768A (en) * | 2020-06-10 | 2020-09-15 | 京东方科技集团股份有限公司 | Chinese named entity recognition method and device and electronic equipment |
CN112417876A (en) * | 2020-11-23 | 2021-02-26 | 北京乐学帮网络技术有限公司 | Text processing method and device, computer equipment and storage medium |
CN113051900A (en) * | 2021-04-30 | 2021-06-29 | 中国平安人寿保险股份有限公司 | Synonym recognition method and device, computer equipment and storage medium |
CN113127503A (en) * | 2021-03-18 | 2021-07-16 | 中国科学院国家空间科学中心 | Automatic information extraction method and system for aerospace information |
CN113987145A (en) * | 2021-10-22 | 2022-01-28 | 智联(无锡)信息技术有限公司 | Method, system, equipment and storage medium for accurately reasoning user attribute entity |
CN114138945A (en) * | 2022-01-19 | 2022-03-04 | 支付宝(杭州)信息技术有限公司 | Entity identification method and device in data analysis |
CN114218935A (en) * | 2022-02-15 | 2022-03-22 | 支付宝(杭州)信息技术有限公司 | Entity display method and device in data analysis |
WO2022111083A1 (en) * | 2020-11-30 | 2022-06-02 | 京东方科技集团股份有限公司 | Entity recognition method, entity recognition apparatus, electronic device and storage medium |
CN115238702A (en) * | 2022-09-21 | 2022-10-25 | 中科雨辰科技有限公司 | Entity library processing method and storage medium |
CN116049447A (en) * | 2023-03-24 | 2023-05-02 | 中科雨辰科技有限公司 | Entity linking system based on knowledge base |
CN111382570B (en) * | 2018-12-28 | 2024-05-03 | 深圳市优必选科技有限公司 | Text entity recognition method, device, computer equipment and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101950284A (en) * | 2010-09-27 | 2011-01-19 | 北京新媒传信科技有限公司 | Chinese word segmentation method and system |
CN102063424A (en) * | 2010-12-24 | 2011-05-18 | 上海电机学院 | Method for Chinese word segmentation |
US20130204606A1 (en) * | 2010-08-09 | 2013-08-08 | Institute Of Automation, Chinese Academy Of Sciences | Method for labeling semantic role of bilingual parallel sentence pair |
CN103530298A (en) * | 2012-07-06 | 2014-01-22 | 深圳市世纪光速信息技术有限公司 | Information searching method and device |
CN104391837A (en) * | 2014-11-19 | 2015-03-04 | 熊玮 | Intelligent grammatical analysis method based on case semantics |
CN104572625A (en) * | 2015-01-21 | 2015-04-29 | 北京云知声信息技术有限公司 | Recognition method of named entity |
CN104715049A (en) * | 2015-03-26 | 2015-06-17 | 无锡中科泛在信息技术研发中心有限公司 | Commodity review property word extracting method based on noumenon lexicon |
CN105426539A (en) * | 2015-12-23 | 2016-03-23 | 成都电科心通捷信科技有限公司 | Dictionary-based lucene Chinese word segmentation method |
CN106372060A (en) * | 2016-08-31 | 2017-02-01 | 北京百度网讯科技有限公司 | Search text labeling method and device |
CN106547733A (en) * | 2016-10-19 | 2017-03-29 | 中国国防科技信息中心 | A kind of name entity recognition method towards particular text |
CN106570179A (en) * | 2016-11-10 | 2017-04-19 | 中国科学院信息工程研究所 | Evaluative text-oriented kernel entity identification method and apparatus |
CN106649250A (en) * | 2015-10-29 | 2017-05-10 | 北京国双科技有限公司 | Method and device for identifying emotional new words |
CN107273356A (en) * | 2017-06-14 | 2017-10-20 | 北京百度网讯科技有限公司 | Segmenting method, device, server and storage medium based on artificial intelligence |
-
2018
- 2018-02-01 CN CN201810101815.7A patent/CN108491373B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130204606A1 (en) * | 2010-08-09 | 2013-08-08 | Institute Of Automation, Chinese Academy Of Sciences | Method for labeling semantic role of bilingual parallel sentence pair |
CN101950284A (en) * | 2010-09-27 | 2011-01-19 | 北京新媒传信科技有限公司 | Chinese word segmentation method and system |
CN102063424A (en) * | 2010-12-24 | 2011-05-18 | 上海电机学院 | Method for Chinese word segmentation |
CN103530298A (en) * | 2012-07-06 | 2014-01-22 | 深圳市世纪光速信息技术有限公司 | Information searching method and device |
CN104391837A (en) * | 2014-11-19 | 2015-03-04 | 熊玮 | Intelligent grammatical analysis method based on case semantics |
CN104572625A (en) * | 2015-01-21 | 2015-04-29 | 北京云知声信息技术有限公司 | Recognition method of named entity |
CN104715049A (en) * | 2015-03-26 | 2015-06-17 | 无锡中科泛在信息技术研发中心有限公司 | Commodity review property word extracting method based on noumenon lexicon |
CN106649250A (en) * | 2015-10-29 | 2017-05-10 | 北京国双科技有限公司 | Method and device for identifying emotional new words |
CN105426539A (en) * | 2015-12-23 | 2016-03-23 | 成都电科心通捷信科技有限公司 | Dictionary-based lucene Chinese word segmentation method |
CN106372060A (en) * | 2016-08-31 | 2017-02-01 | 北京百度网讯科技有限公司 | Search text labeling method and device |
CN106547733A (en) * | 2016-10-19 | 2017-03-29 | 中国国防科技信息中心 | A kind of name entity recognition method towards particular text |
CN106570179A (en) * | 2016-11-10 | 2017-04-19 | 中国科学院信息工程研究所 | Evaluative text-oriented kernel entity identification method and apparatus |
CN107273356A (en) * | 2017-06-14 | 2017-10-20 | 北京百度网讯科技有限公司 | Segmenting method, device, server and storage medium based on artificial intelligence |
Non-Patent Citations (1)
Title |
---|
赵成 等: "一种中文地址知识库支撑的中文地址分词算法", 《测绘科学技术学报》 * |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271630A (en) * | 2018-09-11 | 2019-01-25 | 成都信息工程大学 | A kind of intelligent dimension method and device based on natural language processing |
CN109271630B (en) * | 2018-09-11 | 2022-07-05 | 成都信息工程大学 | Intelligent labeling method and device based on natural language processing |
CN109508382A (en) * | 2018-10-19 | 2019-03-22 | 北京明略软件系统有限公司 | A kind of label for labelling method and apparatus, computer readable storage medium |
CN109271392A (en) * | 2018-10-30 | 2019-01-25 | 长威信息科技发展股份有限公司 | Quick discrimination and the method and apparatus for extracting relevant database entity and attribute |
CN111414766B (en) * | 2018-12-18 | 2024-01-30 | 北京搜狗科技发展有限公司 | Translation method and device |
CN111414766A (en) * | 2018-12-18 | 2020-07-14 | 北京搜狗科技发展有限公司 | Translation method and device |
CN111382570B (en) * | 2018-12-28 | 2024-05-03 | 深圳市优必选科技有限公司 | Text entity recognition method, device, computer equipment and storage medium |
CN111382570A (en) * | 2018-12-28 | 2020-07-07 | 深圳市优必选科技有限公司 | Text entity recognition method and device, computer equipment and storage medium |
CN111401083A (en) * | 2019-01-02 | 2020-07-10 | 阿里巴巴集团控股有限公司 | Name identification method and device, storage medium and processor |
CN111401083B (en) * | 2019-01-02 | 2023-05-02 | 阿里巴巴集团控股有限公司 | Name identification method and device, storage medium and processor |
CN110390101B (en) * | 2019-07-22 | 2023-04-25 | 中新软件(上海)有限公司 | Nonstandard design judgment method and device for entity contract remarks and computer equipment |
CN110390101A (en) * | 2019-07-22 | 2019-10-29 | 中新软件(上海)有限公司 | Non-standard de-sign judgment method, device and the computer equipment of entity contract remarks |
CN110750991B (en) * | 2019-09-18 | 2022-04-15 | 平安科技(深圳)有限公司 | Entity identification method, device, equipment and computer readable storage medium |
CN110750991A (en) * | 2019-09-18 | 2020-02-04 | 平安科技(深圳)有限公司 | Entity identification method, device, equipment and computer readable storage medium |
CN110705258A (en) * | 2019-09-18 | 2020-01-17 | 北京明略软件系统有限公司 | Text entity identification method and device |
CN111062216B (en) * | 2019-12-18 | 2021-11-23 | 腾讯科技(深圳)有限公司 | Named entity identification method, device, terminal and readable medium |
CN111062216A (en) * | 2019-12-18 | 2020-04-24 | 腾讯科技(深圳)有限公司 | Named entity identification method, device, terminal and readable medium |
CN111178080A (en) * | 2020-01-02 | 2020-05-19 | 杭州涂鸦信息技术有限公司 | Named entity identification method and system based on structured information |
CN111178080B (en) * | 2020-01-02 | 2023-07-18 | 杭州涂鸦信息技术有限公司 | Named entity identification method and system based on structured information |
CN111353020B (en) * | 2020-02-27 | 2023-06-30 | 北京奇艺世纪科技有限公司 | Method, device, computer equipment and storage medium for mining text data |
CN111353020A (en) * | 2020-02-27 | 2020-06-30 | 北京奇艺世纪科技有限公司 | Method, device, computer equipment and storage medium for mining text data |
CN111611779B (en) * | 2020-04-07 | 2023-10-13 | 腾讯科技(深圳)有限公司 | Auxiliary text labeling method, device, equipment and storage medium thereof |
CN111611779A (en) * | 2020-04-07 | 2020-09-01 | 腾讯科技(深圳)有限公司 | Auxiliary text labeling method, device and equipment and storage medium thereof |
CN111666768A (en) * | 2020-06-10 | 2020-09-15 | 京东方科技集团股份有限公司 | Chinese named entity recognition method and device and electronic equipment |
CN112417876A (en) * | 2020-11-23 | 2021-02-26 | 北京乐学帮网络技术有限公司 | Text processing method and device, computer equipment and storage medium |
WO2022111083A1 (en) * | 2020-11-30 | 2022-06-02 | 京东方科技集团股份有限公司 | Entity recognition method, entity recognition apparatus, electronic device and storage medium |
CN113127503A (en) * | 2021-03-18 | 2021-07-16 | 中国科学院国家空间科学中心 | Automatic information extraction method and system for aerospace information |
CN113051900A (en) * | 2021-04-30 | 2021-06-29 | 中国平安人寿保险股份有限公司 | Synonym recognition method and device, computer equipment and storage medium |
CN113051900B (en) * | 2021-04-30 | 2023-08-22 | 中国平安人寿保险股份有限公司 | Synonym recognition method, synonym recognition device, computer equipment and storage medium |
CN113987145A (en) * | 2021-10-22 | 2022-01-28 | 智联(无锡)信息技术有限公司 | Method, system, equipment and storage medium for accurately reasoning user attribute entity |
CN113987145B (en) * | 2021-10-22 | 2024-02-02 | 智联网聘信息技术有限公司 | Method, system, equipment and storage medium for accurately reasoning user attribute entity |
CN114138945B (en) * | 2022-01-19 | 2022-06-14 | 支付宝(杭州)信息技术有限公司 | Entity identification method and device in data analysis |
CN114138945A (en) * | 2022-01-19 | 2022-03-04 | 支付宝(杭州)信息技术有限公司 | Entity identification method and device in data analysis |
CN114218935B (en) * | 2022-02-15 | 2022-06-21 | 支付宝(杭州)信息技术有限公司 | Entity display method and device in data analysis |
CN114218935A (en) * | 2022-02-15 | 2022-03-22 | 支付宝(杭州)信息技术有限公司 | Entity display method and device in data analysis |
CN115238702B (en) * | 2022-09-21 | 2022-12-06 | 中科雨辰科技有限公司 | Entity library processing method and storage medium |
CN115238702A (en) * | 2022-09-21 | 2022-10-25 | 中科雨辰科技有限公司 | Entity library processing method and storage medium |
CN116049447A (en) * | 2023-03-24 | 2023-05-02 | 中科雨辰科技有限公司 | Entity linking system based on knowledge base |
CN116049447B (en) * | 2023-03-24 | 2023-06-13 | 中科雨辰科技有限公司 | Entity linking system based on knowledge base |
Also Published As
Publication number | Publication date |
---|---|
CN108491373B (en) | 2022-05-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108491373A (en) | A kind of entity recognition method and system | |
CN109657054B (en) | Abstract generation method, device, server and storage medium | |
US10776578B2 (en) | Method and apparatus for building synonymy discriminating model and method and apparatus for discriminating synonymous text | |
AU2017408800B2 (en) | Method and system of mining information, electronic device and readable storable medium | |
US7493251B2 (en) | Using source-channel models for word segmentation | |
JP6901816B2 (en) | Entity-related data generation methods, devices, devices, and storage media | |
US20180373692A1 (en) | Method for parsing query based on artificial intelligence and computer device | |
EP1889180A2 (en) | Collocation translation from monolingual and available bilingual corpora | |
CN109684634B (en) | Emotion analysis method, device, equipment and storage medium | |
CN107992596A (en) | A kind of Text Clustering Method, device, server and storage medium | |
CN108628830B (en) | Semantic recognition method and device | |
CN110569335B (en) | Triple verification method and device based on artificial intelligence and storage medium | |
CN108460011A (en) | A kind of entitative concept mask method and system | |
CN107807915B (en) | Error correction model establishing method, device, equipment and medium based on error correction platform | |
US9311299B1 (en) | Weakly supervised part-of-speech tagging with coupled token and type constraints | |
CN108549656A (en) | Sentence analytic method, device, computer equipment and readable medium | |
CN108363556A (en) | A kind of method and system based on voice Yu augmented reality environmental interaction | |
US20210042470A1 (en) | Method and device for separating words | |
CN107203504B (en) | Character string replacing method and device | |
CN107491477A (en) | A kind of emoticon searching method and device | |
CN108121697A (en) | Method, apparatus, equipment and the computer storage media that a kind of text is rewritten | |
CN110334209A (en) | File classification method, device, medium and electronic equipment | |
CN107861948B (en) | Label extraction method, device, equipment and medium | |
CN109785829A (en) | A kind of customer service householder method and system based on voice control | |
CN112148958A (en) | Method, apparatus, and computer storage medium for information recommendation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |