CN103631948A - Identifying method of named entities - Google Patents

Identifying method of named entities Download PDF

Info

Publication number
CN103631948A
CN103631948A CN201310674046.7A CN201310674046A CN103631948A CN 103631948 A CN103631948 A CN 103631948A CN 201310674046 A CN201310674046 A CN 201310674046A CN 103631948 A CN103631948 A CN 103631948A
Authority
CN
China
Prior art keywords
entity
word
commodity
item property
pending text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310674046.7A
Other languages
Chinese (zh)
Other versions
CN103631948B (en
Inventor
张永成
罗欢
何泉昊
张喜
姜文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201310674046.7A priority Critical patent/CN103631948B/en
Publication of CN103631948A publication Critical patent/CN103631948A/en
Application granted granted Critical
Publication of CN103631948B publication Critical patent/CN103631948B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an identifying method of named entities. The method comprises the following steps of identifying a special word in a text to be processed; identifying a model entity in the text to be processed, and replacing the special word of the model entity identified in the text to be processed by using a preset numeric string; on the basis, identifying entities, such as a commodity entity, a commodity classifying entity, a brand entity, a commodity attribute name entity and a commodity attribute value entity are identified. By the method, the entities, such as the commodity entity and a commodity attribute entity can be accurately identified without the influence of irrelevant key words.

Description

The recognition methods of named entity
Technical field
The present invention relates to the man-machine automatic answering based on natural language, particularly relate to the recognition methods of a kind of named entity in man-machine automatic answering system.
Background technology
Man-machine automatic answering system based on natural language is an important application of natural language understanding.Man-machine automatic answering system is after carrying out special processing for a certain domain knowledge base, user can ask a question with natural language form by browser, system can multimedia form provide answer automatically, and the statistics that can be correlated with for customer requirements and provide in light of the circumstances suitable suggestion.
Named entity recognition is an important component part of natural language understanding, and it is mainly by the named entity in discovery and mark natural language text.Semantic tagger is that the named entity in natural language is replaced with to machine understandable information, and modal is exactly the coding of information.Such as for " association K900 in stockit is available? " the words, wherein, " K900 of association " will be identified and be labeled as commodity, and the goods number that simultaneously can mark out these commodity is " XXXXXXXXX ".
Man-machine automatic answering system based on natural language is widely used in e-commerce field, conventionally usings commodity and item property as named entity in this field, need to from the natural language of user's input, this type of named entity accurate and effective be identified.The most frequently used a kind of named entity recognition method is directly to use search engine recognition value name at present.Concrete grammar is: the natural language of user input is carried out to participle, and then take each word segmentation result carries out search system database as keyword, finally Search Results is processed, and identifies the named entity in this natural language.
Above-mentioned named entity recognition method is relatively suitable for not existing the identification situation of irrelevant key word, can determine fast the named entity that needs search, but during for the irrelevant key word of existence, often can not identify the semanteme of irrelevant key word, searching for as key word of mistake, navigates to wrong named entity.
Summary of the invention
In view of this, fundamental purpose of the present invention is to provide a kind of recognition methods of named entity, and the method can be identified accurately and effectively to commodity and item property.
In order to achieve the above object, the technical scheme that the present invention proposes is:
A recognition methods for named entity, comprising:
A, statement that user is inputted in current sessions, as pending text, are identified meeting numeral and the hyperlink of preset rules in described pending text, and the hyperlink in described pending text are replaced with to default hyperlink substitute symbol;
B, the special word in described pending text is identified, and only with all special word at interval, space, be labeled as a special word string by continuous, described special word comprises English character, numeral and the symbol except fullstop and comma;
C, to take special word described in each be respectively keyword, the commodity brand of search system and model database, model entity in described pending text is identified, and with default numeric string by identified in described pending text be that the special word of model entity is replaced;
If the non-session first of d current sessions,, according to the initial named entity of determining in the named entity recognition process of last session, identifies the item property name entity in described pending text and item property value entity; Described initial named entity is commodity entity or commodity classification entity;
E, the described pending text obtaining in step c is carried out to participle; And to take each word obtaining after participle be index, the brand of seeking system and commodity classification dictionary, identify the brand entity in described pending text and commodity classification entity;
F, regular according to default keyword, according to the current described entity identifying, determines the current keyword for commercial articles searching; Use described keyword, search for default merchandising database, and from searched for commodity, select default W commodity; According to maximum public substring principle, a described W commodity are screened, by each indication of goods screening, be commodity entity, and record the goods number of described commodity entity;
G, according to the current described commodity entity having identified and described commodity classification entity, the linked database of the commodity of inquiry system, commodity classification, item property name and item property value, identifies corresponding item property name entity and item property value entity;
If the special word that the current existence of h and all described entities are irrelevant, utilize described irrelevant special word, search for the linked database of described merchandising database and described commodity, commodity classification, item property name and item property value, identify corresponding item property name entity and item property value entity;
I, the current all entities that identified are screened, determine all named entities of this session; And determine the initial named entity for session next time identification.
In sum, the recognition methods of the named entity that the present invention proposes, first the special word in pending text is identified, next the model entity in pending text is identified, and with default numeric string by identified in pending text be that the special word of model entity is replaced, then carry out on this basis the identification of the entities such as commodity entity, commodity classification entity, brand entity, item property name entity and item property value entity.So carry out the identification of named entity, can not be subject to the impact of irrelevant key word, realize the accurate identification to entities such as commodity and item property.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the embodiment of the present invention one.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, the present invention is described in further detail below in conjunction with the accompanying drawings and the specific embodiments.
Core concept of the present invention is: first special word is identified, again model entity is identified, and with default numeric string by identified in pending text be that the special word of model entity is replaced, then carry out on this basis the identification of the entities such as commodity entity, commodity classification entity, brand entity, item property name entity and item property value entity.So can not be subject to the impact of irrelevant key word, realize the accurate identification to entities such as commodity and item property.
Fig. 1 is the schematic flow sheet of the embodiment of the present invention one, and as shown in Figure 1, the named entity recognition method of this embodiment mainly comprises the following steps:
Step 101, statement that user is inputted in current sessions are as pending text, to meeting numeral and the hyperlink of preset rules in described pending text, identify, and the hyperlink in described pending text is replaced with to default hyperlink substitute symbol.
Described numeral and the hyperlink that meets preset rules is: the numeral and the hyperlink that meets commodity page rule that meet goods number rule.
In this step, the hyperlink in pending text need to be replaced with to default hyperlink substitute symbol, so that in subsequent step, realize the identification with the entity of text description to item property name entity, item property value entity etc.
Step 102, the special word in described pending text is identified, and only with all special word at interval, space, be labeled as a special word string by continuous.
Described special word comprises English character, numeral and the symbol except fullstop and comma.
Here it should be noted that, in the present invention, need first the special word in text to be identified, to after this accurately identify on this basis the entities such as model entity in text, item property name entity, item property value entity.
Step 103, to take special word described in each be respectively keyword, the commodity brand of search system and model database, model entity in described pending text is identified, and with default numeric string by identified in described pending text be that the special word of model entity is replaced.
Preferably, the concrete grammar of in this step, the model entity in described pending text being identified is as follows:
The special word described in each of take is respectively keyword, the commodity brand of search system and model database; The type information searching is carried out to full word with described pending text and mate, the special word mating with described type information is labeled as to model entity; For model entity described in each, in the conjunctive word list attribute of this model entity, record is used for searching the special word of this model entity, and in the associated entity list attribute of this special word, records this model entity.
Here, for model entity described in each, in the conjunctive word list attribute of this model entity, record is for searching the special word of this model entity, and in the associated entity list attribute of this special word, records this model entity, can realize the associated of special word and relevant model inter-entity.
Here, after the model entity in pending text is identified, need to will in pending text, replace with the special word of model entity associated with default numeric string, so that in subsequent step when the named entity of the use text descriptions such as item property name entity, item property value entity is identified, avoid the interference of model entity, improve the accuracy of identification.
If the non-session first of step 104 current sessions,, according to the initial named entity of determining in the named entity recognition process of last session, identifies the item property name entity in described pending text and item property value entity; Described initial named entity is commodity entity or commodity classification entity.
This step, in order to reduce the interference of irrelevant key word, is first carried out the identification of item property name entity and item property value entity with the commodity entity or the commodity classification entity that identify in session before.
Preferably, in this step, can adopt following method to identify the item property name entity in pending text and item property value entity:
Step 1041, the described initial named entity of take are keyword, the linked database of the commodity of seeking system, commodity classification, item property name and item property value, obtain item property name and the item property value of this initial named entity association, and obtain the vocabulary of described item property name and each auto correlation of item property value.
Here, described vocabulary comprises synonym and part word, and wherein part word is the same with existing system, refers to the word comprising in word, belongs to a part for word.
Step 1042, each word in obtained item property name, item property value and described vocabulary is mated with described pending text respectively, for each word that can mate in described pending text, in the associated entity list attribute of each word that forms this word, record this word place vocabulary affiliated item property name or item property value entity, and in item property name or the conjunctive word list attribute in item property value entity under this word place vocabulary, record forms the word of this word.
Adopt said method to carry out the identification of item property name entity and item property value entity, can be so that the identification to named entity in text not be disturbed by other character of non-physical name.
Step 105, the described pending text obtaining in step 103 is carried out to participle; And to take each word obtaining after participle be index, the brand of seeking system and commodity classification dictionary, identify the brand entity in described pending text and commodity classification entity.
Preferably, this step can adopt following method to realize:
The described pending text obtaining in step 103 is carried out to participle; Each word obtaining after participle of take is index, the brand of seeking system and commodity classification dictionary, identify brand entity and commodity classification entity in described pending text, for each brand entity and commodity classification entity, in the conjunctive word list attribute of this entity, record identifies the word that the word of this entity comprises, and records this entity in the associated entity list attribute of each word comprising at the word that identifies this entity.
Here, the same prior art of concrete segmenting method, does not repeat them here.
Preferably, this step can further include: each word for obtaining after participle, marks corresponding part of speech.
Step 106, regular according to default keyword, according to the current described entity identifying, determines the current keyword for commercial articles searching; Use described keyword, search for default merchandising database, and from searched for commodity, select default W commodity; According to maximum public substring principle, a described W commodity are screened, by each indication of goods screening, be commodity entity, and record the goods number of described commodity entity.
In the keyword of determining in described step 106, do not comprise and in associated entity list attribute, record item property name entity or item property value entity and do not record brand entity and the special word of model entity;
Preferably, described keyword rule will comprise following several:
For each word that comprises the word that meets default first condition, this word is recorded to the special contamination of model entity respectively with each associated entity list attribute, as the current keyword for commercial articles searching; Described first condition is in associated entity list attribute, to record brand entity;
For each word that comprises the word that meets described first condition, by all the elements that start and finish with special word string with this word in original described pending text, as the current keyword for commercial articles searching;
For each word that comprises the word that meets described first condition, by all the elements that start with this word in original described pending text and finish with the word that comprises the word that meets default second condition, as the current keyword for commercial articles searching; Described second condition is in associated entity list attribute, to record commodity classification entity;
For each the special word that does not record item property name entity and item property value entity in associated entity list attribute, if in original described pending text, this special word is afterwards followed by the word that comprises the word that meets described second condition, by this special word and the described contamination of closelying follow thereafter, as the current keyword for commercial articles searching;
By recording each special word of model entity in associated entity list attribute, as the current keyword for commercial articles searching.
In actual applications, can, from searched for commodity, select arbitrarily default W commodity.Here, the concrete numerical value of W can be arranged according to the actual performance requirements such as algorithm complex by those skilled in the art.
Preferably, in this step, a described W commodity being screened, is commodity entity by each indication of goods screening, and records the goods number of described commodity entity, can adopt following method to realize:
For each commodity in a described W commodity, the trade name of these commodity is mated with original described pending text, obtain the Longest Common Substring of this trade name;
The longest commodity of length for Longest Common Substring in all W commodity, each word that the described Longest Common Substring of these commodity is comprised and special word are recorded in the conjunctive word list attribute of the commodity entity that these commodity are corresponding, and record the commodity entity that these commodity are corresponding in the associated entity list attribute of each word comprising at the described Longest Common Substring of these commodity and special word.
Step 107, according to the current described commodity entity having identified and described commodity classification entity, the linked database of the commodity of inquiry system, commodity classification, item property name and item property value, identifies corresponding item property name entity and item property value entity.
Preferably, this step can be set up lower step method realization:
For current each commodity related entities having identified, described commodity related entities comprises commodity entity and commodity classification entity, take this commodity related entities is keyword, the linked database of the commodity of seeking system, commodity classification, item property name and item property value, obtain item property name and the item property value of this commodity related entities association, and obtaining the vocabulary of described item property name and each auto correlation of item property value, described vocabulary comprises synonym and part word;
Each word in obtained item property name, item property value and described vocabulary is mated with described pending text respectively, for each word that can mate in described pending text, in the associated entity list attribute of each word that forms this word, record this word place vocabulary affiliated item property name or item property value entity, and in item property name or the conjunctive word list attribute in item property value entity under this word place vocabulary, record forms the word of this word.
If the special word that the current existence of step 108 and all described entities are irrelevant, utilize described irrelevant special word, search for the linked database of described merchandising database and described commodity, commodity classification, item property name and item property value, identify corresponding item property name entity and item property value entity.
Here, all described entities refer to the current all entities that identified.Do not carry out the special word of entity associated with the irrelevant special word of all described entities, be specially: associated entity list attribute is empty special word.This step, the special word that further entity associated is not carried out in utilization carries out the identification of item property name entity and item property value entity, all relevant item property name entities and item property value entity in text is accurately identified guaranteeing.
Preferably, this step can adopt following method to realize:
Step 1081, for the current associated entity list attribute that exists for special word described in empty each, if comprise the word of the word that meets default second condition in described pending text, the word that each is comprised to the word that meets described second condition respectively with this special contamination, as current merchandise query keyword, otherwise, using this special word as current merchandise query keyword; Described second condition is in associated entity list attribute, to record commodity classification entity.
Step 1082, use current merchandise query keyword, search for described merchandising database; And from searched for commodity, select default Q commodity.
In actual applications, can adopt optional mode, select Q commodity.
Here, the concrete numerical value of described Q can be arranged according to the actual performance requirements such as algorithm complex by those skilled in the art.
Step 1083, for each commodity in a described Q commodity, the trade name of these commodity is mated with original described pending text, obtain the Longest Common Substring of this trade name.
Step 1084, utilize the longest commodity of length of Longest Common Substring in all Q commodity, inquire about the linked database of commodity, commodity classification, item property name and the item property value of described system, identify corresponding item property name entity and item property value entity.
Step 109, the current all entities that identified are screened, determine all named entities of this session; And determine the initial named entity for session next time identification.
In this step, can adopt following step to carry out described screening:
Step 1091, utilize the current all entities that identified to set up entity candidate collection.
Step 1092, according to the quantity of the word recording in conjunctive word list attribute and special word order from big to small, all entities in described entity candidate collection are sorted.
Step 1093, the entity of selecting foremost and not being selected from described entity candidate collection, as current screening reference entity.
Step 1094, for each word in the conjunctive word list attribute of current screening reference entity and special word, by other entities except this screening reference entity that record in its associated entity list attribute, from described entity candidate collection, delete.
Step 1095, judge the entity whether not being selected in addition in described entity candidate collection, if had, perform step 1093, otherwise, all entities in current described entity candidate collection are determined to all named entities of this session.
Preferably, the initial named entity of determining in described step 109 for the identification of session next time comprises:
If include commodity entity in all named entities of this session of determining in step 109, using the last commodity entity occurring in described pending text as on the initial named entity of session identification once; Otherwise, using the last commodity classification entity occurring in described pending text as on the initial named entity of session identification once.
Here in order to improve in session next time the efficiency of named entity recognition and accuracy, need to be identified for the initial named entity of the identification of session next time.
In sum, these are only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (12)

1. a recognition methods for named entity, is characterized in that, comprising:
A, statement that user is inputted in current sessions, as pending text, are identified meeting numeral and the hyperlink of preset rules in described pending text, and the hyperlink in described pending text are replaced with to default hyperlink substitute symbol;
B, the special word in described pending text is identified, and only with all special word at interval, space, be labeled as a special word string by continuous, described special word comprises English character, numeral and the symbol except fullstop and comma;
C, to take special word described in each be respectively keyword, the commodity brand of search system and model database, model entity in described pending text is identified, and with default numeric string by identified in described pending text be that the special word of model entity is replaced;
If the non-session first of d current sessions,, according to the initial named entity of determining in the named entity recognition process of last session, identifies the item property name entity in described pending text and item property value entity; Described initial named entity is commodity entity or commodity classification entity;
E, the described pending text obtaining in step c is carried out to participle; And to take each word obtaining after participle be index, the brand of seeking system and commodity classification dictionary, identify the brand entity in described pending text and commodity classification entity;
F, regular according to default keyword, according to the current described entity identifying, determines the current keyword for commercial articles searching; Use described keyword, search for default merchandising database, and from searched for commodity, select default W commodity; According to maximum public substring principle, a described W commodity are screened, by each indication of goods screening, be commodity entity, and record the goods number of described commodity entity;
G, according to the current described commodity entity having identified and described commodity classification entity, the linked database of the commodity of inquiry system, commodity classification, item property name and item property value, identifies corresponding item property name entity and item property value entity;
If the special word that the current existence of h and all described entities are irrelevant, utilize described irrelevant special word, search for the linked database of described merchandising database and described commodity, commodity classification, item property name and item property value, identify corresponding item property name entity and item property value entity;
I, the current all entities that identified are screened, determine all named entities of this session; And determine the initial named entity for session next time identification.
2. method according to claim 1, is characterized in that, described in meet preset rules numeral and hyperlink be: meet the numeral of goods number rule and meet the hyperlink of commodity page rule.
3. method according to claim 1, is characterized in that, in step c, the model entity in described pending text is identified and is comprised:
The special word described in each of take is respectively keyword, the commodity brand of search system and model database; The type information searching is carried out to full word with described pending text and mate, the special word mating with described type information is labeled as to model entity; For model entity described in each, in the conjunctive word list attribute of this model entity, record is used for searching the special word of this model entity, and in the associated entity list attribute of this special word, records this model entity.
4. method according to claim 1, is characterized in that, in steps d, the item property name entity in described pending text and item property value entity is identified and is comprised:
The described initial named entity of take is keyword, the linked database of the commodity of seeking system, commodity classification, item property name and item property value, obtain item property name and the item property value of this initial named entity association, and obtaining the vocabulary of described item property name and each auto correlation of item property value, described vocabulary comprises synonym and part word;
Each word in obtained item property name, item property value and described vocabulary is mated with described pending text respectively, for each word that can mate in described pending text, in the associated entity list attribute of each word that forms this word, record this word place vocabulary affiliated item property name or item property value entity, and in item property name or the conjunctive word list attribute in item property value entity under this word place vocabulary, record forms the word of this word.
5. method according to claim 1, is characterized in that, step e comprises:
The described pending text obtaining in step c is carried out to participle; Each word obtaining after participle of take is index, the brand of seeking system and commodity classification dictionary, identify brand entity and commodity classification entity in described pending text, for each brand entity and commodity classification entity, in the conjunctive word list attribute of this entity, record identifies the word that the word of this entity comprises, and records this entity in the associated entity list attribute of each word comprising at the word that identifies this entity.
6. method according to claim 5, is characterized in that, described step e further comprises: each word for obtaining after participle, marks corresponding part of speech.
7. method according to claim 1, it is characterized in that, in the keyword of determining in described step f, do not comprise: in associated entity list attribute, record item property name entity or item property value entity and do not record brand entity and the special word of model entity;
Described keyword rule comprises:
For each word that comprises the word that meets default first condition, this word is recorded to the special contamination of model entity respectively with each associated entity list attribute, as the current keyword for commercial articles searching; Described first condition is in associated entity list attribute, to record brand entity;
For each word that comprises the word that meets described first condition, by all the elements that start and finish with special word string with this word in original described pending text, as the current keyword for commercial articles searching;
For each word that comprises the word that meets described first condition, by all the elements that start with this word in original described pending text and finish with the word that comprises the word that meets default second condition, as the current keyword for commercial articles searching; Described second condition is in associated entity list attribute, to record commodity classification entity;
For each the special word that does not record item property name entity and item property value entity in associated entity list attribute, if in original described pending text, this special word is afterwards followed by the word that comprises the word that meets described second condition, by this special word and the described contamination of closelying follow thereafter, as the current keyword for commercial articles searching;
By recording each special word of model entity in associated entity list attribute, as the current keyword for commercial articles searching.
8. method according to claim 1, is characterized in that, described in step f, according to maximum public substring principle, a described W commodity are screened, by each indication of goods screening, be commodity entity, and the goods number that records described commodity entity comprises:
For each commodity in a described W commodity, the trade name of these commodity is mated with original described pending text, obtain the Longest Common Substring of this trade name;
The longest commodity of length for Longest Common Substring in all W commodity, each word that the described Longest Common Substring of these commodity is comprised and special word are recorded in the conjunctive word list attribute of the commodity entity that these commodity are corresponding, and record the commodity entity that these commodity are corresponding in the associated entity list attribute of each word comprising at the described Longest Common Substring of these commodity and special word.
9. method according to claim 1, is characterized in that, described step g comprises:
For current each commodity related entities having identified, described commodity related entities comprises commodity entity and commodity classification entity, take this commodity related entities is keyword, the linked database of the commodity of seeking system, commodity classification, item property name and item property value, obtain item property name and the item property value of this commodity related entities association, and obtaining the vocabulary of described item property name and each auto correlation of item property value, described vocabulary comprises synonym and part word;
Each word in obtained item property name, item property value and described vocabulary is mated with described pending text respectively, for each word that can mate in described pending text, in the associated entity list attribute of each word that forms this word, record this word place vocabulary affiliated item property name or item property value entity, and in item property name or the conjunctive word list attribute in item property value entity under this word place vocabulary, record forms the word of this word.
10. method according to claim 1, is characterized in that, described step h comprises:
H1, for the current associated entity list attribute that exists for special word described in empty each, if comprise the word of the word that meets default second condition in described pending text, the word that each is comprised to the word that meets described second condition respectively with this special contamination, as current merchandise query keyword, otherwise, using this special word as current merchandise query keyword; Described second condition is in associated entity list attribute, to record commodity classification entity;
H2, use current merchandise query keyword, search for described merchandising database; And from searched for commodity, select default Q commodity;
H3, for each commodity in a described Q commodity, the trade name of these commodity is mated with original described pending text, obtain the Longest Common Substring of this trade name;
H4, utilize the longest commodity of length of Longest Common Substring in all Q commodity, inquire about the linked database of commodity, commodity classification, item property name and the item property value of described system, identify corresponding item property name entity and item property value entity.
11. methods according to claim 1, is characterized in that, screening described in described step I comprises:
Step I 1, utilize the current all entities that identified to set up entity candidate collection;
Step I 2, according to the quantity of the word recording in conjunctive word list attribute and special word order from big to small, all entities in described entity candidate collection are sorted;
Step I 3, the entity of selecting foremost and not being selected from described entity candidate collection, as current screening reference entity;
Step I 4, for each word in the conjunctive word list attribute of current screening reference entity and special word, by other entities except this screening reference entity that record in its associated entity list attribute, from described entity candidate collection, delete;
Step I 5, judge the entity whether not being selected in addition in described entity candidate collection, if had, perform step i3, otherwise, all entities in current described entity candidate collection are determined to all named entities of this session.
12. methods according to claim 1, is characterized in that, the initial named entity of determining in described step I for the identification of session next time comprises:
If include commodity entity in all named entities of this session of determining in step I, using the last commodity entity occurring in described pending text as on the initial named entity of session identification once; Otherwise, using the last commodity classification entity occurring in described pending text as on the initial named entity of session identification once.
CN201310674046.7A 2013-12-11 2013-12-11 Identifying method of named entities Active CN103631948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310674046.7A CN103631948B (en) 2013-12-11 2013-12-11 Identifying method of named entities

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310674046.7A CN103631948B (en) 2013-12-11 2013-12-11 Identifying method of named entities

Publications (2)

Publication Number Publication Date
CN103631948A true CN103631948A (en) 2014-03-12
CN103631948B CN103631948B (en) 2017-01-11

Family

ID=50212989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310674046.7A Active CN103631948B (en) 2013-12-11 2013-12-11 Identifying method of named entities

Country Status (1)

Country Link
CN (1) CN103631948B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331395A (en) * 2014-10-28 2015-02-04 北京京东尚科信息技术有限公司 Method and device for identifying Chinese product name from text
CN104572625A (en) * 2015-01-21 2015-04-29 北京云知声信息技术有限公司 Recognition method of named entity
CN104657514A (en) * 2015-03-24 2015-05-27 成都知数科技有限公司 Synonym identification method based on electronic commerce user behavior data
CN104750795A (en) * 2015-03-12 2015-07-01 北京云知声信息技术有限公司 Intelligent semantic searching system and method
CN104978356A (en) * 2014-04-10 2015-10-14 阿里巴巴集团控股有限公司 Synonym identification method and device
CN105320674A (en) * 2014-07-03 2016-02-10 腾讯科技(深圳)有限公司 Method and device for establishing domain ontology base and server
CN105574111A (en) * 2015-12-10 2016-05-11 天津海量信息技术有限公司 Enterprise entity authentication method based on enterprise attribute library
WO2016154866A1 (en) * 2015-03-31 2016-10-06 王志强 Method for displaying commercial uses when searching for trademarks, and information alert system
WO2017028422A1 (en) * 2015-08-20 2017-02-23 小米科技有限责任公司 Knowledge base construction method and apparatus
CN106547733A (en) * 2016-10-19 2017-03-29 中国国防科技信息中心 A kind of name entity recognition method towards particular text
WO2017092556A1 (en) * 2015-12-01 2017-06-08 北京国双科技有限公司 Method and device for automatically judging judgement result of judgement document
CN106997390A (en) * 2017-04-05 2017-08-01 安徽机器猫电子商务股份有限公司 A kind of equipment part or parts commodity transaction information search method
CN107944025A (en) * 2017-12-12 2018-04-20 北京百度网讯科技有限公司 Information-pushing method and device
CN109726612A (en) * 2017-10-27 2019-05-07 北京搜狗科技发展有限公司 A kind of recognition methods, device and device for identification
CN109740159A (en) * 2018-12-29 2019-05-10 北京泰迪熊移动科技有限公司 For naming the processing method and processing device of Entity recognition
CN109933772A (en) * 2017-12-15 2019-06-25 Tcl集团股份有限公司 Semantic analysis and terminal device
CN110209812A (en) * 2019-05-07 2019-09-06 北京地平线机器人技术研发有限公司 File classification method and device
CN111178080A (en) * 2020-01-02 2020-05-19 杭州涂鸦信息技术有限公司 Named entity identification method and system based on structured information
CN111723575A (en) * 2020-06-12 2020-09-29 杭州未名信科科技有限公司 Method, device, electronic equipment and medium for recognizing text

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919175B (en) * 2019-01-16 2020-10-23 浙江大学 Entity multi-classification method combined with attribute information

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005022408A1 (en) * 2003-08-28 2005-03-10 British Telecommunications Public Limited Company Method and apparatus for storing and retrieving data using ontologies
EP2043004A1 (en) * 2007-09-24 2009-04-01 Martin Bode Database system and method for collecting, storing and outputting data records
CN102262634A (en) * 2010-05-24 2011-11-30 北京大学深圳研究生院 Automatic questioning and answering method and system
CN102722558A (en) * 2012-05-29 2012-10-10 百度在线网络技术(北京)有限公司 User question recommending method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005022408A1 (en) * 2003-08-28 2005-03-10 British Telecommunications Public Limited Company Method and apparatus for storing and retrieving data using ontologies
EP2043004A1 (en) * 2007-09-24 2009-04-01 Martin Bode Database system and method for collecting, storing and outputting data records
CN102262634A (en) * 2010-05-24 2011-11-30 北京大学深圳研究生院 Automatic questioning and answering method and system
CN102722558A (en) * 2012-05-29 2012-10-10 百度在线网络技术(北京)有限公司 User question recommending method and device

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978356A (en) * 2014-04-10 2015-10-14 阿里巴巴集团控股有限公司 Synonym identification method and device
CN104978356B (en) * 2014-04-10 2019-09-06 阿里巴巴集团控股有限公司 A kind of recognition methods of synonym and device
CN105320674B (en) * 2014-07-03 2020-05-12 腾讯科技(深圳)有限公司 Method and device for establishing domain ontology base and server
CN105320674A (en) * 2014-07-03 2016-02-10 腾讯科技(深圳)有限公司 Method and device for establishing domain ontology base and server
CN104331395A (en) * 2014-10-28 2015-02-04 北京京东尚科信息技术有限公司 Method and device for identifying Chinese product name from text
CN104331395B (en) * 2014-10-28 2017-11-03 北京京东尚科信息技术有限公司 The method and apparatus that Chinese trade name is recognized from text
CN104572625A (en) * 2015-01-21 2015-04-29 北京云知声信息技术有限公司 Recognition method of named entity
CN104750795B (en) * 2015-03-12 2017-09-01 北京云知声信息技术有限公司 A kind of intelligent semantic searching system and method
CN104750795A (en) * 2015-03-12 2015-07-01 北京云知声信息技术有限公司 Intelligent semantic searching system and method
CN104657514B (en) * 2015-03-24 2018-05-25 成都知数科技有限公司 Near synonym recognition methods based on electric business user behavior data
CN104657514A (en) * 2015-03-24 2015-05-27 成都知数科技有限公司 Synonym identification method based on electronic commerce user behavior data
WO2016154866A1 (en) * 2015-03-31 2016-10-06 王志强 Method for displaying commercial uses when searching for trademarks, and information alert system
WO2017028422A1 (en) * 2015-08-20 2017-02-23 小米科技有限责任公司 Knowledge base construction method and apparatus
US10331648B2 (en) 2015-08-20 2019-06-25 Xiaomi Inc. Method, device and medium for knowledge base construction
WO2017092556A1 (en) * 2015-12-01 2017-06-08 北京国双科技有限公司 Method and device for automatically judging judgement result of judgement document
CN105574111A (en) * 2015-12-10 2016-05-11 天津海量信息技术有限公司 Enterprise entity authentication method based on enterprise attribute library
CN106547733A (en) * 2016-10-19 2017-03-29 中国国防科技信息中心 A kind of name entity recognition method towards particular text
CN106997390A (en) * 2017-04-05 2017-08-01 安徽机器猫电子商务股份有限公司 A kind of equipment part or parts commodity transaction information search method
CN109726612A (en) * 2017-10-27 2019-05-07 北京搜狗科技发展有限公司 A kind of recognition methods, device and device for identification
CN109726612B (en) * 2017-10-27 2021-04-16 北京搜狗科技发展有限公司 Identification method and device for identification
CN107944025A (en) * 2017-12-12 2018-04-20 北京百度网讯科技有限公司 Information-pushing method and device
CN109933772A (en) * 2017-12-15 2019-06-25 Tcl集团股份有限公司 Semantic analysis and terminal device
CN109740159A (en) * 2018-12-29 2019-05-10 北京泰迪熊移动科技有限公司 For naming the processing method and processing device of Entity recognition
CN109740159B (en) * 2018-12-29 2022-04-26 北京泰迪熊移动科技有限公司 Processing method and device for named entity recognition
CN110209812A (en) * 2019-05-07 2019-09-06 北京地平线机器人技术研发有限公司 File classification method and device
CN110209812B (en) * 2019-05-07 2022-04-22 北京地平线机器人技术研发有限公司 Text classification method and device
CN111178080A (en) * 2020-01-02 2020-05-19 杭州涂鸦信息技术有限公司 Named entity identification method and system based on structured information
CN111178080B (en) * 2020-01-02 2023-07-18 杭州涂鸦信息技术有限公司 Named entity identification method and system based on structured information
CN111723575A (en) * 2020-06-12 2020-09-29 杭州未名信科科技有限公司 Method, device, electronic equipment and medium for recognizing text

Also Published As

Publication number Publication date
CN103631948B (en) 2017-01-11

Similar Documents

Publication Publication Date Title
CN103631948A (en) Identifying method of named entities
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
US9304648B2 (en) Video segments for a video related to a task
CN107229668B (en) Text extraction method based on keyword matching
US9613166B2 (en) Search suggestions of related entities based on co-occurrence and/or fuzzy-score matching
CN103678576B (en) The text retrieval system analyzed based on dynamic semantics
CN101872351B (en) Method, device for identifying synonyms, and method and device for searching by using same
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN104361042B (en) A kind of information retrieval method and device
CN103942347B (en) A kind of segmenting method based on various dimensions synthesis dictionary
CN105608232B (en) A kind of bug knowledge modeling method based on graphic data base
EP3022660A2 (en) Performing an operation relative to tabular data based upon voice input
CN105868255A (en) Query recommendation method and apparatus
CN102314452B (en) A kind of method and system of being undertaken navigating by input method platform
CN101727454A (en) Method for automatic classification of objects and system
CN106326303A (en) Spoken language semantic analysis system and method
WO2020074017A1 (en) Deep learning-based method and device for screening for keywords in medical document
CN114911917B (en) Asset meta-information searching method and device, computer equipment and readable storage medium
CN102693279A (en) Method, device and system for fast calculating comment similarity
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN103778122A (en) Searching method and system
CN106649605B (en) Method and device for triggering promotion keywords
CN102314464A (en) Lyrics searching method and lyrics searching engine
US9208204B2 (en) Search suggestions using fuzzy-score matching and entity co-occurrence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant