CN108255806A - A kind of name recognition methods and device - Google Patents

A kind of name recognition methods and device Download PDF

Info

Publication number
CN108255806A
CN108255806A CN201711414492.9A CN201711414492A CN108255806A CN 108255806 A CN108255806 A CN 108255806A CN 201711414492 A CN201711414492 A CN 201711414492A CN 108255806 A CN108255806 A CN 108255806A
Authority
CN
China
Prior art keywords
name
ambiguity
credible
language material
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711414492.9A
Other languages
Chinese (zh)
Other versions
CN108255806B (en
Inventor
刘兵
苗艳军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201711414492.9A priority Critical patent/CN108255806B/en
Publication of CN108255806A publication Critical patent/CN108255806A/en
Application granted granted Critical
Publication of CN108255806B publication Critical patent/CN108255806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of name recognition methods and device, using name included in name identification model automatic identification name language material to be identified.Have comprehensive since name knowledge base includes name, there is certain accuracy to the identification of ambiguity name using the model that name knowledge base and video knowledge base are trained, at the same also can recognize that can name, improve the overall effect of name identification.

Description

A kind of name recognition methods and device
Technical field
The present invention relates to the Internet search technology field, more specifically to a kind of name recognition methods and device.
Background technology
Entity recognition (Named Entity Recognition, NER) is named, also known as " proper name identification ", refers specifically to identify The entity name of specific certain sense in text, for example, name, place name and mechanism name.Text in video industry, for example, There are a large amount of names in video title and entertainment news, and the recognition effect of name included in text is largely affected The popularization of Video Applications product.
At present, name identification mainly realized by building general name identification model, for example, disaggregated model or Conditional random field models.But being likely to ambiguity name occur in text to be identified refers to, for example, " dawn ", general name The identification error rate that identification model refers to these ambiguity names is very high, should so as to influence the videos such as video search, video push With the effect of product.
Invention content
In view of this, the present invention provides a kind of name recognition methods and device, to solve existing general name identification mould The identification error rate that type refers to these ambiguity names is very high, so as to influence the Video Applications product such as video search, video push Effect the problem of.Technical solution is as follows:
A kind of name recognition methods, including:
Receive name language material to be identified;
Name identification model is transferred, the name identification model is to advance with name knowledge base and the training of video knowledge base What default machine learning classification model obtained;
Name included in the name language material to be identified is identified using the name identification model.
Preferably, it advances with name knowledge base and the default machine learning classification model of video knowledge base training obtains name The process of identification model, including:
The list of ambiguity name and credible name list are extracted from name knowledge base;
Ambiguity name language material is extracted from video knowledge base based on the ambiguity name list, and extracts the ambiguity name The ambiguity name feature of language material;
Credible name language material is extracted, and extract described credible from the video knowledge base based on the credible name list The credible name feature of name language material;
Default machine learning classification model is trained using the ambiguity name feature and the credible name feature, Obtain name identification model.
Preferably, the list of ambiguity name and credible name list are extracted in the knowledge base from name, including:
Ambiguity name is obtained from name knowledge base, and generates the ambiguity name list for including the ambiguity name;
The non-ambiguity name in addition to the ambiguity name is obtained from the name knowledge base;
Search daily record is transferred, and credible name is chosen from the non-ambiguity name using described search daily record;
Generation includes the credible name list of the credible name.
Preferably, it is described that ambiguity name language material is extracted from video knowledge base based on the ambiguity name list, including:
For each videotext in video knowledge base, the title of the videotext is obtained;
The title is segmented, obtains multiple title phrases;
Whether judge in the multiple title phrase comprising at least one of ambiguity name list ambiguity name;
If comprising the videotext is determined as ambiguity name text;
Generation includes the ambiguity name language material of all ambiguity name texts.
Preferably, it is described that the videotext is determined as ambiguity name text, before, further include:
It calculates respectively between at least one ambiguity name that the videotext and the videotext are included Text similarity distance;
Judge to whether there is the text phase more than distance threshold at least one text similarity distance being calculated Like distance;
If in the presence of, perform it is described the videotext is determined as ambiguity name text, the step for.
Preferably, the ambiguity name feature of the extraction ambiguity name language material, including:
Feature extraction is carried out to the ambiguity name language material, fisrt feature list is obtained, is wrapped in the fisrt feature list Contain multiple features;
The ambiguity name language material is segmented, obtains multiple first language material phrases;
For feature identical with the first language material phrase in the fisrt feature table and extract the ambiguity name language The ambiguity name of material adds label respectively, and will add the tagged feature and the ambiguity name is converted to characteristic sequence.
Preferably, the credible name feature of the extraction credible name language material, including:
Feature extraction is carried out to the credible name language material, second feature list is obtained, is wrapped in the second feature list Contain multiple features;
The credible name language material is segmented, obtains multiple second language material phrases;
For feature identical with the second language material phrase in the second feature table and extract the credible name language The credible name of material adds label respectively, and will add the tagged feature and the credible name is converted to characteristic sequence.
A kind of name identification device, including:Language material receiving module, model transfer module and name identification module, the mould Type, which is transferred, includes model generation unit in module;
The language material receiving module, for receiving name language material to be identified;
The model generation unit, for advancing with name knowledge base and the default machine learning point of video knowledge base training Class model obtains name identification model;
The model transfers module, for transferring the name identification model;
The name identification module is wrapped for being identified in the name language material to be identified using the name identification model The name contained.
Preferably, the model generation unit, is specifically used for:
The list of ambiguity name and credible name list are extracted from name knowledge base;Based on the ambiguity name list from regarding Ambiguity name language material is extracted in frequency knowledge base, and extracts the ambiguity name feature of the ambiguity name language material;Based on described credible Credible name language material is extracted in name list from the video knowledge base, and the credible name for extracting the credible name language material is special Sign;Default machine learning classification model is trained using the ambiguity name feature and the credible name feature, is obtained Name identification model.
Preferably, it is generated for extracting the model of the list of ambiguity name and credible name list from name knowledge base Unit is specifically used for:
Ambiguity name is obtained from name knowledge base, and generates the ambiguity name list for including the ambiguity name;From The non-ambiguity name in addition to the ambiguity name is obtained in the name knowledge base;Search daily record is transferred, and is searched described in utilization Suo Zhi chooses credible name from the non-ambiguity name;Generation includes the credible name list of the credible name.
Above name recognition methods provided by the present invention and device, it is to be identified using name identification model automatic identification Name included in name language material.There is comprehensive, utilization name knowledge base since name knowledge base includes name There is certain accuracy to the identification of ambiguity name with the model that video knowledge base is trained, at the same also can recognize that can name, carry The high overall effect of name identification.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention, for those of ordinary skill in the art, without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is the method flow diagram of name identification model generation method provided in an embodiment of the present invention;
Fig. 2 is the method flow diagram of step S20 in name identification model generation method provided in an embodiment of the present invention;
Fig. 3 be in name identification model generation method provided in an embodiment of the present invention in step S201 " from name knowledge base The method flow diagram of middle extraction ambiguity name list and credible name list ";
Fig. 4 is " to be based on ambiguity name in step S202 in name identification model generation method provided in an embodiment of the present invention Ambiguity name language material is extracted in list from video knowledge base " method flow diagram;
Fig. 5 is " to extract ambiguity name in step S202 in name identification model generation method provided in an embodiment of the present invention The method flow diagram of the ambiguity name feature of language material ";
Fig. 6 is " to extract credible name in step S203 in name identification model generation method provided in an embodiment of the present invention The method flow diagram of the credible name feature of language material ";
Fig. 7 is the structure diagram of name identification model generating means provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work Embodiment shall fall within the protection scope of the present invention.
The embodiment of the present invention provides a kind of name recognition methods, and the method flow diagram of this method is as shown in Figure 1, including as follows Step:
S10 receives name language material to be identified;
S20, transfers name identification model, and the name identification model is to advance with name knowledge base and video knowledge base The default machine learning classification model of training obtains;
In the embodiment of the present invention, name knowledge base is provided in the background data base of video search engine company and video is known Know library.
There are a large amount of name data in name knowledge base.Name can distinguish credible name and ambiguity name, wherein, it is credible Name refers to conventional name, that is, when credible name occur and referring to, you can determine that the name is referred to as name, and ambiguity Name refers to unconventional name, that is, when ambiguity name occur and referring to, it is impossible to when determining that the name refers to, it is impossible to which determining should Name is referred to as name or non-name.In addition, the structured field information of name, the structuring are stored in name knowledge base Field information includes native place, birthday, nickname, Life story, related works and related person etc..
Video title text and introductory video text etc. are then included in video knowledge base.
Extract ambiguity name and credible name respectively from name knowledge base;Then ambiguity name is utilized from video knowledge base Middle extraction ambiguity name language material, credible name language material is extracted using credible name from video knowledge base;Finally use ambiguity people Name language material and credible name language material are trained default machine learning classification model, obtain name identification model.
During specific implementation, " name knowledge base and the default engineering of video knowledge base training are advanced in step S20 Practise disaggregated model obtain name identification model " process, Fig. 2:
S201 extracts the list of ambiguity name and credible name list from name knowledge base;
In the present embodiment, since the quantity of credible name in name knowledge base is much larger than the quantity of ambiguity name, can Ambiguity name list is obtained first from name knowledge base, and then credible name list is obtained from remaining non-ambiguity name.
During specific implementation, " list of ambiguity name and credible name are extracted from name knowledge base in step S201 The process of list ", can specifically use following steps, and method flow diagram is as shown in Figure 3:
S1001 obtains ambiguity name from name knowledge base, and generates the ambiguity name list for including ambiguity name;
During step S1001 is performed, the name in name knowledge base and dictionary for word segmentation can be compared, with segmenting word The name of inhuman nominal word overlapping, as ambiguity name in allusion quotation.It certainly, can when certain name being not present in dictionary for word segmentation Prompt message is generated, user to be prompted to determine whether the name is ambiguity name, and dictionary for word segmentation is added to after part of speech is demarcated In, this also increases the comprehensive of dictionary for word segmentation.
It should be noted that the part of speech of each phrase is divided into name part of speech and inhuman to have demarcated in advance in dictionary for word segmentation It is nominal, that is to say, that only there are one parts of speech for a phrase category, belong to name or non-name.
S1002 obtains the non-ambiguity name in addition to ambiguity name from name knowledge base;
S1003 transfers search daily record, and chooses credible name from non-ambiguity name using daily record is searched for;
During step S1003 is performed, the searching times of the search non-ambiguity name of log statistic, searching times are used Non- ambiguity name higher than threshold value is considered as popular name, and then chooses popular name as credible name.
Certainly, it for the selection of credible name, can be also from high to low ranked up according to searching times, choose preset quantity The high name of searching times as credible name.
S1004, generation include the credible name list of credible name.
S202 extracts ambiguity name language material based on the list of ambiguity name from video knowledge base, and extracts ambiguity name language The ambiguity name feature of material;
During step S202 is performed, the language material of ambiguity name, as ambiguity name are included in video knowledge base Language material;Further, ambiguity name feature is extracted from ambiguity name language material according to default feature.
During specific implementation, " ambiguity people is extracted from video knowledge base based on the list of ambiguity name in step S202 The process of name language material ", can specifically use following steps, method flow diagram is as shown in Figure 4:
S1005 for each videotext in video knowledge base, obtains the title of videotext;
In the present embodiment, due to including the possibility bigger of name in video title text, the application selection regards Source of the frequency title text as ambiguity name language material.
S1006 segments title, obtains multiple title phrases;
During step S1006 is performed, it is assumed that certain headline is segmented, obtains title phrase a, heading Group b, title phrase c and title phrase d.
Whether S1007 is judged in multiple title phrases comprising at least one of ambiguity name list ambiguity name;
S1008, if comprising videotext is determined as ambiguity name text;
In actual application, in order to choose more effective ambiguity name text from video knowledge base, can first it calculate Text similarity distance between at least one ambiguity name that videotext and videotext are included;Judgement be calculated to It whether there is the text similarity distance more than distance threshold in a few text similarity distance;If in the presence of performing " by video text Originally it is determined as ambiguity name text ", the step for.
During text similarity distance is calculated:The structured field information of name is stored in name knowledge base, it can Using structured field information as knowledge feature, for example, title phrase a is ambiguity name A, the knowledge feature of ambiguity name A Have:" name B ", " name C ", " program 1 ", " program 2 " and " program 3 ".Therefore, this " can will calculate videotext and ambiguity The similarity distance of name " is converted into --- calculate the similarity distance between videotext and knowledge feature.Therefore, similarity distance Computational methods are Text similarity computing method.
S1009, generation include the ambiguity name language material of whole ambiguity name texts.
During specific implementation, the process of " the ambiguity name feature of extraction ambiguity name language material " in step S202 can Specifically to use following steps, method flow diagram is as shown in Figure 5:
S1010 carries out feature extraction to ambiguity name language material, fisrt feature list is obtained, in the fisrt feature list Include multiple features;
During Feature Selection, present inventor is observed by data with realizing that analysis finds that name refers to Hereinafter there are two strong correlation features:First, for word or part of speech feature in the window of left and right, for example, " concert of * " and " * It divorces with * ";Second, for the strong correlation knowledge feature referred to name, for example, the knowledge feature of the strong correlation of ambiguity name A " name B ".
According to above-mentioned window feature and contextual feature, feature extraction is carried out to ambiguity name language material, for example, for above-mentioned Headline progress feature extraction can obtain the feature list as shown in table 1, all features is numbered, by feature list In character representation for integer serial number, wherein, CONTEXT_KNOWLEDGE_FEA is Context Knowledge feature.
Feature serial number Feature
1 CONTEXT_KNOWLEDGE_FEA
2 T01/ title phrases b
3 T02/ title phrases c
…… ……
Table 1
S1011 segments ambiguity name language material, obtains multiple first language material phrases;
S1012, for feature identical with the first language material phrase in fisrt feature table and the discrimination of extraction ambiguity name language material Adopted name adds label respectively, and the feature after addition label and ambiguity name are converted to characteristic sequence;
It is that feature addition identical with the first language material phrase in fisrt feature table is negative during step S1012 is performed Label, and add positive label by and by ambiguity name when extracting the ambiguity feature language material.For example, for being added with positive label Ambiguity name A and feature title phrase b transformation results added with negative label are as follows:
Label Characteristic sequence
1 1:1 2:1 3:1 4:1……
2 5:1 6:1 7:1……
…… ……
Table 2
S203 extracts credible name language material based on credible name list from video knowledge base, and extracts credible name language The credible name feature of material;
In the present embodiment, since credible name is that do not have ambiguous, the name in any context environmental refers to can Determine occurred as name, all can be extracted from video knowledge base includes the videotext of credible name as can Believe name language material.And due to including the possibility bigger of name in video title text, the application selecting video title Source of the text as credible name language material, this can will include the video title text of credible name as credible name Language material.
During specific implementation, the process of " the credible name feature for extracting credible name language material " in step S203 can Specifically to use following steps, method flow diagram is as shown in Figure 6:
S1013 carries out feature extraction to credible name language material, second feature list is obtained, in the second feature list Include multiple features;
During Feature Selection, present inventor is observed by data with realizing that analysis finds that name refers to Hereinafter there are two strong correlation features:First, for word or part of speech feature in the window of left and right, for example, " concert of * " and " * It divorces with * ";Second, for the strong correlation knowledge feature referred to name.
According to above-mentioned window feature and contextual feature, feature extraction is carried out to ambiguity name language material
S1014 segments credible name language material, obtains multiple second language material phrases;
S1015 is that feature identical with the second language material phrase in second feature table adds label, and will be after addition label Feature Conversion is characterized sequence.
S204 is trained default machine learning classification model using ambiguity name feature and credible name feature, obtains To name identification model.
In the present embodiment, default machine learning classification model includes but is not limited to supporting vector machine model, logistic regression Model etc. can be selected according to actual needs, and the present embodiment is not specifically limited.
S30 identifies name included in name language material to be identified using name identification model.
Above step S201~step S204 is only " to advance with name knowledge base in the embodiment of the present application step S20 Default machine learning classification model is trained to obtain name identification model with video knowledge base " process the preferred realization side of one kind Formula, the specific implementation in relation to this process can arbitrarily set according to the demand of oneself, not limit herein.
Above step S1001~step S1004 is only " to be taken out from name knowledge base in the embodiment of the present application step S201 Take the list of ambiguity name and credible name list " process a kind of preferred realization method, the specific implementation in relation to this process Mode can arbitrarily be set according to the demand of oneself, not limited herein.
Above step S1005~step S1009 is only " based on ambiguity name list in the embodiment of the present application step S202 A kind of preferred realization method of the process of the extraction ambiguity name language material from video knowledge base ", the specific reality in relation to this process Existing mode can arbitrarily be set according to the demand of oneself, not limited herein.
Above step S1010~step S1012 is only " extraction ambiguity name language material in the embodiment of the present application step S202 Ambiguity name feature " process a kind of preferred realization method, the specific implementation in relation to this process can be according to oneself Demand arbitrarily set, do not limit herein.
Above step S1013~step S1015 is only " to extract credible name language material in the embodiment of the present application step S203 Credible name feature " process a kind of preferred realization method, the specific implementation in relation to this process can be according to oneself Demand arbitrarily set, do not limit herein.
The name recognition methods that the embodiment of the present invention is provided, using name identification model automatic identification name to be identified Name included in language material.By name knowledge base include name have it is comprehensive, using name knowledge base and regard The model that frequency knowledge base is trained has the identification of ambiguity name certain accuracy, at the same also can recognize that can name, improve The overall effect of name identification.
Based on the name recognition methods that above-described embodiment provides, the embodiment of the present invention is then corresponding to perform above-mentioned name identification side The device of method, structure diagram as shown in fig. 7, comprises:Language material receiving module 10, model transfer module 20 and name identification mould Block 30, model, which is transferred, includes model generation unit 201 in module 20;
Language material receiving module 10, for receiving name language material to be identified;
Model generation unit 201, for advancing with name knowledge base and the default machine learning point of video knowledge base training Class model obtains name identification model;
Model transfers module 20, for transferring name identification model;
Name identification module 30, for identifying name included in name language material to be identified using name identification model.
In some other embodiment, model generation unit 201 is specifically used for:
The list of ambiguity name and credible name list are extracted from name knowledge base;Known based on ambiguity name list from video Know and ambiguity name language material is extracted in library, and extract the ambiguity name feature of ambiguity name language material;Based on credible name list from regarding Credible name language material is extracted in frequency knowledge base, and extracts the credible name feature of credible name language material;Utilize ambiguity name feature Default machine learning classification model is trained with credible name feature, obtains name identification model.
In some other embodiment, for extracting the list of ambiguity name and credible name list from name knowledge base Model generation unit 201, is specifically used for:
Ambiguity name is obtained from name knowledge base, and generates the ambiguity name list for including ambiguity name;From name The non-ambiguity name in addition to ambiguity name is obtained in knowledge base;Search daily record is transferred, and utilizes and searches for daily record from non-ambiguity people Credible name is chosen in name;Generation includes the credible name list of credible name.
The name identification device that the embodiment of the present invention is provided, using name identification model automatic identification name to be identified Name included in language material.By name knowledge base include name have it is comprehensive, using name knowledge base and regard The model that frequency knowledge base is trained has the identification of ambiguity name certain accuracy, at the same also can recognize that can name, improve The overall effect of name identification.
A kind of name recognition methods provided by the present invention and device are described in detail above, it is used herein Specific case is expounded the principle of the present invention and embodiment, and the explanation of above example is only intended to help to understand this The method and its core concept of invention;Meanwhile for those of ordinary skill in the art, thought according to the present invention, specific There will be changes in embodiment and application range, in conclusion the content of the present specification should not be construed as to the present invention's Limitation.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation is all difference from other examples, and just to refer each other for identical similar part between each embodiment. For device disclosed in embodiment, since it is corresponded to the methods disclosed in the examples, so fairly simple, the phase of description Part is closed referring to method part illustration.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, term " comprising ", "comprising" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the element that process, method, article or equipment including a series of elements are intrinsic, It either further includes as these processes, method, article or the intrinsic element of equipment.In the absence of more restrictions, The element limited by sentence "including a ...", it is not excluded that in the process including the element, method, article or equipment In also there are other identical elements.
The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the present invention. A variety of modifications of these embodiments will be apparent for those skilled in the art, it is as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and the principles and novel features disclosed herein phase one The most wide range caused.

Claims (10)

1. a kind of name recognition methods, which is characterized in that including:
Receive name language material to be identified;
Transfer name identification model, the name identification model is to advance with name knowledge base and the training of video knowledge base is default Machine learning classification model obtains;
Name included in the name language material to be identified is identified using the name identification model.
2. according to the method described in claim 1, it is characterized in that, advance with name knowledge base and the training of video knowledge base in advance If machine learning classification model obtains the process of name identification model, including:
The list of ambiguity name and credible name list are extracted from name knowledge base;
Ambiguity name language material is extracted from video knowledge base based on the ambiguity name list, and extracts the ambiguity name language material Ambiguity name feature;
Credible name language material is extracted, and extract the credible name from the video knowledge base based on the credible name list The credible name feature of language material;
Default machine learning classification model is trained using the ambiguity name feature and the credible name feature, is obtained Name identification model.
3. according to the method described in claim 2, it is characterized in that, in the knowledge base from name extract the list of ambiguity name and Credible name list, including:
Ambiguity name is obtained from name knowledge base, and generates the ambiguity name list for including the ambiguity name;
The non-ambiguity name in addition to the ambiguity name is obtained from the name knowledge base;
Search daily record is transferred, and credible name is chosen from the non-ambiguity name using described search daily record;
Generation includes the credible name list of the credible name.
4. according to the method described in claim 2, it is characterized in that, described be based on the ambiguity name list from video knowledge base Middle extraction ambiguity name language material, including:
For each videotext in video knowledge base, the title of the videotext is obtained;
The title is segmented, obtains multiple title phrases;
Whether judge in the multiple title phrase comprising at least one of ambiguity name list ambiguity name;
If comprising the videotext is determined as ambiguity name text;
Generation includes the ambiguity name language material of all ambiguity name texts.
5. according to the method described in claim 4, it is characterized in that, described be determined as ambiguity name text by the videotext This, before, further includes:
The text between at least one ambiguity name that the videotext and the videotext are included is calculated respectively Similarity distance;
Judge to whether there is at least one text similarity distance for being calculated it is similar more than the text of distance threshold away from From;
If in the presence of, perform it is described the videotext is determined as ambiguity name text, the step for.
6. according to the method described in claim 2, it is characterized in that, the ambiguity name of the extraction ambiguity name language material is special Sign, including:
Feature extraction is carried out to the ambiguity name language material, fisrt feature list is obtained, includes in the fisrt feature list Multiple features;
The ambiguity name language material is segmented, obtains multiple first language material phrases;
For feature identical with the first language material phrase in the fisrt feature table and extract the ambiguity name language material Ambiguity name adds label respectively, and will add the tagged feature and the ambiguity name is converted to characteristic sequence.
7. according to the method described in claim 2, it is characterized in that, the credible name of the extraction credible name language material is special Sign, including:
Feature extraction is carried out to the credible name language material, second feature list is obtained, includes in the second feature list Multiple features;
The credible name language material is segmented, obtains multiple second language material phrases;
For feature identical with the second language material phrase in the second feature table and extract the credible name language material Credible name adds label respectively, and will add the tagged feature and the credible name is converted to characteristic sequence.
8. a kind of name identification device, which is characterized in that including:Language material receiving module, model transfer module and name identification mould Block, the model, which is transferred, includes model generation unit in module;
The language material receiving module, for receiving name language material to be identified;
The model generation unit, for advancing with name knowledge base and the default machine learning classification mould of video knowledge base training Type obtains name identification model;
The model transfers module, for transferring the name identification model;
The name identification module, for being identified included in the name language material to be identified using the name identification model Name.
9. device according to claim 8, which is characterized in that the model generation unit is specifically used for:
The list of ambiguity name and credible name list are extracted from name knowledge base;Known based on the ambiguity name list from video Know and ambiguity name language material is extracted in library, and extract the ambiguity name feature of the ambiguity name language material;Based on the credible name Credible name language material is extracted in list from the video knowledge base, and extracts the credible name feature of the credible name language material; Default machine learning classification model is trained using the ambiguity name feature and the credible name feature, obtains name Identification model.
10. device according to claim 9, which is characterized in that for extracting ambiguity name list from name knowledge base With the model generation unit of credible name list, it is specifically used for:
Ambiguity name is obtained from name knowledge base, and generates the ambiguity name list for including the ambiguity name;From described The non-ambiguity name in addition to the ambiguity name is obtained in name knowledge base;Search daily record is transferred, and utilizes described search day Will chooses credible name from the non-ambiguity name;Generation includes the credible name list of the credible name.
CN201711414492.9A 2017-12-22 2017-12-22 Name recognition method and device Active CN108255806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711414492.9A CN108255806B (en) 2017-12-22 2017-12-22 Name recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711414492.9A CN108255806B (en) 2017-12-22 2017-12-22 Name recognition method and device

Publications (2)

Publication Number Publication Date
CN108255806A true CN108255806A (en) 2018-07-06
CN108255806B CN108255806B (en) 2021-12-17

Family

ID=62722815

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711414492.9A Active CN108255806B (en) 2017-12-22 2017-12-22 Name recognition method and device

Country Status (1)

Country Link
CN (1) CN108255806B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401083A (en) * 2019-01-02 2020-07-10 阿里巴巴集团控股有限公司 Name identification method and device, storage medium and processor

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101454750A (en) * 2006-03-31 2009-06-10 谷歌公司 Disambiguation of named entities
CN101833569A (en) * 2010-04-08 2010-09-15 中国科学院自动化研究所 Method for automatically identifying film human face image
CN101950284A (en) * 2010-09-27 2011-01-19 北京新媒传信科技有限公司 Chinese word segmentation method and system
CN102033879A (en) * 2009-09-27 2011-04-27 腾讯科技(深圳)有限公司 Method and device for identifying Chinese name
CN102521321A (en) * 2011-12-02 2012-06-27 华中科技大学 Video search method based on search term ambiguity and user preferences
EP2626118A2 (en) * 2012-02-13 2013-08-14 Sony Computer Entertainment Europe Limited System and method of image augmentation
CN103714094A (en) * 2012-10-09 2014-04-09 富士通株式会社 Equipment and method for recognizing objects in video
US20140142922A1 (en) * 2007-10-17 2014-05-22 Evri, Inc. Nlp-based entity recognition and disambiguation
CN104424332A (en) * 2013-09-11 2015-03-18 富士通株式会社 Unambiguous Japanese name list building method and name identification method and device
US20150154284A1 (en) * 2013-11-29 2015-06-04 Katja Pfeifer Aggregating results from named entity recognition services
US20150161116A1 (en) * 2012-03-19 2015-06-11 Google Inc. Searching based on audio and/or visual features of documents
CN104918136A (en) * 2015-05-28 2015-09-16 北京奇艺世纪科技有限公司 Video positioning method and device
CN105868193A (en) * 2015-01-19 2016-08-17 富士通株式会社 Device and method used to detect product relevant information in electronic text
CN106156051A (en) * 2015-03-27 2016-11-23 深圳市腾讯计算机系统有限公司 Build the method and device of name language material identification model
CN106407180A (en) * 2016-08-30 2017-02-15 北京奇艺世纪科技有限公司 Entity disambiguation method and apparatus
CN106446754A (en) * 2015-08-11 2017-02-22 阿里巴巴集团控股有限公司 Image identification method, metric learning method, image source identification method and devices
CN106649272A (en) * 2016-12-23 2017-05-10 东北大学 Named entity recognizing method based on mixed model
CN106708796A (en) * 2015-07-15 2017-05-24 中国科学院计算技术研究所 Text-based key personal name extraction method and system
CN106779080A (en) * 2017-01-13 2017-05-31 武汉理工数字传播工程有限公司 A kind of people information knowledge base method for auto constructing
CN107180087A (en) * 2017-05-09 2017-09-19 北京奇艺世纪科技有限公司 A kind of searching method and device
CN107391485A (en) * 2017-07-18 2017-11-24 中译语通科技(北京)有限公司 Entity recognition method is named based on the Korean of maximum entropy and neural network model

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101454750A (en) * 2006-03-31 2009-06-10 谷歌公司 Disambiguation of named entities
US20140142922A1 (en) * 2007-10-17 2014-05-22 Evri, Inc. Nlp-based entity recognition and disambiguation
CN102033879A (en) * 2009-09-27 2011-04-27 腾讯科技(深圳)有限公司 Method and device for identifying Chinese name
CN101833569A (en) * 2010-04-08 2010-09-15 中国科学院自动化研究所 Method for automatically identifying film human face image
CN101950284A (en) * 2010-09-27 2011-01-19 北京新媒传信科技有限公司 Chinese word segmentation method and system
CN102521321A (en) * 2011-12-02 2012-06-27 华中科技大学 Video search method based on search term ambiguity and user preferences
EP2626118A2 (en) * 2012-02-13 2013-08-14 Sony Computer Entertainment Europe Limited System and method of image augmentation
US20150161116A1 (en) * 2012-03-19 2015-06-11 Google Inc. Searching based on audio and/or visual features of documents
CN103714094A (en) * 2012-10-09 2014-04-09 富士通株式会社 Equipment and method for recognizing objects in video
CN104424332A (en) * 2013-09-11 2015-03-18 富士通株式会社 Unambiguous Japanese name list building method and name identification method and device
US20150154284A1 (en) * 2013-11-29 2015-06-04 Katja Pfeifer Aggregating results from named entity recognition services
CN105868193A (en) * 2015-01-19 2016-08-17 富士通株式会社 Device and method used to detect product relevant information in electronic text
CN106156051A (en) * 2015-03-27 2016-11-23 深圳市腾讯计算机系统有限公司 Build the method and device of name language material identification model
CN104918136A (en) * 2015-05-28 2015-09-16 北京奇艺世纪科技有限公司 Video positioning method and device
CN106708796A (en) * 2015-07-15 2017-05-24 中国科学院计算技术研究所 Text-based key personal name extraction method and system
CN106446754A (en) * 2015-08-11 2017-02-22 阿里巴巴集团控股有限公司 Image identification method, metric learning method, image source identification method and devices
CN106407180A (en) * 2016-08-30 2017-02-15 北京奇艺世纪科技有限公司 Entity disambiguation method and apparatus
CN106649272A (en) * 2016-12-23 2017-05-10 东北大学 Named entity recognizing method based on mixed model
CN106779080A (en) * 2017-01-13 2017-05-31 武汉理工数字传播工程有限公司 A kind of people information knowledge base method for auto constructing
CN107180087A (en) * 2017-05-09 2017-09-19 北京奇艺世纪科技有限公司 A kind of searching method and device
CN107391485A (en) * 2017-07-18 2017-11-24 中译语通科技(北京)有限公司 Entity recognition method is named based on the Korean of maximum entropy and neural network model

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
SATOH S. 等: "Name-it: Naming and detecting faces in news videos", 《IEEE MULTIMEDIA》 *
SINGH U. 等: "Named entity recognition system for Urdu", 《PROCEEDINGS OF COLING》 *
乐娟 等: "基于HMM的京剧机构命名实体识别寄法", 《计算机工程》 *
刘浩: "面向情感搜索的中文语料分析及其分词", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
塔什甫拉提·尼扎木丁 等: "统计与规则相结合的维吾尔语人名识别方法", 《自动化学报》 *
王元卓 等: "基于开放网络知识的信息检索与数据挖掘", 《计算机研究与发展》 *
王巍巍 等: "双语影视知识图谱的构建研究", 《北京大学学报(自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401083A (en) * 2019-01-02 2020-07-10 阿里巴巴集团控股有限公司 Name identification method and device, storage medium and processor
CN111401083B (en) * 2019-01-02 2023-05-02 阿里巴巴集团控股有限公司 Name identification method and device, storage medium and processor

Also Published As

Publication number Publication date
CN108255806B (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN109145153B (en) Intention category identification method and device
CN105718586B (en) The method and device of participle
US8073877B2 (en) Scalable semi-structured named entity detection
CN111783518A (en) Training sample generation method and device, electronic equipment and readable storage medium
CN105045852A (en) Full-text search engine system for teaching resources
CN108984661A (en) Entity alignment schemes and device in a kind of knowledge mapping
CN104933039A (en) Entity link system for language lacking resources
CN111694927B (en) Automatic document review method based on improved word shift distance algorithm
CN112395420A (en) Video content retrieval method and device, computer equipment and storage medium
CN111291177A (en) Information processing method and device and computer storage medium
CN110008473B (en) Medical text named entity identification and labeling method based on iteration method
CN110413998B (en) Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof
CN103823857A (en) Space information searching method based on natural language processing
CN110674378A (en) Chinese semantic recognition method based on cosine similarity and minimum editing distance
CN111488468A (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
Stern et al. A joint named entity recognition and entity linking system
CN106021234A (en) Label extraction method and system
CN106897274B (en) Cross-language comment replying method
CN108345694B (en) Document retrieval method and system based on theme database
CN110008314B (en) Intention analysis method and device
CN108255806A (en) A kind of name recognition methods and device
Ali et al. Arabic keyphrases extraction using a hybrid of statistical and machine learning methods
CN114548109B (en) Named entity recognition model training method and named entity recognition method
CN114430832A (en) Data processing method and device, electronic equipment and storage medium
CN110705285A (en) Government affair text subject word bank construction method, device, server and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant