CN108255806A - A kind of name recognition methods and device - Google Patents
A kind of name recognition methods and device Download PDFInfo
- Publication number
- CN108255806A CN108255806A CN201711414492.9A CN201711414492A CN108255806A CN 108255806 A CN108255806 A CN 108255806A CN 201711414492 A CN201711414492 A CN 201711414492A CN 108255806 A CN108255806 A CN 108255806A
- Authority
- CN
- China
- Prior art keywords
- name
- ambiguity
- credible
- language material
- knowledge base
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of name recognition methods and device, using name included in name identification model automatic identification name language material to be identified.Have comprehensive since name knowledge base includes name, there is certain accuracy to the identification of ambiguity name using the model that name knowledge base and video knowledge base are trained, at the same also can recognize that can name, improve the overall effect of name identification.
Description
Technical field
The present invention relates to the Internet search technology field, more specifically to a kind of name recognition methods and device.
Background technology
Entity recognition (Named Entity Recognition, NER) is named, also known as " proper name identification ", refers specifically to identify
The entity name of specific certain sense in text, for example, name, place name and mechanism name.Text in video industry, for example,
There are a large amount of names in video title and entertainment news, and the recognition effect of name included in text is largely affected
The popularization of Video Applications product.
At present, name identification mainly realized by building general name identification model, for example, disaggregated model or
Conditional random field models.But being likely to ambiguity name occur in text to be identified refers to, for example, " dawn ", general name
The identification error rate that identification model refers to these ambiguity names is very high, should so as to influence the videos such as video search, video push
With the effect of product.
Invention content
In view of this, the present invention provides a kind of name recognition methods and device, to solve existing general name identification mould
The identification error rate that type refers to these ambiguity names is very high, so as to influence the Video Applications product such as video search, video push
Effect the problem of.Technical solution is as follows:
A kind of name recognition methods, including:
Receive name language material to be identified;
Name identification model is transferred, the name identification model is to advance with name knowledge base and the training of video knowledge base
What default machine learning classification model obtained;
Name included in the name language material to be identified is identified using the name identification model.
Preferably, it advances with name knowledge base and the default machine learning classification model of video knowledge base training obtains name
The process of identification model, including:
The list of ambiguity name and credible name list are extracted from name knowledge base;
Ambiguity name language material is extracted from video knowledge base based on the ambiguity name list, and extracts the ambiguity name
The ambiguity name feature of language material;
Credible name language material is extracted, and extract described credible from the video knowledge base based on the credible name list
The credible name feature of name language material;
Default machine learning classification model is trained using the ambiguity name feature and the credible name feature,
Obtain name identification model.
Preferably, the list of ambiguity name and credible name list are extracted in the knowledge base from name, including:
Ambiguity name is obtained from name knowledge base, and generates the ambiguity name list for including the ambiguity name;
The non-ambiguity name in addition to the ambiguity name is obtained from the name knowledge base;
Search daily record is transferred, and credible name is chosen from the non-ambiguity name using described search daily record;
Generation includes the credible name list of the credible name.
Preferably, it is described that ambiguity name language material is extracted from video knowledge base based on the ambiguity name list, including:
For each videotext in video knowledge base, the title of the videotext is obtained;
The title is segmented, obtains multiple title phrases;
Whether judge in the multiple title phrase comprising at least one of ambiguity name list ambiguity name;
If comprising the videotext is determined as ambiguity name text;
Generation includes the ambiguity name language material of all ambiguity name texts.
Preferably, it is described that the videotext is determined as ambiguity name text, before, further include:
It calculates respectively between at least one ambiguity name that the videotext and the videotext are included
Text similarity distance;
Judge to whether there is the text phase more than distance threshold at least one text similarity distance being calculated
Like distance;
If in the presence of, perform it is described the videotext is determined as ambiguity name text, the step for.
Preferably, the ambiguity name feature of the extraction ambiguity name language material, including:
Feature extraction is carried out to the ambiguity name language material, fisrt feature list is obtained, is wrapped in the fisrt feature list
Contain multiple features;
The ambiguity name language material is segmented, obtains multiple first language material phrases;
For feature identical with the first language material phrase in the fisrt feature table and extract the ambiguity name language
The ambiguity name of material adds label respectively, and will add the tagged feature and the ambiguity name is converted to characteristic sequence.
Preferably, the credible name feature of the extraction credible name language material, including:
Feature extraction is carried out to the credible name language material, second feature list is obtained, is wrapped in the second feature list
Contain multiple features;
The credible name language material is segmented, obtains multiple second language material phrases;
For feature identical with the second language material phrase in the second feature table and extract the credible name language
The credible name of material adds label respectively, and will add the tagged feature and the credible name is converted to characteristic sequence.
A kind of name identification device, including:Language material receiving module, model transfer module and name identification module, the mould
Type, which is transferred, includes model generation unit in module;
The language material receiving module, for receiving name language material to be identified;
The model generation unit, for advancing with name knowledge base and the default machine learning point of video knowledge base training
Class model obtains name identification model;
The model transfers module, for transferring the name identification model;
The name identification module is wrapped for being identified in the name language material to be identified using the name identification model
The name contained.
Preferably, the model generation unit, is specifically used for:
The list of ambiguity name and credible name list are extracted from name knowledge base;Based on the ambiguity name list from regarding
Ambiguity name language material is extracted in frequency knowledge base, and extracts the ambiguity name feature of the ambiguity name language material;Based on described credible
Credible name language material is extracted in name list from the video knowledge base, and the credible name for extracting the credible name language material is special
Sign;Default machine learning classification model is trained using the ambiguity name feature and the credible name feature, is obtained
Name identification model.
Preferably, it is generated for extracting the model of the list of ambiguity name and credible name list from name knowledge base
Unit is specifically used for:
Ambiguity name is obtained from name knowledge base, and generates the ambiguity name list for including the ambiguity name;From
The non-ambiguity name in addition to the ambiguity name is obtained in the name knowledge base;Search daily record is transferred, and is searched described in utilization
Suo Zhi chooses credible name from the non-ambiguity name;Generation includes the credible name list of the credible name.
Above name recognition methods provided by the present invention and device, it is to be identified using name identification model automatic identification
Name included in name language material.There is comprehensive, utilization name knowledge base since name knowledge base includes name
There is certain accuracy to the identification of ambiguity name with the model that video knowledge base is trained, at the same also can recognize that can name, carry
The high overall effect of name identification.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention, for those of ordinary skill in the art, without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is the method flow diagram of name identification model generation method provided in an embodiment of the present invention;
Fig. 2 is the method flow diagram of step S20 in name identification model generation method provided in an embodiment of the present invention;
Fig. 3 be in name identification model generation method provided in an embodiment of the present invention in step S201 " from name knowledge base
The method flow diagram of middle extraction ambiguity name list and credible name list ";
Fig. 4 is " to be based on ambiguity name in step S202 in name identification model generation method provided in an embodiment of the present invention
Ambiguity name language material is extracted in list from video knowledge base " method flow diagram;
Fig. 5 is " to extract ambiguity name in step S202 in name identification model generation method provided in an embodiment of the present invention
The method flow diagram of the ambiguity name feature of language material ";
Fig. 6 is " to extract credible name in step S203 in name identification model generation method provided in an embodiment of the present invention
The method flow diagram of the credible name feature of language material ";
Fig. 7 is the structure diagram of name identification model generating means provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work
Embodiment shall fall within the protection scope of the present invention.
The embodiment of the present invention provides a kind of name recognition methods, and the method flow diagram of this method is as shown in Figure 1, including as follows
Step:
S10 receives name language material to be identified;
S20, transfers name identification model, and the name identification model is to advance with name knowledge base and video knowledge base
The default machine learning classification model of training obtains;
In the embodiment of the present invention, name knowledge base is provided in the background data base of video search engine company and video is known
Know library.
There are a large amount of name data in name knowledge base.Name can distinguish credible name and ambiguity name, wherein, it is credible
Name refers to conventional name, that is, when credible name occur and referring to, you can determine that the name is referred to as name, and ambiguity
Name refers to unconventional name, that is, when ambiguity name occur and referring to, it is impossible to when determining that the name refers to, it is impossible to which determining should
Name is referred to as name or non-name.In addition, the structured field information of name, the structuring are stored in name knowledge base
Field information includes native place, birthday, nickname, Life story, related works and related person etc..
Video title text and introductory video text etc. are then included in video knowledge base.
Extract ambiguity name and credible name respectively from name knowledge base;Then ambiguity name is utilized from video knowledge base
Middle extraction ambiguity name language material, credible name language material is extracted using credible name from video knowledge base;Finally use ambiguity people
Name language material and credible name language material are trained default machine learning classification model, obtain name identification model.
During specific implementation, " name knowledge base and the default engineering of video knowledge base training are advanced in step S20
Practise disaggregated model obtain name identification model " process, Fig. 2:
S201 extracts the list of ambiguity name and credible name list from name knowledge base;
In the present embodiment, since the quantity of credible name in name knowledge base is much larger than the quantity of ambiguity name, can
Ambiguity name list is obtained first from name knowledge base, and then credible name list is obtained from remaining non-ambiguity name.
During specific implementation, " list of ambiguity name and credible name are extracted from name knowledge base in step S201
The process of list ", can specifically use following steps, and method flow diagram is as shown in Figure 3:
S1001 obtains ambiguity name from name knowledge base, and generates the ambiguity name list for including ambiguity name;
During step S1001 is performed, the name in name knowledge base and dictionary for word segmentation can be compared, with segmenting word
The name of inhuman nominal word overlapping, as ambiguity name in allusion quotation.It certainly, can when certain name being not present in dictionary for word segmentation
Prompt message is generated, user to be prompted to determine whether the name is ambiguity name, and dictionary for word segmentation is added to after part of speech is demarcated
In, this also increases the comprehensive of dictionary for word segmentation.
It should be noted that the part of speech of each phrase is divided into name part of speech and inhuman to have demarcated in advance in dictionary for word segmentation
It is nominal, that is to say, that only there are one parts of speech for a phrase category, belong to name or non-name.
S1002 obtains the non-ambiguity name in addition to ambiguity name from name knowledge base;
S1003 transfers search daily record, and chooses credible name from non-ambiguity name using daily record is searched for;
During step S1003 is performed, the searching times of the search non-ambiguity name of log statistic, searching times are used
Non- ambiguity name higher than threshold value is considered as popular name, and then chooses popular name as credible name.
Certainly, it for the selection of credible name, can be also from high to low ranked up according to searching times, choose preset quantity
The high name of searching times as credible name.
S1004, generation include the credible name list of credible name.
S202 extracts ambiguity name language material based on the list of ambiguity name from video knowledge base, and extracts ambiguity name language
The ambiguity name feature of material;
During step S202 is performed, the language material of ambiguity name, as ambiguity name are included in video knowledge base
Language material;Further, ambiguity name feature is extracted from ambiguity name language material according to default feature.
During specific implementation, " ambiguity people is extracted from video knowledge base based on the list of ambiguity name in step S202
The process of name language material ", can specifically use following steps, method flow diagram is as shown in Figure 4:
S1005 for each videotext in video knowledge base, obtains the title of videotext;
In the present embodiment, due to including the possibility bigger of name in video title text, the application selection regards
Source of the frequency title text as ambiguity name language material.
S1006 segments title, obtains multiple title phrases;
During step S1006 is performed, it is assumed that certain headline is segmented, obtains title phrase a, heading
Group b, title phrase c and title phrase d.
Whether S1007 is judged in multiple title phrases comprising at least one of ambiguity name list ambiguity name;
S1008, if comprising videotext is determined as ambiguity name text;
In actual application, in order to choose more effective ambiguity name text from video knowledge base, can first it calculate
Text similarity distance between at least one ambiguity name that videotext and videotext are included;Judgement be calculated to
It whether there is the text similarity distance more than distance threshold in a few text similarity distance;If in the presence of performing " by video text
Originally it is determined as ambiguity name text ", the step for.
During text similarity distance is calculated:The structured field information of name is stored in name knowledge base, it can
Using structured field information as knowledge feature, for example, title phrase a is ambiguity name A, the knowledge feature of ambiguity name A
Have:" name B ", " name C ", " program 1 ", " program 2 " and " program 3 ".Therefore, this " can will calculate videotext and ambiguity
The similarity distance of name " is converted into --- calculate the similarity distance between videotext and knowledge feature.Therefore, similarity distance
Computational methods are Text similarity computing method.
S1009, generation include the ambiguity name language material of whole ambiguity name texts.
During specific implementation, the process of " the ambiguity name feature of extraction ambiguity name language material " in step S202 can
Specifically to use following steps, method flow diagram is as shown in Figure 5:
S1010 carries out feature extraction to ambiguity name language material, fisrt feature list is obtained, in the fisrt feature list
Include multiple features;
During Feature Selection, present inventor is observed by data with realizing that analysis finds that name refers to
Hereinafter there are two strong correlation features:First, for word or part of speech feature in the window of left and right, for example, " concert of * " and " *
It divorces with * ";Second, for the strong correlation knowledge feature referred to name, for example, the knowledge feature of the strong correlation of ambiguity name A
" name B ".
According to above-mentioned window feature and contextual feature, feature extraction is carried out to ambiguity name language material, for example, for above-mentioned
Headline progress feature extraction can obtain the feature list as shown in table 1, all features is numbered, by feature list
In character representation for integer serial number, wherein, CONTEXT_KNOWLEDGE_FEA is Context Knowledge feature.
Feature serial number | Feature |
1 | CONTEXT_KNOWLEDGE_FEA |
2 | T01/ title phrases b |
3 | T02/ title phrases c |
…… | …… |
Table 1
S1011 segments ambiguity name language material, obtains multiple first language material phrases;
S1012, for feature identical with the first language material phrase in fisrt feature table and the discrimination of extraction ambiguity name language material
Adopted name adds label respectively, and the feature after addition label and ambiguity name are converted to characteristic sequence;
It is that feature addition identical with the first language material phrase in fisrt feature table is negative during step S1012 is performed
Label, and add positive label by and by ambiguity name when extracting the ambiguity feature language material.For example, for being added with positive label
Ambiguity name A and feature title phrase b transformation results added with negative label are as follows:
Label | Characteristic sequence |
1 | 1:1 2:1 3:1 4:1…… |
2 | 5:1 6:1 7:1…… |
…… | …… |
Table 2
S203 extracts credible name language material based on credible name list from video knowledge base, and extracts credible name language
The credible name feature of material;
In the present embodiment, since credible name is that do not have ambiguous, the name in any context environmental refers to can
Determine occurred as name, all can be extracted from video knowledge base includes the videotext of credible name as can
Believe name language material.And due to including the possibility bigger of name in video title text, the application selecting video title
Source of the text as credible name language material, this can will include the video title text of credible name as credible name
Language material.
During specific implementation, the process of " the credible name feature for extracting credible name language material " in step S203 can
Specifically to use following steps, method flow diagram is as shown in Figure 6:
S1013 carries out feature extraction to credible name language material, second feature list is obtained, in the second feature list
Include multiple features;
During Feature Selection, present inventor is observed by data with realizing that analysis finds that name refers to
Hereinafter there are two strong correlation features:First, for word or part of speech feature in the window of left and right, for example, " concert of * " and " *
It divorces with * ";Second, for the strong correlation knowledge feature referred to name.
According to above-mentioned window feature and contextual feature, feature extraction is carried out to ambiguity name language material
S1014 segments credible name language material, obtains multiple second language material phrases;
S1015 is that feature identical with the second language material phrase in second feature table adds label, and will be after addition label
Feature Conversion is characterized sequence.
S204 is trained default machine learning classification model using ambiguity name feature and credible name feature, obtains
To name identification model.
In the present embodiment, default machine learning classification model includes but is not limited to supporting vector machine model, logistic regression
Model etc. can be selected according to actual needs, and the present embodiment is not specifically limited.
S30 identifies name included in name language material to be identified using name identification model.
Above step S201~step S204 is only " to advance with name knowledge base in the embodiment of the present application step S20
Default machine learning classification model is trained to obtain name identification model with video knowledge base " process the preferred realization side of one kind
Formula, the specific implementation in relation to this process can arbitrarily set according to the demand of oneself, not limit herein.
Above step S1001~step S1004 is only " to be taken out from name knowledge base in the embodiment of the present application step S201
Take the list of ambiguity name and credible name list " process a kind of preferred realization method, the specific implementation in relation to this process
Mode can arbitrarily be set according to the demand of oneself, not limited herein.
Above step S1005~step S1009 is only " based on ambiguity name list in the embodiment of the present application step S202
A kind of preferred realization method of the process of the extraction ambiguity name language material from video knowledge base ", the specific reality in relation to this process
Existing mode can arbitrarily be set according to the demand of oneself, not limited herein.
Above step S1010~step S1012 is only " extraction ambiguity name language material in the embodiment of the present application step S202
Ambiguity name feature " process a kind of preferred realization method, the specific implementation in relation to this process can be according to oneself
Demand arbitrarily set, do not limit herein.
Above step S1013~step S1015 is only " to extract credible name language material in the embodiment of the present application step S203
Credible name feature " process a kind of preferred realization method, the specific implementation in relation to this process can be according to oneself
Demand arbitrarily set, do not limit herein.
The name recognition methods that the embodiment of the present invention is provided, using name identification model automatic identification name to be identified
Name included in language material.By name knowledge base include name have it is comprehensive, using name knowledge base and regard
The model that frequency knowledge base is trained has the identification of ambiguity name certain accuracy, at the same also can recognize that can name, improve
The overall effect of name identification.
Based on the name recognition methods that above-described embodiment provides, the embodiment of the present invention is then corresponding to perform above-mentioned name identification side
The device of method, structure diagram as shown in fig. 7, comprises:Language material receiving module 10, model transfer module 20 and name identification mould
Block 30, model, which is transferred, includes model generation unit 201 in module 20;
Language material receiving module 10, for receiving name language material to be identified;
Model generation unit 201, for advancing with name knowledge base and the default machine learning point of video knowledge base training
Class model obtains name identification model;
Model transfers module 20, for transferring name identification model;
Name identification module 30, for identifying name included in name language material to be identified using name identification model.
In some other embodiment, model generation unit 201 is specifically used for:
The list of ambiguity name and credible name list are extracted from name knowledge base;Known based on ambiguity name list from video
Know and ambiguity name language material is extracted in library, and extract the ambiguity name feature of ambiguity name language material;Based on credible name list from regarding
Credible name language material is extracted in frequency knowledge base, and extracts the credible name feature of credible name language material;Utilize ambiguity name feature
Default machine learning classification model is trained with credible name feature, obtains name identification model.
In some other embodiment, for extracting the list of ambiguity name and credible name list from name knowledge base
Model generation unit 201, is specifically used for:
Ambiguity name is obtained from name knowledge base, and generates the ambiguity name list for including ambiguity name;From name
The non-ambiguity name in addition to ambiguity name is obtained in knowledge base;Search daily record is transferred, and utilizes and searches for daily record from non-ambiguity people
Credible name is chosen in name;Generation includes the credible name list of credible name.
The name identification device that the embodiment of the present invention is provided, using name identification model automatic identification name to be identified
Name included in language material.By name knowledge base include name have it is comprehensive, using name knowledge base and regard
The model that frequency knowledge base is trained has the identification of ambiguity name certain accuracy, at the same also can recognize that can name, improve
The overall effect of name identification.
A kind of name recognition methods provided by the present invention and device are described in detail above, it is used herein
Specific case is expounded the principle of the present invention and embodiment, and the explanation of above example is only intended to help to understand this
The method and its core concept of invention;Meanwhile for those of ordinary skill in the art, thought according to the present invention, specific
There will be changes in embodiment and application range, in conclusion the content of the present specification should not be construed as to the present invention's
Limitation.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight
Point explanation is all difference from other examples, and just to refer each other for identical similar part between each embodiment.
For device disclosed in embodiment, since it is corresponded to the methods disclosed in the examples, so fairly simple, the phase of description
Part is closed referring to method part illustration.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one
Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation
There are any actual relationship or orders.Moreover, term " comprising ", "comprising" or its any other variant are intended to contain
Lid non-exclusive inclusion, so that the element that process, method, article or equipment including a series of elements are intrinsic,
It either further includes as these processes, method, article or the intrinsic element of equipment.In the absence of more restrictions,
The element limited by sentence "including a ...", it is not excluded that in the process including the element, method, article or equipment
In also there are other identical elements.
The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the present invention.
A variety of modifications of these embodiments will be apparent for those skilled in the art, it is as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention
The embodiments shown herein is not intended to be limited to, and is to fit to and the principles and novel features disclosed herein phase one
The most wide range caused.
Claims (10)
1. a kind of name recognition methods, which is characterized in that including:
Receive name language material to be identified;
Transfer name identification model, the name identification model is to advance with name knowledge base and the training of video knowledge base is default
Machine learning classification model obtains;
Name included in the name language material to be identified is identified using the name identification model.
2. according to the method described in claim 1, it is characterized in that, advance with name knowledge base and the training of video knowledge base in advance
If machine learning classification model obtains the process of name identification model, including:
The list of ambiguity name and credible name list are extracted from name knowledge base;
Ambiguity name language material is extracted from video knowledge base based on the ambiguity name list, and extracts the ambiguity name language material
Ambiguity name feature;
Credible name language material is extracted, and extract the credible name from the video knowledge base based on the credible name list
The credible name feature of language material;
Default machine learning classification model is trained using the ambiguity name feature and the credible name feature, is obtained
Name identification model.
3. according to the method described in claim 2, it is characterized in that, in the knowledge base from name extract the list of ambiguity name and
Credible name list, including:
Ambiguity name is obtained from name knowledge base, and generates the ambiguity name list for including the ambiguity name;
The non-ambiguity name in addition to the ambiguity name is obtained from the name knowledge base;
Search daily record is transferred, and credible name is chosen from the non-ambiguity name using described search daily record;
Generation includes the credible name list of the credible name.
4. according to the method described in claim 2, it is characterized in that, described be based on the ambiguity name list from video knowledge base
Middle extraction ambiguity name language material, including:
For each videotext in video knowledge base, the title of the videotext is obtained;
The title is segmented, obtains multiple title phrases;
Whether judge in the multiple title phrase comprising at least one of ambiguity name list ambiguity name;
If comprising the videotext is determined as ambiguity name text;
Generation includes the ambiguity name language material of all ambiguity name texts.
5. according to the method described in claim 4, it is characterized in that, described be determined as ambiguity name text by the videotext
This, before, further includes:
The text between at least one ambiguity name that the videotext and the videotext are included is calculated respectively
Similarity distance;
Judge to whether there is at least one text similarity distance for being calculated it is similar more than the text of distance threshold away from
From;
If in the presence of, perform it is described the videotext is determined as ambiguity name text, the step for.
6. according to the method described in claim 2, it is characterized in that, the ambiguity name of the extraction ambiguity name language material is special
Sign, including:
Feature extraction is carried out to the ambiguity name language material, fisrt feature list is obtained, includes in the fisrt feature list
Multiple features;
The ambiguity name language material is segmented, obtains multiple first language material phrases;
For feature identical with the first language material phrase in the fisrt feature table and extract the ambiguity name language material
Ambiguity name adds label respectively, and will add the tagged feature and the ambiguity name is converted to characteristic sequence.
7. according to the method described in claim 2, it is characterized in that, the credible name of the extraction credible name language material is special
Sign, including:
Feature extraction is carried out to the credible name language material, second feature list is obtained, includes in the second feature list
Multiple features;
The credible name language material is segmented, obtains multiple second language material phrases;
For feature identical with the second language material phrase in the second feature table and extract the credible name language material
Credible name adds label respectively, and will add the tagged feature and the credible name is converted to characteristic sequence.
8. a kind of name identification device, which is characterized in that including:Language material receiving module, model transfer module and name identification mould
Block, the model, which is transferred, includes model generation unit in module;
The language material receiving module, for receiving name language material to be identified;
The model generation unit, for advancing with name knowledge base and the default machine learning classification mould of video knowledge base training
Type obtains name identification model;
The model transfers module, for transferring the name identification model;
The name identification module, for being identified included in the name language material to be identified using the name identification model
Name.
9. device according to claim 8, which is characterized in that the model generation unit is specifically used for:
The list of ambiguity name and credible name list are extracted from name knowledge base;Known based on the ambiguity name list from video
Know and ambiguity name language material is extracted in library, and extract the ambiguity name feature of the ambiguity name language material;Based on the credible name
Credible name language material is extracted in list from the video knowledge base, and extracts the credible name feature of the credible name language material;
Default machine learning classification model is trained using the ambiguity name feature and the credible name feature, obtains name
Identification model.
10. device according to claim 9, which is characterized in that for extracting ambiguity name list from name knowledge base
With the model generation unit of credible name list, it is specifically used for:
Ambiguity name is obtained from name knowledge base, and generates the ambiguity name list for including the ambiguity name;From described
The non-ambiguity name in addition to the ambiguity name is obtained in name knowledge base;Search daily record is transferred, and utilizes described search day
Will chooses credible name from the non-ambiguity name;Generation includes the credible name list of the credible name.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711414492.9A CN108255806B (en) | 2017-12-22 | 2017-12-22 | Name recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711414492.9A CN108255806B (en) | 2017-12-22 | 2017-12-22 | Name recognition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108255806A true CN108255806A (en) | 2018-07-06 |
CN108255806B CN108255806B (en) | 2021-12-17 |
Family
ID=62722815
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711414492.9A Active CN108255806B (en) | 2017-12-22 | 2017-12-22 | Name recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108255806B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401083A (en) * | 2019-01-02 | 2020-07-10 | 阿里巴巴集团控股有限公司 | Name identification method and device, storage medium and processor |
Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101454750A (en) * | 2006-03-31 | 2009-06-10 | 谷歌公司 | Disambiguation of named entities |
CN101833569A (en) * | 2010-04-08 | 2010-09-15 | 中国科学院自动化研究所 | Method for automatically identifying film human face image |
CN101950284A (en) * | 2010-09-27 | 2011-01-19 | 北京新媒传信科技有限公司 | Chinese word segmentation method and system |
CN102033879A (en) * | 2009-09-27 | 2011-04-27 | 腾讯科技(深圳)有限公司 | Method and device for identifying Chinese name |
CN102521321A (en) * | 2011-12-02 | 2012-06-27 | 华中科技大学 | Video search method based on search term ambiguity and user preferences |
EP2626118A2 (en) * | 2012-02-13 | 2013-08-14 | Sony Computer Entertainment Europe Limited | System and method of image augmentation |
CN103714094A (en) * | 2012-10-09 | 2014-04-09 | 富士通株式会社 | Equipment and method for recognizing objects in video |
US20140142922A1 (en) * | 2007-10-17 | 2014-05-22 | Evri, Inc. | Nlp-based entity recognition and disambiguation |
CN104424332A (en) * | 2013-09-11 | 2015-03-18 | 富士通株式会社 | Unambiguous Japanese name list building method and name identification method and device |
US20150154284A1 (en) * | 2013-11-29 | 2015-06-04 | Katja Pfeifer | Aggregating results from named entity recognition services |
US20150161116A1 (en) * | 2012-03-19 | 2015-06-11 | Google Inc. | Searching based on audio and/or visual features of documents |
CN104918136A (en) * | 2015-05-28 | 2015-09-16 | 北京奇艺世纪科技有限公司 | Video positioning method and device |
CN105868193A (en) * | 2015-01-19 | 2016-08-17 | 富士通株式会社 | Device and method used to detect product relevant information in electronic text |
CN106156051A (en) * | 2015-03-27 | 2016-11-23 | 深圳市腾讯计算机系统有限公司 | Build the method and device of name language material identification model |
CN106407180A (en) * | 2016-08-30 | 2017-02-15 | 北京奇艺世纪科技有限公司 | Entity disambiguation method and apparatus |
CN106446754A (en) * | 2015-08-11 | 2017-02-22 | 阿里巴巴集团控股有限公司 | Image identification method, metric learning method, image source identification method and devices |
CN106649272A (en) * | 2016-12-23 | 2017-05-10 | 东北大学 | Named entity recognizing method based on mixed model |
CN106708796A (en) * | 2015-07-15 | 2017-05-24 | 中国科学院计算技术研究所 | Text-based key personal name extraction method and system |
CN106779080A (en) * | 2017-01-13 | 2017-05-31 | 武汉理工数字传播工程有限公司 | A kind of people information knowledge base method for auto constructing |
CN107180087A (en) * | 2017-05-09 | 2017-09-19 | 北京奇艺世纪科技有限公司 | A kind of searching method and device |
CN107391485A (en) * | 2017-07-18 | 2017-11-24 | 中译语通科技(北京)有限公司 | Entity recognition method is named based on the Korean of maximum entropy and neural network model |
-
2017
- 2017-12-22 CN CN201711414492.9A patent/CN108255806B/en active Active
Patent Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101454750A (en) * | 2006-03-31 | 2009-06-10 | 谷歌公司 | Disambiguation of named entities |
US20140142922A1 (en) * | 2007-10-17 | 2014-05-22 | Evri, Inc. | Nlp-based entity recognition and disambiguation |
CN102033879A (en) * | 2009-09-27 | 2011-04-27 | 腾讯科技(深圳)有限公司 | Method and device for identifying Chinese name |
CN101833569A (en) * | 2010-04-08 | 2010-09-15 | 中国科学院自动化研究所 | Method for automatically identifying film human face image |
CN101950284A (en) * | 2010-09-27 | 2011-01-19 | 北京新媒传信科技有限公司 | Chinese word segmentation method and system |
CN102521321A (en) * | 2011-12-02 | 2012-06-27 | 华中科技大学 | Video search method based on search term ambiguity and user preferences |
EP2626118A2 (en) * | 2012-02-13 | 2013-08-14 | Sony Computer Entertainment Europe Limited | System and method of image augmentation |
US20150161116A1 (en) * | 2012-03-19 | 2015-06-11 | Google Inc. | Searching based on audio and/or visual features of documents |
CN103714094A (en) * | 2012-10-09 | 2014-04-09 | 富士通株式会社 | Equipment and method for recognizing objects in video |
CN104424332A (en) * | 2013-09-11 | 2015-03-18 | 富士通株式会社 | Unambiguous Japanese name list building method and name identification method and device |
US20150154284A1 (en) * | 2013-11-29 | 2015-06-04 | Katja Pfeifer | Aggregating results from named entity recognition services |
CN105868193A (en) * | 2015-01-19 | 2016-08-17 | 富士通株式会社 | Device and method used to detect product relevant information in electronic text |
CN106156051A (en) * | 2015-03-27 | 2016-11-23 | 深圳市腾讯计算机系统有限公司 | Build the method and device of name language material identification model |
CN104918136A (en) * | 2015-05-28 | 2015-09-16 | 北京奇艺世纪科技有限公司 | Video positioning method and device |
CN106708796A (en) * | 2015-07-15 | 2017-05-24 | 中国科学院计算技术研究所 | Text-based key personal name extraction method and system |
CN106446754A (en) * | 2015-08-11 | 2017-02-22 | 阿里巴巴集团控股有限公司 | Image identification method, metric learning method, image source identification method and devices |
CN106407180A (en) * | 2016-08-30 | 2017-02-15 | 北京奇艺世纪科技有限公司 | Entity disambiguation method and apparatus |
CN106649272A (en) * | 2016-12-23 | 2017-05-10 | 东北大学 | Named entity recognizing method based on mixed model |
CN106779080A (en) * | 2017-01-13 | 2017-05-31 | 武汉理工数字传播工程有限公司 | A kind of people information knowledge base method for auto constructing |
CN107180087A (en) * | 2017-05-09 | 2017-09-19 | 北京奇艺世纪科技有限公司 | A kind of searching method and device |
CN107391485A (en) * | 2017-07-18 | 2017-11-24 | 中译语通科技(北京)有限公司 | Entity recognition method is named based on the Korean of maximum entropy and neural network model |
Non-Patent Citations (7)
Title |
---|
SATOH S. 等: "Name-it: Naming and detecting faces in news videos", 《IEEE MULTIMEDIA》 * |
SINGH U. 等: "Named entity recognition system for Urdu", 《PROCEEDINGS OF COLING》 * |
乐娟 等: "基于HMM的京剧机构命名实体识别寄法", 《计算机工程》 * |
刘浩: "面向情感搜索的中文语料分析及其分词", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
塔什甫拉提·尼扎木丁 等: "统计与规则相结合的维吾尔语人名识别方法", 《自动化学报》 * |
王元卓 等: "基于开放网络知识的信息检索与数据挖掘", 《计算机研究与发展》 * |
王巍巍 等: "双语影视知识图谱的构建研究", 《北京大学学报(自然科学版)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401083A (en) * | 2019-01-02 | 2020-07-10 | 阿里巴巴集团控股有限公司 | Name identification method and device, storage medium and processor |
CN111401083B (en) * | 2019-01-02 | 2023-05-02 | 阿里巴巴集团控股有限公司 | Name identification method and device, storage medium and processor |
Also Published As
Publication number | Publication date |
---|---|
CN108255806B (en) | 2021-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109145153B (en) | Intention category identification method and device | |
CN105718586B (en) | The method and device of participle | |
US8073877B2 (en) | Scalable semi-structured named entity detection | |
CN111783518A (en) | Training sample generation method and device, electronic equipment and readable storage medium | |
CN105045852A (en) | Full-text search engine system for teaching resources | |
CN108984661A (en) | Entity alignment schemes and device in a kind of knowledge mapping | |
CN104933039A (en) | Entity link system for language lacking resources | |
CN111694927B (en) | Automatic document review method based on improved word shift distance algorithm | |
CN112395420A (en) | Video content retrieval method and device, computer equipment and storage medium | |
CN111291177A (en) | Information processing method and device and computer storage medium | |
CN110008473B (en) | Medical text named entity identification and labeling method based on iteration method | |
CN110413998B (en) | Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof | |
CN103823857A (en) | Space information searching method based on natural language processing | |
CN110674378A (en) | Chinese semantic recognition method based on cosine similarity and minimum editing distance | |
CN111488468A (en) | Geographic information knowledge point extraction method and device, storage medium and computer equipment | |
Stern et al. | A joint named entity recognition and entity linking system | |
CN106021234A (en) | Label extraction method and system | |
CN106897274B (en) | Cross-language comment replying method | |
CN108345694B (en) | Document retrieval method and system based on theme database | |
CN110008314B (en) | Intention analysis method and device | |
CN108255806A (en) | A kind of name recognition methods and device | |
Ali et al. | Arabic keyphrases extraction using a hybrid of statistical and machine learning methods | |
CN114548109B (en) | Named entity recognition model training method and named entity recognition method | |
CN114430832A (en) | Data processing method and device, electronic equipment and storage medium | |
CN110705285A (en) | Government affair text subject word bank construction method, device, server and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |