CN108255806B

CN108255806B - Name recognition method and device

Info

Publication number: CN108255806B
Application number: CN201711414492.9A
Authority: CN
Inventors: 刘兵; 苗艳军
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2017-12-22
Filing date: 2017-12-22
Publication date: 2021-12-17
Anticipated expiration: 2037-12-22
Also published as: CN108255806A

Abstract

The invention provides a name recognition method and a name recognition device, which can automatically recognize names contained in a name corpus to be recognized by using a name recognition model. Because the names contained in the name knowledge base are comprehensive, the models trained by the name knowledge base and the video knowledge base have certain accuracy for identifying ambiguous names, and meanwhile, names can be identified, so that the overall effect of name identification is improved.

Description

Name recognition method and device

Technical Field

The invention relates to the technical field of internet search, in particular to a name identification method and device.

Background

Named Entity Recognition (NER), also known as "proper name Recognition," specifically refers to recognizing Entity names of specific significance in text, such as person names, place names, and organization names. In the video industry, for example, a large number of names exist in video titles and entertainment news, and the identification effect of the names contained in the text greatly influences the popularization of video application products.

Currently, name recognition is mainly achieved by constructing a general name recognition model, such as a classification model or a conditional random field model. However, ambiguous name mentions, such as "dawn", are likely to appear in the text to be recognized, and the recognition error rate of the universal name recognition model for the ambiguous name mentions is high, so that the effect of video application products such as video search and video push is affected.

Disclosure of Invention

In view of the above, the present invention provides a method and an apparatus for identifying names, so as to solve the problem that the existing universal name identification model has a high error rate in identifying ambiguous names, thereby affecting the effects of video application products such as video search and video push. The technical scheme is as follows:

a person name recognition method, comprising:

receiving name corpus of a person to be identified;

calling a name recognition model, wherein the name recognition model is obtained by training a preset machine learning classification model by utilizing a name knowledge base and a video knowledge base in advance;

and identifying the name contained in the name corpus to be identified by using the name identification model.

Preferably, the process of training the preset machine learning classification model by using the name knowledge base and the video knowledge base in advance to obtain the name recognition model comprises the following steps:

extracting an ambiguous name list and a credible name list from a name knowledge base;

extracting ambiguous name corpus from a video knowledge base based on the ambiguous name list, and extracting ambiguous name features of the ambiguous name corpus;

extracting a credible name corpus from the video knowledge base based on the credible name list, and extracting credible name features of the credible name corpus;

and training a preset machine learning classification model by using the ambiguous name features and the credible name features to obtain a name recognition model.

Preferably, the extracting the list of ambiguous names and the list of trusted names from the name knowledge base includes:

acquiring ambiguous names from a name knowledge base, and generating an ambiguous name list containing the ambiguous names;

acquiring non-ambiguous names except the ambiguous names from the name knowledge base;

calling a search log, and selecting a credible name from the non-ambiguous names by using the search log;

and generating a trusted person name list containing the trusted person name.

Preferably, the extracting the ambiguous person name corpus from the video knowledge base based on the ambiguous person name list includes:

for each video text in a video knowledge base, acquiring a title of the video text;

performing word segmentation on the titles to obtain a plurality of title phrases;

judging whether the plurality of title phrases contain at least one ambiguous name in the ambiguous name list;

if yes, determining the video text as an ambiguous name text;

and generating an ambiguous name corpus comprising all the ambiguous name texts.

Preferably, the determining the video text as the ambiguous person name text further includes:

respectively calculating text similarity distances between the video text and at least one ambiguous name contained in the video text;

judging whether a text similar distance larger than a distance threshold exists in at least one text similar distance obtained through calculation;

and if so, executing the step of determining the video text as the ambiguous name text.

Preferably, the extracting the ambiguous person name feature of the ambiguous person name corpus includes:

extracting features of the ambiguous name corpus to obtain a first feature list, wherein the first feature list comprises a plurality of features;

performing word segmentation on the ambiguous name corpus to obtain a plurality of first corpus word groups;

and respectively adding labels to the features which are the same as the first corpus phrases in the first feature table and the ambiguous names which extract the ambiguous name corpus, and converting the features added with the labels and the ambiguous names into feature sequences.

Preferably, the extracting of the credible name features of the credible name corpus includes:

performing feature extraction on the credible name corpus to obtain a second feature list, wherein the second feature list comprises a plurality of features;

performing word segmentation on the credible name corpus to obtain a plurality of second corpus word groups;

and respectively adding labels to the characteristics which are the same as the second corpus phrases in the second characteristic table and the credible names which extract the credible name corpus, and converting the characteristics added with the labels and the credible names into characteristic sequences.

A person name recognition apparatus comprising: the system comprises a corpus receiving module, a model calling module and a name identification module, wherein the model calling module comprises a model generating unit;

the corpus receiving module is used for receiving a corpus of names of people to be identified;

the model generation unit is used for training a preset machine learning classification model by utilizing a name knowledge base and a video knowledge base in advance to obtain a name recognition model;

the model calling module is used for calling the name recognition model;

and the name recognition module is used for recognizing names contained in the name corpus to be recognized by using the name recognition model.

Preferably, the model generating unit is specifically configured to:

extracting an ambiguous name list and a credible name list from a name knowledge base; extracting ambiguous name corpus from a video knowledge base based on the ambiguous name list, and extracting ambiguous name features of the ambiguous name corpus; extracting a credible name corpus from the video knowledge base based on the credible name list, and extracting credible name features of the credible name corpus; and training a preset machine learning classification model by using the ambiguous name features and the credible name features to obtain a name recognition model.

Preferably, the model generating unit, configured to extract the ambiguous name list and the reliable name list from the name knowledge base, is specifically configured to:

acquiring ambiguous names from a name knowledge base, and generating an ambiguous name list containing the ambiguous names; acquiring non-ambiguous names except the ambiguous names from the name knowledge base; calling a search log, and selecting a credible name from the non-ambiguous names by using the search log; and generating a trusted person name list containing the trusted person name.

The name recognition method and the name recognition device provided by the invention can automatically recognize the names contained in the name corpus to be recognized by using the name recognition model. Because the names contained in the name knowledge base are comprehensive, the models trained by the name knowledge base and the video knowledge base have certain accuracy for identifying ambiguous names, and meanwhile, names can be identified, so that the overall effect of name identification is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a method for generating a name recognition model according to an embodiment of the present invention;

fig. 2 is a flowchart of a method in step S20 of a person name recognition model generation method according to an embodiment of the present invention;

fig. 3 is a flowchart of a method for "extracting an ambiguous name list and a trusted name list from a name knowledge base" in step S201 in a method for generating a name recognition model according to an embodiment of the present invention;

fig. 4 is a flowchart of a method for extracting ambiguous-person-name corpus from a video knowledge base based on an ambiguous-person-name list in step S202 in a person-name recognition model generation method according to an embodiment of the present invention;

fig. 5 is a flowchart of a method for extracting ambiguous name features of an ambiguous name corpus in step S202 in a method for generating a name recognition model according to an embodiment of the present invention;

fig. 6 is a flowchart of a method for "extracting a trusted person name feature of a trusted person name corpus" in step S203 in the person name recognition model generation method according to the embodiment of the present invention;

fig. 7 is a schematic structural diagram of a person name recognition model generation apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a person name identification method, and the method has a flow chart as shown in figure 1, and comprises the following steps:

s10, receiving name corpus of the person to be recognized;

s20, calling a name recognition model, wherein the name recognition model is obtained by training a preset machine learning classification model by utilizing a name knowledge base and a video knowledge base in advance;

in the embodiment of the invention, a background database of a video search engine company is provided with a name knowledge base and a video knowledge base.

The name knowledge base has a large amount of name data. The name can be a credible name and an ambiguous name, wherein the credible name refers to a conventional name, namely when the credible name is referred, the name can be determined to be the name, and the ambiguous name refers to an unconventional name, namely when the ambiguous name is referred, the name cannot be determined to be the name, and whether the name is referred to as the name or the non-name cannot be determined. In addition, the person name knowledge base stores the structured field information of the person name, and the structured field information comprises native place, birthday, alias, life story, related works, related persons and the like.

The video knowledge base includes video title text and video introduction text.

Respectively extracting ambiguous names and credible names from a name knowledge base; then, extracting ambiguous person name linguistic data from a video knowledge base by using the ambiguous person name, and extracting credible person name linguistic data from the video knowledge base by using the credible person name; and finally, training a preset machine learning classification model by using the ambiguous name corpus and the credible name corpus to obtain a name recognition model.

In the specific implementation process, in step S20, the process of "training the preset machine learning classification model by using the name knowledge base and the video knowledge base in advance to obtain the name recognition model" is as follows, fig. 2:

s201, extracting an ambiguous name list and a credible name list from a name knowledge base;

in this embodiment, since the number of the credible names in the name knowledge base is much larger than the number of the ambiguous names, the ambiguous name list may be first obtained from the name knowledge base, and then the credible name list may be obtained from the remaining non-ambiguous names.

In a specific implementation process, in the process of "extracting an ambiguous name list and a trusted name list from a name knowledge base" in step S201, the following steps may be specifically adopted, and a flowchart of the method is shown in fig. 3:

s1001, acquiring ambiguous names from a name knowledge base and generating an ambiguous name list containing the ambiguous names;

in the process of executing step S1001, the names in the name knowledge base may be compared with the segmentation dictionary, and names overlapping with words of non-name part of speech in the segmentation dictionary are ambiguous names. Of course, when a certain name does not exist in the word segmentation dictionary, prompt information can be generated to prompt a user to determine whether the name is an ambiguous name, and the ambiguous name is added to the word segmentation dictionary after the part of speech is calibrated, so that the comprehensiveness of the word segmentation dictionary is increased.

It should be noted that the parts of speech of each phrase in the word segmentation dictionary are pre-labeled and are divided into parts of speech of human names and parts of speech of non-human names, that is, a phrase belongs to only one part of speech and belongs to a human name or a non-human name.

S1002, acquiring non-ambiguous names except for ambiguous names from a name knowledge base;

s1003, calling a search log, and selecting a credible person name from the non-ambiguous person names by using the search log;

in the process of executing step S1003, the search log is used to count the number of times of searching for the unambiguous person name, the unambiguous person name whose number of times of searching is higher than the threshold value is regarded as a popular name, and the popular name is selected as the authentic person name.

Certainly, for the selection of the credible names, the names can be sorted from high to low according to the search times, and the names with the preset number of search times are selected as the credible names.

S1004, a list of trusted people names including the trusted people name is generated.

S202, extracting ambiguous name corpus from a video knowledge base based on an ambiguous name list, and extracting ambiguous name features of the ambiguous name corpus;

in the process of executing step S202, the video knowledge base includes a corpus of ambiguous names, that is, a corpus of ambiguous names; further, the ambiguous person name features are extracted from the ambiguous person name corpus according to preset features.

In a specific implementation process, in the process of "extracting ambiguous-person-name corpus from the video knowledge base based on the ambiguous-person-name list" in step S202, the following steps may be specifically adopted, and a flowchart of the method is shown in fig. 4:

s1005, acquiring the title of the video text for each video text in the video knowledge base;

in this embodiment, since the probability that the video title text contains the name of the person is higher, the video title text is selected as the source of the ambiguous name corpus.

S1006, segmenting the title to obtain a plurality of title phrases;

in the process of executing step S1006, it is assumed that a certain news title is segmented into a title phrase a, a title phrase b, a title phrase c, and a title phrase d.

S1007, determining whether the plurality of title phrases include at least one ambiguous name in the ambiguous name list;

s1008, if yes, determining the video text as an ambiguous name text;

in the practical application process, in order to select a more effective ambiguous name text from a video knowledge base, a text similarity distance between the video text and at least one ambiguous name contained in the video text can be calculated; judging whether a text similar distance larger than a distance threshold exists in the at least one text similar distance obtained by calculation; if yes, the step of determining the video text as the ambiguous name text is executed.

In the process of calculating the text similarity distance: the name knowledge base stores structured field information of names, and the structured field information can be used as knowledge characteristics, for example, the title phrase a is an ambiguous name a, and the knowledge characteristics of the ambiguous name a include: "person name B", "person name C", "program 1", "program 2", and "program 3". Therefore, the method can convert the step of calculating the similar distance between the video text and the ambiguous person name into the step of calculating the similar distance between the video text and the knowledge characteristic. Therefore, the calculation method of the similarity distance is a text similarity calculation method.

S1009 generates an ambiguous name corpus including all ambiguous name texts.

In a specific implementation process, the process of "extracting ambiguous person name features of ambiguous person name corpus" in step S202 may specifically adopt the following steps, and a flowchart of the method is shown in fig. 5:

s1010, extracting features of the ambiguous person name corpus to obtain a first feature list, wherein the first feature list comprises a plurality of features;

in the feature selection process, the inventor of the application finds that through data observation and implementation analysis, the mentioning context of the name of a person has two strongly related features: first, words or part-of-speech features in the left and right windows, e.g., "concert" and "" and "; second, it is a strongly correlated knowledge feature referred to by a person name, e.g., the strongly correlated knowledge feature "person name B" of ambiguous person name a.

According to the window features and the CONTEXT features, feature extraction is performed on the ambiguous person name corpus, for example, feature extraction is performed on the news headlines to obtain a feature list shown in table 1, all features are numbered, and the features in the feature list are represented as integer serial numbers, wherein CONTEXT _ KNOWLEDGE _ FEA is a CONTEXT KNOWLEDGE feature.

Characteristic serial number	Feature(s)
		1	CONTEXT_KNOWLEDGE_FEA
2	T01/heading phrase b
		3	T02/heading phrase c
……	……

TABLE 1

S1011, performing word segmentation on ambiguous name corpora to obtain a plurality of first corpus phrases;

s1012, respectively adding labels to the features of the first feature table, which are the same as the first corpus phrases, and the ambiguous names of the extracted ambiguous name corpus, and converting the labeled features and the ambiguous names into feature sequences;

in the process of step S1012, a negative label is added to the feature of the first feature table that is the same as the first corpus phrase, and a positive label is added to the ambiguous person name when the ambiguous feature corpus is extracted. For example, the conversion results for ambiguous person name a with a positive label and feature title phrase b with a negative label are as follows:

label (R)	Characteristic sequence
		1	1:1 2:1 3:1 4:1……
2	5:1 6:1 7:1……
		……	……

TABLE 2

S203, extracting a credible name corpus from the video knowledge base based on the credible name list, and extracting credible name features of the credible name corpus;

in this embodiment, since the trusted person name is unambiguous, the name mentions in any context can be determined to appear as the name, and all video texts containing the trusted person name can be extracted from the video knowledge base as the corpus of the trusted person name. The video title text has higher possibility of containing the name of the person, so that the video title text is selected as the source of the credible name corpus, and the video title text containing the credible name of the person can be used as the credible name corpus.

In the specific implementation process, the process of "extracting the credible name features of the credible name corpus" in step S203 may specifically adopt the following steps, and a flowchart of the method is shown in fig. 6:

s1013, performing feature extraction on the credible name corpus to obtain a second feature list, wherein the second feature list comprises a plurality of features;

in the feature selection process, the inventor of the application finds that through data observation and implementation analysis, the mentioning context of the name of a person has two strongly related features: first, words or part-of-speech features in the left and right windows, e.g., "concert" and "" and "; second, it is a strongly related knowledge characteristic referred to by names of people.

According to the window characteristics and the context characteristics, performing characteristic extraction on the ambiguous name corpus

S1014, performing word segmentation on the credible name corpus to obtain a plurality of second corpus phrases;

and S1015, adding labels to the features in the second feature table, which are the same as the features in the second corpus phrase, and converting the features after adding the labels into feature sequences.

And S204, training a preset machine learning classification model by using the ambiguous name features and the credible name features to obtain a name recognition model.

In this embodiment, the preset machine learning classification model includes, but is not limited to, a support vector machine model, a logistic regression model, and the like, and may be selected according to actual needs, and this embodiment is not particularly limited.

And S30, identifying the names contained in the name corpus to be identified by using the name identification model.

The above steps S201 to S204 are only one preferred implementation manner of the process of "training the preset machine learning classification model to obtain the name recognition model by using the name knowledge base and the video knowledge base in advance" in step S20 of the embodiment of the present application, and the specific implementation manner of the process may be arbitrarily set according to the needs of the user, and is not limited herein.

The above steps S1001 to S1004 are only one preferred implementation manner of the process of "extracting the ambiguous name list and the trusted name list from the name knowledge base" in step S201 in the embodiment of the present application, and a specific implementation manner of the process may be arbitrarily set according to a requirement of the user, and is not limited herein.

The above steps S1005 to S1009 are only a preferred implementation manner of the process of "extracting the ambiguous-person-name corpus from the video knowledge base based on the ambiguous-person-name list" in step S202 in the embodiment of the present application, and a specific implementation manner of the process may be arbitrarily set according to a requirement of the user, which is not limited herein.

The above steps S1010 to S1012 are only a preferred implementation manner of the process of "extracting the ambiguous person name feature of the ambiguous person name corpus" in step S202 in the embodiment of the present application, and the specific implementation manner related to the process may be arbitrarily set according to the requirement thereof, and is not limited herein.

The above steps S1013 to S1015 are only one preferable implementation manner of the process of "extracting the credible name feature of the credible name corpus" in step S203 in this embodiment, and the specific implementation manner of this process may be arbitrarily set according to its own requirements, and is not limited herein.

The name recognition method provided by the embodiment of the invention can automatically recognize the names contained in the name corpus to be recognized by using the name recognition model. Because the names contained in the name knowledge base are comprehensive, the models trained by the name knowledge base and the video knowledge base have certain accuracy for identifying ambiguous names, and meanwhile, names can be identified, so that the overall effect of name identification is improved.

Based on the person name recognition method provided in the foregoing embodiment, an apparatus for performing the person name recognition method in the embodiment of the present invention is shown in fig. 7, and includes: the system comprises a corpus receiving module 10, a model calling module 20 and a name identifying module 30, wherein the model calling module 20 comprises a model generating unit 201;

the corpus receiving module 10 is used for receiving corpus of names of people to be identified;

the model generation unit 201 is used for training a preset machine learning classification model by using a name knowledge base and a video knowledge base in advance to obtain a name recognition model;

the model calling module 20 is used for calling a name recognition model;

and the name recognition module 30 is configured to recognize names included in the name corpus to be recognized by using the name recognition model.

In some other embodiments, the model generating unit 201 is specifically configured to:

extracting an ambiguous name list and a credible name list from a name knowledge base; extracting ambiguous name corpus from a video knowledge base based on the ambiguous name list, and extracting ambiguous name features of the ambiguous name corpus; extracting a credible name corpus from a video knowledge base based on a credible name list, and extracting credible name features of the credible name corpus; and training a preset machine learning classification model by using the ambiguous name features and the credible name features to obtain a name recognition model.

In some other embodiments, the model generating unit 201 for extracting the ambiguous name list and the trusted name list from the name knowledge base is specifically configured to:

acquiring ambiguous names from a name knowledge base, and generating an ambiguous name list containing the ambiguous names; acquiring non-ambiguous names except ambiguous names from a name knowledge base; calling a search log, and selecting a credible name from the non-ambiguous names by using the search log; and generating a trusted person name list containing the trusted person names.

The name recognition device provided by the embodiment of the invention can automatically recognize names contained in the name corpus to be recognized by using the name recognition model. Because the names contained in the name knowledge base are comprehensive, the models trained by the name knowledge base and the video knowledge base have certain accuracy for identifying ambiguous names, and meanwhile, names can be identified, so that the overall effect of name identification is improved.

The method and the device for identifying the name of the person provided by the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include or include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A person name recognition method is characterized by comprising the following steps:

receiving name corpus of a person to be identified;

calling a name recognition model, wherein the name recognition model is obtained by utilizing a name knowledge base and a video knowledge base to train a preset machine learning classification model in advance, specifically, an ambiguous name list and a credible name list are respectively extracted from the name knowledge base, an ambiguous name corpus is extracted from the video knowledge base by utilizing the ambiguous name list, ambiguous name features of the ambiguous name corpus are extracted, credible name corpora are extracted from the video knowledge base by utilizing the credible name list, credible name features of the credible name corpus are extracted, and the preset machine learning classification model is trained by utilizing the ambiguous name features of the ambiguous name corpus and the credible name features of the credible name corpus to obtain the name recognition model;

identifying the name contained in the name corpus to be identified by using the name identification model;

wherein the extracting of the ambiguous name feature of the ambiguous name corpus comprises: extracting features of the ambiguous name corpus to obtain a first feature list, wherein the first feature list comprises a plurality of features; performing word segmentation on the ambiguous name corpus to obtain a plurality of first corpus word groups; respectively adding labels to the features of the first feature table, which are the same as the first corpus phrases, and the ambiguous names of the ambiguous name corpus, and converting the features added with the labels and the ambiguous names into feature sequences;

wherein, the extracting the credible name characteristics of the credible name corpus comprises the following steps: performing feature extraction on the credible name corpus to obtain a second feature list, wherein the second feature list comprises a plurality of features; performing word segmentation on the credible name corpus to obtain a plurality of second corpus word groups; and respectively adding labels to the characteristics which are the same as the second corpus phrases in the second characteristic table and the credible names which extract the credible name corpus, and converting the characteristics added with the labels and the credible names into characteristic sequences.

2. The method of claim 1, wherein the extracting the list of ambiguous names and the list of trusted names from the name repository, respectively, comprises: acquiring ambiguous names from a name knowledge base, and generating an ambiguous name list containing the ambiguous names; acquiring non-ambiguous names except the ambiguous names from the name knowledge base; calling a search log, and selecting a credible name from the non-ambiguous names by using the search log; and generating a trusted person name list containing the trusted person name.

3. The method of claim 1, wherein the extracting the ambiguous speech corpus from the video knowledge base using the list of ambiguous speech corps comprises:

if yes, determining the video text as an ambiguous name text;

4. The method of claim 3, wherein the determining the video text as ambiguous person name text further comprises:

5. A person name recognition apparatus, comprising: the system comprises a corpus receiving module, a model calling module and a name identification module, wherein the model calling module comprises a model generating unit;

the model generation unit is used for training a preset machine learning classification model by utilizing a name knowledge base and a video knowledge base in advance to obtain a name recognition model, specifically, respectively extracting an ambiguous name list and a credible name list from the name knowledge base, extracting ambiguous name corpora from the video knowledge base by utilizing the ambiguous name list, extracting ambiguous name features of the ambiguous name corpora, extracting credible name corpora from the video knowledge base by utilizing the credible name list, extracting credible name features of the credible name corpora, and training the preset machine learning classification model by utilizing the ambiguous name features of the ambiguous name corpora and the credible name features of the credible name corpora to obtain the name recognition model;

the model calling module is used for calling the name recognition model;

the name recognition module is used for recognizing names contained in the name corpus to be recognized by using the name recognition model;

wherein the model generation unit extracts ambiguous name features of the ambiguous name corpus, including: the model generating unit extracts the features of the ambiguous person name corpus to obtain a first feature list, wherein the first feature list comprises a plurality of features; performing word segmentation on the ambiguous name corpus to obtain a plurality of first corpus word groups; respectively adding labels to the features of the first feature table, which are the same as the first corpus phrases, and the ambiguous names of the ambiguous name corpus, and converting the features added with the labels and the ambiguous names into feature sequences;

wherein, the model generating unit extracts the credible name characteristics of the credible name corpus, and the method comprises the following steps: the model generation unit performs feature extraction on the credible name corpus to obtain a second feature list, wherein the second feature list comprises a plurality of features; performing word segmentation on the credible name corpus to obtain a plurality of second corpus word groups; and respectively adding labels to the characteristics which are the same as the second corpus phrases in the second characteristic table and the credible names which extract the credible name corpus, and converting the characteristics added with the labels and the credible names into characteristic sequences.

6. The apparatus according to claim 5, wherein the model generation unit configured to extract the list of ambiguous names and the list of trusted names from the name repository, respectively, is specifically configured to: