CN112487816A - Named entity identification method based on network classification - Google Patents

Named entity identification method based on network classification Download PDF

Info

Publication number
CN112487816A
CN112487816A CN202011472395.7A CN202011472395A CN112487816A CN 112487816 A CN112487816 A CN 112487816A CN 202011472395 A CN202011472395 A CN 202011472395A CN 112487816 A CN112487816 A CN 112487816A
Authority
CN
China
Prior art keywords
named entity
individual
sample
classification
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011472395.7A
Other languages
Chinese (zh)
Other versions
CN112487816B (en
Inventor
苏延森
张宽宏
程凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202011472395.7A priority Critical patent/CN112487816B/en
Publication of CN112487816A publication Critical patent/CN112487816A/en
Application granted granted Critical
Publication of CN112487816B publication Critical patent/CN112487816B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a named entity identification method based on network classification, which comprises the following steps: 1: inputting named entity training sample text data and converting the named entity training sample text data into vector data; step 2: preprocessing the named entity training sample data; and step 3: constructing a network training named entity recognition model by iteratively selecting partial samples; named entity recognition includes: and 4, step 4: inputting sample data of a named entity to be identified; and 5: preprocessing the sample data of the named entity to be identified; step 6: and identifying sample data of the named entity to be identified through the named entity classification model, and judging the category of the named entity to which the sample data belongs. The method can quickly and effectively extract the key attributes of the named entity from massive texts and identify the category of the entity, improves the efficiency of named entity identification, and provides a basis for information extraction, question answering systems, syntactic analysis, machine translation and the like.

Description

Named entity identification method based on network classification
Technical Field
The invention relates to the field of natural language processing technology and named entity identification, in particular to a named entity identification method based on network classification.
Background
Named Entity Recognition (NER), also called "proper name Recognition", refers to recognizing entities with specific meaning in text, mainly including names of people, places, organizations, proper nouns, etc. It generally comprises two parts: (1) identifying entity boundaries; (2) entity categories (person name, place name, organization name, or others) are determined. NER is a fundamental key task in NLP. From the flow of natural language processing, NER can be regarded as one of the identification of unknown words in lexical analysis, and is a problem that the number of the unknown words is the largest, the identification difficulty is the largest, and the influence on the word segmentation effect is the largest. Meanwhile, the NER is also the basis of a plurality of NLP tasks such as relation extraction, event extraction, knowledge graph, machine translation, question-answering system and the like.
The focus of the named entity identification information extraction task is urgent in actual production, but the named entities are infinite in number, flexible in word formation, fuzzy in category and the like, and the named entities are difficult to identify. Traditional classification algorithms only take into account physical characteristics (such as similarity, distance, distribution, etc.) between data, and do not take into account semantic characteristics (such as the possible presence of contextual semantic information in text) between data.
Traditional classification learning methods, such as SVM and some other network-based classification algorithms, require the use of all training data in practical implementations, and the noise present in the enormous amount of data can reduce the efficiency of named entity recognition.
Disclosure of Invention
The invention provides a named entity identification method based on network classification to overcome the defects of the prior art, so that a classification network can be constructed by selecting part of named entity identification samples and the named entity samples to be detected are identified, the identification efficiency of the named entities is improved, and technical support is further provided for information extraction, question-answering system, syntactic analysis, machine translation and the like.
In order to achieve the purpose, the invention adopts the technical scheme that:
the invention relates to a named entity recognition method based on network classification, which is characterized by comprising the following steps:
the method comprises the following steps: training a named entity classification model:
step 1.1: obtaining text data of T named entity samples, and converting the text data into vector data psi ═ by using Word2Vec natural language processing tool1,y1),(x2,y2),…,(xt,yt),…,(xT,yT)),(xt,yt) Vector data representing the t-th named entity sample, where xtRepresents the attribute characteristics of the tth named entity sample, and
Figure BDA0002834423140000011
Figure BDA0002834423140000012
representing the attribute characteristics of the d-th named entity in the t-th named entity sample; y istA label representing the tth named entity sample, T ═ 1,2, …, T;
step 1.2: attribute feature x for the tth named entity sampletCarrying out standardization processing to obtain the characteristic vector of the t named entity sample
Figure BDA0002834423140000021
Figure BDA0002834423140000022
Representing the d characteristic of the named entity in the t sample of the named entity;
step 1.3: respectively constructing two objective functions f by using an equation (1) and an equation (2)1And f2
min f1=Rr(Vs) (1)
Figure BDA0002834423140000023
In the formula (1), VsFor vector data selected from the T vector data Ψ, Rr (V)s) Representing selected vector data VsThe ratio of T vector data Ψ;
in the formula (2), the reaction mixture is,
Figure BDA0002834423140000024
to use the selected vector data VsConstructing a classification network;
Figure BDA0002834423140000025
for classifying networks
Figure BDA0002834423140000026
The classification accuracy of (2);
step 1.4: taking a set of vector data of S named entity samples to be selected as an initial population P ═ { P }1,...,pS},pSRepresenting the vector data set of the S named entity sample to be selected and combining the vector data set as an individual;
encoding the initial population P by adopting a binary code with the length of T; if the individual pSThe ith bit in the binary code of (1) represents the attribute characteristic x of the ith named entity sampletIs selected and used to construct a classification network
Figure BDA0002834423140000027
Step 1.5: defining the current iteration times as N and the maximum iteration times as N; and initializing n-1; taking the initial population P as the parent population P of the nth iterationn
Step 1.6: parent population P iterated from nth through binary championshipsnIn which two individuals p are randomly selectedxAnd pyAnd respectively construct a classification network
Figure BDA0002834423140000028
And
Figure BDA0002834423140000029
if classifying the network
Figure BDA00028344231400000210
Higher accuracy than classification networks
Figure BDA00028344231400000211
The parent population P from the nth iterationnAcquiring higher than classified networks
Figure BDA00028344231400000212
All individuals of precision and randomly selecting an individual p from themz(ii) a For individual pyAnd pzPerforming cross mutation to obtain mutated individual p'yAnd p'z(ii) a From an individual py、p′yAnd'zThe individual with the highest classification network precision is selected to replace the individual py(ii) a Finally by the replaced individual pyWith the individual pxPerforming cross mutation to generate the offspring P of the nth iteration′n
Step 1.7: the parent population P of the nth iterationnAnd the child P of the nth iteration′nMerging to obtain a merged population of the nth iteration, and obtaining any individual p in the merged population of the nth iteration by using a formula (3)nImportance of (i) IMP (p)n):
IMP(pn)=α×Acc(pn)+(1-α)×(-Red(pn)) (3)
In the formula (3), alpha is a compromise factor Acc (p)n) Is an individual pnPrecision of (1), Red (p)n) Is an individual pnAnd has:
Red(pn)=(a1×b1+a2×b2+...+ai×bi+...+am×bm)/m (4)
in the formula (4), m is the nth timeIterative merging of populations except individual pnNumber of individuals other than; a isiIs an individual pnWith the n-th iteration dividing individual p in the combined populationnRedundancy of the i-th individual out of the others in source space, and by the individual pnThe number of samples of the same named entity as the ith individual chosen is divided by T, i ∈ { 1., m }; biIs an individual pnThe redundancy in the precision target space with the ith individual is obtained by equation (5):
Figure BDA0002834423140000031
in the formula (5), Acc (i) represents the accuracy of the classification network constructed by the ith individual, Acc (p)n) Representing an individual pnThe accuracy of the constructed classification network;
step 1.8: obtaining all individuals p in the combined population of the nth iteration according to the formula (3)nAnd selects the first S individuals as the parent population P of the (n + 1) th iterationn
Step 1.9: assigning N +1 to N, judging whether N is greater than N, if so, selecting vector data of a named entity sample corresponding to an individual with the highest classification network precision in the parent population of the Nth iteration and using the vector data to construct an optimal network classifier, and executing the step two, otherwise, returning to the step 1.6 to execute;
step two: named entity recognition:
step 2.1: inputting text data of a named entity sample to be identified, processing according to the step 1.1 and the step 1.2, and obtaining a feature vector of the sample to be detected;
step 2.3: and classifying the characteristic vectors of the samples to be detected by using the optimal network classifier, wherein the obtained labels represent named entities corresponding to the samples to be detected.
The named entity recognition method based on network classification is characterized in that the classification network in the formula (6)
Figure BDA0002834423140000032
The method is a construction mode of a k-associative optimal graph adopting Euclidean distance, and comprises the following steps:
for feature vectors
Figure BDA0002834423140000033
The Euclidean distance d between d feature vectors related to the named entity in the t named entity sample and d feature vectors related to the named entity in the i named entity sample is obtained by using the formula (6)tiAnd selecting k named entities with the same category and the nearest distance to establish network connection, thereby forming a classified network:
Figure BDA0002834423140000041
in the formula (6), the reaction mixture is,
Figure BDA0002834423140000042
representing the d-th named entity related feature vector in the t-th named entity sample.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention is different from the traditional classification method, provides a named entity identification method based on network classification, comprehensively considers the physical and semantic characteristics of sample data of a named entity, constructs a classification network by screening and training the sample data of the named entity, and eliminates noise points, thereby being capable of identifying the named entity more efficiently.
2. The present invention defines two goals: the number of samples in the selected named entity identification sample set and the selected named entity identification sample set constitute the optimization problem of the classification precision of the network, high-quality named entity sample data are selected by optimizing the two points, and the classification network with better classification effect is constituted, so that the performance and the accuracy of named entity identification are improved.
3. In the iteration process, a solution generating strategy based on precision preference is adopted, and precision guidance is carried out on a low-precision named entity identification sample set to obtain more excellent filial generation, so that the quality of the to-be-constructed classification network is effectively improved, and the classifier finally used for named entity identification has better classification effect and higher identification accuracy.
4. In the process of selecting the next generation named entity identification sample set, the importance-based solution selection strategy is adopted, and the method can enter the next generation through more excellent importance sorting selection of all named entity identification sample sets, so that continuous optimization in the iteration process is ensured, and the classifier finally used for named entity identification has better classification effect and more excellent performance.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In this embodiment, a method for identifying a named entity based on network classification includes a step of training a named entity classification model and a step of identifying the named entity, and specifically, as shown in fig. 1, the method includes the following steps:
the method comprises the following steps: training a named entity classification model:
step 1.1: taking named entity identification of a person name as an example, text data of T named entity samples is obtained, and the text data is converted into vector data psi ═ x by using a Word2Vec natural language processing tool1,y1),(x2,y2),…,(xt,yt),…,(xT,yT)),(xt,yt) Vector data representing the t-th named entity sample, where xtRepresents the attribute characteristics of the tth named entity sample, and
Figure BDA0002834423140000043
Figure BDA0002834423140000044
the attribute characteristics of the d-th named entity in the sample of the t-th named entity, namely the attribute characteristics describing the name of the t-th person, are shown, and the birth time, the native place, the height, the weight, the nickname, the main contribution and the like are common; y istRepresenting the tth named entityThe label of the sample is a mark that the named entity belongs to a certain category, and is a name of a person. Thus converting the named entity recognition problem into a multi-class problem, ytThe name of the person represented by the label is described in the tth named entity sample, and T is 1,2, …, T;
step 1.2: attribute feature x for the tth named entity sampletCarrying out standardization processing to obtain the characteristic vector of the t named entity sample
Figure BDA0002834423140000051
Figure BDA0002834423140000052
Representing the d characteristic of the named entity in the t sample of the named entity;
step 1.3: respectively constructing two objective functions f by using an equation (1) and an equation (2)1And f2The goals are to minimize:
min f1=Rr(Vs) (1)
Figure BDA0002834423140000053
in the formula (1), VsFor vector data selected from the T vector data Ψ, Rr (V)s) Representing selected vector data VsThe ratio of T vector data Ψ;
in the formula (2), the reaction mixture is,
Figure BDA0002834423140000054
to use the selected vector data VsConstructing a classification network;
Figure BDA0002834423140000055
for classifying networks
Figure BDA0002834423140000056
The classification accuracy of (2);
step 1.4: set of vector data in S named entity samples to be selectedAs an initial population P ═ P1,...,pS},pSRepresenting the vector data set of the S named entity sample to be selected and combining the vector data set as an individual;
encoding the initial population P by adopting a binary code with the length of T; if the individual pSThe ith bit in the binary code of (1) represents the attribute characteristic x of the ith named entity sampletIs selected and used to construct a classification network
Figure BDA0002834423140000057
For example, assume a total of 10 named entity samples and pSIn (3), 5, 8, 9) is 1, then pSSelectively naming an entity recognition sample set as (x)3,x5,x8,x9);
Step 1.5: defining the current iteration times as N and the maximum iteration times as N; and initializing n-1; taking the initial population P as the parent population P of the nth iterationn
Step 1.6: parent population P iterated from nth through binary championshipsnIn which two individuals p are randomly selectedxAnd pyAnd respectively construct a classification network
Figure BDA0002834423140000058
And
Figure BDA0002834423140000059
classifying by using the constructed network; if classifying the network
Figure BDA00028344231400000510
Higher accuracy than classification networks
Figure BDA00028344231400000511
The parent population P from the nth iterationnAcquiring higher than classified networks
Figure BDA00028344231400000512
All individuals of precision and randomly selecting an individual p from themz(ii) a For individual pyAnd pzPerforming cross mutation to obtain mutated individual p'yAnd p'z(ii) a From an individual py、p′yAnd p'zThe individual with the highest classification network precision is selected to replace the individual pyThus, poor ones of the two are guided and excellent guided individuals are obtained; finally by the replaced individual pyWith the individual pxPerforming cross mutation to generate the offspring P of the nth iteration′n
Step 1.7: the parent population P of the nth iterationnAnd the child P of the nth iteration′nMerging to obtain a merged population of the nth iteration, and obtaining any individual p in the merged population of the nth iteration by using a formula (3)nImportance of (i) IMP (p)n):
IMP(pn)=α×Acc(pn)+(1-α)×(-Red(pn)) (3)
In formula (3), α is a compromise factor, usually 0.8, Acc (p)n) Is an individual pnPrecision of (1), Red (p)n) Is an individual pnThe importance obtained by integrating the accuracy and the redundancy has a more balanced evaluation on the individuals, and the method comprises the following steps:
Red(pn)=(a1×b1+a2×b2+...+ai×bi+...+am×bm)/m (4)
in the formula (4), m is the dividing individual p in the combined population of the nth iterationnNumber of individuals other than; a isiIs an individual pnWith the n-th iteration dividing individual p in the combined populationnRedundancy of the i-th individual out of the others in source space, and by the individual pnThe number of samples of the same named entity as the ith individual chosen, i ∈ { 1., m }, a, is divided by T to yieldiThe larger the indication of an individual pnThe higher the redundancy in source space with the individual i; biIs an individual pnThe redundancy of the ith individual in the precision target space is combined with the redundancy of the source space and the precision target space, the redundancy analysis of each individual is clear and reasonable, and the judgment effect on the subsequent importance is larger, so thatFormula (5) is obtained:
Figure BDA0002834423140000061
in the formula (5), Acc (i) represents the accuracy of the classification network constructed by the ith individual, Acc (p)n) Representing an individual pnAccuracy of the constructed classification network, biThe larger the indication of an individual pnThe higher the spatial redundancy with the individual i at the precision target;
step 1.8: obtaining all individuals p in the combined population of the nth iteration according to the formula (3)nAnd selects the first S individuals as the parent population P of the (n + 1) th iterationn
Step 1.9: assigning N +1 to N, judging whether N is greater than N, if so, selecting vector data of a named entity sample corresponding to an individual with the highest classification network precision in the parent population of the Nth iteration and using the vector data to construct an optimal network classifier, and executing the step two, otherwise, returning to the step 1.6 to execute;
step two: and (3) named entity identification, namely classifying the sample to be detected by utilizing the most network classifier obtained in the step one:
step 2.1: inputting text data of a named entity sample to be identified, processing according to the step 1.1 and the step 1.2, and obtaining a feature vector of the sample to be detected, wherein common features comprise birth time, native height, weight, nickname, main contribution and the like;
step 2.3: and classifying the characteristic vectors of the samples to be detected by using the optimal network classifier, wherein the obtained labels represent named entities corresponding to the samples to be detected.
2. A method for named entity recognition based on network classification according to claim 1, characterised in that the classification network in formula (6)
Figure BDA0002834423140000071
The method is a construction mode of a k-associative optimal graph adopting Euclidean distance, and comprises the following steps:
for feature vectors
Figure BDA0002834423140000072
The Euclidean distance d between d feature vectors related to the named entity in the t named entity sample and d feature vectors related to the named entity in the i named entity sample is obtained by using the formula (6)tiAnd selecting k named entities with the same category and the nearest distance to establish network connection, thereby forming a classified network:
Figure BDA0002834423140000073
in the formula (6), the reaction mixture is,
Figure BDA0002834423140000074
representing the d-th named entity related feature vector in the t-th named entity sample.
The method is tested and verified by objectively collected data.
1) Acquiring text data of named entity samples related to the names of people, namely acquiring sentences or paragraphs related to the names of people in documents, converting the text data of the real world into vector data which can be processed by a computer by using a Word2Vec tool, dividing a processed data set into training samples and test samples, selecting the optimal training samples through cross validation by ten folds to construct a classification network, and carrying out named entity recognition on the test samples.
2) And evaluating the index;
and the classification precision is used as an evaluation index of the example to evaluate the performance of named entity recognition. The higher the precision is, the better the classification effect is represented, and the higher the identification accuracy is.
3) Performing an experiment on the data set;
the effectiveness of the invention was verified by experimental results on a data set. Today, the information is highly diversified, named entities are accurately and efficiently identified from texts, and the analysis of the named entities is particularly important. Experiments show that the method can quickly and effectively extract the key attributes of the named entities from massive texts and identify the categories of the entities, improves the efficiency of named entity identification, and provides a basis for information extraction, question-answering systems, syntactic analysis, machine translation and the like.

Claims (2)

1. A named entity recognition method based on network classification is characterized by comprising the following steps:
the method comprises the following steps: training a named entity classification model:
step 1.1: obtaining text data of T named entity samples, and converting the text data into vector data psi ═ by using Word2Vec natural language processing tool1,y1),(x2,y2),…,(xt,yt),…,(xT,yT)),(xt,yt) Vector data representing the t-th named entity sample, where xtRepresents the attribute characteristics of the tth named entity sample, and
Figure FDA0002834423130000011
Figure FDA0002834423130000012
representing the attribute characteristics of the d-th named entity in the t-th named entity sample; y istA label representing the tth named entity sample, T ═ 1,2, …, T;
step 1.2: attribute feature x for the tth named entity sampletCarrying out standardization processing to obtain the characteristic vector of the t named entity sample
Figure FDA0002834423130000013
Figure FDA0002834423130000014
Representing the d characteristic of the named entity in the t sample of the named entity;
step 1.3: respectively constructing two objective functions f by using an equation (1) and an equation (2)1And f2
minf1=Rr(Vs) (1)
Figure FDA0002834423130000015
In the formula (1), VsFor vector data selected from the T vector data Ψ, Rr (V)s) Representing selected vector data VsThe ratio of T vector data Ψ;
in the formula (2), the reaction mixture is,
Figure FDA0002834423130000016
to use the selected vector data VsConstructing a classification network;
Figure FDA0002834423130000017
for classifying networks
Figure FDA0002834423130000018
The classification accuracy of (2);
step 1.4: taking a set of vector data of S named entity samples to be selected as an initial population P ═ { P }1,...,pS},pSRepresenting the vector data set of the S named entity sample to be selected and combining the vector data set as an individual;
encoding the initial population P by adopting a binary code with the length of T; if the individual pSThe ith bit in the binary code of (1) represents the attribute characteristic x of the ith named entity sampletIs selected and used to construct a classification network
Figure FDA0002834423130000019
Step 1.5: defining the current iteration times as N and the maximum iteration times as N; and initializing n-1; taking the initial population P as the parent population P of the nth iterationn
Step 1.6: parent population P iterated from nth through binary championshipsnIn which two individuals p are randomly selectedxAnd pyAnd are constructed separatelyBuilding classification networks
Figure FDA00028344231300000110
And
Figure FDA00028344231300000111
if classifying the network
Figure FDA00028344231300000112
Higher accuracy than classification networks
Figure FDA00028344231300000113
The parent population P from the nth iterationnAcquiring higher than classified networks
Figure FDA0002834423130000021
All individuals of precision and randomly selecting an individual p from themz(ii) a For individual pyAnd pzPerforming cross mutation to obtain mutated individual p'yAnd p'z(ii) a From an individual py、p′yAnd p'zThe individual with the highest classification network precision is selected to replace the individual py(ii) a Finally by the replaced individual pyWith the individual pxPerforming cross-mutation to generate offspring P 'of n iteration'n
Step 1.7: the parent population P of the nth iterationnAnd child P 'of nth iteration'nMerging to obtain a merged population of the nth iteration, and obtaining any individual p in the merged population of the nth iteration by using a formula (3)nImportance of (i) IMP (p)n):
IMP(pn)=α×Acc(pn)+(1-α)×(-Red(pn)) (3)
In the formula (3), alpha is a compromise factor Acc (p)n) Is an individual pnPrecision of (1), Red (p)n) Is an individual pnAnd has:
Red(pn)=(a1×b1+a2×b2+...+ai×bi+...+am×bm)/m (4)
in the formula (4), m is the dividing individual p in the combined population of the nth iterationnNumber of individuals other than; a isiIs an individual pnWith the n-th iteration dividing individual p in the combined populationnRedundancy of the i-th individual out of the others in source space, and by the individual pnThe number of samples of the same named entity as the ith individual chosen is divided by T, i ∈ { 1., m }; biIs an individual pnThe redundancy in the precision target space with the ith individual is obtained by equation (5):
Figure FDA0002834423130000022
in the formula (5), Acc (i) represents the accuracy of the classification network constructed by the ith individual, Acc (p)n) Representing an individual pnThe accuracy of the constructed classification network;
step 1.8: obtaining all individuals p in the combined population of the nth iteration according to the formula (3)nAnd selects the first S individuals as the parent population P of the (n + 1) th iterationn
Step 1.9: assigning N +1 to N, judging whether N is greater than N, if so, selecting vector data of a named entity sample corresponding to an individual with the highest classification network precision in the parent population of the Nth iteration and using the vector data to construct an optimal network classifier, and executing the step two, otherwise, returning to the step 1.6 to execute;
step two: named entity recognition:
step 2.1: inputting text data of a named entity sample to be identified, processing according to the step 1.1 and the step 1.2, and obtaining a feature vector of the sample to be detected;
step 2.3: and classifying the characteristic vectors of the samples to be detected by using the optimal network classifier, wherein the obtained labels represent named entities corresponding to the samples to be detected.
2. The named entity recognition based on network classification as claimed in claim 1Method, characterized in that the classification network in said formula (6)
Figure FDA0002834423130000031
The method is a construction mode of a k-associative optimal graph adopting Euclidean distance, and comprises the following steps:
for feature vectors
Figure FDA0002834423130000032
The Euclidean distance d between d feature vectors related to the named entity in the t named entity sample and d feature vectors related to the named entity in the i named entity sample is obtained by using the formula (6)tiAnd selecting k named entities with the same category and the nearest distance to establish network connection, thereby forming a classified network:
Figure FDA0002834423130000033
in the formula (6), the reaction mixture is,
Figure FDA0002834423130000034
representing the d-th named entity related feature vector in the t-th named entity sample.
CN202011472395.7A 2020-12-14 2020-12-14 Named entity identification method based on network classification Active CN112487816B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011472395.7A CN112487816B (en) 2020-12-14 2020-12-14 Named entity identification method based on network classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011472395.7A CN112487816B (en) 2020-12-14 2020-12-14 Named entity identification method based on network classification

Publications (2)

Publication Number Publication Date
CN112487816A true CN112487816A (en) 2021-03-12
CN112487816B CN112487816B (en) 2024-02-13

Family

ID=74916987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011472395.7A Active CN112487816B (en) 2020-12-14 2020-12-14 Named entity identification method based on network classification

Country Status (1)

Country Link
CN (1) CN112487816B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007137487A1 (en) * 2006-05-15 2007-12-06 Panasonic Corporation Method and apparatus for named entity recognition in natural language
CN107203511A (en) * 2017-05-27 2017-09-26 中国矿业大学 A kind of network text name entity recognition method based on neutral net probability disambiguation
WO2018072351A1 (en) * 2016-10-20 2018-04-26 北京工业大学 Method for optimizing support vector machine on basis of particle swarm optimization algorithm
CN109581339A (en) * 2018-11-16 2019-04-05 西安理工大学 A kind of sonar recognition methods based on brainstorming adjust automatically autoencoder network
CN110162795A (en) * 2019-05-30 2019-08-23 重庆大学 A kind of adaptive cross-cutting name entity recognition method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007137487A1 (en) * 2006-05-15 2007-12-06 Panasonic Corporation Method and apparatus for named entity recognition in natural language
WO2018072351A1 (en) * 2016-10-20 2018-04-26 北京工业大学 Method for optimizing support vector machine on basis of particle swarm optimization algorithm
CN107203511A (en) * 2017-05-27 2017-09-26 中国矿业大学 A kind of network text name entity recognition method based on neutral net probability disambiguation
CN109581339A (en) * 2018-11-16 2019-04-05 西安理工大学 A kind of sonar recognition methods based on brainstorming adjust automatically autoencoder network
CN110162795A (en) * 2019-05-30 2019-08-23 重庆大学 A kind of adaptive cross-cutting name entity recognition method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冯艳红;于红;孙庚;孙娟娟;: "基于BLSTM的命名实体识别方法", 计算机科学, no. 02, 16 May 2017 (2017-05-16) *

Also Published As

Publication number Publication date
CN112487816B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN108399158B (en) Attribute emotion classification method based on dependency tree and attention mechanism
Qu et al. Question answering over freebase via attentive RNN with similarity matrix based CNN
CN109635108B (en) Man-machine interaction based remote supervision entity relationship extraction method
CN107491531A (en) Chinese network comment sensibility classification method based on integrated study framework
CN112883732A (en) Method and device for identifying Chinese fine-grained named entities based on associative memory network
CN111462752B (en) Attention mechanism, feature embedding and BI-LSTM (business-to-business) based customer intention recognition method
CN112784013B (en) Multi-granularity text recommendation method based on context semantics
CN110909116B (en) Entity set expansion method and system for social media
CN108563638A (en) A kind of microblog emotional analysis method based on topic identification and integrated study
CN110046356B (en) Label-embedded microblog text emotion multi-label classification method
CN110910175B (en) Image generation method for travel ticket product
CN112417132B (en) New meaning identification method for screening negative samples by using guest information
CN111222318A (en) Trigger word recognition method based on two-channel bidirectional LSTM-CRF network
CN110992988A (en) Speech emotion recognition method and device based on domain confrontation
CN114611491A (en) Intelligent government affair public opinion analysis research method based on text mining technology
CN111159405B (en) Irony detection method based on background knowledge
CN115935998A (en) Multi-feature financial field named entity identification method
CN113222059B (en) Multi-label emotion classification method using cooperative neural network chain
CN114416991A (en) Method and system for analyzing text emotion reason based on prompt
CN117171413B (en) Data processing system and method for digital collection management
CN113535928A (en) Service discovery method and system of long-term and short-term memory network based on attention mechanism
CN110245234A (en) A kind of multi-source data sample correlating method based on ontology and semantic similarity
CN112397201B (en) Intelligent inquiry system-oriented repeated sentence generation optimization method
Wu et al. Inferring users' emotions for human-mobile voice dialogue applications
CN112487816B (en) Named entity identification method based on network classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant