CN112487816A - Named entity identification method based on network classification - Google Patents
Named entity identification method based on network classification Download PDFInfo
- Publication number
- CN112487816A CN112487816A CN202011472395.7A CN202011472395A CN112487816A CN 112487816 A CN112487816 A CN 112487816A CN 202011472395 A CN202011472395 A CN 202011472395A CN 112487816 A CN112487816 A CN 112487816A
- Authority
- CN
- China
- Prior art keywords
- named entity
- individual
- sample
- classification
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 239000013598 vector Substances 0.000 claims abstract description 59
- 238000012549 training Methods 0.000 claims abstract description 12
- 238000013145 classification model Methods 0.000 claims abstract description 5
- 238000003058 natural language processing Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 6
- 239000011541 reaction mixture Substances 0.000 claims description 6
- 230000035772 mutation Effects 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 6
- 238000000605 extraction Methods 0.000 abstract description 6
- 238000013519 translation Methods 0.000 abstract description 4
- 238000007781 pre-processing Methods 0.000 abstract 2
- 230000000694 effects Effects 0.000 description 6
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a named entity identification method based on network classification, which comprises the following steps: 1: inputting named entity training sample text data and converting the named entity training sample text data into vector data; step 2: preprocessing the named entity training sample data; and step 3: constructing a network training named entity recognition model by iteratively selecting partial samples; named entity recognition includes: and 4, step 4: inputting sample data of a named entity to be identified; and 5: preprocessing the sample data of the named entity to be identified; step 6: and identifying sample data of the named entity to be identified through the named entity classification model, and judging the category of the named entity to which the sample data belongs. The method can quickly and effectively extract the key attributes of the named entity from massive texts and identify the category of the entity, improves the efficiency of named entity identification, and provides a basis for information extraction, question answering systems, syntactic analysis, machine translation and the like.
Description
Technical Field
The invention relates to the field of natural language processing technology and named entity identification, in particular to a named entity identification method based on network classification.
Background
Named Entity Recognition (NER), also called "proper name Recognition", refers to recognizing entities with specific meaning in text, mainly including names of people, places, organizations, proper nouns, etc. It generally comprises two parts: (1) identifying entity boundaries; (2) entity categories (person name, place name, organization name, or others) are determined. NER is a fundamental key task in NLP. From the flow of natural language processing, NER can be regarded as one of the identification of unknown words in lexical analysis, and is a problem that the number of the unknown words is the largest, the identification difficulty is the largest, and the influence on the word segmentation effect is the largest. Meanwhile, the NER is also the basis of a plurality of NLP tasks such as relation extraction, event extraction, knowledge graph, machine translation, question-answering system and the like.
The focus of the named entity identification information extraction task is urgent in actual production, but the named entities are infinite in number, flexible in word formation, fuzzy in category and the like, and the named entities are difficult to identify. Traditional classification algorithms only take into account physical characteristics (such as similarity, distance, distribution, etc.) between data, and do not take into account semantic characteristics (such as the possible presence of contextual semantic information in text) between data.
Traditional classification learning methods, such as SVM and some other network-based classification algorithms, require the use of all training data in practical implementations, and the noise present in the enormous amount of data can reduce the efficiency of named entity recognition.
Disclosure of Invention
The invention provides a named entity identification method based on network classification to overcome the defects of the prior art, so that a classification network can be constructed by selecting part of named entity identification samples and the named entity samples to be detected are identified, the identification efficiency of the named entities is improved, and technical support is further provided for information extraction, question-answering system, syntactic analysis, machine translation and the like.
In order to achieve the purpose, the invention adopts the technical scheme that:
the invention relates to a named entity recognition method based on network classification, which is characterized by comprising the following steps:
the method comprises the following steps: training a named entity classification model:
step 1.1: obtaining text data of T named entity samples, and converting the text data into vector data psi ═ by using Word2Vec natural language processing tool1,y1),(x2,y2),…,(xt,yt),…,(xT,yT)),(xt,yt) Vector data representing the t-th named entity sample, where xtRepresents the attribute characteristics of the tth named entity sample, and representing the attribute characteristics of the d-th named entity in the t-th named entity sample; y istA label representing the tth named entity sample, T ═ 1,2, …, T;
step 1.2: attribute feature x for the tth named entity sampletCarrying out standardization processing to obtain the characteristic vector of the t named entity sample Representing the d characteristic of the named entity in the t sample of the named entity;
step 1.3: respectively constructing two objective functions f by using an equation (1) and an equation (2)1And f2:
min f1=Rr(Vs) (1)
In the formula (1), VsFor vector data selected from the T vector data Ψ, Rr (V)s) Representing selected vector data VsThe ratio of T vector data Ψ;
in the formula (2), the reaction mixture is,to use the selected vector data VsConstructing a classification network;for classifying networksThe classification accuracy of (2);
step 1.4: taking a set of vector data of S named entity samples to be selected as an initial population P ═ { P }1,...,pS},pSRepresenting the vector data set of the S named entity sample to be selected and combining the vector data set as an individual;
encoding the initial population P by adopting a binary code with the length of T; if the individual pSThe ith bit in the binary code of (1) represents the attribute characteristic x of the ith named entity sampletIs selected and used to construct a classification network
Step 1.5: defining the current iteration times as N and the maximum iteration times as N; and initializing n-1; taking the initial population P as the parent population P of the nth iterationn;
Step 1.6: parent population P iterated from nth through binary championshipsnIn which two individuals p are randomly selectedxAnd pyAnd respectively construct a classification networkAndif classifying the networkHigher accuracy than classification networksThe parent population P from the nth iterationnAcquiring higher than classified networksAll individuals of precision and randomly selecting an individual p from themz(ii) a For individual pyAnd pzPerforming cross mutation to obtain mutated individual p'yAnd p'z(ii) a From an individual py、p′yAnd'zThe individual with the highest classification network precision is selected to replace the individual py(ii) a Finally by the replaced individual pyWith the individual pxPerforming cross mutation to generate the offspring P of the nth iteration′n;
Step 1.7: the parent population P of the nth iterationnAnd the child P of the nth iteration′nMerging to obtain a merged population of the nth iteration, and obtaining any individual p in the merged population of the nth iteration by using a formula (3)nImportance of (i) IMP (p)n):
IMP(pn)=α×Acc(pn)+(1-α)×(-Red(pn)) (3)
In the formula (3), alpha is a compromise factor Acc (p)n) Is an individual pnPrecision of (1), Red (p)n) Is an individual pnAnd has:
Red(pn)=(a1×b1+a2×b2+...+ai×bi+...+am×bm)/m (4)
in the formula (4), m is the nth timeIterative merging of populations except individual pnNumber of individuals other than; a isiIs an individual pnWith the n-th iteration dividing individual p in the combined populationnRedundancy of the i-th individual out of the others in source space, and by the individual pnThe number of samples of the same named entity as the ith individual chosen is divided by T, i ∈ { 1., m }; biIs an individual pnThe redundancy in the precision target space with the ith individual is obtained by equation (5):
in the formula (5), Acc (i) represents the accuracy of the classification network constructed by the ith individual, Acc (p)n) Representing an individual pnThe accuracy of the constructed classification network;
step 1.8: obtaining all individuals p in the combined population of the nth iteration according to the formula (3)nAnd selects the first S individuals as the parent population P of the (n + 1) th iterationn;
Step 1.9: assigning N +1 to N, judging whether N is greater than N, if so, selecting vector data of a named entity sample corresponding to an individual with the highest classification network precision in the parent population of the Nth iteration and using the vector data to construct an optimal network classifier, and executing the step two, otherwise, returning to the step 1.6 to execute;
step two: named entity recognition:
step 2.1: inputting text data of a named entity sample to be identified, processing according to the step 1.1 and the step 1.2, and obtaining a feature vector of the sample to be detected;
step 2.3: and classifying the characteristic vectors of the samples to be detected by using the optimal network classifier, wherein the obtained labels represent named entities corresponding to the samples to be detected.
The named entity recognition method based on network classification is characterized in that the classification network in the formula (6)The method is a construction mode of a k-associative optimal graph adopting Euclidean distance, and comprises the following steps:
for feature vectorsThe Euclidean distance d between d feature vectors related to the named entity in the t named entity sample and d feature vectors related to the named entity in the i named entity sample is obtained by using the formula (6)tiAnd selecting k named entities with the same category and the nearest distance to establish network connection, thereby forming a classified network:
in the formula (6), the reaction mixture is,representing the d-th named entity related feature vector in the t-th named entity sample.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention is different from the traditional classification method, provides a named entity identification method based on network classification, comprehensively considers the physical and semantic characteristics of sample data of a named entity, constructs a classification network by screening and training the sample data of the named entity, and eliminates noise points, thereby being capable of identifying the named entity more efficiently.
2. The present invention defines two goals: the number of samples in the selected named entity identification sample set and the selected named entity identification sample set constitute the optimization problem of the classification precision of the network, high-quality named entity sample data are selected by optimizing the two points, and the classification network with better classification effect is constituted, so that the performance and the accuracy of named entity identification are improved.
3. In the iteration process, a solution generating strategy based on precision preference is adopted, and precision guidance is carried out on a low-precision named entity identification sample set to obtain more excellent filial generation, so that the quality of the to-be-constructed classification network is effectively improved, and the classifier finally used for named entity identification has better classification effect and higher identification accuracy.
4. In the process of selecting the next generation named entity identification sample set, the importance-based solution selection strategy is adopted, and the method can enter the next generation through more excellent importance sorting selection of all named entity identification sample sets, so that continuous optimization in the iteration process is ensured, and the classifier finally used for named entity identification has better classification effect and more excellent performance.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In this embodiment, a method for identifying a named entity based on network classification includes a step of training a named entity classification model and a step of identifying the named entity, and specifically, as shown in fig. 1, the method includes the following steps:
the method comprises the following steps: training a named entity classification model:
step 1.1: taking named entity identification of a person name as an example, text data of T named entity samples is obtained, and the text data is converted into vector data psi ═ x by using a Word2Vec natural language processing tool1,y1),(x2,y2),…,(xt,yt),…,(xT,yT)),(xt,yt) Vector data representing the t-th named entity sample, where xtRepresents the attribute characteristics of the tth named entity sample, and the attribute characteristics of the d-th named entity in the sample of the t-th named entity, namely the attribute characteristics describing the name of the t-th person, are shown, and the birth time, the native place, the height, the weight, the nickname, the main contribution and the like are common; y istRepresenting the tth named entityThe label of the sample is a mark that the named entity belongs to a certain category, and is a name of a person. Thus converting the named entity recognition problem into a multi-class problem, ytThe name of the person represented by the label is described in the tth named entity sample, and T is 1,2, …, T;
step 1.2: attribute feature x for the tth named entity sampletCarrying out standardization processing to obtain the characteristic vector of the t named entity sample Representing the d characteristic of the named entity in the t sample of the named entity;
step 1.3: respectively constructing two objective functions f by using an equation (1) and an equation (2)1And f2The goals are to minimize:
min f1=Rr(Vs) (1)
in the formula (1), VsFor vector data selected from the T vector data Ψ, Rr (V)s) Representing selected vector data VsThe ratio of T vector data Ψ;
in the formula (2), the reaction mixture is,to use the selected vector data VsConstructing a classification network;for classifying networksThe classification accuracy of (2);
step 1.4: set of vector data in S named entity samples to be selectedAs an initial population P ═ P1,...,pS},pSRepresenting the vector data set of the S named entity sample to be selected and combining the vector data set as an individual;
encoding the initial population P by adopting a binary code with the length of T; if the individual pSThe ith bit in the binary code of (1) represents the attribute characteristic x of the ith named entity sampletIs selected and used to construct a classification networkFor example, assume a total of 10 named entity samples and pSIn (3), 5, 8, 9) is 1, then pSSelectively naming an entity recognition sample set as (x)3,x5,x8,x9);
Step 1.5: defining the current iteration times as N and the maximum iteration times as N; and initializing n-1; taking the initial population P as the parent population P of the nth iterationn;
Step 1.6: parent population P iterated from nth through binary championshipsnIn which two individuals p are randomly selectedxAnd pyAnd respectively construct a classification networkAndclassifying by using the constructed network; if classifying the networkHigher accuracy than classification networksThe parent population P from the nth iterationnAcquiring higher than classified networksAll individuals of precision and randomly selecting an individual p from themz(ii) a For individual pyAnd pzPerforming cross mutation to obtain mutated individual p'yAnd p'z(ii) a From an individual py、p′yAnd p'zThe individual with the highest classification network precision is selected to replace the individual pyThus, poor ones of the two are guided and excellent guided individuals are obtained; finally by the replaced individual pyWith the individual pxPerforming cross mutation to generate the offspring P of the nth iteration′n;
Step 1.7: the parent population P of the nth iterationnAnd the child P of the nth iteration′nMerging to obtain a merged population of the nth iteration, and obtaining any individual p in the merged population of the nth iteration by using a formula (3)nImportance of (i) IMP (p)n):
IMP(pn)=α×Acc(pn)+(1-α)×(-Red(pn)) (3)
In formula (3), α is a compromise factor, usually 0.8, Acc (p)n) Is an individual pnPrecision of (1), Red (p)n) Is an individual pnThe importance obtained by integrating the accuracy and the redundancy has a more balanced evaluation on the individuals, and the method comprises the following steps:
Red(pn)=(a1×b1+a2×b2+...+ai×bi+...+am×bm)/m (4)
in the formula (4), m is the dividing individual p in the combined population of the nth iterationnNumber of individuals other than; a isiIs an individual pnWith the n-th iteration dividing individual p in the combined populationnRedundancy of the i-th individual out of the others in source space, and by the individual pnThe number of samples of the same named entity as the ith individual chosen, i ∈ { 1., m }, a, is divided by T to yieldiThe larger the indication of an individual pnThe higher the redundancy in source space with the individual i; biIs an individual pnThe redundancy of the ith individual in the precision target space is combined with the redundancy of the source space and the precision target space, the redundancy analysis of each individual is clear and reasonable, and the judgment effect on the subsequent importance is larger, so thatFormula (5) is obtained:
in the formula (5), Acc (i) represents the accuracy of the classification network constructed by the ith individual, Acc (p)n) Representing an individual pnAccuracy of the constructed classification network, biThe larger the indication of an individual pnThe higher the spatial redundancy with the individual i at the precision target;
step 1.8: obtaining all individuals p in the combined population of the nth iteration according to the formula (3)nAnd selects the first S individuals as the parent population P of the (n + 1) th iterationn;
Step 1.9: assigning N +1 to N, judging whether N is greater than N, if so, selecting vector data of a named entity sample corresponding to an individual with the highest classification network precision in the parent population of the Nth iteration and using the vector data to construct an optimal network classifier, and executing the step two, otherwise, returning to the step 1.6 to execute;
step two: and (3) named entity identification, namely classifying the sample to be detected by utilizing the most network classifier obtained in the step one:
step 2.1: inputting text data of a named entity sample to be identified, processing according to the step 1.1 and the step 1.2, and obtaining a feature vector of the sample to be detected, wherein common features comprise birth time, native height, weight, nickname, main contribution and the like;
step 2.3: and classifying the characteristic vectors of the samples to be detected by using the optimal network classifier, wherein the obtained labels represent named entities corresponding to the samples to be detected.
2. A method for named entity recognition based on network classification according to claim 1, characterised in that the classification network in formula (6)The method is a construction mode of a k-associative optimal graph adopting Euclidean distance, and comprises the following steps:
for feature vectorsThe Euclidean distance d between d feature vectors related to the named entity in the t named entity sample and d feature vectors related to the named entity in the i named entity sample is obtained by using the formula (6)tiAnd selecting k named entities with the same category and the nearest distance to establish network connection, thereby forming a classified network:
in the formula (6), the reaction mixture is,representing the d-th named entity related feature vector in the t-th named entity sample.
The method is tested and verified by objectively collected data.
1) Acquiring text data of named entity samples related to the names of people, namely acquiring sentences or paragraphs related to the names of people in documents, converting the text data of the real world into vector data which can be processed by a computer by using a Word2Vec tool, dividing a processed data set into training samples and test samples, selecting the optimal training samples through cross validation by ten folds to construct a classification network, and carrying out named entity recognition on the test samples.
2) And evaluating the index;
and the classification precision is used as an evaluation index of the example to evaluate the performance of named entity recognition. The higher the precision is, the better the classification effect is represented, and the higher the identification accuracy is.
3) Performing an experiment on the data set;
the effectiveness of the invention was verified by experimental results on a data set. Today, the information is highly diversified, named entities are accurately and efficiently identified from texts, and the analysis of the named entities is particularly important. Experiments show that the method can quickly and effectively extract the key attributes of the named entities from massive texts and identify the categories of the entities, improves the efficiency of named entity identification, and provides a basis for information extraction, question-answering systems, syntactic analysis, machine translation and the like.
Claims (2)
1. A named entity recognition method based on network classification is characterized by comprising the following steps:
the method comprises the following steps: training a named entity classification model:
step 1.1: obtaining text data of T named entity samples, and converting the text data into vector data psi ═ by using Word2Vec natural language processing tool1,y1),(x2,y2),…,(xt,yt),…,(xT,yT)),(xt,yt) Vector data representing the t-th named entity sample, where xtRepresents the attribute characteristics of the tth named entity sample, and representing the attribute characteristics of the d-th named entity in the t-th named entity sample; y istA label representing the tth named entity sample, T ═ 1,2, …, T;
step 1.2: attribute feature x for the tth named entity sampletCarrying out standardization processing to obtain the characteristic vector of the t named entity sample Representing the d characteristic of the named entity in the t sample of the named entity;
step 1.3: respectively constructing two objective functions f by using an equation (1) and an equation (2)1And f2:
minf1=Rr(Vs) (1)
In the formula (1), VsFor vector data selected from the T vector data Ψ, Rr (V)s) Representing selected vector data VsThe ratio of T vector data Ψ;
in the formula (2), the reaction mixture is,to use the selected vector data VsConstructing a classification network;for classifying networksThe classification accuracy of (2);
step 1.4: taking a set of vector data of S named entity samples to be selected as an initial population P ═ { P }1,...,pS},pSRepresenting the vector data set of the S named entity sample to be selected and combining the vector data set as an individual;
encoding the initial population P by adopting a binary code with the length of T; if the individual pSThe ith bit in the binary code of (1) represents the attribute characteristic x of the ith named entity sampletIs selected and used to construct a classification network
Step 1.5: defining the current iteration times as N and the maximum iteration times as N; and initializing n-1; taking the initial population P as the parent population P of the nth iterationn;
Step 1.6: parent population P iterated from nth through binary championshipsnIn which two individuals p are randomly selectedxAnd pyAnd are constructed separatelyBuilding classification networksAndif classifying the networkHigher accuracy than classification networksThe parent population P from the nth iterationnAcquiring higher than classified networksAll individuals of precision and randomly selecting an individual p from themz(ii) a For individual pyAnd pzPerforming cross mutation to obtain mutated individual p'yAnd p'z(ii) a From an individual py、p′yAnd p'zThe individual with the highest classification network precision is selected to replace the individual py(ii) a Finally by the replaced individual pyWith the individual pxPerforming cross-mutation to generate offspring P 'of n iteration'n;
Step 1.7: the parent population P of the nth iterationnAnd child P 'of nth iteration'nMerging to obtain a merged population of the nth iteration, and obtaining any individual p in the merged population of the nth iteration by using a formula (3)nImportance of (i) IMP (p)n):
IMP(pn)=α×Acc(pn)+(1-α)×(-Red(pn)) (3)
In the formula (3), alpha is a compromise factor Acc (p)n) Is an individual pnPrecision of (1), Red (p)n) Is an individual pnAnd has:
Red(pn)=(a1×b1+a2×b2+...+ai×bi+...+am×bm)/m (4)
in the formula (4), m is the dividing individual p in the combined population of the nth iterationnNumber of individuals other than; a isiIs an individual pnWith the n-th iteration dividing individual p in the combined populationnRedundancy of the i-th individual out of the others in source space, and by the individual pnThe number of samples of the same named entity as the ith individual chosen is divided by T, i ∈ { 1., m }; biIs an individual pnThe redundancy in the precision target space with the ith individual is obtained by equation (5):
in the formula (5), Acc (i) represents the accuracy of the classification network constructed by the ith individual, Acc (p)n) Representing an individual pnThe accuracy of the constructed classification network;
step 1.8: obtaining all individuals p in the combined population of the nth iteration according to the formula (3)nAnd selects the first S individuals as the parent population P of the (n + 1) th iterationn;
Step 1.9: assigning N +1 to N, judging whether N is greater than N, if so, selecting vector data of a named entity sample corresponding to an individual with the highest classification network precision in the parent population of the Nth iteration and using the vector data to construct an optimal network classifier, and executing the step two, otherwise, returning to the step 1.6 to execute;
step two: named entity recognition:
step 2.1: inputting text data of a named entity sample to be identified, processing according to the step 1.1 and the step 1.2, and obtaining a feature vector of the sample to be detected;
step 2.3: and classifying the characteristic vectors of the samples to be detected by using the optimal network classifier, wherein the obtained labels represent named entities corresponding to the samples to be detected.
2. The named entity recognition based on network classification as claimed in claim 1Method, characterized in that the classification network in said formula (6)The method is a construction mode of a k-associative optimal graph adopting Euclidean distance, and comprises the following steps:
for feature vectorsThe Euclidean distance d between d feature vectors related to the named entity in the t named entity sample and d feature vectors related to the named entity in the i named entity sample is obtained by using the formula (6)tiAnd selecting k named entities with the same category and the nearest distance to establish network connection, thereby forming a classified network:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011472395.7A CN112487816B (en) | 2020-12-14 | 2020-12-14 | Named entity identification method based on network classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011472395.7A CN112487816B (en) | 2020-12-14 | 2020-12-14 | Named entity identification method based on network classification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112487816A true CN112487816A (en) | 2021-03-12 |
CN112487816B CN112487816B (en) | 2024-02-13 |
Family
ID=74916987
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011472395.7A Active CN112487816B (en) | 2020-12-14 | 2020-12-14 | Named entity identification method based on network classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112487816B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007137487A1 (en) * | 2006-05-15 | 2007-12-06 | Panasonic Corporation | Method and apparatus for named entity recognition in natural language |
CN107203511A (en) * | 2017-05-27 | 2017-09-26 | 中国矿业大学 | A kind of network text name entity recognition method based on neutral net probability disambiguation |
WO2018072351A1 (en) * | 2016-10-20 | 2018-04-26 | 北京工业大学 | Method for optimizing support vector machine on basis of particle swarm optimization algorithm |
CN109581339A (en) * | 2018-11-16 | 2019-04-05 | 西安理工大学 | A kind of sonar recognition methods based on brainstorming adjust automatically autoencoder network |
CN110162795A (en) * | 2019-05-30 | 2019-08-23 | 重庆大学 | A kind of adaptive cross-cutting name entity recognition method and system |
-
2020
- 2020-12-14 CN CN202011472395.7A patent/CN112487816B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007137487A1 (en) * | 2006-05-15 | 2007-12-06 | Panasonic Corporation | Method and apparatus for named entity recognition in natural language |
WO2018072351A1 (en) * | 2016-10-20 | 2018-04-26 | 北京工业大学 | Method for optimizing support vector machine on basis of particle swarm optimization algorithm |
CN107203511A (en) * | 2017-05-27 | 2017-09-26 | 中国矿业大学 | A kind of network text name entity recognition method based on neutral net probability disambiguation |
CN109581339A (en) * | 2018-11-16 | 2019-04-05 | 西安理工大学 | A kind of sonar recognition methods based on brainstorming adjust automatically autoencoder network |
CN110162795A (en) * | 2019-05-30 | 2019-08-23 | 重庆大学 | A kind of adaptive cross-cutting name entity recognition method and system |
Non-Patent Citations (1)
Title |
---|
冯艳红;于红;孙庚;孙娟娟;: "基于BLSTM的命名实体识别方法", 计算机科学, no. 02, 16 May 2017 (2017-05-16) * |
Also Published As
Publication number | Publication date |
---|---|
CN112487816B (en) | 2024-02-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108399158B (en) | Attribute emotion classification method based on dependency tree and attention mechanism | |
Qu et al. | Question answering over freebase via attentive RNN with similarity matrix based CNN | |
CN109635108B (en) | Man-machine interaction based remote supervision entity relationship extraction method | |
CN107491531A (en) | Chinese network comment sensibility classification method based on integrated study framework | |
CN112883732A (en) | Method and device for identifying Chinese fine-grained named entities based on associative memory network | |
CN111462752B (en) | Attention mechanism, feature embedding and BI-LSTM (business-to-business) based customer intention recognition method | |
CN112784013B (en) | Multi-granularity text recommendation method based on context semantics | |
CN110909116B (en) | Entity set expansion method and system for social media | |
CN108563638A (en) | A kind of microblog emotional analysis method based on topic identification and integrated study | |
CN110046356B (en) | Label-embedded microblog text emotion multi-label classification method | |
CN110910175B (en) | Image generation method for travel ticket product | |
CN112417132B (en) | New meaning identification method for screening negative samples by using guest information | |
CN111222318A (en) | Trigger word recognition method based on two-channel bidirectional LSTM-CRF network | |
CN110992988A (en) | Speech emotion recognition method and device based on domain confrontation | |
CN114611491A (en) | Intelligent government affair public opinion analysis research method based on text mining technology | |
CN111159405B (en) | Irony detection method based on background knowledge | |
CN115935998A (en) | Multi-feature financial field named entity identification method | |
CN113222059B (en) | Multi-label emotion classification method using cooperative neural network chain | |
CN114416991A (en) | Method and system for analyzing text emotion reason based on prompt | |
CN117171413B (en) | Data processing system and method for digital collection management | |
CN113535928A (en) | Service discovery method and system of long-term and short-term memory network based on attention mechanism | |
CN110245234A (en) | A kind of multi-source data sample correlating method based on ontology and semantic similarity | |
CN112397201B (en) | Intelligent inquiry system-oriented repeated sentence generation optimization method | |
Wu et al. | Inferring users' emotions for human-mobile voice dialogue applications | |
CN112487816B (en) | Named entity identification method based on network classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |