CN110569510A - method for identifying named entity of user request data - Google Patents

method for identifying named entity of user request data Download PDF

Info

Publication number
CN110569510A
CN110569510A CN201910877939.9A CN201910877939A CN110569510A CN 110569510 A CN110569510 A CN 110569510A CN 201910877939 A CN201910877939 A CN 201910877939A CN 110569510 A CN110569510 A CN 110569510A
Authority
CN
China
Prior art keywords
entity
word
named entity
request data
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910877939.9A
Other languages
Chinese (zh)
Inventor
杜忠和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN201910877939.9A priority Critical patent/CN110569510A/en
Publication of CN110569510A publication Critical patent/CN110569510A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a method for identifying named entities of user request data, which comprises the steps of obtaining a participle set with parts of speech through part of speech tagging, screening out part of speech morphemes with parts of speech to be used as a first entity set, obtaining names of people, place names and mechanism names in the request data through an entity extraction method to be used as a second entity set, extracting a main and subordinate relation, a moving object relation and components which are semantically related to a sentence core word in the request data through a dependency analysis method to form a third entity set, and finally cleaning the first entity set, the second entity set and the third entity set on the basis of semantic analysis to obtain a final named entity identification result. The method can solve the problem that the recognition accuracy of the named entity of the request data is not high in the interaction process of the user and the intelligent equipment, so that the recognition rate of the user intention is improved.

Description

Method for identifying named entity of user request data
Technical Field
The invention relates to the technical field of computer natural language processing, in particular to a method for identifying a named entity of user request data.
background
with the rapid development of computer technology and information industry, more and more text information is growing explosively, and how to quickly and accurately extract useful information from massive text information is of great importance. Natural language processing is an important direction in the fields of computer science and artificial intelligence, and people can be helped to interact with intelligent equipment more conveniently and effectively by researching natural language processing and related technologies, so that real intentions of people are realized. Named entity recognition is an important basic tool for information extraction and machine learning, and plays an important role in the practical process of natural language processing technology, and the named entity recognition is to find out a text unit, namely an entity, bearing practical significance from given text information. Common entities include person name, place name, organization name, time, etc.
The early named entity recognition method is mostly based on rules, the realization cost of the system is high, and the transportability of the system is limited to a certain extent. At present, popular named entity identification methods mainly include a feature template-based method and a neural network-based method. The statistical machine learning method regards named entity recognition as a sequence labeling task, and learns a labeling model by using large-scale linguistic data so as to label each position of a sentence. Commonly used models include HMM models (hidden markov models), CRF models (Conditional Random Field models), and the like. The method needs a large amount of labeled corpora, has certain requirements on feature extraction, and the quality of the feature template influences the entity recognition effect. The method based on the neural network automatically extracts the characteristics by using the neural network, so that the training of the model becomes an end-to-end integral process, the method is independent of characteristic engineering, and is a data-driven method, but the network has many varieties, the dependence on parameter setting is large, and the model has poor interpretability. In addition, the recognition of the named entities in chinese has some difficulties, such as different extensions of entities in different fields, various entity types, large quantity, large influence by the word segmentation effect in chinese, and the like.
Disclosure of Invention
the invention aims to overcome the defects in the background technology and provide a method for identifying the named entity of user request data, which adopts a Stanford natural language processing tool and solves the problem of low identification accuracy of the named entity of the request data in the interaction process of a user and intelligent equipment through a method of combining part of speech tagging, entity extraction and dependency analysis, thereby improving the identification rate of user intention.
in order to achieve the technical effects, the invention adopts the following technical scheme:
a method for named entity identification of user requested data comprising the steps of:
A. Introducing a Stanford natural language processing tool, performing word segmentation and part-of-speech tagging on a request text to obtain a word segmentation set with part-of-speech, and screening out famous-part morphemes in the word segmentation set to form a first entity set, such as a common noun NN, a proper noun NR, a time noun NT and the like;
B. Acquiring a person name, a place name and an organization name in the request data by an entity extraction method to serve as a second entity set;
C. Extracting a cardinal relation and a dynamic guest relation in the request text by a dependency analysis method, and forming a third entity set by a cardinal language forming the cardinal relation, a direct object and an indirect object forming the dynamic guest relation and a sentence core word ROOT;
D. and on the basis of semantic analysis, cleaning the first entity set, the second entity set and the third entity set to obtain a final named entity recognition result.
further, when performing word segmentation on the request text in the step a, the method specifically includes: and defining the word position information of the characters as a word head, a word middle, a word tail and single words by using a CRF model, realizing word position labeling, and forming participles by the characters between the word head and the word tail and the single words.
Further, the part-of-speech tagging performed on the request text in the step a specifically includes: and defining a group of characteristic functions for all possible part-of-speech tagging sequences after word segmentation, assigning a weight to each characteristic function, and scoring each part-of-speech tagging sequence to obtain a tagging sequence with the highest score as a part-of-speech tagging result.
Further, when the part-of-speech tagging is performed on the request text in the step a, the user-defined part-of-speech tagging is also performed on words in a specific field, for example, the word "love you for ten thousand years" is tagged with the part-of-speech of N _ song, that is, a song noun, and the word "speed and passion" is tagged with the part-of-speech of N _ video, that is, a movie noun.
further, the step B is specifically to input the labeled data into a CRF model for training, and then extract the name of the person, the name of the place, and the name of the organization in the predicted data.
Further, in the step B, for the request data that cannot be extracted by the entity identification, redundant components, such as qualifiers, ordinal words and quantifiers, that are not beneficial to the entity identification in the sentence are analyzed through syntactic analysis, and simultaneously, simple clauses, verb phrases and noun phrases in the sentence are analyzed.
Further, in the step C, when the core word is a verb, it is generally a predicate of a sentence, and the predicate is followed by an object or a table, and the object largely belongs to an entity or a part of the entity; when the core word is a noun, the noun is generally an entity or a portion thereof that needs to be extracted.
Further, the washing in the step D is specifically a secondary elimination of the components not belonging to the named entity.
Further, the components not belonging to the named entity at least include pronouns, verbs, prepositions, wherein pronouns (e.g., "you," "i," "her"), verbs (e.g., "see"), prepositions (e.g., "after," etc.) that may exist before the final set of named entities may act as components such as the subject predicate in a sentence, but these are not within the scope of the named entity and need to be excluded from the set of named entities.
compared with the prior art, the invention has the following beneficial effects:
The named entity recognition method for the user request data, disclosed by the invention, has the advantages that through introducing a Stanford natural language processing tool and adopting a method combining part-of-speech tagging, entity extraction and dependency analysis, other named entities including three categories of names of people, place and organization are recognized for the user request data in a specific field on the basis of semantic analysis, the recall ratio of Chinese entity recognition is improved, the semantic features are enriched, meanwhile, the named entity recognition method is easier to understand and realize than a feature template method and a neural network method, and the named entity recognition efficiency is improved.
drawings
FIG. 1 is a flow diagram illustrating a method for named entity identification of user requested data in accordance with the present invention.
Detailed Description
The invention will be further elucidated and described with reference to the embodiments of the invention described hereinafter.
Example (b):
The first embodiment is as follows:
As shown in fig. 1, a method for identifying a named entity of user request data specifically includes the following steps:
In this embodiment, it is assumed that a user sends a request "i want to see hong kong movie dialect X that X runs on and X runs off in 1994 weeks" to an intelligent device through voice, in order to better identify the user intention, a named entity in a request text needs to be identified in this embodiment, and the specific steps are as follows:
Step 1, introducing Stanfordcorenlp, obtaining a participle list with part-of-speech labels through a participle and part-of-speech labeling method [ I/PN, want/VV, see/VV, 1994/NT, Zhou X chi/NR, and/CC, Zhu X/NR, Ex/VV, DEC, hong Kong/NR, movie/NN, dialect X/NR ], screening out famous morphemes in the participle list to form a first named entity set [ 1994/NT, Zhou X chi/NR, Zhu X/NR, hong Kong/NR, movie/NN, dialect X/NR ].
And 2, obtaining the PERSON name PERSON, the place name LOCATION and the ORGANIZATION name ORGANIZATION in the request text by an entity extraction method to form a second named entity set [ ZhouX chi/PERSON, Zhu X/PERSON, hong Kong/LOCATION ].
And 3, extracting the major-minor relationship and the moving-guest relationship in the request text by a dependency analysis method.
Specifically, the dependency syntax explains the syntax structure by analyzing the dependency relationship before the components in the language unit, and the core verb in the sentence is claimed to be the central component which governs other components, but is not governed by any other components, and all governed components depend on the governors in a certain relationship. The dependency analysis results in this example are [ (ROOT, want), (nsubj, want, me), (ccomp, want, see), (dep, chateaux, 1994), (conj, zhux, zhoux chi), (cc, zhux, and), (nsubj, rehearsal, zhux), (acl, chateaux, rehearsal), (mark, rehearsal), (nmod, movie, hong kong), (appos, chateaux, movie), (dobj, see, chateaux) ].
Each element in the list is a triple, and the three elements of the triple represent a dependency relationship, a dependent word and a dominant word respectively. And extracting the subject forming the subject-predicate relationship, the direct object and the indirect object forming the animal-guest relationship and a sentence core word ROOT to form a third named entity set [ I/PN, want/VV, Zhu X/NR, and Mandarin X tour/NN ].
And 4, solving a union set of the first named entity set, the second named entity set and the third named entity set to obtain [ 1994/NT, movie/NN, greater language X tour/NR, Zhou X chi/PERSON, Zhu X/PERSON, hong Kong/LOCATION, I/PN, want/VV ]. And cleaning components which do not belong to the named entities, such as pronouns PN, verbs VV, prepositions P and the like, to obtain a final named entity set [ 1994/NT, Mandarin X swimming/NR, Wenxchi/PERSON, Zhu X/PERSON, hong Kong/LOCATION ]. Thereby completing named entity identification of the entire user request data.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (9)

1. A method for named entity identification of user requested data, comprising the steps of:
A. Introducing a Stanford natural language processing tool, performing word segmentation and part-of-speech tagging on a request text to obtain a word segmentation set with part-of-speech, and screening out famous-part morphemes in the word segmentation set to form a first entity set;
B. acquiring a person name, a place name and an organization name in the request data by an entity extraction method to serve as a second entity set;
C. Extracting a cardinal relation and a dynamic guest relation in the request text by a dependency analysis method, and forming a third entity set by a cardinal language forming the cardinal relation, a direct object and an indirect object forming the dynamic guest relation and a sentence core word ROOT;
D. And on the basis of semantic analysis, cleaning the first entity set, the second entity set and the third entity set to obtain a final named entity recognition result.
2. The method for identifying the named entity of the user request data according to claim 1, wherein when performing the word segmentation on the request text in the step a, the method specifically comprises: and defining the word position information of the characters as a word head, a word middle, a word tail and single words by using a CRF model, realizing word position labeling, and forming participles by the characters between the word head and the word tail and the single words.
3. The method for identifying the named entity of the user request data according to claim 2, wherein the part-of-speech tagging performed on the request text in the step a specifically comprises: and defining a group of characteristic functions for all possible part-of-speech tagging sequences after word segmentation, assigning a weight to each characteristic function, and scoring each part-of-speech tagging sequence to obtain a tagging sequence with the highest score as a part-of-speech tagging result.
4. The method as claimed in claim 3, wherein the step A of part-of-speech tagging of the request text further comprises a step of custom part-of-speech tagging of words in a specific field.
5. the method according to claim 3, wherein step B is specifically implemented by inputting labeled data into a CRF model for training, and then extracting the name of a person, the name of a place and the name of an organization from the predicted data.
6. The method for named entity recognition of data requested by a user according to claim 5, wherein the requested data that cannot be extracted for entity recognition in step B is obtained by parsing redundant components of qualifiers, ordinal words and quantifiers in the sentence that are not beneficial to entity recognition, and parsing simple clauses, verb phrases and noun phrases in the sentence.
7. The method for named entity recognition of user request data as claimed in claim 5, wherein in step C, when the core word is verb, that is, the predicate of the sentence, and the predicate is followed by the object or the table, the object belongs to the entity or the part of the entity; when the core word is a noun, the noun is the entity or a portion thereof to be extracted.
8. The method according to claim 7, wherein the cleaning in step D is performed by culling components not belonging to the named entity.
9. The method of claim 8, wherein the components not belonging to a named entity comprise at least pronouns, verbs, and prepositions.
CN201910877939.9A 2019-09-17 2019-09-17 method for identifying named entity of user request data Pending CN110569510A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910877939.9A CN110569510A (en) 2019-09-17 2019-09-17 method for identifying named entity of user request data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910877939.9A CN110569510A (en) 2019-09-17 2019-09-17 method for identifying named entity of user request data

Publications (1)

Publication Number Publication Date
CN110569510A true CN110569510A (en) 2019-12-13

Family

ID=68780594

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910877939.9A Pending CN110569510A (en) 2019-09-17 2019-09-17 method for identifying named entity of user request data

Country Status (1)

Country Link
CN (1) CN110569510A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255343A (en) * 2021-06-21 2021-08-13 中国平安人寿保险股份有限公司 Semantic identification method and device for label data, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933027A (en) * 2015-06-12 2015-09-23 华东师范大学 Open Chinese entity relation extraction method using dependency analysis
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb
CN109271626A (en) * 2018-08-31 2019-01-25 北京工业大学 Text semantic analysis method
CN109359291A (en) * 2018-08-28 2019-02-19 昆明理工大学 A kind of name entity recognition method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933027A (en) * 2015-06-12 2015-09-23 华东师范大学 Open Chinese entity relation extraction method using dependency analysis
CN109359291A (en) * 2018-08-28 2019-02-19 昆明理工大学 A kind of name entity recognition method
CN109271626A (en) * 2018-08-31 2019-01-25 北京工业大学 Text semantic analysis method
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吕秀才: "基于条件随机场(CRF)的命名实体识别", 《CNBLOGS.COM/NOCML/P/3543236.HTML》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255343A (en) * 2021-06-21 2021-08-13 中国平安人寿保险股份有限公司 Semantic identification method and device for label data, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107066455B (en) Multi-language intelligent preprocessing real-time statistics machine translation system
Salloum et al. A survey of lexical functional grammar in the Arabic context
CN100536532C (en) Method and system for automatic subtilting
CN108959242B (en) Target entity identification method and device based on part-of-speech characteristics of Chinese characters
Suleiman et al. The use of hidden Markov model in natural ARABIC language processing: a survey
CN105138507A (en) Pattern self-learning based Chinese open relationship extraction method
CN103678684A (en) Chinese word segmentation method based on navigation information retrieval
CN109949799B (en) Semantic parsing method and system
CN110119510B (en) Relationship extraction method and device based on transfer dependency relationship and structure auxiliary word
CN109213856A (en) A kind of method for recognizing semantics and system
CN108920447B (en) Chinese event extraction method for specific field
CN110717045A (en) Letter element automatic extraction method based on letter overview
CN111143531A (en) Question-answer pair construction method, system, device and computer readable storage medium
CN107038163A (en) A kind of text semantic modeling method towards magnanimity internet information
CN107451116B (en) Statistical analysis method for mobile application endogenous big data
CN115858750A (en) Power grid technical standard intelligent question-answering method and system based on natural language processing
CN112307756A (en) Bi-LSTM and word fusion-based Chinese word segmentation method
CN112632259A (en) Automatic dialog intention recognition system based on linguistic rule generation
CN110569510A (en) method for identifying named entity of user request data
CN108255818B (en) Combined machine translation method using segmentation technology
CN109002540B (en) Method for automatically generating Chinese announcement document question answer pairs
CN116483314A (en) Automatic intelligent activity diagram generation method
Zhou et al. Statistical natural language generation for speech-to-speech machine translation
CN108280066B (en) Off-line translation method from Chinese to English
CN108153743B (en) Intelligent off-line translation machine based on similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191213

RJ01 Rejection of invention patent application after publication