CN115169352A - Named entity recognition method, device, equipment and storage medium - Google Patents
Named entity recognition method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN115169352A CN115169352A CN202211092788.4A CN202211092788A CN115169352A CN 115169352 A CN115169352 A CN 115169352A CN 202211092788 A CN202211092788 A CN 202211092788A CN 115169352 A CN115169352 A CN 115169352A
- Authority
- CN
- China
- Prior art keywords
- text
- entity
- recognized
- word
- recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000003860 storage Methods 0.000 title claims abstract description 24
- 238000012545 processing Methods 0.000 claims abstract description 66
- 230000006870 function Effects 0.000 claims description 32
- 238000012549 training Methods 0.000 claims description 29
- 230000011218 segmentation Effects 0.000 claims description 28
- 239000013598 vector Substances 0.000 claims description 25
- 238000000605 extraction Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 15
- 238000012795 verification Methods 0.000 claims description 15
- 238000012937 correction Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 8
- 238000013473 artificial intelligence Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 14
- 238000005516 engineering process Methods 0.000 description 13
- 238000010801 machine learning Methods 0.000 description 7
- 239000011159 matrix material Substances 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 238000002372 labelling Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 241000282414 Homo sapiens Species 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Character Discrimination (AREA)
Abstract
The application discloses a named entity identification method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a text to be recognized; inputting the text to be recognized into the trained entity recognition model for recognition processing to obtain an output result of the text to be recognized, wherein the output result comprises an entity type and an entity word text in the text to be recognized; and acquiring an entity word text feature preset reference item and correcting an output result based on the entity word text feature preset reference item to obtain a recognition result of the text to be recognized, wherein the entity word text feature preset reference item is used for representing text features before and after an entity word in the text to be recognized. According to the technical scheme, the output result is corrected through the entity word text feature preset reference item, and the text features before and after the entity word in the text to be recognized are combined, so that the condition that two identical text entity words exist in one sentence is recognized accurately, and the accuracy of recognizing the named entity of the text to be recognized is higher.
Description
Technical Field
The present invention relates generally to the field of machine learning technologies, and in particular, to a named entity identification method, apparatus, device, and storage medium.
Background
With the continuous development of artificial intelligence algorithm technology, the Named Entity Recognition (NER) task has been increasingly applied to various fields. The named entity recognition is used for recognizing the type and the position of an entity with a specific meaning in a text, so that NER labels are added to each text in the text.
At present, in the related art, a named entity recognition model is adopted and a dictionary correction mode is combined to realize the named entity recognition and output the result. However, in the case that two entities with the same text content but different entity types exist in one text sentence, the adoption of the scheme can obtain the result of only one entity type, so that the accuracy of obtaining the recognition result is low.
Disclosure of Invention
In view of the above-mentioned shortcomings or drawbacks of the prior art, it is desirable to provide a named entity identification method, apparatus, device and storage medium.
In a first aspect, an embodiment of the present application provides a named entity identification method, including:
acquiring a text to be recognized;
inputting the text to be recognized into a trained entity recognition model for recognition processing to obtain an output result of the text to be recognized, wherein the output result comprises an entity type and an entity word text in the text to be recognized;
and acquiring an entity word text feature preset reference item and correcting the output result based on the entity word text feature preset reference item to obtain an identification result of the text to be identified, wherein the entity word text feature preset reference item is used for representing text features before and after an entity word in the text to be identified.
In one embodiment, the modifying the output result based on the entity word text feature preset reference item to obtain the recognition result of the text to be recognized includes:
correcting the entity word text by adopting a preset dictionary to obtain an intermediate result, wherein the preset dictionary comprises a standard field word segmentation dictionary and a word frequency word segmentation dictionary corresponding to the text to be recognized;
and modifying the entity type based on the intermediate result and the text characteristics before and after the entity word in the text to be recognized to obtain the recognition result of the text to be recognized.
In one embodiment, the modifying the entity type based on the intermediate result and the text features before and after the entity word in the text to be recognized to obtain the recognition result of the text to be recognized includes:
determining feature identification and structural relation between the front and rear text features and the entity word text based on the front and rear text features of the entity word in the text to be recognized;
and correcting the entity type according to the feature identification, the structural relationship between the front and rear text features and the entity word text and the intermediate result to obtain the recognition result of the text to be recognized.
In one embodiment, the modifying the entity word text by using a preset dictionary to obtain an intermediate result includes:
correcting the entity word text according to the occurrence frequency of different words in the word frequency segmentation dictionary;
and selecting one of a plurality of word segmentation modes to be confirmed as an intermediate result according to the standard field word segmentation dictionary.
In one embodiment, inputting the text to be recognized into an entity recognition model for recognition processing to obtain an output result of the text to be recognized, including:
inputting the text to be recognized into an entity recognition model, and obtaining a feature vector of the text to be recognized through a vectorization processing module;
performing feature extraction on the feature vector through a feature extraction module to obtain attribute information of the sample to be identified, wherein the attribute information comprises the part of speech and the language structure of the sample to be identified;
and processing the attribute information of the sample to be recognized through a recognition module to obtain an output result of the text to be recognized.
In one embodiment, the processing by the recognition module based on the attribute information of the sample to be recognized to obtain the output result of the text to be recognized includes:
processing the attribute information of the sample to be identified through a full-connection layer in the identification module to obtain a full-connection vector;
processing the full-connection vector by adopting an activation function to obtain a prediction result set of the sample to be identified, wherein the prediction result set comprises a plurality of label types;
and taking the maximum value of the probability values corresponding to the same label type in the prediction result sets as the output result of the text to be recognized.
In one embodiment, the training process of the entity recognition model includes:
acquiring historical text data, and dividing the historical text data into a training set and a verification set;
training the entity recognition model to be constructed by utilizing the training set to obtain the entity recognition model to be verified;
and optimizing the entity identification model to be verified according to the loss function minimization by utilizing the verification set to the entity identification model to be verified to obtain the entity identification model.
In a second aspect, the present application provides a named entity recognition apparatus, comprising:
the acquisition module is used for acquiring a text to be recognized;
the recognition module is used for inputting the text to be recognized into a trained entity recognition model for recognition processing to obtain an output result of the text to be recognized, wherein the output result comprises an entity type and an entity word text in the text to be recognized;
and the correction module is used for acquiring an entity word text feature preset reference item and correcting the output result on the basis of the entity word text feature preset reference item to obtain the recognition result of the text to be recognized, wherein the entity word text feature preset reference item is used for representing the text features of the text to be recognized before and after the entity word.
In a third aspect, an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the named entity identifying method according to the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, the computer program being configured to implement the named entity identifying method according to the first aspect.
According to the named entity recognition method, the device, the equipment and the storage medium provided in the embodiment of the application, the text to be recognized is obtained and input into a trained entity recognition model for recognition processing, so that an output result of the text to be recognized is obtained, the output result comprises an entity type and an entity word text in the text to be recognized, then an entity word text characteristic preset reference item is obtained, the output result is corrected based on the entity word text characteristic preset reference item, so that a recognition result of the text to be recognized is obtained, and the entity word text characteristic preset reference item is used for representing text characteristics of the text before and after an entity word in the text to be recognized. Compared with the prior art, on one hand, the technical scheme obtains an output result by carrying out recognition processing through the trained entity recognition model, so that comprehensive and accurate guide information is provided for subsequent correction processing, on the other hand, the output result is corrected through the entity word text characteristic preset reference item, and the front and rear text characteristics of the entity words in the text to be recognized are combined, so that the condition that two identical text entity words exist in one sentence is recognized accurately, and the accuracy of recognizing the named entities of the text to be recognized is higher.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
fig. 1 is a system architecture diagram of an application system for named entity identification provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a named entity identification method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of named entity recognition provided in an embodiment of the present application;
fig. 4 is a schematic diagram of a method for determining a recognition result of a text to be recognized according to an embodiment of the present application;
FIG. 5 is a schematic flowchart of a method for training an entity recognition model according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a named entity recognition apparatus according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a named entity recognition apparatus according to another embodiment of the present application;
fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings. For convenience of understanding, some technical terms related to the embodiments of the present application are explained below:
artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software mainly comprises computer vision, a voice processing technology, a natural language technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how the computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to improve the performance of the computer. Machine learning is the core of artificial intelligence, and is the fundamental approach to make computers have intelligence, and the application of the artificial intelligence is spread in various fields. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formula learning.
It can be understood that the man-machine automatic response system based on natural language is an important application of natural language understanding. Named entity recognition is an important component of natural language understanding, primarily by finding and marking named entities in natural language text. At present, in the related art, a named entity recognition model is adopted and a dictionary correction mode is combined to realize named entity recognition and output results, specifically, a text to be recognized is recognized through the named entity model to obtain a model recognition result, and the model recognition result is corrected through the dictionary to obtain a final recognition result. However, for two entities with the same text content but different entity types in a text sentence, only one result of any entity type can be obtained by adopting the scheme, so that the accuracy of obtaining the recognition result is low.
Based on the above defects, embodiments of the present application provide a named entity recognition method, apparatus, device, and storage medium, compared with the prior art, on one hand, since recognition processing is performed through a trained entity recognition model to obtain an output result, comprehensive and accurate guidance information is provided for subsequent correction processing, and on the other hand, a reference item is preset through entity word text features to correct the output result, and text features before and after an entity word in a text to be recognized are combined, so that recognition of a situation that two identical text entity words exist in one sentence is accurately performed, so that accuracy of recognition of a named entity of the text to be recognized is higher.
The scheme provided by the embodiment of the application relates to technologies such as artificial intelligence natural language processing and machine learning, and is specifically explained by the following embodiment.
Fig. 1 is an implementation environment architecture diagram of a named entity identification method according to an embodiment of the present disclosure. As shown in fig. 1, the implementation environment architecture includes: a terminal 100 and a server 200.
The terminal 100 may be a terminal device in various AI application scenarios. For example, the terminal 100 may be a smart home device such as a smart television and a smart television set-top box, or the terminal 100 may be a mobile portable terminal such as a smart phone, a tablet computer and an electronic book reader, or the terminal 100 may be a smart wearable device such as smart glasses and a smart watch, which is not limited in this embodiment.
Among them, the terminal 100 may be installed with an AI application based on natural language processing. For example, the AI application may be an intelligent search, intelligent question and answer, or the like.
The server 200 may be a server, or may be a server cluster composed of several servers, or the server 200 may include one or more virtualization platforms, or the server 200 may be a cloud computing service center.
The server 200 may be a server device that provides a background service for the AI application installed in the terminal 100.
The terminal 100 and the server 200 establish a communication connection therebetween through a wired or wireless network. Optionally, the wireless network or wired network described above uses standard communication techniques and/or protocols. The Network is typically the Internet, but can be any Network including, but not limited to, a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), any combination of mobile, wired or wireless networks, private networks, or virtual private networks.
In the process of providing the AI application service, the AI application system based on natural language processing can identify the text to be identified through the entity identification model and the entity word text characteristic preset reference item, and provide the AI application service according to the identification result. The entity recognition model can be set in the server 200, and is trained and applied by the server; alternatively, the entity recognition model may be provided in the terminal 100 and trained and updated by the server 200.
For convenience of understanding and explanation, the named entity identification method, apparatus, device and storage medium provided in the embodiments of the present application are described in detail below with reference to fig. 2 to 8.
Fig. 2 is a flowchart illustrating a named entity identification method according to an embodiment of the present application, where the method may be executed by a computer device, where the computer device may be the server 200 or the terminal 100 in the system shown in fig. 1, or the computer device may also be a combination of the terminal 100 and the server 200. As shown in fig. 2, the method includes:
s101, obtaining a text to be recognized.
Specifically, the text to be recognized refers to a text that needs to be recognized as a named entity. The type of the text to be recognized may be one or multiple, for example, the text to be recognized may be a sentence of a user or a content of an article, where the text to be recognized may include multiple words or multiple words, or may be a sentence or a paragraph composed of one or multiple words.
In the embodiment of the application, the text to be recognized can be acquired through a cloud, can also be acquired through a database or a block chain, and can also be acquired through external equipment introduction.
S102, inputting the text to be recognized into the trained entity recognition model for recognition processing, and obtaining an output result of the text to be recognized, wherein the output result comprises an entity type and an entity word text in the text to be recognized.
The entity recognition model is a network structure model which is trained by sample data and has the capability of recognizing entity types and entity word texts. The entity recognition model is a neural network model which is input as a text to be recognized and output as a text including an entity type and an entity word text in the text to be recognized, has the capability of carrying out named entity recognition on the text to be recognized and can predict the entity type and the entity word text. The entity recognition model can comprise a multi-layer network structure, the network structures of different layers carry out different processing on the data input into the entity recognition model, and the output result is transmitted to the next network layer until the data is processed through the last network layer, so that the output result is obtained. Optionally, the entity recognition model includes a BERT model, where the BERT (Bidirectional Encoder reproduction from transforms) model is a word vector model, and the BERT model can extract text information of a text to be recognized.
Wherein, the entity recognition model may include: the input layer, the fusion layer, the word embedding layer, the convolution layer, the full connection layer, the output layer and the like which are connected in series are different in corresponding functions.
Optionally, before the text to be recognized is input into the trained entity recognition model, the text to be recognized may be preprocessed, for example, word segmentation processing may be performed to determine a naming basis. The word segmentation has the function of effectively dividing a plurality of characters in a sentence into one or more words, has more specific word segmentation modes, and can determine one or more words based on a mechanical matching method, a feature word library method, a constraint matrix method, a grammar analysis method and the like.
The method comprises the steps of preprocessing a text to be recognized to obtain a preprocessed result, inputting the preprocessed result into a trained entity recognition model for recognition, specifically, obtaining a feature vector of the text to be recognized through a vectorization processing module, then, performing feature extraction on the feature vector through a feature extraction module to obtain attribute information of a sample to be recognized, wherein the attribute information comprises the part of speech and the language structure of the sample to be recognized, and processing the attribute information through the recognition module based on the attribute information of the sample to be recognized to obtain an output result of the text to be recognized.
In the embodiment of the application, the vectorization processing module is configured to perform vectorization processing on different word segments to obtain corresponding feature vectors, and then perform feature extraction on the feature vectors through the feature extraction module to obtain attribute information of a sample to be identified. The feature extraction module can convert the abstract characters into vectors of mathematical formula operation, and fully describe character level, word level, sentence level and even sentence-to-sentence relation features. The recognition module can comprise a full connection layer and an activation function, can classify the attribute information of the text to be recognized output by the feature extraction module, and accordingly obtains the output result of the text to be recognized, the prediction result is the entity type corresponding to the text to be recognized, and the prediction result can also comprise an entity word text. The attribute information comprises the part of speech and the language structure of the text to be recognized.
The vectorization processing module converts the semantic space relationship into a vector space relationship, namely converts the semantic text into a vector which can be processed by the computer equipment.
Specifically, after the text to be recognized is obtained and preprocessed to obtain the words or the words of the text to be recognized, the words or the words of the text to be recognized can be input into a trained entity recognition model, a feature vector of the text to be recognized is obtained through a vectorization processing module, then feature extraction is performed through a feature extraction module to obtain attribute information of the text to be recognized, the attribute information of a sample to be recognized is processed through a full connection layer in the recognition module to obtain a full connection vector, the full connection vector is processed through an activation function to obtain a prediction result set of the sample to be recognized, the prediction result set comprises a plurality of label types, and the maximum value of probability values corresponding to the same label type in the prediction result sets is used as an output result of the text to be recognized. Optionally, the probability values corresponding to the tag types in the prediction result set may be sorted from large to small, and the maximum value of the probability values is the tag type of the text to be recognized.
In particular, the identification module may include, but is not limited to, a full connectivity layer and an activation function. The fully-connected layer may comprise one layer or may comprise multiple layers. The full connection layer is mainly used for classifying the attribute information of the text to be recognized.
The activation function may be a softmax function, and the activation function is used to add a non-linear factor, because the linear model has insufficient expression ability to transform the continuous real values of the input into the output between 0 and 1.
In the embodiment of the application, the text to be recognized is obtained and is input into the trained entity recognition model for recognition processing, so that the output result comprising the entity type and the entity word text in the text to be recognized is obtained, and comprehensive and accurate guide information can be provided for subsequent correction processing.
S103, acquiring an entity word text feature preset reference item and correcting an output result based on the entity word text feature preset reference item to obtain a recognition result of the text to be recognized, wherein the entity word text feature preset reference item is used for representing text features of the text to be recognized before and after an entity word.
It should be noted that the preset reference item for the text feature of the entity word refers to a feature related to the entity word in the text to be recognized. The entity word text feature preset reference item may include a text feature before and after an entity word in the text to be recognized, and a relationship between the text before and after the entity word and the entity word in the text to be recognized.
The text characteristics before and after the entity word in the text to be recognized may include text types before and after the entity word, text characteristic marks before and after the entity word, and the like.
Further, please refer to fig. 3, a text 3-1 to be recognized may be obtained, the text 3-1 to be recognized is input to the trained entity recognition model 3-2 for recognition processing, an output result 3-3 of the text to be recognized is obtained, an entity word text feature preset reference item 3-4 is obtained, the output result 3-3 is corrected based on the entity word text feature preset reference item 3-4, a recognition result 3-5 of the text to be recognized is obtained, and the entity word text feature preset reference item is used for representing text features before and after an entity word in the text to be recognized.
Further, on the basis of the foregoing embodiment, fig. 4 is a schematic flowchart of a named entity recognition method provided in the embodiment of the present application, where the named entity recognition method may be applied to a computer device, and as shown in fig. 4, the text classification method may include the following steps:
s201, correcting the entity word text by adopting a preset dictionary to obtain an intermediate result, wherein the preset dictionary comprises a standard field word segmentation dictionary and a word frequency word segmentation dictionary corresponding to the text to be recognized.
The preset dictionary may be a reference naming dictionary in which units of words to be recognized are recorded, the reference naming dictionary may be understood as a named entity recognition dictionary, and generally, a plurality of reference naming dictionaries may be prepared in advance, and the reference naming dictionaries are in different fields or words in different aspects are collected in the same field. By using reference naming dictionaries of different categories, a sentence (a pre-acquired text to be recognized) can be sufficiently analyzed, thereby making the determined named entity recognition result more accurate. And the named entity recognition dictionaries in different fields can be divided into multiple stages, so that more accurate recognition is realized. For example, the entity recognition dictionary may be classified into natural sciences and social sciences, and the natural science category of the named entity dictionary may be classified into biology, electricity, chemistry, and the like. When confirming the reference naming dictionary recording the word unit to be recognized, whether to use the dictionary of this field as the "reference naming dictionary recording the text to be recognized" can be determined according to the frequency of occurrence of the word unit to be recognized in a certain dictionary.
The standard domain dictionary is provided by a user and is preset according to entity words corresponding to different domains. For example, the mining field, the instrument testing field, the aerospace field, the industrial manufacturing field, the movie and television field, the literature field, and the like can be included.
When the entity word text is corrected by adopting the preset dictionary, the entity word text can be corrected according to the occurrence frequency of different words in the word frequency word segmentation dictionary, and one of multiple word segmentation modes to be confirmed is selected as an intermediate result according to the standard field word segmentation dictionary.
S202, based on the intermediate result and the text characteristics of the entity words in the text to be recognized, the entity types are corrected, and the recognition result of the text to be recognized is obtained.
Specifically, after the intermediate result is determined, the feature identifier, the structural relationship between the text of the entity words and the feature of the text before and after the entity words in the text to be recognized can be determined, and the entity type can be corrected according to the feature identifier, the structural relationship between the text of the entity words and the intermediate result, so as to obtain the recognition result of the text to be recognized.
In the process based on the feature identification, the structural relationship between the entity word texts and the intermediate result, for example, when the entity types determined for the same entity word text include two entity types of a time entity and a film name entity, the corresponding entity word front and back text types in the front and back text features of the determined time entity are verb types, and the corresponding entity word front and back text types of the determined film name entity are noun types, so that the redundant entity word texts corresponding to the entity types are deleted according to the processing rule, the entity word text corresponding to the required entity types is obtained, and the recognition result of the text to be recognized is obtained.
It should be noted that the processing rule may be set by the user in advance according to actual requirements, for example, only a movie name entity needs to be identified, only a weather entity needs to be identified, only a time entity needs to be identified, and the like.
Illustratively, the acquired text to be recognized is "acquired from the acquired day to the acquired day of the movie", then word segmentation is performed on the acquired text by adopting a preset word segmentation algorithm to obtain word segmentation results, the word segmentation results are, for example, "acquired from the acquired day", "go", "see" and "movie", the word segmentation results are input into a trained entity recognition model for recognition, and output results are obtained, the output results comprise entity types and entity word texts in the text to be recognized, wherein the entity types comprise time entities and movie name entities, and the entity words comprise "acquired days". And then, correcting the text of the entity word by adopting a preset dictionary to obtain an intermediate result, and determining the structural relationship between the characteristic mark and the text of the entity word on the basis of the characteristics of the text before and after the entity word in the text to be recognized, wherein the characteristics of the text before and after the entity word comprise the characteristics of ' go ' and ' movie ', the structural relationship between the characteristic mark and the text of the entity word is determined, namely the characteristic mark corresponding to the ' go ' is a verb type, the characteristic mark corresponding to the movie ' is a noun type, and the entity type is subjected to matching and contrast correction processing by an algorithm according to the requirement preset by a user to obtain the recognition result of the text to be recognized, for example, for the time entity ' acquired day ' and the movie name entity ' acquired day ', only the acquired after the movie name entity is actually required to be recognized for the viewing intention, namely, the movie name entity ' acquired day ' is determined as the recognition result of the text to be recognized.
The named entity recognition method provided in the embodiment of the application obtains a text to be recognized, inputs the text to be recognized into a trained entity recognition model for recognition processing, obtains an output result of the text to be recognized, wherein the output result comprises an entity type and an entity word text in the text to be recognized, then obtains an entity word text feature preset reference item and corrects the output result based on the entity word text feature preset reference item, and obtains a recognition result of the text to be recognized, wherein the entity word text feature preset reference item is used for representing text features before and after an entity word in the text to be recognized. Compared with the prior art, on one hand, the output result is obtained by carrying out recognition processing through the trained entity recognition model, so that comprehensive and accurate guide information is provided for subsequent correction processing, on the other hand, the output result is corrected through the entity word text feature preset reference item, the text features before and after the entity words in the text to be recognized are combined, the condition that two identical text entity words exist in one sentence is recognized accurately, and the accuracy of recognizing the named entity of the text to be recognized is higher.
In one embodiment, the output result is determined by including a pre-trained entity recognition model in the above embodiment, and the following is a description of a training process of the entity recognition model. Referring to fig. 5, the method may include:
s301, historical text data are obtained and are divided into a training set and a verification set.
It should be noted that the historical text data may be a plurality of the historical text data or one historical text data, where each historical text data may include at least one word or word, for example, the historical text data may include a plurality of words or words.
Specifically, after the historical text data is obtained, the historical text data can be randomly divided into a training set and a verification set according to a certain proportion, wherein the training set is used for training an initial entity recognition model to obtain a trained entity recognition model, and the verification set is used for verifying the trained entity recognition model to verify the performance of the entity recognition model.
S302, training the entity recognition model to be constructed by using the training set to obtain the entity recognition model to be verified.
S303, optimizing the entity identification model to be verified according to the loss function minimization by utilizing the entity identification model to be verified in the verification set to obtain the entity identification model.
After dividing historical text data into a training set and a verification set, inputting the training set into an entity recognition model to be constructed, wherein the entity recognition model to be constructed comprises a plurality of connected vectorization processing modules, a feature extraction module and a recognition module, processing the training set through the feature vectorization processing module to obtain an initial word vector, inputting the initial word vector into the feature extraction module in the entity recognition model to be constructed to obtain a corresponding result, and inputting the result into the recognition module to obtain an output result of a text to be recognized. And training the vectorization processing module, the feature extraction module and the identification module to be constructed by utilizing the training set to obtain the vectorization processing module, the feature extraction module and the identification module to be verified.
In the process of training the entity recognition model, the computer equipment utilizes the vectorization processing module, the feature extraction module and the recognition module which are to be verified in the verification set to carry out optimization processing on the vectorization processing module, the feature extraction module and the recognition module which are to be verified according to the loss function minimization, so as to obtain the vectorization processing module, the feature extraction module and the recognition module, and parameters in the entity recognition model to be constructed are updated according to the difference between the result obtained by inputting the verification set into the entity recognition model to be verified and the labeling result, so as to achieve the purpose of training the entity recognition model, wherein the labeling result can be obtained by manually labeling historical text data and can comprise an entity type and an entity word text corresponding to the historical text data.
Optionally, the updating of the parameters in the entity identification model to be verified may be updating of matrix parameters such as a weight matrix and a bias matrix in the entity identification model to be constructed. The weight matrix and the bias matrix include, but are not limited to, matrix parameters in a self-attention layer, a feedforward network layer and a full connection layer in the entity recognition model to be verified.
In the embodiment of the application, the loss value of the result and the tag result obtained by inputting the verification set into the entity identification model to be verified can be calculated by using the loss function, so that the parameters in the entity identification model to be verified are updated. Alternatively, the loss function may use a cross-entropy loss function, a normalized cross-entropy loss function,
when the parameters in the entity identification model to be verified are updated through the loss function, the parameters in the model can be adjusted to enable the entity identification model to be verified to be converged when the entity identification model to be verified is determined not to be converged according to the loss function, so that the entity identification model is obtained. The convergence of the entity identification model to be verified may mean that a difference between an output result of the entity identification model to be verified on the verification set and a labeling result of the training data is smaller than a preset threshold, or that a change rate of the difference between the output result and the labeling result of the training data approaches a certain lower value. And when the calculated loss function is small, or the difference between the calculated loss function and the loss function output in the previous iteration is close to 0, the entity identification model to be verified is considered to be converged.
In the embodiment, the entity recognition model is trained, so that the text to be recognized can be recognized through the trained entity recognition model, an output result of the text to be recognized can be accurately obtained, and comprehensive and accurate guide information is provided for subsequent correction processing.
On the other hand, fig. 6 is a schematic structural diagram of a named entity recognition apparatus according to an embodiment of the present disclosure. The apparatus may be an apparatus in a terminal or a server, as shown in fig. 6, the apparatus 700 includes:
an obtaining module 710, configured to obtain a text to be recognized;
the recognition module 720 is configured to input the text to be recognized into the trained entity recognition model for recognition processing, so as to obtain an output result of the text to be recognized, where the output result includes an entity type and an entity word text in the text to be recognized;
the correcting module 730 is configured to obtain an entity word text feature preset reference item and correct the output result based on the entity word text feature preset reference item to obtain an identification result of the text to be identified, where the entity word text feature preset reference item is used to represent text features before and after an entity word in the text to be identified.
Optionally, referring to fig. 7, the correcting module 730 includes:
the first correcting unit 731 is configured to correct the entity word text by using a preset dictionary to obtain an intermediate result, where the preset dictionary includes a standard field word segmentation dictionary and a word frequency word segmentation dictionary corresponding to the text to be recognized;
the second correcting unit 732 is configured to correct the entity type based on the intermediate result and the text features before and after the entity word in the text to be recognized, so as to obtain a recognition result of the text to be recognized.
Optionally, the second correcting unit 732 is specifically configured to:
determining feature identification and structural relation between the front and rear text features and the entity word text based on the front and rear text features of the entity word in the text to be recognized;
and correcting the entity type according to the feature identification, the structural relationship among the entity word texts and the intermediate result to obtain the recognition result of the text to be recognized.
Optionally, the first correcting unit 731 is specifically configured to:
correcting the entity word text according to the occurrence frequency of different words in the word frequency word segmentation dictionary;
and selecting one of a plurality of word segmentation modes to be confirmed as an intermediate result according to the standard field word segmentation dictionary.
Optionally, the identifying module 720 is specifically configured to:
inputting a text to be recognized into an entity recognition model, and obtaining a feature vector of the text to be recognized through a vectorization processing module;
extracting the characteristic of the characteristic vector through a characteristic extraction module to obtain attribute information of the sample to be identified, wherein the attribute information comprises the part of speech and the language structure of the sample to be identified;
and processing the sample to be recognized through the recognition module based on the attribute information of the sample to be recognized to obtain an output result of the text to be recognized.
Optionally, the identifying module 720 is further configured to:
processing attribute information of a sample to be identified through a full-connection layer in an identification module to obtain a full-connection vector;
processing the full-connection vector by adopting an activation function to obtain a prediction result set of the sample to be identified, wherein the prediction result set comprises a plurality of label types;
and taking the maximum value of the probability values corresponding to the same label type in the plurality of prediction result sets as the output result of the text to be recognized.
Optionally, the training process of the entity recognition model includes:
acquiring historical text data, and dividing the historical text data into a training set and a verification set;
training an entity recognition model to be constructed by using a training set to obtain an entity recognition model to be verified;
and optimizing the entity identification model to be verified according to the loss function minimization by utilizing the entity identification model to be verified in the verification set to obtain the entity identification model.
It can be understood that the functions of the functional modules of the named entity identifying device in this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.
To sum up, the named entity recognition device provided by the embodiment of the application obtains an output result by performing recognition processing through the trained entity recognition model on the one hand, so that comprehensive and accurate guidance information is provided for subsequent correction processing, and on the other hand, corrects the output result through the entity word text feature preset reference item, combines the text features of the entity words in the text to be recognized before and after the entity words, and then accurately recognizes the condition that two identical text entity words exist in one sentence, so that the accuracy of named entity recognition of the text to be recognized is higher.
In another aspect, a computer device provided in this embodiment includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor executes the computer program to implement the named entity identifying method as described above.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a computer system of a terminal device according to an embodiment of the present application.
As shown in fig. 8, the computer system 300 includes a Central Processing Unit (CPU) 301 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage section 303 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the system 300 are also stored. The CPU 301, ROM 302, and RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
The following components are connected to the I/O interface 305: an input portion 306 including a keyboard, a mouse, and the like; an output portion 307 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 308 including a hard disk and the like; and a communication section 309 including a network interface card such as a LAN card, a modem, or the like. The communication section 309 performs communication processing via a network such as the internet. A drive 310 is also connected to the I/O interface 305 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 310 as necessary, so that a computer program read out therefrom is mounted into the storage section 308 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 303, and/or installed from the removable medium 311. The above-described functions defined in the system of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 301.
It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor, and may be described as: a processor, comprising: the device comprises an acquisition module, an identification module and a correction module. The names of these units or modules do not in some cases constitute a limitation to the units or modules themselves, and for example, the acquiring module may also be described as "for acquiring text to be recognized".
As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may be separate and not incorporated into the electronic device. The computer-readable storage medium stores one or more programs that, when executed by one or more processors, perform the named entity recognition method described herein:
acquiring a text to be identified;
inputting the text to be recognized into a trained entity recognition model for recognition processing to obtain an output result of the text to be recognized, wherein the output result comprises an entity type and an entity word text in the text to be recognized;
and acquiring an entity word text feature preset reference item and correcting the output result based on the entity word text feature preset reference item to obtain an identification result of the text to be identified, wherein the entity word text feature preset reference item is used for representing text features before and after an entity word in the text to be identified.
To sum up, in the method, the apparatus, the device, and the storage medium for recognizing a named entity provided in the embodiments of the present application, a text to be recognized is obtained, and the text to be recognized is input to a trained entity recognition model for recognition processing, so as to obtain an output result of the text to be recognized, where the output result includes an entity type and an entity word text in the text to be recognized, and then an entity word text feature preset reference item is obtained and the output result is corrected based on the entity word text feature preset reference item, so as to obtain a recognition result of the text to be recognized, where the entity word text feature preset reference item is used to represent text features before and after an entity word in the text to be recognized. Compared with the prior art, on one hand, the technical scheme obtains an output result by carrying out recognition processing through the trained entity recognition model, so that comprehensive and accurate guide information is provided for subsequent correction processing, on the other hand, the output result is corrected through the entity word text characteristic preset reference item, and the front and rear text characteristics of the entity words in the text to be recognized are combined, so that the condition that two identical text entity words exist in one sentence is recognized accurately, and the accuracy of recognizing the named entities of the text to be recognized is higher.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention according to the present application is not limited to the specific combination of the above-mentioned features, but also covers other embodiments where any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.
Claims (10)
1. A named entity recognition method, comprising:
acquiring a text to be recognized;
inputting the text to be recognized into a trained entity recognition model for recognition processing to obtain an output result of the text to be recognized, wherein the output result comprises an entity type and an entity word text in the text to be recognized;
and acquiring an entity word text feature preset reference item and correcting the output result based on the entity word text feature preset reference item to obtain an identification result of the text to be identified, wherein the entity word text feature preset reference item is used for representing text features before and after an entity word in the text to be identified.
2. The method according to claim 1, wherein the modifying the output result based on the entity word text feature preset reference item to obtain the recognition result of the text to be recognized comprises:
correcting the entity word text by adopting a preset dictionary to obtain an intermediate result, wherein the preset dictionary comprises a standard field word segmentation dictionary and a word frequency word segmentation dictionary corresponding to the text to be recognized;
and correcting the entity type based on the intermediate result and the text characteristics before and after the entity word in the text to be recognized to obtain the recognition result of the text to be recognized.
3. The method of claim 2, wherein the modifying the entity type based on the intermediate result and the text features before and after the entity word in the text to be recognized to obtain the recognition result of the text to be recognized comprises:
determining feature identification and structural relation between the front and rear text features and the entity word text based on the front and rear text features of the entity word in the text to be recognized;
and correcting the entity type according to the feature identification, the structural relationship between the front and rear text features and the entity word text and the intermediate result to obtain the recognition result of the text to be recognized.
4. The method of claim 2, wherein the modifying the entity word text using a predetermined dictionary to obtain an intermediate result comprises:
correcting the entity word text according to the occurrence frequency of different words in the word frequency segmentation dictionary;
and selecting one of a plurality of word segmentation modes to be confirmed as an intermediate result according to the standard field word segmentation dictionary.
5. The method of claim 1, wherein inputting the text to be recognized into an entity recognition model for recognition processing to obtain an output result of the text to be recognized, comprises:
inputting the text to be recognized into an entity recognition model, and obtaining a feature vector of the text to be recognized through a vectorization processing module;
performing feature extraction on the feature vector through a feature extraction module to obtain attribute information of the sample to be identified, wherein the attribute information comprises the part of speech and the language structure of the sample to be identified;
and processing the sample to be recognized through a recognition module based on the attribute information of the sample to be recognized to obtain an output result of the text to be recognized.
6. The method according to claim 5, wherein the processing by the recognition module based on the attribute information of the sample to be recognized to obtain the output result of the text to be recognized comprises:
processing the attribute information of the sample to be identified through a full-connection layer in the identification module to obtain a full-connection vector;
processing the full-connection vector by adopting an activation function to obtain a prediction result set of the sample to be identified, wherein the prediction result set comprises a plurality of label types;
and taking the maximum value of the probability values corresponding to the same label type in the prediction result sets as the output result of the text to be recognized.
7. The method of claim 1, wherein the training process of the entity recognition model comprises:
acquiring historical text data, and dividing the historical text data into a training set and a verification set;
training the entity recognition model to be constructed by using the training set to obtain the entity recognition model to be verified;
and optimizing the entity identification model to be verified according to the loss function minimization by utilizing the verification set to obtain the entity identification model.
8. An apparatus for named entity recognition, the apparatus comprising:
the acquisition module is used for acquiring a text to be recognized;
the recognition module is used for inputting the text to be recognized into a trained entity recognition model for recognition processing to obtain an output result of the text to be recognized, wherein the output result comprises an entity type and an entity word text in the text to be recognized;
and the correction module is used for acquiring an entity word text feature preset reference item and correcting the output result on the basis of the entity word text feature preset reference item to obtain the recognition result of the text to be recognized, wherein the entity word text feature preset reference item is used for representing the text features of the text to be recognized before and after the entity word.
9. A computer device, characterized in that the computer device comprises a memory, a processor and a computer program stored in the memory entropy and executable on the processor, the processor being adapted to implement the named entity recognition method according to any one of claims 1-7 when executing the program.
10. A computer-readable storage medium, on which a computer program for implementing a named entity recognition method according to any one of claims 1-7 is stored.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211092788.4A CN115169352A (en) | 2022-09-08 | 2022-09-08 | Named entity recognition method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211092788.4A CN115169352A (en) | 2022-09-08 | 2022-09-08 | Named entity recognition method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115169352A true CN115169352A (en) | 2022-10-11 |
Family
ID=83480407
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211092788.4A Pending CN115169352A (en) | 2022-09-08 | 2022-09-08 | Named entity recognition method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115169352A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110852106A (en) * | 2019-11-06 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Named entity processing method and device based on artificial intelligence and electronic equipment |
CN112215008A (en) * | 2020-10-23 | 2021-01-12 | 中国平安人寿保险股份有限公司 | Entity recognition method and device based on semantic understanding, computer equipment and medium |
-
2022
- 2022-09-08 CN CN202211092788.4A patent/CN115169352A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110852106A (en) * | 2019-11-06 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Named entity processing method and device based on artificial intelligence and electronic equipment |
CN112215008A (en) * | 2020-10-23 | 2021-01-12 | 中国平安人寿保险股份有限公司 | Entity recognition method and device based on semantic understanding, computer equipment and medium |
Non-Patent Citations (1)
Title |
---|
AKSHAY KULKARNI 等: "Natural Language Processing Projects: Build Next-Generation NLP Applications Using AI Techniques", APRESS, pages: 1 - 327 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111444340B (en) | Text classification method, device, equipment and storage medium | |
CN107679039B (en) | Method and device for determining statement intention | |
CN111737476B (en) | Text processing method and device, computer readable storage medium and electronic equipment | |
CN109697239B (en) | Method for generating teletext information | |
CN115422944A (en) | Semantic recognition method, device, equipment and storage medium | |
CN113627447B (en) | Label identification method, label identification device, computer equipment, storage medium and program product | |
CN113344206A (en) | Knowledge distillation method, device and equipment integrating channel and relation feature learning | |
CN113779358B (en) | Event detection method and system | |
CN113158656B (en) | Ironic content recognition method, ironic content recognition device, electronic device, and storage medium | |
CN112131881A (en) | Information extraction method and device, electronic equipment and storage medium | |
CN111858898A (en) | Text processing method and device based on artificial intelligence and electronic equipment | |
CN117521675A (en) | Information processing method, device, equipment and storage medium based on large language model | |
CN113254613A (en) | Dialogue question-answering method, device, equipment and storage medium | |
CN113743101A (en) | Text error correction method and device, electronic equipment and computer storage medium | |
CN116245097A (en) | Method for training entity recognition model, entity recognition method and corresponding device | |
CN115759254A (en) | Question-answering method, system and medium based on knowledge-enhanced generative language model | |
CN116467417A (en) | Method, device, equipment and storage medium for generating answers to questions | |
CN111858860B (en) | Search information processing method and system, server and computer readable medium | |
CN114419514A (en) | Data processing method and device, computer equipment and storage medium | |
CN115169352A (en) | Named entity recognition method, device, equipment and storage medium | |
CN113537372B (en) | Address recognition method, device, equipment and storage medium | |
CN117009532B (en) | Semantic type recognition method and device, computer readable medium and electronic equipment | |
CN113421551B (en) | Speech recognition method, speech recognition device, computer readable medium and electronic equipment | |
CN117152467B (en) | Image recognition method, device, medium and electronic equipment | |
CN117077656B (en) | Demonstration relation mining method and device, medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20221011 |