CN113536790A - Model training method and device based on natural language processing - Google Patents

Model training method and device based on natural language processing Download PDF

Info

Publication number
CN113536790A
CN113536790A CN202010293248.7A CN202010293248A CN113536790A CN 113536790 A CN113536790 A CN 113536790A CN 202010293248 A CN202010293248 A CN 202010293248A CN 113536790 A CN113536790 A CN 113536790A
Authority
CN
China
Prior art keywords
model
natural language
loss function
language processing
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010293248.7A
Other languages
Chinese (zh)
Inventor
王潇斌
徐光伟
龙定坤
马春平
丁瑞雪
谢朋峻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010293248.7A priority Critical patent/CN113536790A/en
Publication of CN113536790A publication Critical patent/CN113536790A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a model training method and device based on natural language processing, relates to the technical field of natural language processing, and mainly aims to improve the accuracy of model identification. The main technical scheme of the invention is as follows: the method comprises the steps of constructing a natural language processing model, wherein the natural language processing model comprises a first model and a second model, the first model is used for identifying a named entity, and the second model is used for identifying a precursor of the named entity; training the natural language processing model by adopting a multi-task learning mode to obtain a loss function of the natural language processing model; and optimizing according to the loss function of the natural language processing model to obtain the optimized natural language processing model.

Description

Model training method and device based on natural language processing
Technical Field
The invention relates to the technical field of natural language processing, in particular to a model training method and device based on natural language processing.
Background
Natural Language Processing (NLP) is a discipline that studies the linguistic problems of human interaction with computers. It is a research on various theories and methods that can achieve efficient communication between humans and computers using natural language, and consists of two major technical fields of natural language understanding and natural language generation. The NLP technology is based on technologies and resources such as big data, knowledge maps, machine learning, linguistics and the like, and can form a specific application system of machine translation, deep question answering and dialogue systems, so that various actual services and products are served.
Mathematical models based on natural language processing, such as hidden markov models, maximum entropy models, conditional random fields, and the like, need to be trained before being applied to an actual scene to ensure the accuracy of the output result of the model. However, in the process of training the model, the model is easy to be overfitted due to the fact that natural language in the training sample has a word-of-word understanding, so that the accuracy is reduced. For example, when named entity recognition is performed, training using a sample with multiple words (a human hospital, which may be an organization name or a place name) easily causes over-fitting of the noun features of the model, which results in a decrease in model accuracy.
Disclosure of Invention
In view of the above problems, the present invention provides a model training method and apparatus based on natural language processing, and mainly aims to improve the accuracy of model recognition.
In order to achieve the purpose, the invention mainly provides the following technical scheme:
in one aspect, the present invention provides a model training method based on natural language processing, which specifically includes:
the method comprises the steps of constructing a natural language processing model, wherein the natural language processing model comprises a first model and a second model, the first model is used for identifying a named entity, and the second model is used for identifying a precursor of the named entity;
training the natural language processing model by adopting a multi-task learning mode to obtain a loss function of the natural language processing model;
and optimizing according to the loss function of the natural language processing model to obtain the optimized natural language processing model.
Preferably, the first model adopts a BilSTM-CRF model, and the structure of the first model comprises:
an input layer to convert an input sentence into a sequence of vectors, the input sentence comprising a plurality of words;
the BilSTM layer is used for combining the vector sequence converted by the input layer with context information to generate a corresponding feature vector;
the full connection layer is used for receiving the characteristic vectors generated by the BilSTM layer and calculating the distribution probability of the output label corresponding to each word on all labels;
and the CRF layer is used for determining the output sequences of all the labels according to the distribution probability output by the full connection layer and a preset rule.
Preferably, the converting the input sentence into the vector sequence includes:
splitting the input sentence into a word sequence;
and obtaining the vector representation of each word to obtain the vector sequence of the input sentence.
Preferably, the BilTM layer comprises a plurality of LSTM units, each LSTM unit is used for outputting a feature vector with a fixed length corresponding to a word vector in the vector sequence, and the plurality of LSTM units have an association relationship with each other corresponding to the word vector arrangement order in the vector sequence.
Preferably, the structure of the second model includes: an input layer and a classification layer;
the input layer is used for acquiring the vector representation of each word in the input sentence and the characteristic vector generated by the BilSTM layer, and splicing the vector representation of the specific word and the corresponding vector of the specific word in the characteristic vector to obtain a spliced vector;
and the classification layer is used for determining the distribution probability of the next word adjacent to the specific word as a named entity according to the spliced vector by utilizing a full-connection neural network.
Preferably, the loss function of the natural language processing model is obtained by:
setting weights for the loss function of the first model and the loss function of the second model respectively;
and calculating the loss function of the natural language processing model according to the loss function of the first model and the loss function of the second model after the weight is set.
Preferably, the loss function of the first model adopts a CRF likelihood function; and/or the presence of a gas in the gas,
and the loss function of the second model adopts a cross entropy loss function, and the cross entropy loss function is the cross entropy of the prediction distribution probability and the actual distribution probability of the second model aiming at the named entity.
In another aspect, the present invention provides a model training apparatus based on natural language processing, which specifically includes:
the system comprises a setting unit, a processing unit and a processing unit, wherein the setting unit is used for constructing a natural language processing model which comprises a first model and a second model, the first model is used for identifying a named entity, and the second model is used for identifying a precursor of the named entity;
the training unit is used for training the natural language processing model constructed by the setting unit in a multi-task learning mode to obtain a loss function of the natural language processing model;
and the optimization unit is used for optimizing according to the loss function of the natural language processing model obtained by the training unit to obtain the optimized natural language processing model.
Preferably, the first model adopts a BilSTM-CRF model, and the structure of the first model comprises:
an input layer to convert an input sentence into a sequence of vectors, the input sentence comprising a plurality of words;
the BilSTM layer is used for combining the vector sequence converted by the input layer with context information to generate a corresponding feature vector;
the full connection layer is used for receiving the characteristic vectors generated by the BilSTM layer and calculating the distribution probability of the output label corresponding to each word on all labels;
and the CRF layer is used for determining the output sequences of all the labels according to the distribution probability output by the full connection layer and a preset rule.
Preferably, the input layer is specifically configured to: splitting the input sentence into a word sequence; and obtaining the vector representation of each word to obtain the vector sequence of the input sentence.
Preferably, the BilTM layer comprises a plurality of LSTM units, each LSTM unit is used for outputting a feature vector with a fixed length corresponding to a word vector in the vector sequence, and the plurality of LSTM units have an association relationship with each other corresponding to the word vector arrangement order in the vector sequence.
Preferably, the structure of the second model includes: an input layer and a classification layer;
the input layer is used for acquiring the vector representation of each word in the input sentence and the characteristic vector generated by the BilSTM layer, and splicing the vector representation of the specific word and the corresponding vector of the specific word in the characteristic vector to obtain a spliced vector;
and the classification layer is used for determining the distribution probability of the next word adjacent to the specific word as a named entity according to the spliced vector by utilizing a full-connection neural network.
Preferably, the loss function of the natural language processing model is obtained by:
setting weights for the loss function of the first model and the loss function of the second model respectively;
and calculating the loss function of the natural language processing model according to the loss function of the first model and the loss function of the second model after the weight is set.
Preferably, the loss function of the first model adopts a CRF likelihood function; and/or the presence of a gas in the gas,
and the loss function of the second model adopts a cross entropy loss function, and the cross entropy loss function is the cross entropy of the prediction distribution probability and the actual distribution probability of the second model aiming at the named entity.
In another aspect, the present invention provides a processor for executing a program, where the program executes the above model training method based on natural language processing.
By means of the technical scheme, the model training method and the device based on natural language processing enable the second model to recognize the leading characters of the named entities by building the model structure with the first model and the second model, thereby providing reference information for the first model to recognize the named entities, enabling the built natural language processing model to have higher accuracy of recognizing the named entities and avoiding the appearance of over-fitting phenomenon, when the natural language processing model is trained, the loss function of the model is obtained by utilizing a multi-task learning mode, namely the optimal solution of the model loss function is obtained by optimizing the combination of the loss functions based on the first model and the second model, so that the trained natural language processing model can be comprehensively analyzed based on the recognition of the leading characters of the named entities when the named entities are recognized, thereby improving the accuracy of the model in identifying the named entity.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart of a model training method based on natural language processing according to an embodiment of the present invention;
FIG. 2 is a block diagram illustrating the structure of a natural language processing model according to an embodiment of the present invention;
FIG. 3 is a block diagram illustrating a model training apparatus based on natural language processing according to an embodiment of the present invention;
fig. 4 is a block diagram illustrating another model training apparatus based on natural language processing according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Natural language processing, i.e., implementing man-machine natural language communication, or implementing natural language understanding and natural language generation, is very difficult. The underlying cause of the difficulty is the wide variety of ambiguities or ambiguities that exist widely across the various levels of natural language text and dialog. A chinese text is formally a string of chinese characters (including punctuation marks, etc.). Words can be composed by words, phrases can be composed by words, sentences can be composed by phrases, and further paragraphs, sections, chapters and pieces can be composed by sentences. At whatever level above: even if there is ambiguity or ambiguity in the next level to the next level, the word, phrase, sentence, segment, etc., a piece of character string with the same form can be understood as different word strings, word group strings, etc. and have different meanings in different scenes or different contexts.
Natural language processing is a model for researching language ability and language application, is realized by establishing a computer algorithm framework, is perfected through training and is evaluated, and is finally used for various practical systems. Scenarios to which the model of natural language processing applies include information retrieval, machine translation, document classification, information extraction, text mining, and the like.
The embodiment of the invention provides a model training method based on natural language processing, which has the main application scene of text content analysis, in particular to the recognition of named entities in texts. The specific steps of the method are shown in fig. 1, and the method comprises the following steps:
step 101, constructing a natural language processing model, wherein the natural language processing model comprises a first model and a second model.
The first model is used for identifying the named entities in the text, and the second model is used for identifying the precursor words of the named entities.
In this step, the leading word identified for the second model refers to a word or a participle preceding the named entity class participle. That is, in the expression of natural language, a word or word followed by a leading word is a named entity with a high probability. Therefore, in the natural language processing model constructed in the step, the purpose of the second model identification leading word is to provide reference information for the first model identification named entity, and the reference information is determined according to the word segmentation incidence relation in the text.
And 102, training the natural language processing model by adopting a multi-task learning mode to obtain a loss function of the natural language processing model.
In the training process of the model, the sample with the labeling information is input into the model, and the output result is compared with the actual labeling information of the sample, so that the relevant parameters of the model are adjusted, the output of the model is closer to the actual result, and the model has an accurate prediction result. Therefore, when a model is constructed, a loss function is determined for the model, and the loss function is optimized by using a training sample to obtain an optimal solution, namely, the output of the model is closest to an actual result.
In this step, the loss function of the natural language processing model can be regarded as being formed by combining the loss function of the first model and the loss function of the second model, and the specific combination manner is not limited to weighted summation, averaging, maximum value taking and the like. In the step, the process of solving the optimal solution for the loss function of the natural language processing model is the optimal solution of the combination of the loss function of the first model and the loss function of the second model, so that the characteristics of the input text content and the text content associated with the text content are considered in the output result, and the overfitting of the model is avoided. Therefore, the step adopts a multi-task learning mode, and the loss function of the first model and the loss function of the second model are respectively set as two different tasks for combined training.
And 103, optimizing according to the loss function of the natural language processing model to obtain an optimized natural language processing model.
In this step, the process of optimizing the loss function of the natural language processing model is a process of solving for an optimal solution in the above step by means of multi-task learning, and for this reason, because multi-task learning is widely applied in the field of natural language processing, this step does not describe in detail the solving process.
Through the description of the above embodiments, the model training method based on natural language processing provided by the present invention mainly constructs a model structure having a first model and a second model, so that the second model can identify the predecessor of the named entity, reference information may be provided for the first model to identify named entities, which, when training the natural language processing model, the loss function of the model is also formed by the combination of the loss functions corresponding to the first model and the second model, i.e. the optimal solution of the model loss function is optimized based on the combination of the first model and the second model loss function, when the trained natural language processing model identifies the named entities to the text, the trained natural language processing model can be comprehensively analyzed based on the precursor words to determine the corresponding named entities, so that the accuracy of model output is improved, and the phenomenon of overfitting of the model is avoided.
Further, regarding the natural language processing model illustrated in fig. 1, the following detailed description describes a specific process of constructing and training the model when the model is applied to a scene of a named entity recognition task:
firstly, for constructing a named entity recognition model, a named entity recognition task is a basic task in natural language processing, namely, a named term is recognized from a text and is laid for tasks such as relation extraction and the like. In general, named entity recognition mainly refers to recognizing three types of named entities, namely, a person name, a place name and an organization structure name in a text. The current popular method in the field of named entity identification is to convert the named entity identification problem into a sequence labeling problem and then solve the problem by a sequence labeling method. The solution of general sequence labeling is as follows: hidden Markov models HMM or conditional random field CRF or BilSTM-maximum entropy. The first two of these are statistical learning methods and the last two are neural network methods.
In the embodiment of the invention, the named entity recognition model is constructed on the basis of a BilSTM-CRF model, and specifically comprises the following steps: the structure of the BiLSTM-CRF model which is taken as a first model can be mainly divided into four layers, wherein the four layers are respectively as follows: input layer, BilSTM layer, full connection layer, and CRF layer.
The input layer is used for converting sentences in the input text into vector sequences, namely, the sentences in the input text are expressed into word vector sequences or word vector sequences. In this embodiment, the input layer splits an input sentence into word sequences, and then obtains a vector representation of each word by looking up a table, thereby obtaining a vector sequence of the entire sentence. The table to be checked is a preset word vector comparison table, and the word and vector mapping relation is recorded in the comparison table.
And the BilSTM layer is used for combining the vector sequence converted by the input layer with the context information to generate a corresponding feature vector. The BilSTM layer is composed of a plurality of LSTM units, each LSTM unit is used for outputting a feature vector with fixed length corresponding to a word vector in a vector sequence, and the feature vector is determined based on the feature of the previous word vector and the feature of the next word vector, namely the feature vector can be regarded as the feature vector with each word fused with context information. Therefore, the LSTM units have an association relationship with each other corresponding to the arrangement order of the word vectors in the vector sequence.
And the full connection layer is used for receiving the characteristic vectors generated by the BilSTM layer and calculating the distribution probability of the output label corresponding to each word on all labels. For example, assume that there are two entity types: people (Person) and organizations (Organization), if the BIO tagging system is adopted, would get five entity tags: B-Person (beginning of Person participle), I-Person (middle of Person participle), B-Organization (beginning of Organization participle), I-Organization (middle of Organization participle), O (not belonging to any class), the full link layer will output its distribution probability at the corresponding label for each word, e.g. B-Person (1.5), I-Person (0.9), B-Organization (0.1), I-Organization (0.08), O (0.05).
And the CRF layer is used for determining the output sequences of all the labels according to the distribution probability output by the full connection layer and a preset rule. The rationality of the output sequence is analyzed according to a preset rule, so that a reasonable prediction result is obtained, and the marking of the named entity word segmentation in the sentence is realized. That is, the CRF layer searches for a globally optimal output sequence, i.e., combines words, according to the probability distribution for each word, and labels the named entities on the participles combined by the words.
The above is a description of the structure of the BiLSTM-CRF model, which is a common model currently used for named entity identification, and a detailed implementation principle thereof is not described in detail in this embodiment. For the named entity recognition model to be constructed in this embodiment, in addition to using the BiLSTM-CRF model as the first model, a second model needs to be constructed, where the second model is used to determine, through analysis of the input word vector, a distribution probability that a next word adjacent to the input word vector is a named entity, and a specific structure of the second model includes: an input layer and a classification layer.
The input layer is used for acquiring vectors of each word in a sentence of an input text, specifically, the input data of the output layer is derived from the outputs of the first model input layer and the BilSTM layer, that is, a first vector output by the input layer in the first model is spliced with a second vector output by the BilSTM layer, wherein the first vector is a vector representation of each word in the input sentence, and the second vector is a vector corresponding to a specific word corresponding to the first vector in a feature vector. For example, a first vector output by the input layer is [1,2], and a corresponding second vector in the feature vector output by the BiLSTM layer is [3,4], so that the two vectors are spliced by the input layer of the second model to obtain a vector [1,2,3,4], and as an output result, the vector is input to the classification layer.
And the classification layer is used for determining the distribution probability of the next word adjacent to the specific word as the named entity according to the spliced vector output by the input layer by utilizing the full-connection neural network.
In this embodiment, the purpose of the second model is to determine a leading word in the text, where the leading word refers to a word or a word before the word in the named entity class, for example, in the sentence "i go to beijing," beijing "is a place name, and the corresponding word" go "is the leading word, and the second model gives a distribution probability corresponding to a word whose next word is the named entity class by identifying the probability that the input word is the leading word, that is, the probability that the word identified by the second model is the leading word is larger, and the probability that the next word is the named entity is larger. And through the identification of the leading characters, the dimension of the named entity identification is increased, so that the phenomenon of model overfitting caused by different applications of a word with multiple meanings in different scenes is avoided.
Through the above description of the structures of the first model and the second model, the natural language processing model is further required to be constructed, and a loss function is determined for the natural language processing model. Specifically, the loss function of the first model adopts a CRF likelihood function, that is, a probability distribution of which label is output in the case of a given label, and determines an optimal given label. And adopting a cross entropy loss function as the loss function of the second model, wherein the cross entropy loss function is the cross entropy of the predicted distribution probability and the actual distribution probability of the second model for the named entity. According to the above description of the embodiments, the prediction distribution probability refers to the probability of whether a word is a precursor word, and the actual distribution probability refers to whether the word in the sample is a precursor word, in practical applications, the training sample only labels the participle or the word corresponding to the named entity, and does not label the precursor word, for which, a next word or word that needs to be read when the participle is processed is a labeled named entity, if so, the word is considered as a precursor word, otherwise, the word is determined not to be a precursor word.
Further, the loss function of the natural language processing model is obtained by weighting and summing the loss function of the first model and the loss function of the second model according to a preset weight, that is, when the weight of the loss function of the second model is larger, the influence of the leading word on the word segmentation recognition of the named entity is larger. For this reason, weights need to be set in advance for the loss function of the first model and the loss function of the second model, respectively.
According to the above-mentioned construction description of the natural language processing model, the specific structure of the natural language processing model is shown in fig. 2, wherein the input layer, the BiLSTM layer and the CRF layer are respectively shown in the first model, the fully connected layer is not shown, the input layer and the classification layer are included in the second model, and the input data source of the input layer and the BiLSTM layer of the first model. And the loss functions of the first model and the second model are combined to form the loss function of the natural language processing model.
And finally, training the natural language processing model by using the training sample with the label, namely solving the optimal solution of the loss function of the natural language processing model by using the training sample. In the invention, because the natural language processing model has a plurality of independent models and different models have different tasks to be processed, the natural language processing model can be trained in a multi-task learning mode during training so as to simultaneously optimize the loss functions of the plurality of models, and the loss function of the natural language processing model can obtain the optimal solution. Specifically, in this embodiment, a random gradient descent method is used for optimizing the loss function, and the random gradient descent method is a common method for solving the loss function in machine learning, so the principle and process for solving the loss function by the random gradient descent method to solve the optimal solution are not described in this embodiment.
The natural language processing model provided by the embodiment of the invention is applied to process and explain in the scene of named entity recognition, and the constructed natural language processing model can effectively recognize the leading characters in the sentence through the second model, so that the first model can be helped to more accurately recognize the named entities in the sentence, the over-fitting phenomenon of the ambiguous words is avoided, and the recognition accuracy of the model is improved. The named entity recognition scene may be a scene for entity recognition such as a name of a person, a place name, an organization name, and the like, and may be applied to a real-time scene such as a conversation, a bullet screen character, and the like, and may also be applied to a recognition scene of a common text.
Further, as an implementation of the method shown in fig. 1, an embodiment of the present invention provides a model training device based on natural language processing, and the device mainly aims to improve accuracy of natural language processing model recognition. For convenience of reading, details in the foregoing method embodiments are not described in detail again in this apparatus embodiment, but it should be clear that the apparatus in this embodiment can correspondingly implement all the contents in the foregoing method embodiments. As shown in fig. 3, the apparatus specifically includes:
the system comprises a setting unit 21, a processing unit and a processing unit, wherein the setting unit is used for constructing a natural language processing model, the natural language processing model comprises a first model and a second model, the first model is used for identifying a named entity, and the second model is used for identifying a precursor of the named entity;
the training unit 22 is configured to train the natural language processing model constructed by the setting unit 21 in a multi-task learning manner to obtain a loss function of the natural language processing model;
and the optimizing unit 23 is configured to perform optimization according to the loss function of the natural language processing model obtained by the training unit 22 to obtain an optimized natural language processing model.
Further, the first model set by the setting unit 21 adopts a BiLSTM-CRF model, and the structure thereof includes:
an input layer to convert an input sentence into a sequence of vectors, the input sentence comprising a plurality of words;
the BilSTM layer is used for combining the vector sequence converted by the input layer with context information to generate a corresponding feature vector;
the full connection layer is used for receiving the characteristic vectors generated by the BilSTM layer and calculating the distribution probability of the output label corresponding to each word on all labels;
and the CRF layer is used for determining the output sequences of all the labels according to the distribution probability output by the full connection layer and a preset rule.
Further, the input layer is specifically configured to: splitting the input sentence into a word sequence; and obtaining the vector representation of each word to obtain the vector sequence of the input sentence.
Further, the BilTM layer comprises a plurality of LSTM units, each LSTM unit is used for outputting a feature vector with a fixed length corresponding to one word vector in the vector sequence, and the plurality of LSTM units have an association relation corresponding to the arrangement order of the word vectors in the vector sequence.
Further, the structure of the second model set by the setting unit 21 includes: an input layer and a classification layer;
the input layer is used for acquiring the vector representation of each word in the input sentence and the characteristic vector generated by the BilSTM layer, and splicing the vector representation of the specific word and the corresponding vector of the specific word in the characteristic vector to obtain a spliced vector;
and the classification layer is used for determining the distribution probability of the next word adjacent to the specific word as a named entity according to the spliced vector by utilizing a full-connection neural network.
Further, as shown in fig. 4, the loss function of the natural language processing model obtained by the training unit 22 is obtained by:
a weight setting module 221, configured to set weights for the loss function of the first model and the loss function of the second model, respectively;
a determining module 222, configured to calculate a loss function of the natural language processing model according to the loss function of the first model and the loss function of the second model after the weight setting module 221 sets the weight.
Further, the loss function of the first model adopts a CRF likelihood function; and/or the presence of a gas in the gas,
and the loss function of the second model adopts a cross entropy loss function, and the cross entropy loss function is the cross entropy of the prediction distribution probability and the actual distribution probability of the second model aiming at the named entity.
In addition, the embodiment of the present invention further provides a processor, where the processor is configured to run a program, where the program executes the model training method based on natural language processing provided in any one of the above embodiments when running.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose preferred embodiments of the invention.
In addition, the memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (13)

1. A method of model training based on natural language processing, the method comprising:
the method comprises the steps of constructing a natural language processing model, wherein the natural language processing model comprises a first model and a second model, the first model is used for identifying a named entity, and the second model is used for identifying a precursor of the named entity;
training the natural language processing model by adopting a multi-task learning mode to obtain a loss function of the natural language processing model;
and optimizing according to the loss function of the natural language processing model to obtain the optimized natural language processing model.
2. The method of claim 1, wherein the first model is a BilSTM-CRF model, and the structure thereof comprises:
an input layer to convert an input sentence into a sequence of vectors, the input sentence comprising a plurality of words;
the BilSTM layer is used for combining the vector sequence converted by the input layer with context information to generate a corresponding feature vector;
the full connection layer is used for receiving the characteristic vectors generated by the BilSTM layer and calculating the distribution probability of the output label corresponding to each word on all labels;
and the CRF layer is used for determining the output sequences of all the labels according to the distribution probability output by the full connection layer and a preset rule.
3. The method of claim 2, wherein converting the input sentence into a sequence of vectors comprises:
splitting the input sentence into a word sequence;
and obtaining the vector representation of each word to obtain the vector sequence of the input sentence.
4. The method of claim 2, wherein the BilTM layer comprises a plurality of LSTM units, each LSTM unit is configured to output a feature vector with a fixed length corresponding to a word vector in the vector sequence, and wherein the plurality of LSTM units have an association relationship with each other corresponding to an arrangement order of the word vectors in the vector sequence.
5. The method according to any of claims 2-4, wherein the structure of the second model comprises: an input layer and a classification layer;
the input layer is used for acquiring the vector representation of each word in the input sentence and the characteristic vector generated by the BilSTM layer, and splicing the vector representation of the specific word and the corresponding vector of the specific word in the characteristic vector to obtain a spliced vector;
and the classification layer is used for determining the distribution probability of the next word adjacent to the specific word as a named entity according to the spliced vector by utilizing a full-connection neural network.
6. The method of claim 1, wherein the loss function of the natural language processing model is obtained by:
setting weights for the loss function of the first model and the loss function of the second model respectively;
and calculating the loss function of the natural language processing model according to the loss function of the first model and the loss function of the second model after the weight is set.
7. The method of claim 6, wherein the loss function of the first model employs a CRF likelihood function; and/or the presence of a gas in the gas,
and the loss function of the second model adopts a cross entropy loss function, and the cross entropy loss function is the cross entropy of the prediction distribution probability and the actual distribution probability of the second model aiming at the named entity.
8. A model training apparatus based on natural language processing, the apparatus comprising:
the system comprises a setting unit, a processing unit and a processing unit, wherein the setting unit is used for constructing a natural language processing model which comprises a first model and a second model, the first model is used for identifying a named entity, and the second model is used for identifying a precursor of the named entity;
the training unit is used for training the natural language processing model constructed by the setting unit in a multi-task learning mode to obtain a loss function of the natural language processing model;
and the optimization unit is used for optimizing according to the loss function of the natural language processing model obtained by the training unit to obtain the optimized natural language processing model.
9. The apparatus of claim 8, wherein the first model is a BiLSTM-CRF model configured to:
an input layer to convert an input sentence into a sequence of vectors, the input sentence comprising a plurality of words;
the BilSTM layer is used for combining the vector sequence converted by the input layer with context information to generate a corresponding feature vector;
the full connection layer is used for receiving the characteristic vectors generated by the BilSTM layer and calculating the distribution probability of the output label corresponding to each word on all labels;
and the CRF layer is used for determining the output sequences of all the labels according to the distribution probability output by the full connection layer and a preset rule.
10. The apparatus of claim 9, wherein the structure of the second model comprises: an input layer and a classification layer;
the input layer is used for acquiring the vector representation of each word in the input sentence and the characteristic vector generated by the BilSTM layer, and splicing the vector representation of the specific word and the corresponding vector of the specific word in the characteristic vector to obtain a spliced vector;
and the classification layer is used for determining the distribution probability of the next word adjacent to the specific word as a named entity according to the spliced vector by utilizing a full-connection neural network.
11. The apparatus of claim 8, wherein the loss function of the natural language processing model is obtained by:
setting weights for the loss function of the first model and the loss function of the second model respectively;
and calculating the loss function of the natural language processing model according to the loss function of the first model and the loss function of the second model after the weight is set.
12. The apparatus of claim 11, wherein the loss function of the first model employs a CRF likelihood function; and/or the presence of a gas in the gas,
and the loss function of the second model adopts a cross entropy loss function, and the cross entropy loss function is the cross entropy of the prediction distribution probability and the actual distribution probability of the second model aiming at the named entity.
13. A processor, configured to run a program, wherein the program is configured to execute the natural language processing based model training method according to any one of claims 1 to 7 when the program is run.
CN202010293248.7A 2020-04-15 2020-04-15 Model training method and device based on natural language processing Pending CN113536790A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010293248.7A CN113536790A (en) 2020-04-15 2020-04-15 Model training method and device based on natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010293248.7A CN113536790A (en) 2020-04-15 2020-04-15 Model training method and device based on natural language processing

Publications (1)

Publication Number Publication Date
CN113536790A true CN113536790A (en) 2021-10-22

Family

ID=78088121

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010293248.7A Pending CN113536790A (en) 2020-04-15 2020-04-15 Model training method and device based on natural language processing

Country Status (1)

Country Link
CN (1) CN113536790A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115640810A (en) * 2022-12-26 2023-01-24 国网湖北省电力有限公司信息通信公司 Method, system and storage medium for identifying communication sensitive information of power system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150088511A1 (en) * 2013-09-24 2015-03-26 Verizon Patent And Licensing Inc. Named-entity based speech recognition
WO2017097166A1 (en) * 2015-12-11 2017-06-15 北京国双科技有限公司 Domain named entity recognition method and apparatus
CN109062901A (en) * 2018-08-14 2018-12-21 第四范式(北京)技术有限公司 Neural network training method and device and name entity recognition method and device
CN109902307A (en) * 2019-03-15 2019-06-18 北京金山数字娱乐科技有限公司 Name the training method and device of entity recognition method, Named Entity Extraction Model
CN109992773A (en) * 2019-03-20 2019-07-09 华南理工大学 Term vector training method, system, equipment and medium based on multi-task learning
US20190244604A1 (en) * 2016-09-16 2019-08-08 Nippon Telegraph And Telephone Corporation Model learning device, method therefor, and program
CN110705294A (en) * 2019-09-11 2020-01-17 苏宁云计算有限公司 Named entity recognition model training method, named entity recognition method and device
CN110807325A (en) * 2019-10-18 2020-02-18 腾讯科技(深圳)有限公司 Predicate identification method and device and storage medium
CN110852108A (en) * 2019-11-11 2020-02-28 中山大学 Joint training method, apparatus and medium for entity recognition and entity disambiguation
CN110852103A (en) * 2019-10-28 2020-02-28 青岛聚好联科技有限公司 Named entity identification method and device
WO2020043123A1 (en) * 2018-08-30 2020-03-05 京东方科技集团股份有限公司 Named-entity recognition method, named-entity recognition apparatus and device, and medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150088511A1 (en) * 2013-09-24 2015-03-26 Verizon Patent And Licensing Inc. Named-entity based speech recognition
WO2017097166A1 (en) * 2015-12-11 2017-06-15 北京国双科技有限公司 Domain named entity recognition method and apparatus
US20190244604A1 (en) * 2016-09-16 2019-08-08 Nippon Telegraph And Telephone Corporation Model learning device, method therefor, and program
CN109062901A (en) * 2018-08-14 2018-12-21 第四范式(北京)技术有限公司 Neural network training method and device and name entity recognition method and device
WO2020043123A1 (en) * 2018-08-30 2020-03-05 京东方科技集团股份有限公司 Named-entity recognition method, named-entity recognition apparatus and device, and medium
CN109902307A (en) * 2019-03-15 2019-06-18 北京金山数字娱乐科技有限公司 Name the training method and device of entity recognition method, Named Entity Extraction Model
CN109992773A (en) * 2019-03-20 2019-07-09 华南理工大学 Term vector training method, system, equipment and medium based on multi-task learning
CN110705294A (en) * 2019-09-11 2020-01-17 苏宁云计算有限公司 Named entity recognition model training method, named entity recognition method and device
CN110807325A (en) * 2019-10-18 2020-02-18 腾讯科技(深圳)有限公司 Predicate identification method and device and storage medium
CN110852103A (en) * 2019-10-28 2020-02-28 青岛聚好联科技有限公司 Named entity identification method and device
CN110852108A (en) * 2019-11-11 2020-02-28 中山大学 Joint training method, apparatus and medium for entity recognition and entity disambiguation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAI ZHAO;NA SHI;ZHEN SA;HUA-XING WANG;CHUN-HUA LU;XIAO-YING XU;: "Text Mining and Analysis of Treatise on Febrile Diseases Based on Natural Language Processing", WORLD JOURNAL OF TRADITIONAL CHINESE MEDICINE, no. 01, 13 March 2020 (2020-03-13) *
祖木然提古丽・库尔班;艾山・吾买尔;: "中文命名实体识别模型对比分析", 现代计算机, no. 14, 15 May 2019 (2019-05-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115640810A (en) * 2022-12-26 2023-01-24 国网湖北省电力有限公司信息通信公司 Method, system and storage medium for identifying communication sensitive information of power system

Similar Documents

Publication Publication Date Title
Zeng et al. A convolution BiLSTM neural network model for Chinese event extraction
CN112015859A (en) Text knowledge hierarchy extraction method and device, computer equipment and readable medium
CN111046656A (en) Text processing method and device, electronic equipment and readable storage medium
CN111666766B (en) Data processing method, device and equipment
CN111339260A (en) BERT and QA thought-based fine-grained emotion analysis method
CN110597966A (en) Automatic question answering method and device
CN112069312B (en) Text classification method based on entity recognition and electronic device
CN110852089B (en) Operation and maintenance project management method based on intelligent word segmentation and deep learning
CN113761868B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN114298035A (en) Text recognition desensitization method and system thereof
CN108205524B (en) Text data processing method and device
CN111368066B (en) Method, apparatus and computer readable storage medium for obtaining dialogue abstract
CN112101526A (en) Knowledge distillation-based model training method and device
CN112100377A (en) Text classification method and device, computer equipment and storage medium
CN112699685A (en) Named entity recognition method based on label-guided word fusion
Tarride et al. A comparative study of information extraction strategies using an attention-based neural network
CN112214595A (en) Category determination method, device, equipment and medium
CN116561592A (en) Training method of text emotion recognition model, text emotion recognition method and device
Gales et al. Low-resource speech recognition and keyword-spotting
CN114742016A (en) Chapter-level event extraction method and device based on multi-granularity entity differential composition
CN113076758B (en) Task-oriented dialog-oriented multi-domain request type intention identification method
CN113536790A (en) Model training method and device based on natural language processing
CN114417891B (en) Reply statement determination method and device based on rough semantics and electronic equipment
Ajees et al. A named entity recognition system for Malayalam using conditional random fields
CN114648005B (en) Multi-segment machine reading and understanding method and device for multi-task joint learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination