CN115510869A - End-to-end Tibetan La lattice shallow semantic analysis method - Google Patents
End-to-end Tibetan La lattice shallow semantic analysis method Download PDFInfo
- Publication number
- CN115510869A CN115510869A CN202210602138.3A CN202210602138A CN115510869A CN 115510869 A CN115510869 A CN 115510869A CN 202210602138 A CN202210602138 A CN 202210602138A CN 115510869 A CN115510869 A CN 115510869A
- Authority
- CN
- China
- Prior art keywords
- semantic
- lattice
- lstm
- tibetan
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of Tibetan La lattice shallow semantic analysis, in particular to an end-to-end Tibetan La lattice shallow semantic analysis method, which comprises the following steps: 1. mapping the input character sequence taking the word as a unit and the corresponding mark sequence into a low-dimensional real-valued vector; 2. a gated high-speed connection mechanism GM is arranged in the vertical direction of the LSTM, and the timing sequence characteristics and the context semantic information of the input sentence are learned by adopting the BilSTM; the GM contains linear connections to the cell internal inputs and outputs, allowing information to propagate unobstructed between different layers; 3. calculating local normalized distribution of the semantic label at each moment by using softmax, so as to be used for constraint decoding by an output layer; 4. and when the Viterbi algorithm is used for decoding, the structural relationship between the output semantic labels is normalized by forcibly executing the set BIO and La lattice shallow semantic labeling constraints. The method can better perform the Tibetan La grid shallow semantic analysis.
Description
Technical Field
The invention relates to the technical field of Tibetan La lattice shallow semantic analysis, in particular to an end-to-end Tibetan La lattice shallow semantic analysis method.
Background
The aim of the La lattice shallow semantic analysis of the Tibetan language is to find out a predicate of a given sentence, determine related main semantic components by taking the predicate as a core, and mark corresponding semantic labels, so as to obtain the type of the given sentence and semantic roles played by the semantic components.
La lattice is the key point and difficulty in Tibetan language grammar book thirty song and pronunciation theory, and is also eight latticesThe main research content in the method is oriented to natural language processing, and the Tibetan language La superficial layer semantic analysis technology based on a machine learning method can help a plurality of upper Tibetan language natural language understanding tasks, such as semantic role labeling, semantic analysis, information extraction, automatic question answering, reading understanding, machine translation and the like. In addition, la lattice is a key knowledge that must be learned in the textbook of tibetan language, and only by skillfully mastering its concept and usage, the types of the tibetan language La can be accurately distinguished, the main semantic components of each sentence can be found, and the actual meaning of each sentence can be further analyzed. Therefore, the Tibetan La lattice shallow semantic analysis technology based on the machine learning method has certain practical application value in the learning of La lattices.
The traditional shallow semantic analysis task is closely related to syntactic analysis and depends heavily on the syntactic analysis result, which increases the complexity of the shallow semantic analysis. In recent years, with the continuous maturation of deep learning technology, an end-to-end model without syntactic input obtains good results on a shallow semantic analysis task. The document researches an end-to-end LSTM shallow semantic analysis method, obtains an effect superior to that of the traditional method for introducing syntactic information, successfully discloses the potential capability of LSTM in processing the potential syntactic structure of a sentence, and provides a theoretical basis and a reference basis for further research and improvement. At present, a Tibetan shallow semantic analysis method based on an end-to-end model is not found, and a literature report about La lattice shallow semantic analysis is not seen.
Disclosure of Invention
The invention provides an end-to-end Tibetan language La lattice shallow semantic analysis method which can overcome some or some defects in the prior art.
The invention discloses an end-to-end Tibetan La lattice shallow semantic analysis method, which comprises the following steps of:
1. mapping the input characteristic sequence taking the word as a unit and the corresponding mark sequence into a low-dimensional real-value vector;
2. a gated high-speed connection mechanism GM is arranged in the vertical direction of the LSTM, and the temporal characteristics and the context semantic information of the input sentence are learned by adopting the BilTM; the GM contains linear connections to the cell internal inputs and outputs, allowing information to propagate smoothly between different layers;
3. calculating local normalized distribution of the semantic tags at each moment by using softmax so as to be used for constraint decoding by an output layer;
4. and when decoding is carried out by using a Viterbi algorithm, the structural relationship between the output semantic labels is standardized by forcibly executing the set BIO and La grid shallow semantic labeling constraint.
Preferably, in step one, the method comprisesRepresents the trained GloVe word vector, represents the vocabulary by V,representing the set of tokens by C ∈ {0,1}, the most primitive input sequence { w 1 ,w 2 ,…,w T And the marker sequence m 1 ,m 2 ,…,m T Mapping into low-dimensional real value vector e (w) by lookup table t ) And e (m) t ) Wherein w is t E V and corresponding mark m t E is C; to this end, the vector e (w) may be divided t ) And e (m) t ) Are spliced into x l,t As a first layer of the LSTMEntering:
x l,t =[e(w t ),e(m t )]
wherein x is l,t Is the input to LSTM at time t of the ith layer, where l =1,t = [1,t =]。
Preferably, in the second step, the first LSTM is used for forward processing of the input sentences, and then the output of the layer is used as the input of the next layer for reverse processing, so that a foundation is laid for improving the learning capability of time sequence characteristics and fully acquiring context semantic information at each moment; LSTM is defined as follows:
wherein, delta l Represents the direction of the l-th layer LSTM when delta l LSTM direction is positive when = -1, when delta l Direction is reverse when = 1;
to stack LSTM in interleaved mode, the inputs x of a particular layer are arranged in the following manner l,t And a direction parameter delta l :
Input vector x l,t Is a character w t Word embedding and representation of w t Is an embedded concatenation of binary features (t = v) of a given predicate.
As a preferenceIn the second step, linear and nonlinear transformation weights between layers are controlled by means of GM in the vertical direction of LSTM, which is used for balancing the information transfer in the vertical direction; by λ l,t Gating device for representing GM, using output h of hidden layer after GM l,t The following can be changed:
h l ′ ,t =LSTM(h l-1,t ,h l,t-1 )。
preferably, in step two, to reduce overfitting, a rate of conjugate Dropout is used, with a shared Dropout mask D l Application to hidden states:
inputting a characteristic sequence x = { w) of La lattice sentences 1 ,w 2 ,…,w n H, the corresponding correct semantic tag sequence y = { y = 1 ,y 2 ,…,y n The log-likelihood of is:
preferably, in step three, the hidden state h according to the model l,t Output semantic tag y can be computed using softmax t Local normalized distribution of (c):
w in the above formula o Is a parameter matrix for softmax,is Kronecker delta, and the dimension is consistent with the number of semantic labels; the model training objective is givenThe probability of the correct label is maximized when entered.
The invention firstly uses the design idea of the LSTM for reference, and balances the information transmission in the vertical direction by arranging a gate control high-speed connection mechanism (GM) in the vertical direction of the LSTM. With GM, information can propagate more smoothly in the spatial and temporal dimensions with only minor information loss. Most importantly, GM contains a "gating" function that dynamically selects or ignores the vertical propagation of information so that different levels of abstract representation can be more easily passed to the output layer. Then using softmax, the local normalized distribution of semantic tags at each moment is calculated for constraint decoding by the output layer. And finally, when the Viterbi algorithm is used for decoding, the structural relationship between the output semantic labels is normalized by forcibly executing the BIO and La lattice shallow semantic labeling constraint set by the text, so that the accuracy of the final predicted semantic labels is improved.
Drawings
FIG. 1 is a schematic diagram of a shallow semantic analysis model architecture of La lattice in an embodiment;
FIG. 2 is a graph showing the difference between GM and LSTM in the examples;
FIG. 3 is a graph showing the effect of GM in the examples on experimental performance;
FIG. 4 is a diagram illustrating the effect of constrained decoding on experimental performance in an example embodiment;
FIG. 5 is a diagram illustrating an influence of a timing feature learning manner on experimental performance in an embodiment.
Detailed Description
For a further understanding of the present invention, reference is made to the following detailed description taken in conjunction with the accompanying drawings and examples. It is to be understood that the examples are illustrative of the invention and not limiting.
Examples
The embodiment provides an end-to-end Tibetan La lattice shallow semantic analysis method, which comprises the following steps of:
1. mapping the input character sequence taking the word as a unit and the corresponding mark sequence into a low-dimensional real-valued vector;
2. a gated high-speed connection mechanism GM is arranged in the vertical direction of the LSTM, and the temporal characteristics and the context semantic information of the input sentence are learned by adopting the BilTM; the GM contains linear connections to the cell internal inputs and outputs, allowing information to propagate smoothly between different layers;
3. calculating local normalized distribution of the semantic label at each moment by using softmax, so as to be used for constraint decoding by an output layer;
4. and when decoding is carried out by using a Viterbi algorithm, the structural relationship between the output semantic labels is standardized by forcibly executing the set BIO and La grid shallow semantic labeling constraint.
Because the Semantic Role Labeling (SRL) is a shallow semantic representation, which has the advantages of simplicity and easy use, multiple language applications, and deeper model and algorithm research, and the target of the shallow semantic analysis of the La lattice of the tibetan is similar to that of the semantic role labeling task, the model architecture shown in fig. 1 is designed by referring to and using the representative research in the semantic role labeling task, i.e. based on the work based on the Bi deep-LSTM and Self-annotation framework, and mainly comprises the following parts:
(1) Embedding layer: mapping the input characteristic sequence taking the word as a unit and the corresponding mark sequence into a low-dimensional real-value vector;
(2) LSTM layer: aiming at improving the learning capability of model time sequence characteristics and enhancing the semantic space expression capability of the model, the time sequence characteristics and context semantic information of an input sentence are learned by using BilSTM;
(3) Gating the high-speed link: to alleviate the problem of gradient disappearance when training BiLSTM, this embodiment uses GM, which controls the weights of the linear and nonlinear transformations between layers;
(4) Softmax layer: mapping semantic labels possibly output at each moment into a local normalized distribution of (0, 1) by using a Softmax function so as to carry out constraint decoding on an output layer;
(5) Constraint decoding layer: in order to constrain the structural relationship between the output semantic tags during decoding, the embodiment enforces the BIO and La lattice shallow semantic labeling constraints set by the embodiment during decoding by using the viterbi algorithm.
Embedding layer
In the task of shallow semantic analysis of the Tibetan La lattice, the method usesRepresents the trained GloVe word vector, represents the vocabulary by V,by usingRepresenting a set of tokens, the most primitive input sequence w 1 ,w 2 ,…,w T And the marker sequence m 1 ,m 2 ,…,m T Mapping the lookup table into a low-dimensional real-valued vector e (w) t ) And e (m) t ) Wherein w is t E V and corresponding mark m t E is C; to this end, the vector e (w) may be divided t ) And e (m) t ) Are spliced into x l,t As inputs to the LSTM first layer:
x l,t =[e(w t ),e(m t )]
wherein x is l,t Is the input to LSTM at time t of the ith layer, where l =1,t = [1,t =]。
Bidirectional LSTM
This embodiment converts the shallow semantic analysis of the Tibetan La lattice into an end-to-end sequence labeling task. The sequence labeling task has higher dependence on the learning capacity of the time sequence characteristics and the text context information, so that the first LSTM is used for forward processing of the input sentence, the output of the layer is used as the input of the next layer for reverse processing, and a foundation is laid for improving the learning capacity of the time sequence characteristics and fully acquiring the context semantic information of each moment; LSTM is defined as follows:
wherein, delta l Represents the direction of the l-th layer LSTM when delta l LSTM direction is positive when = -1, when delta l Direction is reverse when = 1;
to stack LSTM in interleaved mode, the inputs x of a particular layer are arranged in the following manner l,t And a direction parameter delta l :
Input vector x l,t Is a character w t Word embedding and representation of w t Is an embedded concatenation of the binary features (t = v) of a given predicate.
GM based on LSTM
In order to alleviate the problem of gradient disappearance when training BiLSTM, linear and nonlinear transformation weights between layers are controlled by means of GM in the vertical direction of the LSTM, which acts to balance the transfer of information in the vertical direction; by λ l,t Gating device for representing GM, using output h of hidden layer after GM l,t The following steps can be changed:
h′ l,t =LSTM(h l-1,t ,h l,t-1 )。
for clarity, a difference plot of GM based LSTM and normal LSTM is given, see fig. 2 for details.
In FIG. 2, h l-1,t The output representing the previous layer is also the input of the current layer. h' l,t Representing candidate outputs, also the output of the LSTM. GM to h l-1,t And h' l,t A linear connection is made which plays a great role in the high-speed transfer of information in the vertical direction, λ l,t It is decided how much information of the next layer can be transferred to the previous layer. During the training process, λ l,t The closer to 1, the more information is transferred to the next layer, when λ l,t When =1, the input is directly copied to the output without any change, and the information of the bottom layer can be more smoothly transferred to the top layer through the GM mechanism. On the contrary, λ l,t Closer to 0 means less information is transferred to the next layer when λ l,t =0, the GM degrades to the traditional LSTM. Since GM occurs inside neurons, the transfer of underlying information in the temporal direction is not affected.
To reduce overfitting, the rate of conjugated prediction Dropout is used, with a shared Dropout mask D l Application to the hidden state:
inputting a characteristic sequence x = { w) of La lattice sentences 1 ,w 2 ,…,w n }, the corresponding correct semantic tag sequence y = { y = 1 ,y 2 ,…,y n The log-likelihood of is:
softmax layer
Hidden state according to model h l,t Output semantic tag y can be computed using softmax t Local normalized distribution of (c):
w in the above formula o Is a softmax parameterA matrix of numbers is formed by a matrix of numbers,is Kronecker delta, and the dimension is consistent with the number of semantic labels; the goal of model training is to maximize the probability of a correct label given the input.
Constraint decoding layer
In order to combine constraints on an output structure during decoding, this embodiment sets the shallow semantic labeling constraints of BIO and La lattice according to the BIO sequence labeling method and the La lattice shallow semantic label specification. And finally, the two constraints are enforced when decoding is carried out by using a Viterbi algorithm, and the constraints are exemplified as follows:
(1) BIO constraint
BIO is a commonly used sequence labeling method in NLP, B denotes the beginning of a labeled fragment, I denotes the middle or end of the labeled fragment, and O denotes others. It constrains to reject any sequence that does not produce a valid BIO transition, e.g., B-A0 followed by I-A1, etc.
(2) La lattice shallow semantic annotation constraint
Unique semantic tags: semantic tags A0, A1, A2 and the proprietary tags in Table 1 can only appear once at most for each La lattice sentence pattern;
restricted semantic tags: rejecting any proprietary tags in Table 1 from cross-appearing in different sentence patterns, e.g. AM-Bas followed by AM-L 2 And so on.
Sequential semantic tags: rejecting any proprietary tag AM-L in Table 1 i Where the sequence appears before another proprietary tag, e.g. AM-L 1 Followed by AM-Bas, etc.
Continuation semantic tags: continuation semantic tags may exist only if their base semantic tag is implemented before it, e.g., B-A0 immediately follows I-A0, etc.
TABLE 1 bounding Table of proprietary and common semantic tags
Due to the deficiency in the La latticeIn five usages of prime, eagle, isomate and hour, the rest of the fictitious words can be replaced randomly in principleAndtherefore they are collectively called La latticeThe La lattice includes two kinds of free and non-free null words, whereinFor the non-free virtual words, the additional word is limited to the postaddition word of the previous syllable,andthe addition is not limited by the previous syllable for free null words.
According to the semantic function, the interpolation rule and different usages of the La lattice virtual words, the usage of the La lattice virtual words can be divided into five types of sentence patterns of table operation lattices, eggers, isomorphism and time lattices:
business grid sentence: sentences that represent actions that have been or are being performed and actions to be performed at a certain place of implementation [9] . One characteristic of a sentence is that there is an implementation place in the sentenceTo perform ground movementAnd La lattice particleThree main semantic components, one of which is absent. Such as " (in library learning)', the place where the action is performed is "(bookstore) ", the action of implementation is"(learning)' and La lattice syllabary areIf the machine can correctly recognize and distinguish these semantic components, the semantics can be basically correctly understood.
As a lattice sentence: a sentence representing an action to be performed for a certain purpose. One characteristic of a grid sentence is that there is a purpose in the sentencePerforming ground-based actions for a certain purpose And La lattice particleThree main semantic components, one of which is missing. Such as "(efforts to gain knowledge) ', the purpose is'(to know that"recognition", the action performed to achieve the goal is "(effort) ", la lattice abbreviation isIf the machine can correctly recognize and distinguish these semantic components, the semantics can be understood correctly basically.
According to the lattice sentence: the sentence which represents that something depends on a certain place added with La lattice word. The example sentence has two main characteristics, one is that there is some place where La lattice word is added and connected in the sentenceA dependent objectAnd La lattice particleThree main semantic components. Such as " (students in classroom are many)' where "(classroom) ", the storage is"(student) ", la lattice virtual words areIf the machine can correctly identify and distinguish the semantic components, the semantic meaning of the machine can be basically correctly understood; secondly, when the predicate in the sentence pattern is the existence of the assistant wordAndwhen equal, there may be no component of the dependency in the first feature, e.g.In, the place is "(classroom), the existence of the auxiliary word is "(there) ", la lattice abbreviation isIf the machine can correctly recognize and distinguish the semantic components, the machine can also basically correctly understand the semantics.
And (4) sentence with same lattice: a sentence that represents that something changes to another thing, so that both have the property of "unity" in the result of the change in the action behavior. The same example sentence has two main characteristics, one is that there is a certain event in the sentenceAnother thing to change toAnd La lattice particle Three main semantic components. Such as "(translation of Chinese into Tibetan)', a certain thing is "(Han dynasty) ", another thing that became is"(Tibetan)', la lattice dotted word isIf the machine can correctly identify and distinguish the semantic components, the semantics of the machine can be basically correctly understood; second, there is a result of action behavior change in the sentenceAct of actionAnd La lattice particleThree main semantic components. Such as "(let him happy) "the result of the change in action behavior is"(Tibetan) ', action behavior is'(let) ", la lattice syllabic notation isIf the machine can correctly identify and distinguish these semantic components, the machine can also basically correctly understand the semantics thereof.
The time frame sentence: a sentence representing the time when a certain action behavior is implemented. The sentence pattern has two main characteristics, one is that there is time for implementing a certain action in the sentenceImplement a certain action behaviorAnd La lattice particleThree main semantic components, e.g.) "(three years of research) ', the time to perform action is'(three years) ", the action performed is" (study) ", la lattice particle isIf the machine can correctly identify and distinguish the semantic components, the semantics of the machine can be basically correctly understood; second, only the time of implementing a certain action in the sentence And La lattice particleTwo semantic components, one action behavior without enforcementThe phenomenon of this semantic component is more common, e.g.) "(three years of He)', the time to perform an action is "(three years) ", la lattice virtual words areIf the machine can correctly identify and distinguish these semantic components, the machine can also basically correctly understand the semantics thereof.
Experiment of
Since there is no published Tibetan La lattice shallow semantic analysis dataset, in this embodiment, 2 ten thousand sentences containing only one La lattice particle word are first extracted from the Tibetan corpus constructed by the laboratory topic group. Then 12000 sentences are selected out for marking La grid shallow semantic by preprocessing the sentences. And finally, according to the La lattice shallow semantic mark specification formulated by the embodiment, the construction of the Tibetan La lattice shallow semantic analysis data set is completed in a manual marking mode, and the TLSD is called as the TLSD for short for convenience in later use. In the experiment, the TLSD data set was divided into a training set, a validation set, and a test set in a ratio of 8.
In the experimental process, in order to ensure the comparability of experimental results, parameter adjusting ranges of hyper-parameters of all models are limited, and finally, the current optimal hyper-parameter combination is selected in the limited range after multiple parameter adjustments, wherein the model parameters are detailed in a table 2.
TABLE 2 model parameters
Baseline method
Since the literature about Tibetan La superficial semantic analysis is not referred to at present, and a Tibetan La superficial semantic analysis data set which is not disclosed is added, the effect of the model of the embodiment cannot be verified by directly comparing with the work of predecessors.
Based on the above reasons, we will select several methods in the references used in building the model of the present embodiment as baseline models to verify the effect of the model of the present embodiment.
(1) LSTM + CRF: the semantic role labeling method based on the deep bidirectional RNN is disclosed.
(2) DBLSTM: the method is a semantic role labeling method based on the depth bidirectional LSTM.
(3) Self-orientation: the method is a semantic role labeling method based on a self-attention mechanism.
(4) End-to-End: the method is a span-based end-to-end semantic role labeling method.
(5) BilSTM + GM + CD and BilSTM + GM + CD (V) are models of the embodiment, which respectively refer to a Tibetan La lattice shallow semantic analysis model when predicates in sentences need to be predicted by the models and given in advance.
Evaluation index
In the embodiment, the model performance is evaluated by selecting the commonly used evaluation index Accuracy (ACC) in the sequence labeling task. Let TP denote positive samples predicted to be positive, FP denote negative samples predicted to be positive, FN denote positive samples predicted to be negative, TN denote negative samples predicted to be negative, the Accuracy (ACC) is calculated as:
results and analysis of the experiments
Comparison of model Performance on TLSD datasets
In order to verify the effectiveness and superiority of the model of the embodiment, the Tibetan language La lattice shallow semantic analysis performance of the baseline model and the model of the embodiment is compared, and the experimental comparison result of each model is shown in Table 3.
TABLE 3 experimental comparison of the models
As can be seen from the experimental results in Table 3, the performance of the model of the present embodiment is improved compared with that of several baseline models. When the model predicts predicates by itself, the analysis accuracy of the La lattice shallow semantic of the LSTM + CRF, DBLSTM, self-orientation and End-to-End on the test set is respectively improved by 3.33, 3.01, 1.58 and 1.87 percentage points, which shows that the analysis performance of the La lattice shallow semantic of the Tibetan language of the model of the embodiment is better. In addition, the model of the embodiment has a good effect when jointly predicting the predicates and other corresponding semantic labels, the analysis accuracy of the La lattice shallow semantic is only 0.93% lower than that of the predicates given in advance on the test set, and the superiority of the model of the embodiment is verified.
The Tibetan La lattice shallow semantic analysis task has good effects on several baseline models and the model of the embodiment, and the main reasons are three, namely, all sentences in the TLSD data set are Tibetan single sentences only containing one La lattice particle word; secondly, the sentences in the TLSD data set are not long and the length is between 4 and 30 words; and thirdly, compared with the semantic role labeling task, the shallow semantic label of the La lattice is easy to identify and label. In addition, the second reason why the model of the present embodiment has better performance relative to the baseline model is that the transmission of information in the vertical direction is balanced by the device GM, and the problem of gradient disappearance is further alleviated; and secondly, a more reasonable output semantic tag structure is obtained by constraining the output structure during decoding.
Validation of GM
In order to verify the validity of GM, the La lattice shallow semantic analysis effect of the model when using the BilSTM of GM and when using the ordinary BilSTM was examined respectively. The first mode is a mode using ordinary BilSTM, the second mode is a mode using GM BilSTM, and the experimental results on the test set are shown in FIG. 3.
As can be seen from FIG. 3, the accuracy when using GM BilSTM was 1.06% higher than when using ordinary BilSTM, confirming the effectiveness of GM.
Validity verification of constrained decoding method
In order to verify the effectiveness of the constraint decoding method, the La lattice shallow semantic analysis effect of the model when constraint decoding is used and not used is examined respectively. The first method is a method without using constraint decoding, the second method is a method using constraint decoding, and the experimental results on the test set are shown in fig. 4.
As can be seen from FIG. 4, the La lattice shallow semantic analysis accuracy of the model is 0.76% higher when constraint decoding is used than when constraint decoding is not used, and the effectiveness of the constraint decoding method is verified.
Model performance impact of timing characteristic learning mode
In order to examine the influence of the time sequence characteristic learning mode on the model performance, the La lattice shallow semantic analysis effects of the model when LSTM and BilSTM are used are respectively compared. The first method is a method for learning the temporal characteristics by using LSTM, and the second method is a method for learning the temporal characteristics by using BiLSTM, and the experimental results on the test set are shown in fig. 5.
As can be seen from FIG. 5, the La lattice shallow semantic analysis accuracy of the model is 2.57% higher when using BilSTM to learn the temporal features than when using LSTM to learn the temporal features, which indicates that the model has better performance when using BilSTM to learn the temporal features.
Final phrase
In order to alleviate the problem of gradient disappearance and balance the transfer of information in the vertical direction, the embodiment mixes linear and nonlinear information by arranging a GM in the vertical direction of an LSTM, so that the information is more smoothly spread in the spatial and temporal dimensions. In order to standardize the structural relationship between the output semantic tags and improve the accuracy of the predicted semantic tags, the BIO and La lattice shallow semantic labeling constraints set by the embodiment are enforced when decoding is performed by using the Viterbi algorithm. The experimental result shows that on the test set, the La lattice shallow semantic analysis accuracy of the method reaches 90.59%, and the performance is superior to that of a plurality of baseline models.
The present invention and its embodiments have been described above schematically, and the description is not intended to be limiting, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, if the person skilled in the art receives the teaching, without departing from the spirit of the invention, the person skilled in the art shall not inventively design the similar structural modes and embodiments to the technical solution, but shall fall within the scope of the invention.
Claims (6)
1. An end-to-end Tibetan La lattice shallow semantic analysis method is characterized by comprising the following steps: the method comprises the following steps:
1. mapping the input character sequence taking the word as a unit and the corresponding mark sequence into a low-dimensional real-valued vector;
2. a gated high-speed connection mechanism GM is arranged in the vertical direction of the LSTM, and the timing sequence characteristics and the context semantic information of the input sentence are learned by adopting the BilSTM; the GM contains linear connections to the cell internal inputs and outputs, allowing information to propagate unobstructed between different layers;
3. calculating local normalized distribution of the semantic tags at each moment by using softmax so as to be used for constraint decoding by an output layer;
4. and when the Viterbi algorithm is used for decoding, the structural relationship between the output semantic labels is normalized by forcibly executing the set BIO and La lattice shallow semantic labeling constraints.
2. The end-to-end method for analyzing the latticed shallow layer of La in tibetan as claimed in claim 1, wherein: in the first step, useRepresents the trained GloVe word vector, represents the vocabulary by V,representing the set of tokens by C ∈ {0,1}, the most primitive input sequence { w 1 ,w 2 ,…,w T H and a marker sequence m 1 ,m 2 ,…,m T Mapping the lookup table into a low-dimensional real-valued vector e (w) t ) And e (m) t ) Wherein w is t e.V and corresponding mark m t Belongs to C; to this end, the vector e (w) may be divided t ) And e (m) t ) Are spliced into x l,t As inputs to the LSTM first layer:
x l,t =[e(w t ),e(m t )]
wherein x is l,t Is the input to LSTM at time instant l layer t, where l =1,t = [1,t =]。
3. The end-to-end method for analyzing the latticed shallow layer of La in tibetan as claimed in claim 1, wherein: in the second step, the first LSTM is used for forward processing of input sentences, and then the output of the layer is used as the input of the next layer for reverse processing, so that a foundation is laid for improving the learning capability of time sequence characteristics and fully acquiring context semantic information at each moment; LSTM is defined as follows:
wherein, delta l Represents the direction of the l-th layer LSTM when delta l LSTM direction is positive when =1, when δ l Direction is reverse when = 1;
to stack LSTM in interleaved mode, the inputs x of a particular layer are arranged in the following manner l,t And a direction parameter delta l :
Input vector x l,t Is a character w t Word embedding and table ofShow w t Is an embedded concatenation of the binary features (t = v) of a given predicate.
4. The end-to-end Tibetan La lattice shallow semantic analysis method according to claim 3, characterized in that: in the second step, linear and nonlinear transformation weights between layers are controlled by means of GM in the vertical direction of LSTM, which is used for balancing the information transfer in the vertical direction; by λ l,t Gating device for GM, using output h of post-GM hidden layer l,t The following can be changed:
h′ l,t =LSTM(h l-1,t ,h l,t-1 )。
5. the end-to-end method for analyzing the latticed shallow layer of La in tibetan as claimed in claim 4, wherein: in step two, to reduce overfitting, the rate of conjugate Dropout is used, with a shared Dropout mask D l Application to the hidden state:
inputting a characteristic sequence x = { w) of La lattice sentences 1 ,w 2 ,…,w n }, the corresponding correct semantic tag sequence y = { y = 1 ,y 2 ,…,y n The log-likelihood of is:
6. the end-to-end method for analyzing the latticed shallow layer of La in tibetan as claimed in claim 1, wherein: in step three, according to the hidden state h of the model l,t Output semantic tag y can be computed using softmax t Local normalized distribution of (c):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210602138.3A CN115510869B (en) | 2022-05-30 | 2022-05-30 | End-to-end Tibetan Lager shallow semantic analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210602138.3A CN115510869B (en) | 2022-05-30 | 2022-05-30 | End-to-end Tibetan Lager shallow semantic analysis method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115510869A true CN115510869A (en) | 2022-12-23 |
CN115510869B CN115510869B (en) | 2023-08-01 |
Family
ID=84500637
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210602138.3A Active CN115510869B (en) | 2022-05-30 | 2022-05-30 | End-to-end Tibetan Lager shallow semantic analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115510869B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440236A (en) * | 2013-09-16 | 2013-12-11 | 中央民族大学 | United labeling method for syntax of Tibet language and semantic roles |
CN109408812A (en) * | 2018-09-30 | 2019-03-01 | 北京工业大学 | A method of the sequence labelling joint based on attention mechanism extracts entity relationship |
CN111062210A (en) * | 2019-12-25 | 2020-04-24 | 贵州大学 | Neural network-based predicate center word identification method |
CN114239574A (en) * | 2021-12-20 | 2022-03-25 | 淄博矿业集团有限责任公司 | Miner violation knowledge extraction method based on entity and relationship joint learning |
-
2022
- 2022-05-30 CN CN202210602138.3A patent/CN115510869B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440236A (en) * | 2013-09-16 | 2013-12-11 | 中央民族大学 | United labeling method for syntax of Tibet language and semantic roles |
CN109408812A (en) * | 2018-09-30 | 2019-03-01 | 北京工业大学 | A method of the sequence labelling joint based on attention mechanism extracts entity relationship |
CN111062210A (en) * | 2019-12-25 | 2020-04-24 | 贵州大学 | Neural network-based predicate center word identification method |
CN114239574A (en) * | 2021-12-20 | 2022-03-25 | 淄博矿业集团有限责任公司 | Miner violation knowledge extraction method based on entity and relationship joint learning |
Non-Patent Citations (2)
Title |
---|
LUHENG HE等: "Deep Semantic Role Labeling: WhatWorks and What’s Next", PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, pages 473 - 483 * |
班玛宝: "融合双通道音节特征的藏文 La 格例句自动分类模型", 北京大学学报(自然科学版), vol. 58, no. 1, pages 91 - 98 * |
Also Published As
Publication number | Publication date |
---|---|
CN115510869B (en) | 2023-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111738003B (en) | Named entity recognition model training method, named entity recognition method and medium | |
CN112270193A (en) | Chinese named entity identification method based on BERT-FLAT | |
CN111125331A (en) | Semantic recognition method and device, electronic equipment and computer-readable storage medium | |
CN108268643A (en) | A kind of Deep Semantics matching entities link method based on more granularity LSTM networks | |
CN107590127A (en) | A kind of exam pool knowledge point automatic marking method and system | |
CN116151256A (en) | Small sample named entity recognition method based on multitasking and prompt learning | |
CN114492441A (en) | BilSTM-BiDAF named entity identification method based on machine reading understanding | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN115587594A (en) | Network security unstructured text data extraction model training method and system | |
CN116432184A (en) | Malicious software detection method based on semantic analysis and bidirectional coding characterization | |
CN114492460B (en) | Event causal relationship extraction method based on derivative prompt learning | |
CN113971394A (en) | Text repeat rewriting system | |
CN115510230A (en) | Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism | |
CN112015760A (en) | Automatic question-answering method and device based on candidate answer set reordering and storage medium | |
Jiang et al. | A hierarchical model with recurrent convolutional neural networks for sequential sentence classification | |
CN111914553A (en) | Financial information negative subject judgment method based on machine learning | |
Zhang et al. | Description-enhanced label embedding contrastive learning for text classification | |
CN114548117A (en) | Cause-and-effect relation extraction method based on BERT semantic enhancement | |
Shan | Social Network Text Sentiment Analysis Method Based on CNN‐BiGRU in Big Data Environment | |
CN114239584A (en) | Named entity identification method based on self-supervision learning | |
CN114692615B (en) | Small sample intention recognition method for small languages | |
Kashihara et al. | Automated corpus annotation for cybersecurity named entity recognition with small keyword dictionary | |
CN115510869A (en) | End-to-end Tibetan La lattice shallow semantic analysis method | |
Olivero | Figurative Language Understanding based on Large Language Models | |
CN114722818A (en) | Named entity recognition model based on anti-migration learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |